geotribu_scraper.pipelines module
Custom pipelines.
See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
- class geotribu_scraper.pipelines.CustomImagesPipeline(store_uri, download_func=None, settings=None)[source]
Bases :
ImagesPipeline
Customize how images are downloaded. Stores images into a subfolder named full under the path defined in setting IMAGES_STORE. Inherits from ImagesPipeline, the generic images pipelines from Scrapy. See: <https://doc.scrapy.org/en/latest/topics/media-pipeline.html?#using-the-images-pipeline>
- class geotribu_scraper.pipelines.ScrapyCrawlerPipeline[source]
Bases :
object
- close_spider(spider)[source]
This method is called when the spider is closed.
- Paramètres
spider (_type_) – _description_
- process_content(in_md_str)[source]
Check images in content and try to replace broken paths using a dict (stored in settings).
- process_item(item, spider)[source]
Process each item output by a spider. It performs these steps:
Extract date handling different formats
Use it to format output filename
Convert content into a markdown file handling different cases
- Paramètres
item (GeoRdpItem) – output item to process
spider (Spider) – Scrapy spider which is used
- Renvoie
item passed
- Type renvoyé
Item
- static title_builder(raw_title, append_year_at_end=True, item_date_clean=None)[source]
Handy method to build a clean title.