python - remove scraped data with an empty value -
supposed scraping data , of fields scraped ""
meaning no value , don't want row ""
in it. how can it? example:
field1 field2 field3 place blurred trying house fan door mouse hat
what want program not write entire 2nd row csv because field3 empty.
you can write , configure item pipeline following instructions [the scrapy docs] , drop item test on it's values.
add in pipeline.py
file:
from scrapy.exceptions import dropitem class dropifemptyfieldpipeline(object): def process_item(self, item, spider): # test if "job_id" empty, # change to: # if not(item["job_id"]): if not(all(item.values())): raise dropitem() else: return item
and set in settings.py
(adapt projet's name)
item_pipelines = [ 'myproject.pipeline.dropifemptyfieldpipeline', ]
edit after op's comment testing "nurse"
from scrapy.exceptions import dropitem import re class dropifemptyfieldpipeline(object): # case-insensitive search string "nurse" regex_nurse = re.compile(r'nurse', re.ignorecase) def process_item(self, item, spider): # user .search() , not .match() test substring match if not(self.regex_nurse.search(item["job_id"])): raise dropitem() else: return item
Comments
Post a Comment