python - remove scraped data with an empty value -


supposed scraping data , of fields scraped "" meaning no value , don't want row "" in it. how can it? example:

field1       field2     field3 place     blurred    trying house        fan                door         mouse      hat 

what want program not write entire 2nd row csv because field3 empty.

you can write , configure item pipeline following instructions [the scrapy docs] , drop item test on it's values.

add in pipeline.py file:

from scrapy.exceptions import dropitem  class dropifemptyfieldpipeline(object):      def process_item(self, item, spider):          # test if "job_id" empty,         # change to:         # if not(item["job_id"]):         if not(all(item.values())):             raise dropitem()         else:             return item 

and set in settings.py (adapt projet's name)

item_pipelines = [ 'myproject.pipeline.dropifemptyfieldpipeline', ] 

edit after op's comment testing "nurse"

from scrapy.exceptions import dropitem import re  class dropifemptyfieldpipeline(object):      # case-insensitive search string "nurse"     regex_nurse = re.compile(r'nurse', re.ignorecase)      def process_item(self, item, spider):         # user .search() , not .match() test substring match         if not(self.regex_nurse.search(item["job_id"])):             raise dropitem()         else:             return item 

Comments

Popular posts from this blog

Unable to remove the www from url on https using .htaccess -