python 3.x - Impossible to remove \n and \t in python3 string? -

- April 15, 2014

so been trying format taken webpage cl can send email, come every time try remove \n , \t

b'\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n\n\n\t\n\n\n\t \n\t\t\t \n\t \n\t\t \n\t\t\t \n 0 favorites\n \n\n\t\t \n\t\t ∨ \n\t\t ∧ \n\t\t \n \n \n \n\t \tcl wenatchee personals casual encounters\n \n \n\t\t \n\t \n \n\n\t\t \n\t\t\t \n\t\n\t\t\n\t\n\n\n\nreply to: 59nv6-4031116628@pers.craigslist.org\n \n\n\n\t \n\t\n\t\tflag [?] :\n\t\t\n\t\t\tmiscategorized\n\t\t\n\t\t\tprohibited\n\t\t\n\t\t\tspam\n\t\t\n\t\t\tbest of\n\t\n \n\n\t\t  posted: 2013-08-28, 8:23am pdt \n \n\n \n \n well... - w4m - 22 (wenatchee)\n

i have tried strip, replace , regex nothing fazes it, comes in email unaffected everything.

here's code:

try:     if url.find('http://') == -1:         url = 'http://wenatchee.craigslist.org' + url     html = urlopen(url).read()     html = str(html)     html = re.sub('\s+',' ', html)     print(html)     part2 = mimetext(html, 'html')     msg.attach(part2)     s = smtplib.smtp('localhost')     s.sendmail(me, you, msg.as_string())     s.quit()

your issue despite evidence contrary, still have bytes object rather str you're hoping for. attempts come nothing because without encoding specified, there's no way match (regexes, replacement parameters, etc) html string.

what need decode bytes first.

and personally, favorite method cleaning whitespace use string.split , string.join. here's working example. remove runs of kind of whitespace, , replace them single spaces.

try:     html = urlopen('http://wenatchee.craigslist.org').read()     html = html.decode("utf-8") # decode bytes useful string     # split string on whitespace, join again.     html = ' '.join(html.split())     print(html)     s.quit() except exception e:     print(e)

Search This Blog

Code wiki

python 3.x - Impossible to remove \n and \t in python3 string? -

Comments

Post a Comment

Popular posts from this blog

design - Custom Styling Qt Quick Controls -

sql - Is there any inbuilt stored procedure which will return the output of a query as an XML document..? -

Unable to remove the www from url on https using .htaccess -