python - Python3 : unescaping non ascii characters -


(python 3.3.2) have unescape non ascii escaped characters returned call re.escape(). see here , here methods doesn't work. i'm working in 100% utf-8 environment.

# pure ascii string : ok mystring = "a\n" # expected unescaped string : "a\n" cod = codecs.getencoder('unicode_escape') print( cod(mystring) )  # non ascii string : method #1 mystring = "€\n" # equivalent : mystring = codecs.unicode_escape_decode(mystring) cod = codecs.getdecoder('unicode_escape') print(cod(mystring)) # result = ('â\x82¬\n', 5) instead of ("€\n", 2)  # non ascii string : method #2 mystring = "€\n" mystring = bytes(mystring, 'utf-8').decode('unicode_escape') print(mystring) # result = â\202¬ instead of "€\n" 

is bug ? have misunderstood ?

any appreciated !

ps : edited post michael foukarakis' remark.

you seem misunderstand encodings. protected against common errors, encode string when leaves our application, , decode when comes in.

firstly, let's @ documentation unicode_escape, states:

produce[s] string suitable unicode literal in python source code.

here network or file claims contents unicode escaped:

b'\\u20ac\\n' 

now, have decode use in app:

>>> s = b'\\u20ac\\n'.decode('unicode_escape') >>> s '€\n' 

and if wanted write to, say, python source file:

with open('/tmp/foo', 'wb') fh: # binary mode     fh.write(b'print("' + s.encode('unicode_escape') + b'")') 

Comments

Popular posts from this blog

Unable to remove the www from url on https using .htaccess -