regex - Convert multiple Unicode in a string to character -
problem -- have string, buna$002c_texasbuna$002c_texas
' , $
followed unicode. want replace these unicode respective unicode character representation.
in perl if unicode in form of "\x{002c}
converted respective unicode character. below sample code.
#!/usr/bin/perl $string = "hello \x{263a}!\n"; @arr= split //,$string; print "@arr";
i processing file contain 10 million of records. have these strings in scalar variable. same above substituting $4_digit_unicode
\x{4_digit_unicode}
below.
$str = 'buna$002c_texasbuna$002c_texas'; $str =~s/\$(.{4})/\\x\{$1\}/g; $str = "$str"
it gives me
buna\x{002c}_texasbuna\x{002c}_texas
it because @ $str = "$str"
, line $str
being interpolated, not value. \x{002c}
not being interpolated perl.
is there way force perl interpolate contents of $str
too?
or
is there method achieve this? not want take out each of unicodes pack using pack "u4",0x002c
, substitute back. in 1 line (like below unsuccessful attempt) ok.
$str =~ s/\$(.{4})/pack("u4",$1)/g;
i know above wrong; can above?
for input string $str = 'buna$002c_texasbuna$002c_texas'
, desired output buna,_texasbuna,_texas
.
this gives desired result:
use strict; use warnings; use feature 'say'; $str = 'buna$002c_texasbuna$002c_texas'; $str =~s/\$(.{4})/chr(hex($1))/eg; $str;
the main interesting item e
in s///eg
. e
means treat replacement text code executed. hex()
converts string of hexadecimal characters number. chr()
converts number character. replace line might better written below avoid trying convert dollar followed non-hexadecimal characters.
$str =~s/\$([0-9a-f]{4})/chr(hex($1))/egi;
Comments
Post a Comment