regex - Convert multiple Unicode in a string to character -
problem -- have string, buna$002c_texasbuna$002c_texas' , $ followed unicode. want replace these unicode respective unicode character representation.
in perl if unicode in form of "\x{002c} converted respective unicode character. below sample code.
#!/usr/bin/perl $string = "hello \x{263a}!\n"; @arr= split //,$string; print "@arr"; i processing file contain 10 million of records. have these strings in scalar variable. same above substituting $4_digit_unicode \x{4_digit_unicode} below.
$str = 'buna$002c_texasbuna$002c_texas'; $str =~s/\$(.{4})/\\x\{$1\}/g; $str = "$str" it gives me
buna\x{002c}_texasbuna\x{002c}_texas it because @ $str = "$str", line $str being interpolated, not value. \x{002c} not being interpolated perl.
is there way force perl interpolate contents of $str too?
or
is there method achieve this? not want take out each of unicodes pack using pack "u4",0x002c , substitute back. in 1 line (like below unsuccessful attempt) ok.
$str =~ s/\$(.{4})/pack("u4",$1)/g; i know above wrong; can above?
for input string $str = 'buna$002c_texasbuna$002c_texas', desired output buna,_texasbuna,_texas.
this gives desired result:
use strict; use warnings; use feature 'say'; $str = 'buna$002c_texasbuna$002c_texas'; $str =~s/\$(.{4})/chr(hex($1))/eg; $str; the main interesting item e in s///eg. e means treat replacement text code executed. hex() converts string of hexadecimal characters number. chr() converts number character. replace line might better written below avoid trying convert dollar followed non-hexadecimal characters.
$str =~s/\$([0-9a-f]{4})/chr(hex($1))/egi;
Comments
Post a Comment