regex - Convert multiple Unicode in a string to character -


problem -- have string, buna$002c_texasbuna$002c_texas' , $ followed unicode. want replace these unicode respective unicode character representation.

in perl if unicode in form of "\x{002c} converted respective unicode character. below sample code.

#!/usr/bin/perl $string = "hello \x{263a}!\n"; @arr= split //,$string; print "@arr"; 

i processing file contain 10 million of records. have these strings in scalar variable. same above substituting $4_digit_unicode \x{4_digit_unicode} below.

$str = 'buna$002c_texasbuna$002c_texas'; $str =~s/\$(.{4})/\\x\{$1\}/g; $str = "$str" 

it gives me

buna\x{002c}_texasbuna\x{002c}_texas 

it because @ $str = "$str", line $str being interpolated, not value. \x{002c} not being interpolated perl.

is there way force perl interpolate contents of $str too?

or

is there method achieve this? not want take out each of unicodes pack using pack "u4",0x002c , substitute back. in 1 line (like below unsuccessful attempt) ok.

$str =~ s/\$(.{4})/pack("u4",$1)/g; 

i know above wrong; can above?

for input string $str = 'buna$002c_texasbuna$002c_texas', desired output buna,_texasbuna,_texas.

this gives desired result:

use strict; use warnings; use feature 'say';  $str = 'buna$002c_texasbuna$002c_texas';  $str =~s/\$(.{4})/chr(hex($1))/eg;  $str; 

the main interesting item e in s///eg. e means treat replacement text code executed. hex() converts string of hexadecimal characters number. chr() converts number character. replace line might better written below avoid trying convert dollar followed non-hexadecimal characters.

$str =~s/\$([0-9a-f]{4})/chr(hex($1))/egi; 

Comments

Popular posts from this blog

Unable to remove the www from url on https using .htaccess -