regex - Perl Regular Expression to extract value from nested html tags -


$match = q(<a href="#google"><h1><b>google</b></h1></a>); if($match =~ /<a.*?href.*?><.?>(.*?)<\/a>/){ $title = $1; }else { $title=""; } print"$title"; 

output: google</b></h1>

it should : google

unable extract value link using regex in perl, have 1 more or less nesting:

<h1><b><i>google</i></b></h1> 

please try this:

1) <td><a href="/wiki/unix_shell" title="unix shell">unix shell</a>

2) <a href="http://www.hp.com"><h1><b>hp</b></h1></a>

3) <a href="/wiki/generic_programming" title="generic programming">generic</a></td>);

4) <a href="#cite_note-1"><span>[</span>1<span>]</span></a>

output:

unix shell

hp

generic

[1]

try this:

if($match =~ /<a.*?href.*?><b>(.*?)<\/b>/) 

that should take "everything after href , between <b>...</b> tags

instead, "everything after last > , before first </, can use

<a.*?href.*?>([^>]*?)<\/ 

Comments

Popular posts from this blog

Unable to remove the www from url on https using .htaccess -