regex - Perl Regular Expression to extract value from nested html tags -
$match = q(<a href="#google"><h1><b>google</b></h1></a>); if($match =~ /<a.*?href.*?><.?>(.*?)<\/a>/){ $title = $1; }else { $title=""; } print"$title";
output: google</b></h1>
it should : google
unable extract value link using regex in perl, have 1 more or less nesting:
<h1><b><i>google</i></b></h1>
please try this:
1) <td><a href="/wiki/unix_shell" title="unix shell">unix shell</a>
2) <a href="http://www.hp.com"><h1><b>hp</b></h1></a>
3) <a href="/wiki/generic_programming" title="generic programming">generic</a></td>);
4) <a href="#cite_note-1"><span>[</span>1<span>]</span></a>
output:
unix shell
hp
generic
[1]
try this:
if($match =~ /<a.*?href.*?><b>(.*?)<\/b>/)
that should take "everything after href
, between <b>...</b>
tags
instead, "everything after last >
, before first </
, can use
<a.*?href.*?>([^>]*?)<\/
Comments
Post a Comment