c# - Regex to get url from HTML -
i'm using following regex (which found online) obtain urls within html page;
regex regex = new regex(@"url\((?<char>['""])?(?<url>.*?)\k<char>?\)");
works fine html below;
<div style="background:url(images/logo.png) no-repeat;">uk</div>
however returns more need when html page contained following javascript, returning 'destpage'
function buildurl(destpage)
i tried following regex include colon, appears invalid
:url\((?<char>['""])?(?<:url>.*?)\k<char>?\)
any appreciated.
to urls, use htmlagilitypack instead of regex. example page
htmldocument doc = new htmldocument(); doc.load("file.htm"); foreach(htmlnode link in doc.documentelement.selectnodes("//a[@href"]) { }
you can expand on obtain style urls by, example, using //@style
style
nodes , iterating through extract url
value.
Comments
Post a Comment