c# - Regex to get url from HTML -


i'm using following regex (which found online) obtain urls within html page;

        regex regex = new regex(@"url\((?<char>['""])?(?<url>.*?)\k<char>?\)"); 

works fine html below;

<div style="background:url(images/logo.png) no-repeat;">uk</div> 

however returns more need when html page contained following javascript, returning 'destpage'

function buildurl(destpage)  

i tried following regex include colon, appears invalid

:url\((?<char>['""])?(?<:url>.*?)\k<char>?\) 

any appreciated.

to urls, use htmlagilitypack instead of regex. example page

htmldocument doc = new htmldocument(); doc.load("file.htm"); foreach(htmlnode link in doc.documentelement.selectnodes("//a[@href"]) {  } 

you can expand on obtain style urls by, example, using //@style style nodes , iterating through extract url value.


Comments

Popular posts from this blog

design - Custom Styling Qt Quick Controls -

Unable to remove the www from url on https using .htaccess -