getting source code of redirected http site via c# webclient -
i have problem site - provided list of product id numbers (about 2000) , job pull data producer site. tried forming url of product pages, there unknown variables can't put results. there search field can use url this: http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchsubmit=suchen - problem is, given page display info (probably java script) , redirect straight desired page - 1 need pull data from.
is there way of tracking redirection thing?
i put of code, got far, find unhelpful because download source of preregistered page.
public static string download(string uri) { webclient client = new webclient(); client.encoding = encoding.utf8; client.headers.add("user-agent", "mozilla/4.0 (compatible; msie 6.0; windows nt 5.2; .net clr 1.0.3705;)"); string s = client.downloadstring(uri); return s; }
also suggested answer not helpfull in case, because redirection doesn't come http request - page redirected after few seconds of loading http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchsubmit=suchen url
i found solution, , since i'm new, , have wait few hours answer question, end there:
i hope other users find usefull: {pseudocode}
webbrowser1.navigate('url'); while (webbrowser1.url.absoluteuri != 'url') { // wait } string desireduri = webbrowser1.url.absoluteuri;
thanks answers.
welcome wonderful world of page scraping. short answer "you can't that." not in general case, anyway, , not webclient. problem appears javascript redirection. , since webclient download page, it's not going download javascript. less parse , execute it.
you might able creating program uses webbrowser class. can have load page. should redirect , can inspect result, should page looking for. haven't done this, seem possible.
your other option fire web browser's developer tools (like ie's f12 developer tools) , watch what's happening. can inspect javascript that's being executed modified dom, , see redirect happens.
yes, it's tedious work. once figure out redirect 1 page, can generate url other pages want automatically.
Comments
Post a Comment