getting source code of redirected http site via c# webclient -


i have problem site - provided list of product id numbers (about 2000) , job pull data producer site. tried forming url of product pages, there unknown variables can't put results. there search field can use url this: http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchsubmit=suchen - problem is, given page display info (probably java script) , redirect straight desired page - 1 need pull data from.

is there way of tracking redirection thing?

i put of code, got far, find unhelpful because download source of preregistered page.

public static string download(string uri) {      webclient client = new webclient();     client.encoding = encoding.utf8;     client.headers.add("user-agent", "mozilla/4.0 (compatible; msie 6.0; windows nt 5.2; .net clr 1.0.3705;)");      string s = client.downloadstring(uri);     return s;  } 

also suggested answer not helpfull in case, because redirection doesn't come http request - page redirected after few seconds of loading http://www.hansgrohe.de/suche.htm?searchtext=10117000&searchsubmit=suchen url

i found solution, , since i'm new, , have wait few hours answer question, end there:

i hope other users find usefull: {pseudocode}

webbrowser1.navigate('url');  while (webbrowser1.url.absoluteuri != 'url') { // wait } string desireduri = webbrowser1.url.absoluteuri; 

thanks answers.

welcome wonderful world of page scraping. short answer "you can't that." not in general case, anyway, , not webclient. problem appears javascript redirection. , since webclient download page, it's not going download javascript. less parse , execute it.

you might able creating program uses webbrowser class. can have load page. should redirect , can inspect result, should page looking for. haven't done this, seem possible.

your other option fire web browser's developer tools (like ie's f12 developer tools) , watch what's happening. can inspect javascript that's being executed modified dom, , see redirect happens.

yes, it's tedious work. once figure out redirect 1 page, can generate url other pages want automatically.


Comments

Popular posts from this blog

Unable to remove the www from url on https using .htaccess -