Im trying to use HttpWebRequest to check if the url content is text/html before trying to parse it w

EDN Admin

Well-known member
Joined
Aug 7, 2010
Messages
12,794
Location
In the Machine
This is the function im using now:

<pre class="prettyprint lang-vb private static HtmlAgilityPack.HtmlDocument getHtmlDocumentWebClient(string url, bool useProxy, string proxyIp, int proxyPort, string usename, string password)
{

HttpWebRequest myHttpWebRequest = null; //Declare an HTTP-specific implementation of the WebRequest class.
HttpWebResponse myHttpWebResponse = null; //Declare an HTTP-specific implementation of the WebResponse class
//Create Request
myHttpWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
myHttpWebRequest.Method = "GET";
myHttpWebRequest.ContentType = "text/html; encoding=utf-8";
//Get Response
myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
/*WebClient client = new WebClient();
//client.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
client.Credentials = CredentialCache.DefaultCredentials;
client.Proxy = WebRequest.DefaultWebProxy;
if (useProxy)
{
//Proxy
if (!string.IsNullOrEmpty(proxyIp))
{
WebProxy p = new WebProxy(proxyIp, proxyPort);
if (!string.IsNullOrEmpty(usename))
{
if (password == null)
password = string.Empty;
NetworkCredential nc = new NetworkCredential(usename, password);
p.Credentials = nc;
}
}
}*/

Stream data = myHttpWebResponse.GetResponseStream();//client.OpenRead(url);
doc.Load(data);
data.Close();
return doc;
}[/code]
Before the HttpWebRequest i used WebClient.
The problem was and is that the variable url contain for example the link:
<pre class="prettyprint http://appldnld.apple.com/iTunes10/041-7196.20120912.Ber43/iTunesSetup.exe[/code]
If im using this link in internet explorer it will download the file to my hard disk.
But thats not what i need .
I need to decide when im getting a link with exe file what to do with it. At start i need to detect find that its not text/html content and then to decide what to do with it maybe just nothing to pass over to the next link.
The problem is when its getting to the line: doc.Load(data); the application think and think then stuck freeze then i need to use Task Manager to force and close the application since its not responding.

Right after closing it im getting the exception:

<pre class="prettyprint StackOverFlowException was unhandled An unhandled exception of type System.StackOverflowException occurred in HtmlAgilityPack.dll System.StackOverflowException was unhandled Message: An unhandled exception of type System.StackOverflowException occurred in HtmlAgilityPack.dll [/code]
I used a breakpoint and found it stuck and then make the exception in the end on the line doc.Load(data);
So the questions are:

1. How ot handle this case of urls what to do with them if to pass over or maybe in the future i will want to use them to download the files ?
2. How to use the httpwebrequest ? for now as my function is now im getting the same problem.
I want to be able later to decide if this links are useful like for downloading the files or not useful and just ignore them if they are not contain text/html content.
<hr class="sig danieli

View the full article
 
Back
Top