Z
zydjohn
Guest
Hello:
I have a C# .NET Core project to download around 200 web pages after I login to a web site.
The following is my C# code to download HTML from URL:
public static async Task<HttpClient> Create_HttpClient()
{
try
{
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = 1000;
ServicePointManager.SecurityProtocol =
SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
ServicePointManager.ServerCertificateValidationCallback += (sender, cert, chain, sslPolicyErrors) => true;
HttpClientHandler clientHandler = new HttpClientHandler()
{
AllowAutoRedirect = true,
AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip,
};
HttpClient _client1 = new HttpClient(clientHandler);
_client1.DefaultRequestHeaders.Accept.Clear();
_client1.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate");
_client1.DefaultRequestHeaders.Add("X-Requested-With", "XMLHttpRequest");
return (_client1);
}
catch (Exception ex)
{
}
return (null);
}
public static async Task<string> Get_WebContent(string url1)
{
try
{
using (HttpClient client1 = await Create_HttpClient())
{
using (HttpResponseMessage http_reply1 = await client1.GetAsync(url1))
{
string html_content = await http_reply1.Content.ReadAsStringAsync();
if ((html_content != "") &&
(!html_content .Contains("503 Service Temporarily Unavailable")))
{
string page_html = html_content.Replace("\n", "");
}
return (page_html);
}
catch (Exception ex)
{
}
return (null);
}
static async Task Main()
{
List<string> web_links = new List<string>();
for (int i = 1; i <= 200; i++)
{
string page_url1 = string.Format("https://myweb.com/markets/page={0}", i);
web_links.Add(page_url1);
}
for(int i = 1; i <= 200; i++)
{
using (HttpClient client1 = await Create_HttpClient())
{
using (HttpResponseMessage http_reply1 = await client1.GetAsync(web_links))
{
string html_content = await http_reply1.Content.ReadAsStringAsync();
if ((html_content != "") &&
(!html_content .Contains("503 Service Temporarily Unavailable")))
{
string page_html = html_content.Replace("\n", "");
}
}
}
}
I can run my program, but for nearly half of the time, I always get 503 Error: Service Temporarily Unavailable. (Some times, most of them (70%+) show 503 Error.)
But if I use web browser to visit each of the web links, most of the time (90%+ of the times), I can see the web URL content.
However, since using web browser to visit all 200 web links taking too much time, all the web links have redirect web link, and due to some timeout issue, using web browser to visit can get stuck from time to time, so using web browser to visit all 200 web links is not suitable for C# program.
I guessed that because web browser could wait for DOM download for a longer timeout, but even if I changed timeout for http client, but it doesn’t help at all.
I also tried to use the following code:
IEnumerable<Task> download_all_links =
web_links.Select(x=> Get_WebContent(x));
awaitTask.WhenAll(download_all_links);
However, I found this is much worse comparing to my previous code, nearly 100% of the web links, I got 503 Error.
Please advice how I can fix my issue: quite often I got 503 Error using http client; but using web browser I didn’t get 503 Error, yet using web browser is not suitable in C# .NET Core project.
By the way, I am using Visual Studio 2019 Version 16.2.5 on Windows 10 (Version 1903).
Thanks,
Continue reading...
I have a C# .NET Core project to download around 200 web pages after I login to a web site.
The following is my C# code to download HTML from URL:
public static async Task<HttpClient> Create_HttpClient()
{
try
{
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = 1000;
ServicePointManager.SecurityProtocol =
SecurityProtocolType.Tls12 | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls;
ServicePointManager.ServerCertificateValidationCallback += (sender, cert, chain, sslPolicyErrors) => true;
HttpClientHandler clientHandler = new HttpClientHandler()
{
AllowAutoRedirect = true,
AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip,
};
HttpClient _client1 = new HttpClient(clientHandler);
_client1.DefaultRequestHeaders.Accept.Clear();
_client1.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate");
_client1.DefaultRequestHeaders.Add("X-Requested-With", "XMLHttpRequest");
return (_client1);
}
catch (Exception ex)
{
}
return (null);
}
public static async Task<string> Get_WebContent(string url1)
{
try
{
using (HttpClient client1 = await Create_HttpClient())
{
using (HttpResponseMessage http_reply1 = await client1.GetAsync(url1))
{
string html_content = await http_reply1.Content.ReadAsStringAsync();
if ((html_content != "") &&
(!html_content .Contains("503 Service Temporarily Unavailable")))
{
string page_html = html_content.Replace("\n", "");
}
return (page_html);
}
catch (Exception ex)
{
}
return (null);
}
static async Task Main()
{
List<string> web_links = new List<string>();
for (int i = 1; i <= 200; i++)
{
string page_url1 = string.Format("https://myweb.com/markets/page={0}", i);
web_links.Add(page_url1);
}
for(int i = 1; i <= 200; i++)
{
using (HttpClient client1 = await Create_HttpClient())
{
using (HttpResponseMessage http_reply1 = await client1.GetAsync(web_links))
{
string html_content = await http_reply1.Content.ReadAsStringAsync();
if ((html_content != "") &&
(!html_content .Contains("503 Service Temporarily Unavailable")))
{
string page_html = html_content.Replace("\n", "");
}
}
}
}
I can run my program, but for nearly half of the time, I always get 503 Error: Service Temporarily Unavailable. (Some times, most of them (70%+) show 503 Error.)
But if I use web browser to visit each of the web links, most of the time (90%+ of the times), I can see the web URL content.
However, since using web browser to visit all 200 web links taking too much time, all the web links have redirect web link, and due to some timeout issue, using web browser to visit can get stuck from time to time, so using web browser to visit all 200 web links is not suitable for C# program.
I guessed that because web browser could wait for DOM download for a longer timeout, but even if I changed timeout for http client, but it doesn’t help at all.
I also tried to use the following code:
IEnumerable<Task> download_all_links =
web_links.Select(x=> Get_WebContent(x));
awaitTask.WhenAll(download_all_links);
However, I found this is much worse comparing to my previous code, nearly 100% of the web links, I got 503 Error.
Please advice how I can fix my issue: quite often I got 503 Error using http client; but using web browser I didn’t get 503 Error, yet using web browser is not suitable in C# .NET Core project.
By the way, I am using Visual Studio 2019 Version 16.2.5 on Windows 10 (Version 1903).
Thanks,
Continue reading...