HTML Screen Scraping using C# .Net WebClient

EDN Admin

Well-known member
Joined
Aug 7, 2010
Messages
12,794
Location
In the Machine
I am trying the sample code from here:
http://www.codersource.net/MicrosoftNet/CAdvanced/HTMLScreenScrapinginC.aspx http://www.codersource.net/MicrosoftNet/CAdvanced/HTMLScreenScrapinginC.aspx
I have a Textbox1, Button1, and DataGrid1. Below is my code:
using System;<br/>
using System.Collections.Generic;<br/>
using System.ComponentModel;<br/>
using System.Data;<br/>
using System.Drawing;<br/>
using System.Linq;<br/>
using System.Text;<br/>
using System.Windows.Forms;<br/>
using System.Net;<br/>
using System.IO;<br/>
<br/>
// creates a button protected <br/>
using System.Web.UI.WebControls.Button Button1; <br/>
// creates a byte array <br/>
private byte[] aRequestHTML; <br/>
// creates a string private string <br/>
myString = null; <br/>
// creates a datagrid protected <br/>
using System.Web.UI.WebControls.DataGrid DataGrid1; <br/>
// creates a textbox protected <br/>
using System.Web.UI.WebControls.TextBox TextBox1; <br/>
// creates the label protected <br/>
using System.Web.UI.WebControls.Label Label1; <br/>
// creates the arraylist <br/>
private ArrayList a = new ArrayList();

namespace WindowsFormsApplication1<br/>
{<br/>
public partial class Form1 : Form<br/>
{<br/>
public Form1()<br/>
{<br/>
InitializeComponent();<br/>
}<br/>
<br/>
private void Button1_Click(object sender, System.EventArgs e)<br/>
{<br/>
// make an object of the WebClient class
<br/>
WebClient objWebClient = new WebClient();<br/>
// gets the HTML from the url written in the textbox<br/>
aRequestHTML = objWebClient.DownloadData(TextBox1.Text);
<br/>
// creates UTf8 encoding object<br/>
UTF8Encoding utf8 = new UTF8Encoding(); <br/>
// gets the UTF8 encoding of all the html we got in aRequestHTML<br/>
myString = utf8.GetString(aRequestHTML);
<br/>
// this is a regular expression to check for the urls<br/>
Regex r = new Regex("href\s*=\s*(?:(?:\"(?[^\"]*)\")|(?[^\s]* ))");
<br/>
// get all the matches depending upon the regular expression<br/>
MatchCollection mcl = r.Matches(myString);
<br/>
<br/>
foreach(Match ml in mcl)
<br/>
{<br/>
foreach(Group g in ml.Groups)
<br/>
{<br/>
string b = g.Value + " "; <br/>
// Add the extracted urls to the array list<br/>
a.Add(b); <br/>
<br/>
}<br/>
}<br/>
// assign arraylist to the datasource<br/>
DataGrid1.DataSource = a;
<br/>
// binds the databind<br/>
DataGrid1.DataBind();
<br/>
<br/>
// The following lines of code writes the extracted Urls to the file named test.txt<br/>
StreamWriter sw = new StreamWriter(Server.MapPath("test.txt"));
<br/>
sw.Write(myString);
<br/>
sw.Close(); <br/>
}<br/>
<br/>
<br/>
}<br/>
}<br/>

Im guessing that there needs to be a reference to a WebService; not exactly sure how ot set that up, but I tried the following:
<img alt="" src="http://social.msdn.microsoft.com/Forums/getfile/169870

DataGridView Tasks > Choose Data Source > Add Project Data Source . . . > Service > Next > then, in the address bar I entered this: http:www.ebay.com

Then I got an error message:
<img alt="" src="http://social.msdn.microsoft.com/Forums/getfile/169871

Can someone please tell me what Im doing wrong here?
<
Ryan Shuell
<br/>

View the full article
 
Back
Top