Hello,
I am working on a .net application that will extract all the hyperlinks from a web page.
Here is my code
--------
href_pattern = "<a\s+(.*?)href\s*=\s*""**\s*(?<hrf>.+?)""**\s*>\s*(?<content>.*?)</a>"
Dim Regex As New Regex(href_pattern, RegexOptions.IgnoreCase)
Dim mt As Match
strng is string which I read from a web page
mt = Regex.Match(strng)
Dim i As Integer
i = 0
While mt.Success
Dim c As Capture
For Each c In mt.Captures
arr_href(i) = Trim(mt.Result("${hrf}"))
arr_text(i) = mt.Result("${content}")
Next
i = i + 1
mt = mt.NextMatch()
-------
I sometimes get the hyperlinks urls rught but other times "target=..." gets returned along with the href url.
Is there a better regular expression to parse the url and description of a hyperlink?
Thank you,
Burak
I am working on a .net application that will extract all the hyperlinks from a web page.
Here is my code
--------
href_pattern = "<a\s+(.*?)href\s*=\s*""**\s*(?<hrf>.+?)""**\s*>\s*(?<content>.*?)</a>"
Dim Regex As New Regex(href_pattern, RegexOptions.IgnoreCase)
Dim mt As Match
strng is string which I read from a web page
mt = Regex.Match(strng)
Dim i As Integer
i = 0
While mt.Success
Dim c As Capture
For Each c In mt.Captures
arr_href(i) = Trim(mt.Result("${hrf}"))
arr_text(i) = mt.Result("${content}")
Next
i = i + 1
mt = mt.NextMatch()
-------
I sometimes get the hyperlinks urls rught but other times "target=..." gets returned along with the href url.
Is there a better regular expression to parse the url and description of a hyperlink?
Thank you,
Burak