neodammer said:
Anybody know of a good regex function for extracting links from html code? Im finding it hard with the various ways to display links.
Here are a couple of good ones from
http://www.regular-expressions.info/.
That is a great reference for new RegEx and old Regex users.
createRegexObj("<" + tagName + "[^>]*>(.*?)</" + tagName + ">");
matchObj = regexObj.Match(search);
You could then loop through the matches.
This next one does the same thing but uses Backreferences to capture the text inside the tags.
createRegexObj(@"<([A-Z][A-Z0-9]*)[^>]*>(.*?)</\1>");
matchObj = regexObj.Match(search);
You could write a generic print module to show all the results like this:
private void printMatch()
{
// Regex.Match constructs and returns a Match object
// You can query this object to get all possible information about the match
while (matchObj.Success)
{
Console.WriteLine("Match offset: " + matchObj.Index.ToString() + "\r\n");
Console.WriteLine("Match length: " + matchObj.Length.ToString() +"\r\n");
Console.WriteLine("Matched text: " + matchObj.Value + "\r\n");
if (matchObj.Groups.Count > 1)
{
// matchObj.Groups[0] holds the entire regex match also held by
// matchObj itself. The other Group objects hold the matches for
// capturing parentheses in the regex
for (int i = 1; i < matchObj.Groups.Count; i++)
{
Group g = matchObj.Groups
;
if (g.Success)
{
Console.WriteLine("Group " + i.ToString() +
" offset: " + g.Index.ToString() + "\r\n");
Console.WriteLine("Group " + i.ToString() +
" length: " + g.Length.ToString() + "\r\n");
Console.WriteLine("Group " + i.ToString() +
" text: " + g.Value + "\r\n");
}
else
{
Console.WriteLine("Group " + i.ToString() +
" did not participate in the overall match\r\n");
}
}
}
else
{
Console.WriteLine("no backreferences/groups");
}
// Get the next match
matchObj = matchObj.NextMatch();
}
}
Neither of these get tags within tags. You would need to loop through the backexpressions to do that.