Parsing XML data using RegEx

mj0lnr

Member
Joined
Oct 25, 2004
Messages
7
I have some data that I need to have stripped of all excess XML data. When I first volunteered to do this, I thought to myself, its just excess text, how hard can that be? :D oh how little did I know what RegEx is. Im beating my head against a wall, and would just like someone to help get me going.

I followed along in this thread. But, Ive had no such luck in figuring it out on my own without a little bit of coaxing.

So, heres my question, how do I strip out this excess data so that all Im left with is the coachs name (it needs to be a variable for a combo box). So, that the coach of the team can make changes to his roster below his name. Or since the coachs name always stays the same, should I use RegEx.IsMatch() instead of trying to strip the excess data since it needs to be saved as XML anyway and sent back as a plain TXT file? So, many questions, thats why Im coming to the gurus...!

and heres a sample of what my XML looks like....

Code:
- <team name="Sacramento" coach="Randy" teamid="1" picture="madison.gif" abbreviation="SAC" email="">
- <roster>
  <player name="Jose Fuentes" pos="QB1" /> 
  <player name="JoJo Jones" pos="RB1" /> 
  <player name="Johnnie Vee" pos="RB2" /> 
  <player name="Tom Waddle" pos="WR1" /> 
  <player name="Sherman Deary" pos="WR2" /> 
  <player name="Dan Graham" pos="TE1" /> 
  <player name="John Hall" pos="K" /> 
  <player name="CHA" pos="DEF" /> 
  <player name="Wheeler Chandells" /> 
  <player name="Carlin Patton" /> 
  <player name="Sidney Iverson" /> 
  <player name="Nicky Santoro" /> 
  </roster>
  </team>
 
again...no luck :(
any clue as to what Im doing wrong...?

Code:
    Public Function ReturnValues(ByVal RegularExpression As String, ByVal mytext As String, ByVal item As String) As String()
        Dim myRegExp As New Regex(RegularExpression, RegexOptions.IgnoreCase)
        Dim Matchs As MatchCollection = myRegExp.Matches(mytext)
        Dim currentMatch As Match

        Dim matchedValues As New ArrayList()


        For Each currentMatch In Matchs
            Dim myCaptures As CaptureCollection = currentMatch.Groups(item).Captures
            Dim currentItem As Capture
            For Each currentItem In myCaptures
                matchedValues.Add(currentItem.Value)
            Next

        Next

        Return CType(matchedValues.ToArray(GetType(String)), String())
    End Function


    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim myPattern As String = "\<team name="Sacramento"\(?i)<team[^>]+coach="(?<coach>[^"]*)"[^>]*>"
        Dim myText As String = "<team name="Sacramento" coach="Randy" teamid="1" picture="madison.gif" abbreviation="SAC" email="">"
        Dim oneValues() As String = ReturnValues(myPattern, myText, "coach1")
        Dim twoValues() As String = ReturnValues(myPattern, myText, "itemtwo")
    End Sub

TIA,
A.
 
Code:
Public Function ExplicitTest() As ArrayList()
        Dim input As String = "<team name="Sacramento" coach="Randy" teamid="1" picture="madison.gif" abbreviation="SAC" email="">"
         THIS IS NOT THE REGEX I GAVE YOU
         Dim pattern As String = "\<team name="Sacramento"\(?i)<team[^>]+coach="(?<coach>[^"]*)"[^>]*>"
        Dim pattern As String = "(?i)<team[^>]+coach="(?<coach>[^"]*)"[^>]*>"
        Dim matches As MatchCollection = Regex.Matches(input, regex)
        Dim captures As New ArrayList(matches.Count)

        For Each match In matches
            captures.Add(match.Result("${coach}"))
        Next

        Return captures
    End Function

I dunno vb but that example should work. heres a c# example

Code:
ArrayList FooBar(string input)
{
	string pattern = "(?i)<team[^>]+coach=\"(?<coach>[^\"]*)\"[^>]*>";
	MatchCollection matches = Regex.Matches(input, pattern);
	ArrayList captures = new ArrayList(matches.Count);
	foreach(Match match in matches)
		captures.Add(match.Result("${coach}"));
	return captures;
}
 
Why not create a class that loads that particular xml structure, then write it out as a differenct xml file with only the nodes and attributes that are required?
 
fenris said:
Why not create a class that loads that particular xml structure, then write it out as a differenct xml file with only the nodes and attributes that are required?


Fen, Im completely open to suggestions.....I thought I knew VB, until I found out what regular expressions were. Whew!! Theyve thrown me for a loop
 
I hear that!

Regular expressions are entirely different langauge that were designed to parse text expression very well. I dont think that you need to use them for you particular circumstances.

I would use a couple of classes like this:

Code:
Public Class Team
    Private _Name As String
    Private _CoachName As String
    Private _ID As String
    Private _Picture As String could be an IMAGE object as well
    Private _NameAbbreviation As String
    Private _Email As String
    Private _Players As Collection

    Public Property Players() As Collection
        Get

        End Get
        Set(ByVal Value As Collection)

        End Set
    End Property

    Public Property Email() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property NameAbbreviation() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property Picture() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property ID() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property CoachName() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

    Public Property Name() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property

End Class

Public Class Player
    Private _Name As String
    Private _Position As String

    Public Property Position() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property
    Public Property Name() As String
        Get

        End Get
        Set(ByVal Value As String)

        End Set
    End Property
End Class

Then I would load the xml into vb and create the collections from there. Once the classes are created, you can then create the new xml files any way you please. You can also output the data to text.

Here is an example to get you started.
 
THANK YOU!!! Ill definitely look at that when I get to the house tonight....and btw, you might find this really funny....this is for an online football league Im in, my team name is The Fenris Wolves. :)

A.
 
:D


Regex is great for parsing text, I use it (well at least I try to use it) for parsing html tables from downloaded html source.
 
Jeez, Fen...Ive tried 10 or 11 different ways to setup what you gave me, but no dice.....I cant even get it started... :( I hope its not too much to ask for a little more help....(as my son would say) PEASE????? LOL

A.
 
Back
Top