Regular Expressions -vs- Instr, Mid, etc.

Threads · Apr 29, 2003

Does anyone know of any sites that compare the use of Regular Expressions versus the Instr and Mid$ functions when extracting data? I know how to use Instr and Mid$ quite well already and I know with Regular Expressions it is possible to extract data as well (and give me more power since I can use more pattern matching), but I am curious what kind of performance hit I may take if I start using Regular Expressions instead. Any help would be appreciated

Thanks.

Robby · Apr 29, 2003

I wont address the performance between the two, but I can tell you... dont use Mid, left, instr.

All the string functions can be done using the new .net methods.

Mid ... myString.Substring()
Replace... myString.Replace()
Instr ... myString.IndexOf()
and many more....

Nerseus · Apr 29, 2003

Out of curiosity, I created a small test program. The results were somewhat expected (Regular Expression matching is slower), but there are a number of factors to consider.

First, Regular Expressions are VERY powerful. Besides doing matching, they can do validation. Also, you can create very powerful expressions much easier than you could with IndexOf and Substring.

Ill make two notes about the sample code. First, I wrote the regular expression code in about 5 minutes. Writing the IndexOf and Substring took about 15 minutes. Also, my first regular expression is MUCH more robust than the IndexOf/Substring method. For instance, the expression will automatically trim off any spaces or whitespace along with weird characters.

Also, the code for regular expressions is MUCH more readable since each match is named. To get the last name, I simply use:

C#:

lastName = match.Groups["LastName"].Value;

Using IndexOf, I had to use:

C#:

    =    2;
   2 = smallData.IndexOf( ,    +1);
lastName = smallData.Substring(   +1,    2 -     - 1);

Without comments, its hard to say whats going on. Which code would you rather look at a year from now?

Having said that, the speed is really dependent on what you need to do. If you need to parse through a 4 gig text file, Id go with the fastest method possible and hard-code as many settings as possible. If youre parsing a string or two, Id go with whatever is easier to maintain as both Regular Expressions and IndexOf are going to be perceptibly the same to the user.

Here are the results after running the project in Debug mode in the IDE on my machine:

Code:

short Data RegEx: 828
large Data RegEx: 2578
short Data Substring: 31
large Data Substring: 94

Press ENTER to close

Keep in mind this is for 100,000 iterations. For 1000 iterations, all 4 tests come in at 0ms on my machine.

-Nerseus

Threads · Apr 29, 2003

Thank you, Nerseus. That was very very helpful. Ill use your example also to compare the regular expression -vs- other method speeds in VB 6 as well.

Since the purpose of this is usually to parse HTML pages that Ive downloaded into the application, the regular expressions may be much more easily maintained. When it is a time critical part, then I may return back to my old methods depending on what the tests show me.

Thanks again for all your help.

Derek Stone · Apr 29, 2003

Let it be noted that the RegEx times will decrease by almost half once the application is running without debug symbols. Obviously the raw string manipulation is still many times faster, however.

Threads · Apr 30, 2003

Using Regular Expressions sure is easier than the other methods I was using before. Im now able to parse the page with ease and I dont have to worry about little things changing on the page like the background color of cells in a table since I can use pattern matching. I dont notice a slowdown once it is compiled (Derek was definitely right about it being at least twice as fast). Anyway, thanks again.

Regular Expressions -vs- Instr, Mid, etc.

Threads

Well-known member

Robby

Moderator

Nerseus

Danner

Attachments

Threads

Well-known member

Derek Stone

Exalted One

Threads

Well-known member

Similar threads