Is there a way to configure a Regular Expression to match multiple strings?

Denaes

Well-known member
Joined
Jun 10, 2003
Messages
956
Its complicated to explain, but easy to show.

I created a little utility to move files around from my download directories.

It reads from an XML file the following attributes:

Extension (.avi, .mp3, etc)
Action (Copy, Move, Delete, etc)
Destination (where to put it if moved or copied)

Now this is easy, one record in an XML document per extension. I load the program, it loads this information into a user defined data type and I throw it into a HashTable, using the extension as the key. Then I compare each files extension in the directory. If its ".avi" then it pulls up ".avi" from the hashtable.

Ive been getting random files which are harder now: ".001, .005, .050, .R05, .P29", etc.

IE, they follow patterns, but there are many different instances of them. For this reason I was thinking of holding Regular Expressions instead of just plain strings.

I was thinking of somehow using a string of the Regular Expression as the Key in the HashTable... something like this: (I think this is good)

"[.][0-9][0-9][0-9]"

This should match up to "." and any three numbers... basically .001 to .999

Something similar could be done with "[.R][0-9][0-9]" for RAR archives.

If I can figure out how to do Regular Expressions as the Key and do a comparison of the filename (fullname + extension), I could really get some filtering action going on.

Could anyone be of any help on this?

The best I can think of logically (I dont know the syntax) would be to cycle through the HashTable for each file and compare each of the Keys to the filename. Im sure this would be a fairly large performance hit compared to just trying to get the key and either it doesnt exist or it does and has instructions attached.
 
Wow

Wow....this is a really cool question, and since no one has commented on it yet Ill take a stab at. Perhaps my answers or others that reply will help out the regex community, because I think this is a good application of regexs.

Setup:
* Application needs to decide what to do with a file based on file extension
* Lots of different file extensions being encountered
* Some are fixed and most others fit some pattern, e.g., *.R* for RAR files
* Very cumbersome to try and handle each possible extension individually

My comments:
* Ill assume (and you know what that does) that using patterns will greatly reduce the number of extension entries to be tracked. For example, the pattern \.R[0-9]0-9]$ matches all file extensions .R00 through .R99, reducing the count for that file extension tracking from 100 entries to 1 entry.
* Based on that assumption I believe that you will be able to negate the need for a hash table or use the hash table more judiciously. Perhaps you could store fixed extensions in the hash table. By fixed I mean ".avi", ".txt", etc. You could check the extension against the hash table and if found then youre done. If you dont find the extension in the hash table, then you can start comparing it against a sequential list of regex patterns.
* This is where it gets subjective. The list of regex patterns probably wont be that big, but even if it is you will most likely still want to go through all the entries. Since a given extension has the potential to match more than one extension pattern, you may have to come up with a resolution method. You might want to check the extension against all extension patterns, and if more than one match is found then do some more extensive extension analysis (no pun intended).
* At this point in the discussion there is room for thinking about potentially how many different regex patterns you might handle. If its a lot, then you might want to hash the patterns into a seperate table, based on the first character of the extension. This and other considerations will have to be made upon analysis of the actual application environment currently and/or as it progresses.

Thats my ideas. Im sure that others have suggestions that can help us all think and learn more. :)

Regular Expression Documentation
 
Back
Top