String Regex match count performance

EDN Admin

Well-known member
Joined
Aug 7, 2010
Messages
12,794
Location
In the Machine
I need to have a method that takes in a string (a genomic sequence in this case) and counts the number of times each character is found.  Pretty straightforward, but because this function is called so many times performance is critical.  I tried using something like this:<br/>        public int baseCount(string inSeq, out int aCount, out int gCount, out int cCount, out int tCount,<br/>                                    out int otherCount)<br/>        {<br/>            string sequence = inSeq.ToLower().Trim();
            Regex matchA = new Regex(@"a");<br/>            Regex matchG = new Regex(@"g");<br/>            Regex matchC = new Regex(@"c");<br/>            Regex matchT = new Regex(@"t");<br/>            aCount = matchA.Matches(sequence).Count;<br/>            gCount = matchG.Matches(sequence).Count;<br/>            cCount = matchC.Matches(sequence).Count;<br/>            tCount = matchT.Matches(sequence).Count;<br/>            int totalBaseCount = aCount + gCount + cCount + tCount;<br/>            otherCount = sequence.Length - totalBaseCount;
            return totalBaseCount + otherCount;<br/>        }<br/><br/>And while it functions correctly, it is -slooow-.  Is the regex stuff just slow and I need to live with it, or maybe some indexed approach and just iterate over the string?<br/><br/>Any thoughts on how to speed this up?

View the full article
 
Back
Top