EDN Admin
Well-known member
Hey everyone:
I did some profiling on code we have, and found that 26% of the time is spent here:
<pre class="prettyprint" style=" if (hs2[x].Contains(arr[j]))[/code]
hs2 is a hashset array which contains strings of 2-digit numbers separated by commas.
EDIT: it is defined as:
<pre class="prettyprint HashSet<string>[] hs2[/code]
Eg: hs2[0] might contain the values "01", "23", "34", "35", "38", "44", "45"
arr[j] is another string of numbers that has been split. Each "j" is the 2-digit string number. So it could look like:
arr[1][0] = "02"
arr[1][1] = "21"
arr[1][2] = "26"
The numbers are always guaranteed to be in ascending order in both data structures. And the combinations are always guaranteed to be in ascending order as well.
Eg, "01,02,03,04,05,06"
"01,03,06,07,08,09"
"01,03,07,08,10,12"
etc etc.
The strings in hs2 will be of equal or greater length than the strings in arr[j]. For example, hs2 might contain 7 2-digit combos whereas arr[j] will be made up of 6-digit combos.
What we need to do is count how many numbers from every arr[j] are in _each_ hs2[x].
Originally our code used String.Contains but after doing performance benchmarking found that
Hashset.Contains is *significantly* faster .
There could potentially be up to 3 million that have to be compared, so speed is of the essence.
Does anyone have any thoughts they could share on a faster way to accomplish this? Or if theres a faster way to do a string comparison than the hashset.contains?
We have the 2 digit numbers as strings because if the number is less than 10 we need it represented as a 2 digit numeral (unless theres another way to do it without it being a string?).
Thanks!
<br/>
View the full article
I did some profiling on code we have, and found that 26% of the time is spent here:
<pre class="prettyprint" style=" if (hs2[x].Contains(arr[j]))[/code]
hs2 is a hashset array which contains strings of 2-digit numbers separated by commas.
EDIT: it is defined as:
<pre class="prettyprint HashSet<string>[] hs2[/code]
Eg: hs2[0] might contain the values "01", "23", "34", "35", "38", "44", "45"
arr[j] is another string of numbers that has been split. Each "j" is the 2-digit string number. So it could look like:
arr[1][0] = "02"
arr[1][1] = "21"
arr[1][2] = "26"
The numbers are always guaranteed to be in ascending order in both data structures. And the combinations are always guaranteed to be in ascending order as well.
Eg, "01,02,03,04,05,06"
"01,03,06,07,08,09"
"01,03,07,08,10,12"
etc etc.
The strings in hs2 will be of equal or greater length than the strings in arr[j]. For example, hs2 might contain 7 2-digit combos whereas arr[j] will be made up of 6-digit combos.
What we need to do is count how many numbers from every arr[j] are in _each_ hs2[x].
Originally our code used String.Contains but after doing performance benchmarking found that
Hashset.Contains is *significantly* faster .
There could potentially be up to 3 million that have to be compared, so speed is of the essence.
Does anyone have any thoughts they could share on a faster way to accomplish this? Or if theres a faster way to do a string comparison than the hashset.contains?
We have the 2 digit numbers as strings because if the number is less than 10 we need it represented as a 2 digit numeral (unless theres another way to do it without it being a string?).
Thanks!
<br/>
View the full article