EDN Admin
Well-known member
I am using fast RAID solid-state drives to store certain types of program critical data. I am using multi-core CPUs that support eight dedicated HW threads. Each server has 16GB expandable to 32GB.
I have one large data set (~30M records that includes a unique integer ID and string of 20 characters max.) that needs to reside in memory due to performance requirements. Any part of the large data set must be accessible at any time. Incremental loading
is NOT an option. Most of the time, the access will be read-only, but writes will also occur at a lower frequency. When a write occurs, the changes must be reflected in a write to the disk
<span style="text-decoration:underline prior to acknowledging the write to the calling process. The data set must be read accessible by up to eight threads. A Write can lock the data for a short time period (100-200 us, not milliseconds).
I also have several smaller data sets that must be cached in memory, but these files are read-only in normal use. The smaller files are regenerated periodically, but the frequency is quite low. Read access will be locked when the in-memory files are updated.
The Concurrent Dictionary looks like it would provide the kind of read/write access that I need, but I am not sure how feasible it is to create a memory-mapped filed for a Concurrent Dictionary. I can set aside up to 8GB of memory for the large data set,
which looks reasonable considering the overhead for the dictionary and the actual data. I can also create the initial maximum size for the memory-mapped file so that it will not have to resize itself. I can implement a lock for any writes to the dictionary
so that it will allow a clean snapshot to be saved to disk.
Looking at memory mapped files, only a changed segment would actually be written to the disk. Since I use SSD for the purpose, it seems like this would be a very fast efficient approach. If I understand the mapping, each segment is 4K. Since only one segment
would change per write, the loop should not take much more that 100-200us.
First, are there reasons why the above approach would not be possible using a Concurrent Dictionary? If there are serious issues, what is a more workable alternative?
Second, does anyone have practical experience with doing something like this? I have not implemented this approach yet, and could definitely benefit from the experience of others who have already trodden this path.
Third, does anyone have any pointers, links, or suggestions that I should consider?
<span style="line-height:115%; font-family:Calibri,sans-serif; font-size:11pt Thanks…<hr class="sig Warren
View the full article
I have one large data set (~30M records that includes a unique integer ID and string of 20 characters max.) that needs to reside in memory due to performance requirements. Any part of the large data set must be accessible at any time. Incremental loading
is NOT an option. Most of the time, the access will be read-only, but writes will also occur at a lower frequency. When a write occurs, the changes must be reflected in a write to the disk
<span style="text-decoration:underline prior to acknowledging the write to the calling process. The data set must be read accessible by up to eight threads. A Write can lock the data for a short time period (100-200 us, not milliseconds).
I also have several smaller data sets that must be cached in memory, but these files are read-only in normal use. The smaller files are regenerated periodically, but the frequency is quite low. Read access will be locked when the in-memory files are updated.
The Concurrent Dictionary looks like it would provide the kind of read/write access that I need, but I am not sure how feasible it is to create a memory-mapped filed for a Concurrent Dictionary. I can set aside up to 8GB of memory for the large data set,
which looks reasonable considering the overhead for the dictionary and the actual data. I can also create the initial maximum size for the memory-mapped file so that it will not have to resize itself. I can implement a lock for any writes to the dictionary
so that it will allow a clean snapshot to be saved to disk.
Looking at memory mapped files, only a changed segment would actually be written to the disk. Since I use SSD for the purpose, it seems like this would be a very fast efficient approach. If I understand the mapping, each segment is 4K. Since only one segment
would change per write, the loop should not take much more that 100-200us.
First, are there reasons why the above approach would not be possible using a Concurrent Dictionary? If there are serious issues, what is a more workable alternative?
Second, does anyone have practical experience with doing something like this? I have not implemented this approach yet, and could definitely benefit from the experience of others who have already trodden this path.
Third, does anyone have any pointers, links, or suggestions that I should consider?
<span style="line-height:115%; font-family:Calibri,sans-serif; font-size:11pt Thanks…<hr class="sig Warren
View the full article