Decreasing throughput & increasing CPU when writing a huge file 20

  • Thread starter Thread starter CrashHunter
  • Start date Start date
C

CrashHunter

Guest
I have a Windows 2003 Server x64 Enterprise Edition with SP2 with 4GB RAM and
an application writing a huge file.
The write throughput is quite good in the beginning (~33MB/s) but it keeps
decreasing, while the CPU keeps increasing. In the beginning, the Kernel CPU
(both in the Task Manager and the Processor\% Privileged Time in performance
monitor) is pretty low, but it keeps increasing. Total CPU is ~ 50%, with
Kernel taking ~8% in the beginning, while later on, the Total CPU reaches
80-90% with Kernel using almost all of that CPU; at that point, the
throughput is very low.
Other numbers: the System Cache (in Task Manager) reaches very soon 3.5 GB
and it stays at that value, but the one that seems to be the problem is the
Paged Pool, which keeps increasing.
In poolmon, I can see that Mmst is the one that keeps increasing and it does
not free the memory unless I stop the write to file. Before starting the
process the Mmst uses 1.8MB, while after 1h:20min it uses 200 MB (at that
point, the Total CPU avarage is 74, with 48% in Privileged mode).
I read some info about the Paged pool (including the KB304101), but most of
them apply to x86 version, which has a limited value for the max pool size
(some 460 MB). On my computer (x64), the size should not be a problem (120GB
is a max value) and I do not get errors but the performance steadily goes
down, even with the Paged Pool under 100 MB !
I do not have anything else running on this computer and the behavior is
reproducible every time. As soon as I stop the process, the System Cache and
Paged Pool memory usage go down, so there is no memory leak.
My application writes data to disk using regular WriteFile API, with an
overlapped structure to write asynchronously. It writes a buffer, processes
the next one and then waits for the previous write to complete before issuing
another write request.
I also tried to use the FILE_FLAG_WRITE_THROUGH flag when opening the file;
the general behavior is similar: increasing Paged Pool, increasing usage of
CPU usage in Kernel mode and decreasing throughput, with some differences,
like the starting throughput is much lower (~6MB/s), and it goes down
slightly slower than the other scenario.
 
RE: Decreasing throughput & increasing CPU when writing a huge file 20

I can reproduce the same behavior even with a simple tool just writing random
data to a file (256 GB). This tool only uses synchronous WriteFile. It takes
a little longer than with the async one, but the behavior follows the same
pattern:
After running it for ~3h, the speed is ~7.5MB/s, the processor time is 77%
out of which 60% is in privileged mode, the total kernel memory is 384MB (351
being paged memory) and the Mmst pool uses 317 MB)

Any suggestion would be greatly appreciated.
 
RE: Decreasing throughput & increasing CPU when writing a huge fil

RE: Decreasing throughput & increasing CPU when writing a huge fil

Another update (the simple app writing random data):
after 6hours, the average speed is ~5.5 MB/s, the processor time is 81.8%
out of which 69% is in privileged mode, the total kernel memory is 510 MB (474
being paged memory) and the Mmst pool uses 443.5 MB).
 
Re: Decreasing throughput & increasing CPU when writing a huge file 20

I'm not really qualified to make assumptions - just guesses. If your HD(s)
and subsystem are relatively modern and properly configured ( it is many
years since I've seen figures as low as 33MB/s!) then I would suspect your
application.

I don't like all this buffer shuffling, you shouldn't have to do that. If
you know the size of the file, in my day you just created a file and filled
it up (the system serves you the buffer it needs) if you don't know the size
you employ some recursive programing (do this - do that - do it all again
untill you're finished). I'm sorry, but it looks like you have been working
hard to make a simple job infinitely more complicated, and succeded. ;0)

But, I really don't feel qualified to pass judgements.

(What is your Hardware?)

What figures do you get if you run a HD benchmark like HDTach or HD Tune?


Tony. . .
 
Re: Decreasing throughput & increasing CPU when writing a huge fil

Re: Decreasing throughput & increasing CPU when writing a huge fil

The hardware is pretty old; the specs are:
- one Dynamic volume stripped over 2 x 900 GB RAID5 disk subsystems
- HBA: QLogic QLA2340 PCI Fibre Channel Adapter
- SAN: Metastore (emulating IBM 3526 0401) with 2 x LSI (INF-01-00)
controllers with a total of 30 x 72GB Seagate 10K SCSI disks

The HDTach results (for read, since it is the trial version):
- Random access: 10.5 ms
- CPU utilization: 1%
- Avg speed: 44.1 MB/s

The HDTune:
- Transfer rate (Min/Max/Avg): 13.7/24.2/20.6
- Burst speed: 51.2
- Access Time: 10.2 ms

The results were similar for each of the 2 disk subsystems

In regards to making a simple job complicated, I started from the existing
implementation of our application and ending using a simple application with
just a loop, generating random data and writing it to the disk (just simple
synchronous WriteFile calls, no extra buffering, or anything else)

I restarted the test writing to the local 250GB WDC-WD2500KS-00MJB0 SATA
disk to eliminate the potential un-optimized SAN configuration and, again,
the behavior follows the same pattern: the write started with ~22.5 GB/s and
~1% CPU in privileged mode; after writing ~50GB, the speed dropped to 20.2
MB/s, 7.9% CPU in privileged mode and paged pool of 150MB (~123MB in the Mmst
pool).

again the tools does something like
while (size less than targeted one) {
generate buffer with random bytes
write buffer to file
}

The only thing I can point to now is the OS; it either needs some fine
tunning or it has a problem...



"Tony Sperling" wrote:

> I'm not really qualified to make assumptions - just guesses. If your HD(s)
> and subsystem are relatively modern and properly configured ( it is many
> years since I've seen figures as low as 33MB/s!) then I would suspect your
> application.
>
> I don't like all this buffer shuffling, you shouldn't have to do that. If
> you know the size of the file, in my day you just created a file and filled
> it up (the system serves you the buffer it needs) if you don't know the size
> you employ some recursive programing (do this - do that - do it all again
> untill you're finished). I'm sorry, but it looks like you have been working
> hard to make a simple job infinitely more complicated, and succeded. ;0)
>
> But, I really don't feel qualified to pass judgements.
>
> (What is your Hardware?)
>
> What figures do you get if you run a HD benchmark like HDTach or HD Tune?
>
>
> Tony. . .
>
>
>
 
Re: Decreasing throughput & increasing CPU when writing a huge fil

Re: Decreasing throughput & increasing CPU when writing a huge fil

I have some more news:
- I changed the PagedPoolSize to 2GB and the PoolUsageMaximum to 5
(therefore to 100MB) hoping to see a difference. Although these changes took
effect (the Mmst pool was kept under 100 MB - about 98MB) the trend was
exactly the same !!! After writing 160 GB in about 3h:20m, the CPU
utilization in privileged mode is over 60 and the speed has been constantly
going down.
The write is still to the local SATA drive (an empty volume), no other
application running (except the monitoring ones)

I am out of ideas here. Can anybody give me some (constructive) suggestions?

"CrashHunter" wrote:

> The hardware is pretty old; the specs are:
> - one Dynamic volume stripped over 2 x 900 GB RAID5 disk subsystems
> - HBA: QLogic QLA2340 PCI Fibre Channel Adapter
> - SAN: Metastore (emulating IBM 3526 0401) with 2 x LSI (INF-01-00)
> controllers with a total of 30 x 72GB Seagate 10K SCSI disks
>
> The HDTach results (for read, since it is the trial version):
> - Random access: 10.5 ms
> - CPU utilization: 1%
> - Avg speed: 44.1 MB/s
>
> The HDTune:
> - Transfer rate (Min/Max/Avg): 13.7/24.2/20.6
> - Burst speed: 51.2
> - Access Time: 10.2 ms
>
> The results were similar for each of the 2 disk subsystems
>
> In regards to making a simple job complicated, I started from the existing
> implementation of our application and ending using a simple application with
> just a loop, generating random data and writing it to the disk (just simple
> synchronous WriteFile calls, no extra buffering, or anything else)
>
> I restarted the test writing to the local 250GB WDC-WD2500KS-00MJB0 SATA
> disk to eliminate the potential un-optimized SAN configuration and, again,
> the behavior follows the same pattern: the write started with ~22.5 GB/s and
> ~1% CPU in privileged mode; after writing ~50GB, the speed dropped to 20.2
> MB/s, 7.9% CPU in privileged mode and paged pool of 150MB (~123MB in the Mmst
> pool).
>
> again the tools does something like
> while (size less than targeted one) {
> generate buffer with random bytes
> write buffer to file
> }
>
> The only thing I can point to now is the OS; it either needs some fine
> tunning or it has a problem...
>
>
 
Re: Decreasing throughput & increasing CPU when writing a huge fil

Re: Decreasing throughput & increasing CPU when writing a huge fil

As I said, I am no good in a server environment, but I would certainly
expect an older HD system to be more of a bottleneck on a faster machine.
The phenomenon of the data throughput slowing down is pretty much standard,
I believe. I've never run a benchmark for the lenght of time you are
employing but 30 - 40% slowdown over a few minutes I would expect on a
standard IDE system. My own current SATA/RAID0 shows a nearly flat curve
over a few minutes time, hovering around a 100 MB/s.

You might consider tweaking your machine's use of resources depending on
wether you are runnign those tests in the foreground or background. I would
make sure I had plenty of swap and I would check the signal cables to those
disks if they are of the same generation. Temperature, might also be an
issue with HD's working hard over long periods.

I believe too, that servers have an option to tweak the system cache far
more than the Pro Editions I'm used to.

In short, what you are seeing may be quite natural - but you may be able to
beat more performance out of it.

( I suggest to pay a visit to the 'Knowledge Base' - go there and search for
"system cache", quite a few hits there. Something might lead you further?)


Tony. . .
 
Back
Top