A
Andrew McLaren
Guest
Re: For the love of God.. Help me!
"Michichael" <Michichael@discussions.microsoft.com> wrote in message
news:542451A7-178A-4CEB-ADAA-37F64699CF27@microsoft.com...
> Anyway, little update. Got another PAGE_FAULT_IN_NONPAGED_AREA from
> dxgkrnl.sys, tried calling Microsoft. The rep effectively repeated a
> mantra
> This time it was trying to play Counterstrike: Source.
> So this isn't just limited to Halo... BSOD's so far today (got on it 3
> hours
> ago...): 6.
I guess you're running latest ForceWare drivers (163.69 as of today). I'm
also assuming you've trawled the knowledgebase at Nvidia, read their online
help files, etc and not found any solutions yet. So we'll go straight to the
heavy-duty troubleshooting. You can take 2 approaches. They're not entirely
mutually exclusive.
Firstly, take the system back to a minimal config baseline that works. From
there, you incrementally add or adjust configuration items, one by one, to
bring the system back in line with its present configuration. After each
adjustment, exercise the system in a way which should reproduce the error
(ie play Halo2, I guess). The point at which you start to see the problem
re-appear, will give you a good clue where the cause of the problem may lie.
Here's what you'd do, in a serious industrial setting. Since it's a home
machine, you might choose to be a bit less disciplined; although each
departure diminishes the fidelity of the exercise (and possibly, turns it
into a waste of time if you get too cavalier). Much patience is required.
- first, back up your user data
- remove as many peripheral devices as you can - printers, cameras, sound
cards, scanners etc. We want CPU, memory, graphics card, and one hard disk;
that's all
- if the machine has been overclocked, take *everything* back to the default
factory settings - CPU, buss, graphic processor, the lot.
- re-install Vista from scratch, from original media, reformatting the hard
disk and avoiding any third party drivers during the installation process
(only use the Microsoft-supplied drivers).
- you now have a very plain, vanilla installation of Vista. Performance
might be less than what you'd like; but our goal here is stability, not
performance! (not yet, anyway).
- install Halo2, as the tool with which to exercise the system.
- reproduce the problem scenario; eg, play Halo2 for >30 minutes and verify
that it does not crash (this is the painful part: you need to play games for
at least 30 minutes![Smile :-) :-)](https://cdn.jsdelivr.net/joypixels/assets/8.0/png/unicode/64/1f642.png)
If it still crashes in this very vanilla environment, then you have a
fundamental hardware problem with your machine. It needs to be examined by a
skilled computer technician. I mean someone with a certificate in
electronics engineering, or similar, who can use an oscilliscope, logic
probes etc - not just a PC enthusiast who reads Maximum PC (fine publication
that it is).
Assuming that Halo2 does run okay, start changing your config back to how it
was. It is very important you only change one thing at a time, and then test
after each change. For example:
- confirm the new, clean install of Vista runs okay.
- then run Windows Update to patch your machine to the current revision
level. Test again
- run DXDiag and export a report of your settings ("Save all Information"),
for future reference and comparison.
- next, install the current Nvidia-supplied Forceware drivers. Test again.
- next, re-attach your peripheral devices, one by one. Exercise the system
in between each, to verify the system continues to run normally. You might
need to spread this over a few days.
- install any additional vendor-supplied drivers for your various devices.
Test system again.
- install your normal user applications eg Office, Photoshop, etc. Avoid
installing any apps which install kernel-mode drivers; we want to stick to
user mode stuff, for now.
- exercise the system. This is a fairly good baseline: a plain installation
of Vista, Nvidia-supplied graphics drivers and general user apps. Hopefully
the system is still stable, at this point.
- now install any apps which include kernel mode drivers. Test the system
again.
- assuming you want to return to an overclocked configuration, you can start
overclocking again, now. But, don't leap straight to the maximum overclock -
just ramp up the CPU a little bit, and then test. Then increase a little bit
more, and test the system. Then change your memory timing settings, if
that's what you wish ... but again, don't go straight to an aggressive
setting, just moderate - and test the system again.
At some point, the system may start to fail. Observe the last change you
made to the system. If possible, roll back that change (eg uninstall driver,
decrease OC setting, etc) and check that the system returns to stability. Be
aware that not all changes are "idempotent" - in other words, they might be
one-way: even uninstalling the change won't return the system back to a
working state. If that's what you encounter, you may need to repeat the
whole loop, stopping at the point just short of where problems appeared last
time round.
This approach is empirical, and draws on the traditions of root cause
analysis (in the precise engineering sense, not the loose vernacular sense
of "root cause").
The second approach is to be analytically diagnostic: get a memory dump of
the crash, and analyse it.
For this, you need to install the Windows Debugging Tools. You can download
these from here:
http://www.microsoft.com/whdc/devtools/debugging/default.mspx
You'll also need a a symbol path, so WinDBG can find the debug symbols from
Microsoft's public symbol server. In Contol Panel, System, Advanced System
Settings, define an environment varible called "_NT_SYMBOL_PATH" (with an
initial underscore). Assign it the value of
"srv*C:\Symbols*http://msdl.microsoft.com/download/symbols". This will tell
debuggers on your system to download symbols from
http://msdl.microsoft.com/download/symbols, and store them in a directory on
your hard disk called "C:\Symbols". If you do a SET command at the prompt,
you should see this in the output:
_NT_SYMBOL_PATH=srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
For more background on confuring the Debug Tools, see:
http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx
If you're lucky, the system will still have the mini-dumps from your
previous crashes. These are stored in a location like
"C:\Users\<username>\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp", where
the "90FE" part of the path will vary, for each different crash. Note that
AppData is normally a hidden directory.
If you have no minidump.mdmp files currently on your system, go to Control
Panel, System, Advanced System Settings, Startup and Recovery, Settings, and
configure a specific location for your memory dumps in the "System Failure"
box (eg C:\Dumps, or similar).
A full debug of a memory dump is a complex task, which requires extensive
specialised knowledge. Fortunately, some of this knowledge has been
automated in WinDBG's "analyze" command.
- Run WinDBG from the Windows Debugging Tools in Start menu;
- go to File menu, Open Crash Dump, to open one of the minidump.mdmp or
memory.mdp files on your machine.
- when the dump file is opened, WinDBG will display a message similar to the
following (you'll have a different exception code):
Microsoft (R) Windows Debugger Version 6.6.0007.5
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File
[C:\Users\Someguy\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp]
User Mini Dump File: Only registers, stack and portions of memory are
available
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Sun Aug 5 15:55:28.000 2007 (GMT+10)
System Uptime: 0 days 18:45:04.965
Process Uptime: 0 days 0:00:07.000
Symbol search path is:
srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
.............
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a84.1014): Access violation - code c0000005 (first/second chance not
available)
00000000`002e4a30 c3 ret
Now, at the command line at the bottom of the WinDBG window, enter the
command "!analyze -v". That's an exclamation mark, followed by the word
"analyze" spelt in the American fashion with a "z", then a space, and a
dash, and a lower-case v.
WinDBG will chugg away for a minute or two - you will also see some network
activity, as it downloads the debug symbols from the symbol server. It will
then display a diagnostic report, making a reasonable guess at the faulty
module. To give a big headstart to any troubleshooting, include this report
in any problem reports to Nvidia, Microsoft, newsgroup forums etc.
Dxgkrnl.sys, which is referred to in several of your crashes, is a miniport
driver. That's to say, the DirectX graphics have 2 main components - a bunch
of functionality which is common to all drivers from all vendors, and so is
written one time for everyone by Microsoft (that's dxgkrnl.sys); and a
vendor-supplied driver, which contains the functionality specific to each
vendor's hardware (for Nvidia, this will be nvlddmkm.sys). Like all miniport
drivers, there is an unusually close symbiosis of the Microsoft-supplied and
vendor-supplied drivers - so a crash in one, can easily be caused by a
problem in the other. The fact you're seeing crashes in dxgkrnl.sys is
interesting, but ... this is some of the most heavily exercised code out
there. Every Vista machine is hammering this driver all day, every day.
There could certainly be many as-yet undiscovered bugs in this driver! But
if the cuase of the crashes is a bug in dxgkrnl.sys, there would need to be
some fairly unusual condition on your machine which is exposing the bug,
when it does seem to occur with anything like the same frquency on most
other machines. Isolating that unusual condition may also provide you with a
workaround solution, even if there isn't a hotfix (yet) for the bug.
PAGE_FAULT_IN_NONPAGED_AREA can often be caused by faulty hardware. But
since you're seeing a combination of STOP 0x50 and STOP 0x3B, I think it's
more likely to be a buggy driver, passing bad data as it makes the
transition from User Mode to Kernel Mode (graphics drivers especially prone
to this). That's the "System Service" referred to in the STOP message (ie,
not a "service" as in a process controlled by the Windows Service Manager,
but a "service call" by the operating system, to request a kernel function).
Obviously there's a lot of work here .. but if there's no ready-to-go answer
to your problem, this is the way I'd tackle it. Other folks may have
additional ideas.
Good luck,
--
Andrew McLaren
amclar (at) optusnet dot com dot au
"Michichael" <Michichael@discussions.microsoft.com> wrote in message
news:542451A7-178A-4CEB-ADAA-37F64699CF27@microsoft.com...
> Anyway, little update. Got another PAGE_FAULT_IN_NONPAGED_AREA from
> dxgkrnl.sys, tried calling Microsoft. The rep effectively repeated a
> mantra
> This time it was trying to play Counterstrike: Source.
> So this isn't just limited to Halo... BSOD's so far today (got on it 3
> hours
> ago...): 6.
I guess you're running latest ForceWare drivers (163.69 as of today). I'm
also assuming you've trawled the knowledgebase at Nvidia, read their online
help files, etc and not found any solutions yet. So we'll go straight to the
heavy-duty troubleshooting. You can take 2 approaches. They're not entirely
mutually exclusive.
Firstly, take the system back to a minimal config baseline that works. From
there, you incrementally add or adjust configuration items, one by one, to
bring the system back in line with its present configuration. After each
adjustment, exercise the system in a way which should reproduce the error
(ie play Halo2, I guess). The point at which you start to see the problem
re-appear, will give you a good clue where the cause of the problem may lie.
Here's what you'd do, in a serious industrial setting. Since it's a home
machine, you might choose to be a bit less disciplined; although each
departure diminishes the fidelity of the exercise (and possibly, turns it
into a waste of time if you get too cavalier). Much patience is required.
- first, back up your user data
- remove as many peripheral devices as you can - printers, cameras, sound
cards, scanners etc. We want CPU, memory, graphics card, and one hard disk;
that's all
- if the machine has been overclocked, take *everything* back to the default
factory settings - CPU, buss, graphic processor, the lot.
- re-install Vista from scratch, from original media, reformatting the hard
disk and avoiding any third party drivers during the installation process
(only use the Microsoft-supplied drivers).
- you now have a very plain, vanilla installation of Vista. Performance
might be less than what you'd like; but our goal here is stability, not
performance! (not yet, anyway).
- install Halo2, as the tool with which to exercise the system.
- reproduce the problem scenario; eg, play Halo2 for >30 minutes and verify
that it does not crash (this is the painful part: you need to play games for
at least 30 minutes
![Smile :-) :-)](https://cdn.jsdelivr.net/joypixels/assets/8.0/png/unicode/64/1f642.png)
If it still crashes in this very vanilla environment, then you have a
fundamental hardware problem with your machine. It needs to be examined by a
skilled computer technician. I mean someone with a certificate in
electronics engineering, or similar, who can use an oscilliscope, logic
probes etc - not just a PC enthusiast who reads Maximum PC (fine publication
that it is).
Assuming that Halo2 does run okay, start changing your config back to how it
was. It is very important you only change one thing at a time, and then test
after each change. For example:
- confirm the new, clean install of Vista runs okay.
- then run Windows Update to patch your machine to the current revision
level. Test again
- run DXDiag and export a report of your settings ("Save all Information"),
for future reference and comparison.
- next, install the current Nvidia-supplied Forceware drivers. Test again.
- next, re-attach your peripheral devices, one by one. Exercise the system
in between each, to verify the system continues to run normally. You might
need to spread this over a few days.
- install any additional vendor-supplied drivers for your various devices.
Test system again.
- install your normal user applications eg Office, Photoshop, etc. Avoid
installing any apps which install kernel-mode drivers; we want to stick to
user mode stuff, for now.
- exercise the system. This is a fairly good baseline: a plain installation
of Vista, Nvidia-supplied graphics drivers and general user apps. Hopefully
the system is still stable, at this point.
- now install any apps which include kernel mode drivers. Test the system
again.
- assuming you want to return to an overclocked configuration, you can start
overclocking again, now. But, don't leap straight to the maximum overclock -
just ramp up the CPU a little bit, and then test. Then increase a little bit
more, and test the system. Then change your memory timing settings, if
that's what you wish ... but again, don't go straight to an aggressive
setting, just moderate - and test the system again.
At some point, the system may start to fail. Observe the last change you
made to the system. If possible, roll back that change (eg uninstall driver,
decrease OC setting, etc) and check that the system returns to stability. Be
aware that not all changes are "idempotent" - in other words, they might be
one-way: even uninstalling the change won't return the system back to a
working state. If that's what you encounter, you may need to repeat the
whole loop, stopping at the point just short of where problems appeared last
time round.
This approach is empirical, and draws on the traditions of root cause
analysis (in the precise engineering sense, not the loose vernacular sense
of "root cause").
The second approach is to be analytically diagnostic: get a memory dump of
the crash, and analyse it.
For this, you need to install the Windows Debugging Tools. You can download
these from here:
http://www.microsoft.com/whdc/devtools/debugging/default.mspx
You'll also need a a symbol path, so WinDBG can find the debug symbols from
Microsoft's public symbol server. In Contol Panel, System, Advanced System
Settings, define an environment varible called "_NT_SYMBOL_PATH" (with an
initial underscore). Assign it the value of
"srv*C:\Symbols*http://msdl.microsoft.com/download/symbols". This will tell
debuggers on your system to download symbols from
http://msdl.microsoft.com/download/symbols, and store them in a directory on
your hard disk called "C:\Symbols". If you do a SET command at the prompt,
you should see this in the output:
_NT_SYMBOL_PATH=srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
For more background on confuring the Debug Tools, see:
http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx
If you're lucky, the system will still have the mini-dumps from your
previous crashes. These are stored in a location like
"C:\Users\<username>\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp", where
the "90FE" part of the path will vary, for each different crash. Note that
AppData is normally a hidden directory.
If you have no minidump.mdmp files currently on your system, go to Control
Panel, System, Advanced System Settings, Startup and Recovery, Settings, and
configure a specific location for your memory dumps in the "System Failure"
box (eg C:\Dumps, or similar).
A full debug of a memory dump is a complex task, which requires extensive
specialised knowledge. Fortunately, some of this knowledge has been
automated in WinDBG's "analyze" command.
- Run WinDBG from the Windows Debugging Tools in Start menu;
- go to File menu, Open Crash Dump, to open one of the minidump.mdmp or
memory.mdp files on your machine.
- when the dump file is opened, WinDBG will display a message similar to the
following (you'll have a different exception code):
Microsoft (R) Windows Debugger Version 6.6.0007.5
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File
[C:\Users\Someguy\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp]
User Mini Dump File: Only registers, stack and portions of memory are
available
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Sun Aug 5 15:55:28.000 2007 (GMT+10)
System Uptime: 0 days 18:45:04.965
Process Uptime: 0 days 0:00:07.000
Symbol search path is:
srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
.............
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a84.1014): Access violation - code c0000005 (first/second chance not
available)
00000000`002e4a30 c3 ret
Now, at the command line at the bottom of the WinDBG window, enter the
command "!analyze -v". That's an exclamation mark, followed by the word
"analyze" spelt in the American fashion with a "z", then a space, and a
dash, and a lower-case v.
WinDBG will chugg away for a minute or two - you will also see some network
activity, as it downloads the debug symbols from the symbol server. It will
then display a diagnostic report, making a reasonable guess at the faulty
module. To give a big headstart to any troubleshooting, include this report
in any problem reports to Nvidia, Microsoft, newsgroup forums etc.
Dxgkrnl.sys, which is referred to in several of your crashes, is a miniport
driver. That's to say, the DirectX graphics have 2 main components - a bunch
of functionality which is common to all drivers from all vendors, and so is
written one time for everyone by Microsoft (that's dxgkrnl.sys); and a
vendor-supplied driver, which contains the functionality specific to each
vendor's hardware (for Nvidia, this will be nvlddmkm.sys). Like all miniport
drivers, there is an unusually close symbiosis of the Microsoft-supplied and
vendor-supplied drivers - so a crash in one, can easily be caused by a
problem in the other. The fact you're seeing crashes in dxgkrnl.sys is
interesting, but ... this is some of the most heavily exercised code out
there. Every Vista machine is hammering this driver all day, every day.
There could certainly be many as-yet undiscovered bugs in this driver! But
if the cuase of the crashes is a bug in dxgkrnl.sys, there would need to be
some fairly unusual condition on your machine which is exposing the bug,
when it does seem to occur with anything like the same frquency on most
other machines. Isolating that unusual condition may also provide you with a
workaround solution, even if there isn't a hotfix (yet) for the bug.
PAGE_FAULT_IN_NONPAGED_AREA can often be caused by faulty hardware. But
since you're seeing a combination of STOP 0x50 and STOP 0x3B, I think it's
more likely to be a buggy driver, passing bad data as it makes the
transition from User Mode to Kernel Mode (graphics drivers especially prone
to this). That's the "System Service" referred to in the STOP message (ie,
not a "service" as in a process controlled by the Windows Service Manager,
but a "service call" by the operating system, to request a kernel function).
Obviously there's a lot of work here .. but if there's no ready-to-go answer
to your problem, this is the way I'd tackle it. Other folks may have
additional ideas.
Good luck,
--
Andrew McLaren
amclar (at) optusnet dot com dot au