March 17, 2012

TDR: Helpful. Next Stop: Video Card

After the TDR disabling, the crashes still happened, but more slowly.  In what previously was a quick video feed shutoff followed by a buzzing that was the last 1/2 second of audio looping now became a video feed shutoff, audio still playing with some distortion and the keyboard still responding.  I was unable to gain control though, since in about 10 seconds the keyboard became unresponsive and the audio eventually became the same buzzing.

This is at least something in the crashing that changed - and it means... something.  Previous changes were only on the frequency of the crashing, but this is the first time I was able to change the crashing method.  Another interesting change was I was able to get a single Windows Event Log entry:

Log Name: System
Source: nvlddmkm
Event ID: 14
Level: Error
Keywords: Classic
Task Category: None

Binary data:
In Words
0000: 00000000 00300002 00000000 C0AA000E
0008: 00000000 00000000 00000000 00000000
0010: 00000000 00000000

In Bytes
0000: 00 00 00 00 02 00 30 00   ......0.
0008: 00 00 00 00 0E 00 AA C0   ......ªÀ
0010: 00 00 00 00 00 00 00 00   ........
0018: 00 00 00 00 00 00 00 00   ........
0020: 00 00 00 00 00 00 00 00   ........

The description for Event ID 14 from source nvlddmkm cannot be found.
Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information
had to be saved with the event.

The following information was included with the event:

An uncorrectable double bit error (DBE) has been detected on GPU (03 03 03).

What this means to me is that the crash is centered on the video card.  Probably bad memory chips on the video card.  I ran GPU-Z and I got 68% on the ASIC - not clear on what that means, but the general consensus on the Internet is that the higher the percentage, the higher the quality of the video card build.

Shortly after this, with TDR still disabled, I saw the same crash in Grid.

Conclusion #1: Having TDR enabled is good, because it catches most crashes, preventing the system from going down.

Conclusion #2: Having TDR enabled is bad, because it hides these crashes from us and only when a crash is not recoverable (i.e. in Skyrim) do we think there is a problem.

Obviously #2 causes us to run down some crazy rabbit holes, while technically related, is really not something I can do anything about.

I've sent in an RMA request for the video card.  So we'll see what happens next...

No comments:

Post a Comment

Please be helpful and courteous. Thanks!