INTERACT FORUM

Please login or register.

Login with username, password and session length
Advanced search  
Pages: [1]   Go Down

Author Topic: [Solved] Bluescreens after 20-30 mins video playblack  (Read 5987 times)

InflatableMouse

  • MC Beta Team
  • Citizen of the Universe
  • *****
  • Posts: 3978
[Solved] Bluescreens after 20-30 mins video playblack
« on: January 09, 2013, 02:26:02 pm »

Although the issue is resolved I wanted to report it nonetheless, hopefully helping others experiencing similar issues. I've actually seen several reports of bluescreens, one of which was due to an overheating CPU during playback.

After I rebuilt my HTPC I started getting bluescreens, or STOP errors, 0x116 to be specific. These indicate issues with the graphics card or driver, often said to be defective cards. I've read people replacing their cards without properly investigating the issue.

A bluescreen should create a minidump file on the boot volume in the \Windows\Minidump folder. Cases when this doesn't happen include when the option to create minidumps is disabled, or when the pagefile on the boot volume (most often C:\) is less than 300MB. These minidump files can be analysed and give clues about the issue, offending process and lots of other very technical details about the crash, most of which is beyond and above my understanding as well. At osronline.com you can upload a minidump file and it will analyse it for you.

Here's a part of my minidump analysis:
Code: [Select]
Crash Dump Analysis provided by OSR Open Systems Resources, Inc. (http://www.osr.com)
Online Crash Dump Analysis Service
See http://www.osronline.com for more information
Windows 7 Kernel Version 7601 (Service Pack 1) MP (4 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 7601.17944.amd64fre.win7sp1_gdr.120830-0333
Machine Name:
Kernel base = 0xfffff800`02c0d000 PsLoadedModuleList = 0xfffff800`02e51670
Debug session time: Wed Jan  9 09:47:53.047 2013 (UTC - 5:00)
System Uptime: 0 days 18:39:14.875
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: fffffa801037e290, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff880055b6adc, The pointer into responsible device driver module (e.g. owner tag).
Arg3: ffffffffc000009a, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000000000004, Optional internal context dependent data.

Debugging Details:
------------------

TRIAGER: Could not open triage file : e:\dump_analysis\program\triage\modclass.ini, error 2

FAULTING_IP:
nvlddmkm+929adc
fffff880`055b6adc ??              ?

DEFAULT_BUCKET_ID:  GRAPHICS_DRIVER_TDR_FAULT

BUGCHECK_STR:  0x116

PROCESS_NAME:  System

CURRENT_IRQL:  0

STACK_TEXT: 
fffff880`06ddca48 fffff880`04546000 : 00000000`00000116 fffffa80`1037e290 fffff880`055b6adc ffffffff`c000009a : nt!KeBugCheckEx
fffff880`06ddca50 fffff880`04519867 : fffff880`055b6adc fffffa80`0e0c1000 00000000`00000000 ffffffff`c000009a : dxgkrnl!TdrBugcheckOnTimeout+0xec
fffff880`06ddca90 fffff880`04545e0f : fffffa80`ffffd84d fffff800`00b96080 fffffa80`1037e290 00000000`0000000f : dxgkrnl!DXGADAPTER::Reset+0x2a3
fffff880`06ddcb40 fffff880`04437ec1 : fffffa80`0f86d110 00000000`00000080 00000000`00000000 fffffa80`0d36a270 : dxgkrnl!TdrResetFromTimeout+0x23
fffff880`06ddcbc0 fffff800`02f22e5a : 00000000`0247fde8 fffffa80`0d1d9060 fffffa80`0c781740 fffffa80`0d1d9060 : dxgmms1!VidSchiWorkerThread+0x101
fffff880`06ddcc00 fffff800`02c7cd26 : fffff800`02dfee80 fffffa80`0d1d9060 fffff800`02e0ccc0 fffffa80`0cee9da0 : nt!PspSystemThreadStartup+0x5a
fffff880`06ddcc40 00000000`00000000 : fffff880`06ddd000 fffff880`06dd7000 fffff880`06ddc4d0 00000000`00000000 : nt!KiStartSystemThread+0x16


STACK_COMMAND:  .bugcheck ; kb

FOLLOWUP_IP:
nvlddmkm+929adc
fffff880`055b6adc ??              ?

SYMBOL_NAME:  nvlddmkm+929adc

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME:  nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  50de9218

FAILURE_BUCKET_ID:  X64_0x116_IMAGE_nvlddmkm.sys

BUCKET_ID:  X64_0x116_IMAGE_nvlddmkm.sys

Followup: MachineOwner
---------

Explaining the minidump analysis in detail is way beyond the purpose of this post but everyone should be able to gather some important parts from the above codeblock. In this case:
Specific Error code: VIDEO_TDR_FAILURE (116)
Description: Attempt to reset the display driver and recover from timeout failed.
General Error code: DEFAULT_BUCKET_ID:  GRAPHICS_DRIVER_TDR_FAULT
Bugcheck string (or bluescreen code): BUGCHECK_STR:  0x116
Offending driver name: IMAGE_NAME:  nvlddmkm.sys

Every minidump analysis will follow a similar structure and these parts are most often present and can be used to search the internet for more information. I believe almost anyone should be able to go to that website and upload their minidump, provided one has been created, and gather this same information.

In my case my card could be defective, but it came from a working machine and the problem started after reassembling in a new case. I suspected heat could be a problem and searching the internet I found some references of people having overheating issues with this error, so I started logging the GPU and CPU temperatures and fan speeds to a logfile with Speedfan.

Here's the temperature log before a crash. I've truncated it at the dots. The NUL-line is when the file wasn't properly closed due to the bluescreen:
Code: [Select]
Seconds GPU Core 0 Core 1 Core 2 Core 3 CPU Fan
55486 70,0 48,0 38,0 36,0 36,0 805
55489 70,0 49,0 37,0 37,0 35,0 805
55492 70,0 47,0 39,0 36,0 37,0 805
55495 70,0 47,0 36,0 36,0 35,0 805
55498 70,0 48,0 37,0 39,0 35,0 805
55501 70,0 49,0 38,0 36,0 36,0 805
55504 71,0 48,0 36,0 37,0 37,0 804
...
56422 99,0 49,0 39,0 38,0 38,0 800
56425 99,0 50,0 39,0 37,0 37,0 800
56428 100,0 49,0 39,0 38,0 37,0 801
56431 99,0 50,0 39,0 39,0 38,0 801
56434 99,0 50,0 39,0 38,0 38,0 801
56437 100,0 49,0 39,0 39,0 37,0 801
56440 101,0 50,0 39,0 38,0 37,0 804
...
56865 104,0 49,0 39,0 39,0 38,0 802
56868 104,0 49,0 39,0 38,0 37,0 804
NULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNULNUL

104 is WAY TOO hot! I'm lucky it didn't crack itself and is still working.

I initially placed the graphics adapter in the lowest PCIe slot because its farthest away from the CPU cooler (radiating heat as well), but that places it right next to the Xonar Audio card. I decided to move the graphics card to the upper PCIe slot and move the Xonar one slot down. The CPU doesn't get all that hot anyways and has a good fan, temperature controlled. So in the unlikely event the GPU would cause the CPU to get hotter, the fan and sensor will take care of it. More space below the graphics card and what I didn't realize when I assembled it, is that the CPU fan will actually create a small draft of air over the topside of the GPU cooler. Hot air rises and again, the CPU fan will take care of it.

A log after 1 hour playtime:
Code: [Select]
Seconds GPU Core 0 Core 1 Core 2 Core 3 CPU Fan
75967 78,0 50,0 37,0 36,0 37,0 797
75970 79,0 50,0 38,0 37,0 36,0 801
75973 78,0 49,0 38,0 37,0 37,0 801
75976 78,0 49,0 38,0 36,0 36,0 801
75979 78,0 49,0 39,0 37,0 37,0 801
75982 78,0 49,0 39,0 37,0 37,0 800
75985 78,0 50,0 39,0 37,0 37,0 800
75988 78,0 50,0 39,0 37,0 36,0 798
75991 79,0 49,0 37,0 38,0 36,0 798
75994 78,0 49,0 40,0 37,0 36,0 798

This is the end part of the log, you can see the temperature goes up to 79 twice but and goes down to 78 again. It's been doing this for the past 15 minutes or so, so this looks very good. 26 degrees lower than before. I call success!

The PC hasn't crashed anymore and from the looks of the video the GPU survived the overheating issues.
Logged
Pages: [1]   Go Up