Playing games puts a tremendous amount of stress on the display adapter and the PSU in a PC.
Overheating is an all too common problem. Extreme gaming cards are even more prone to have problems.
We cannot overstress the value of regular cleaning of computer components. Modern parts run hot and blocked fans will fail.
If your GPU is over 85°C read this page carefully. Frequently we see forum posts with users reporting their card is 94°C or worse. Such cards will not last long at those temperatures.
We have several disparate video cards. We are accumulating them
GPU COOLERS ARE NOT CREATED EQUAL
Our old GTX 260s have blower type single fan coolers and when they run hot the fan is rather noisy. Our EVGA GTX 660 Ti has a dual fan cooler that is much quieter. The EVGA GTX 260 could reach 90°C with Furmark while the EVGA GTX 660 Ti was much cooler. Our EVGA GTX 260 suffered a thermal interface material failure, likely due to the higher operating temperatures.
Given the empirical observation about the different coolers, we believe the dual fan solution will be the option looking forward.
The GPU itself can run up as warm as 75-80°C when loaded. When idle the GPU can fall to the mid 40°C range. Low-end cards like our HD 5450 only need about 25W of power to operate while gaming cards like our GTX 260s need over 200W each. Extreme cards like out HD 6970 can use 267W.
Cards like out old 8600 GT had a single sensor under the GPU. By comparison our BFG GTX 260 has thermal sensors for the GPU, the VRAM and two on the mainboard.
Thermal interface material is usually designed to tolerate temperatures < 100°C. Our GTX 260s thermal throttle is 105°C at which point the card will throttle back to reduce the heat output. Thermal throttling is handled in the VBIOS making it operating system independent.
Playing games can load a GPU heavily. The amount of heat generated has to be managed. Modern gaming chassis are equipped with several fans to create a strong flow of air.
After disassembling our defunct 8600 GT we realized that excess heat degraded the thermal interface material. The large number of blown capacitors meant the actual temperature probably exceeded 100°C for a significant period of time. The VBIOS also lacked a thermal throttle to protect the card from thermal runaway.
We use the popular MSI Afterburner and we created a custom fan profile to be sure that our GPU fan was running at 100% at 75C or higher to be sure a defective driver never destroys our video card again. We set the fan to 30% below 50C and its a straight line to 100% at 75C. This keeps the noise down when we are not playing games and maxes it when the game is demanding.
“Display driver nvlddmkm stopped responding and has successfully recovered” This is only found with Windows Vista and higher.
Artifacts on the screen are common when the GPU or the VRAM is having problems. Generally the GPU tends not to fail as often as VRAM. When the GPU fails, the screen will go blank. Typically VRAM problems present as snow, anomalies or speckles on the screen. Often only a single VRAM chip is at fault which is what is causing the snow or streaked like effects.
The Zotac 8600 GT, Zotac 8500 GT and Zotac 8400 GT all have the same VRAM problem from cards that we had to replace in the shop. The problem we noticed was that these lower end cards lack the thermal sensors found on more expensive models. Thermal sensors are not expensive and should be universal. This is based on period revisions of the popular GPU-Z.
We have also seen others with more recent cards like the GTX 200 series with the same VRAM problems. Given that VRAM runs hot its best to provide as much cooling as possible.
Sometimes it is possible to fix the problem, other times its not. The first step is to try adjusting the clocks of the GPU and VRAM down somewhat to see if that helps. Other times the thermal grease has failed and then the chances are the card is finished. The NVIDIA driver version 190.45 had a defective fan profile that quickly led to large numbers of dead video cards.
These programs allow a user to adjust GPU and VRAM clock speeds and they are popular with the overclocking crowd. These program also allow users to increase the fan speed or to create a custom fan profile. We use a custom fan profile that is much more aggressive than the factory configuration.
Try reducing the VRAM clock first by 5% and see if that clears up the artifacts. Try another 5% until it clears up. If you cannot clear it up then the VRAM chip has burned out and the video card is trash.
HELL’S KITCHEN – THE OVEN
We have seen rare reports of success with baking the video card in an oven for several minutes. Given the card is already on its way to the trash, what can it hurt?
The idea is to reflow solder that has become damaged from oxidation or stresses.
Set the over to 375F. Make sure all the plastic parts are removed. Place the card on some aluminum foil and then bake it for about 7-8 minutes. The idea is to get the solder to reflow.
The idea is to get the solder to reflow. Some solders evidently seem to be problematic and this approach seems to be able to recover some dead cards. The problem is with the RoHS which banned lead in solder. New solders have experienced tin fingers and other problems.
After the card cools down use some thermal grease on the GPU and RAM chips and reassemble the fan assembly. Then try it out, if it works, congratulations, otherwise its back to the trash can with it.
OUR VIDEO CARDS
We generally use NVIDIA cards however we also use Radeon cards. Both have their respective enthusiasts.
ZOTAC 8600 GT
Our old 8600 GT was destroyed completely by the bad driver problem. The card has thermal sensors but the GPU lacks the ability to adjust the clock speed so the BIOS is powerless to prevent failure.
A bad driver (196.75), caused widespread damage to video cards. Zotac used mediocre thermal interface material and there is no thermal sensor so the card simply overheated until it failed catastrophically. We attempted repairs but capacitors kept failing. Our GTX 260s also were affected by the driver problems but thermal sensors prevented these cards from failing.
Once a capacitor is popped it has to be replaced. The problem is that collateral damage may frustrate any repair. Given our experience, popped capacitors mean the card is garbage.
The capacitors on more expensive video cards are able to tolerate higher temperatures. The best ones are solid core. NVIDIA mentioned their GPU can tolerate 105°C before it will fail.
Given the degraded thermal interface material, we suspect the heatsink and fan assembly might have been inadequate. The 8600 GT is designed to use a maximum of 65W.
EVGA GTX 260 SC
Our EVGA GTX 260 has been a minor nuisance. The card was running far too hot and upon inspection we noted that the thermal grease was compromised. The card now runs much cooler after regreasing the card. We are also now watching the temperatures on both cards so that if the BFG card gets too warm then it can be regreased. The card is simply a provisional solution while we await the upcoming 20nm lineup. Searching with Google, we found many threads over regreasing cards damaged by the 196.75 driver.
Evidently the thermal interface material had degraded. It resembled a gel like material with some fibers which were residual from the original thermal pads. Curiously our BFG GTX 260 card is equally as old and it does not overheat even with Furmark running for extended periods of time.
We simply disassemble the GTX 260 by removed all of the large screws on the back. There are two small screws in the DVI bracket that also have to be removed.
For some reason The fan connector would not release so we carefully opened the card to reveal the GPU and other parts. Using a cotton ball and some isopropyl alcohol is best for cleaning the heat sink and semiconductor surfaces. A small vacuum can clean up dust and debris quickly which makes the repair more professional.
Cleaning up the mess, we then applied a small dab fresh thermal grease to all areas. Its important to not forget the regulator area as they also get warm.
Reassembling the card brought the idle temperature down significantly. Clearly the thermal interface pads used by EVGA are not suitable, they degrade when they should be able to tolerate 105C.
The Arctic MX-4 claims 8 year service life which is more then the usual service life of a video card. In effect MX-4 doubles the service life of a video card which the industry should consider more widely to reduce RMA costs.
The power on off cycle can change the temperature of the thermal interface material considerably. Over time the expansion and contraction of the semiconductor and heatsink will act like a pump. In effect the TIM is squeezed thinner and thinner. This is why TIM is applied so very thinly to deal with very small voids.
Thermal interface material can vary widely in capability. The best can manage 10W/mK thermal conductivity. The Arctic MX-4 we use is around 8.5W/mK. By comparison copper is around 385W/mK. This is why thermal interface material should be extremely thin. In practice most use far too much.
Epoxy thermal interface materials are used with motherboard mounted heatsinks. The better performance is an added feature. Generally there are 8 screws that secure the heatsink and fan assembly as can be seen in the image of the NVIDIA GTX 980 Ti solder side..
The better performing MX-4 reduced the temperatures enough that throttling is no longer a problem. Obviously the GPU needs high-end thermal grease to be able to operate efficiently. Furmark no longer causes the card to overheat.
Using a custom fan profile with higher fan speeds is one way to keep the video card cool. The other is regular cleaning. Canned air can blow air into the video card fan assembly to remove dust that can block fans. Using an air purifier will reduce dust in the gaming room and extend the life of your valuable hardware.
More and more video cards are now being designed with dual fan coolers which reduce the operating temperatures considerably.