Showing posts with label reboot. Show all posts
Showing posts with label reboot. Show all posts

February 21, 2012

Status of the Apparatus

Skyrim being the appartus.

As of right now, Skyrim seems to be stable on my system.  Actually, not to put too fine a point on it, Skyrim is working with my current save file.  After all the testing I've done in the past several months, I've determined that my hardware is not the issue.  The problem that causes Skyrim to crash my system seems to be related to the game itself.  In interviews published last week, Bethesda's Skyrim director Todd Howard gave some insight to some of the issues that Skyrim developers have been aware of for some time now.  Specifically, it is NOT related to large save files:

"That’s the common misconception. It’s literally the things you’ve done in what order and what’s running. Some of the things are literally what spells do you have hot-keyed? Because, as you switch to them, they handle memory differently."

"We tried doing it through e-mail. We need to open the saved game comes up and look at it. We’ve got one guy who has seven dragons on the other side of the world, and a siege about to happen in this city and another 20 quests running. And, ok, this is what the game is trying to do and it’s having a hard time running that."

Not All Crashes Are Created Equal

After countless forum/website/blog reading, there seems to be several types of crashes.  Listed below are the ones I've categorized in my head as well as the theory I have and the suggested actions.  At the bottom of this post are some links to tools and some information on using them.  Note the information below (except the Skyrim specific ones) can be applied to any game or program that is giving you heat (sorry for the pun).

Crash to Desktop without Error Message - this happens while playing Skyrim and the TESV.EXE application just exists and dumps you to the desktop.
Theory: This is likely due to some kind of corruption with the game. 
What to do:
  • Stop overclocking CPU, RAM and Video Card (just to be able to test with a default setup)
  • Run RAM stress test to look for memory errors.  Replace defective RAM modules if necessary.
  • Run Video Card stress test to look for artifacts, crashing, etc. 
  • Upgrade Video Card drivers to latest STABLE NON-BETA release.
  • Replace defective video card if necessary.
  • Check for overheating

Crash to Desktop with Error Message - the error message is related to the video driver crashing and recovering
Theory: Something caused the video card driver to crash
What to do:
  • Stop overclocking CPU, RAM and Video Card (just to be able to test with a default setup)
  • Run Video Card stress test to test stability of video card and drivers
  • Upgrade Video Card drivers to latest STABLE NON-BETA release.
  • Check for overheating

Crash to Blue Screen - the standard Windows OS crash with a blue screen background with lots of data on the screen.  You all know this one, I'm sure.
Theory: Normally, a driver crashed, but the cause could be a bad driver or hardware the driver was talking to.
What to do:
  • Stop overclocking CPU, RAM and Video Card (just to be able to test with a default setup)
  • Identify the driver by looking at the Bluescreen or the dump files and test the driver and hardware associated with driver.
  • Definitely get help on this one.
  • Also, check to ensure that your Power Supply can keep up with your system.
  • Check for overheating

Crash to Black Screen - this is what I've seen - it's a crash where the monitor loses signal and the sound loops and sometimes reboots your whole system.  Sometimes you have to power off manually.
Theory: Something in the game engine causes the video card hardware and/or RAM to access a part of memory that does not exist (computers don't like that). 
What to do:
  • Stop overclocking CPU, RAM and Video Card (just to be able to test with a default setup)
  • First, rule out the hardware:
  • Run RAM Stress Test
  • Run Video Stress Test
  • Run CPU Stress Test
  • Run the above overnight
  • Check that your power supply can keep up with your system
  • Check for overheating
  • Fix any issues identified
If all the above are good, then do the following:
  • Turn off the system and let it sit for a minute.  Also, to be sure, try removing the power cable for a few minutes to ensure that any artifacts are removed from memory.
  • Power on the system.
  • Start Skyrim.
  • Turn off ALL Auto Saving.
  • Reload your last saved game.
  • Clean House!  This means remove any favorites that you don't use regularly, unbind unneeded hotkeys, get rid of ingredients and potions and anything else you are carrying around that you don't need (put them in a chest in your house or just drop them somewhere).
  • Create a NEW save game.
  • Exit Skyrim.
  • Start Skyrim.
  • Load that new save game.
Generally, if you suspect it's the last item above (i.e. a save file that is causing the crash), you'll want to keep that save file (make a copy) and then use that repeatedly.  For example, load it, make it crash so you know exactly what to do to make it crash and then restart, load it, make a change and see if it crashes at the same place.  If you get beyond it then do it again, don't do what you just did and see if it still crashes.  Then make the same change and see if it doesn't crash again.  In other words, make sure it's repeatable and demonstrable so you can pinpoint the change that makes it not crash.

Tools
 
Video card drivers - just go to your manufacturer's website and find the appropriate driver.  Be sure to do a "clean install" if possible - Nvidia driver installation has this option during the advanced install.

Hardware Testers:
Prime95 - Let this run at the highest settings overnight.  Be sure to test all of your RAM.  Any RAM issues should surface.
Linx - Another RAM tester.  Same as above - let it run against all your RAM overnight.
OCCT - http://www.ocbase.com/perestroika_en/index.php?Download - this tests multiple aspects of your system. Use it to test your GPU RAM and your CPU.  Again, let it run for a long time to encourage any hardware issues to surface.
Kombustor - this will test your GPU in DX9, DX10 and DX11 modes (test each one to be sure) - let this run overnight and any errors should surface.
Heaven - this one tests your game rendering.  Let this run overnight.  You can also choose DX9, DX10 and DX11 tests.  I suggest running them all, one after the other.

If you do the above tests, AT THE VERY LEAST you will feel good about your hardware being solid.  I've run the first four items AT THE SAME TIME for hours and hours and no issues surfaced for me and yet I experienced Black Screen crashes.

Monitoring for Overheating
I strongly recommend using HWiNFO (32 or 64 depending on your OS).  Start this and set it to log your stats to a file.   If you do crash, you can go back and check many, many things, like your CPU utilization, CPU temperature, System Temperature, GPU Temperature, GPU utilization, RAM usage, GPU RAM usage, etc, etc.  It's a very handy tool.  Feel free to send me your log files if you need help figuring out what it means.

Remember, this is an awesome game and there is a lot of mis-information out there.  When troubleshooting, you always want to get to the root cause of the issue.  Software developers will always blame your hardware first and hardware manufacturers will always blame the software.  Forget them - get to the truth yourself, learn something in the process and hopefully at the end you can enjoy your game.


February 15, 2012

Skyrim Update

Interesting thing happened when playing last night - at one point, just outside of Whiterun, I ran into an old Orc who wanted to "die a good death" and after I started to grant his which, the screen froze for 1/2 second, the sound started to stutter and it seemed like the start of a black screen/reboot crash, but then it recovered and no issue for the next four hours.

So what happened?  I think the game started to change all kinds of variables based on that decision to engage the Orc and that caused the stutter as things were loaded and changed.

I think if *THAT* situation happens AND there is a auto-save at the same time, things go downhill really fast and the whole system crashes.  Since I disabled auto-saving, then it didn't crash.  Sound reasonable?

I shouldn't say nothing else happened the rest of the session - after a period of time, I noticed a slight "heaviness" to the whole game.  Seemed a little jerky vs the normal super smooth.  It's barely noticeable.  But I saved the game, exited to desktop and restarted Skyrim and loaded the game and it was back to smoothness (this only took about 20 seconds for me).  I think this is just the engine having too much garbage (memory leak anyone?) in memory and reloading the program cleared it out.  Next time this happens, I will try just saving to a new save file and then loading it to see if the heaviness is lifted.

Anyway, that's my latest update.  And here's a puppy:


February 05, 2012

1.4 Patch doesn't fix Resetting/Black Screen/Rebooting

Tested today and found that the resetting/black screen/rebooting still occurs with the 1.4 patch.  Next test item - completely removing the Creative SB X-Fi audio card.  Here's the sequence of events:

  1. Encounter Reset/Reboot bug.
  2. Restart PC
  3. Do exact same thing (i.e. load the saved game), observe same result
  4. Restart PC
  5. Do exact same thing, observe same result
  6. Change one variable (in this case, removing the audio card)
  7. Do exact same thing.
    1. If result is the same, return the change variable to previous value
    2. If result has changed, note the change and continue testing until bug is encountered.
  8. If the change lasts a week of playing (i.e. over 10 hours), then it's considered to be a success.

As of tonight, the current setup:
  • Speedstep enabled
  • Core Parking disabled
  • Memory set to 1333MHz
  • Turbo mode enabled
  • Hyperthreading enabled
  • Defaults in MB settings and GPU settings
  • Creative audio card removed
  • Using MB audio card
So far so good, it's at least running.  Testing continues on step 7-2...


January 20, 2012

New Fangled Tech

Naturally, I'm researching Core Parking.  It's been around for a number of years, but during those same years, I've essentially "dropped out" of keeping up with technology.  My previous rig was a Dell with a dual-core (I forget which) and it worked fine for minor gaming.  It kept on working with newer games, so I never dug into the "new-fangled" stuff.  That system lasted me over five years.  Also, since the prices for components and custom machines have dropped significantly, actually building my own rig (or building for others) had become unreasonable: it's cheaper and easier to just order something that has what you need.  "Back in my day" you needed to build your own system if you wanted cutting edge technology without paying an arm and a leg (*ahem*Alienware*ahem*).  But with a new "custom built" rig that someone else (CyberpowerPC) built for me, I'm back waist-deep in researching the new-fangled technologies.

Not much has changed.  Well, a lot has changed tech wise, but the community seems to have stayed the same: just a few "serious" sites (Anandtech, Tom's Hardware, Overclockers.net), riddled with different groups of increasing population:

Very Few: The hard core geeks who can differentiate between different gate technology materials (I just made that up), drills holes into their CPUs and even solder on wires to get that last .5 Frame Per Second.  These guys will tear you a new one (by answering in detail or by dismissively giving a correct but unexpected answer) when you ask which processor is faster.

Minority: Somewhat serious folks who like to tinker, but are not drilling holes in CPUs (although when pushed, they will and then go into a depression).  They are up to speed with the latest marketing and specs and can generally work through tough issues, but also when to stop and accept a non-optimal solution.

Majority: Those who just want things to work and wander into various communities looking for help.  They eagerly want to solve their problem, but sometimes throw everyone off by asking how to edit the registry.

Of course, there are the usual nut-jobs as well and the angry ones who will rant and rave for pages and pages about how much they don't care about something.  These people tend to campaign for boycotts and demand something unreasonable, like a game company to call in programmers to fix their specific issue.

I like to think I fall in the minority because I know enough, but not enough

As I started with, I've been reading about Core Parking quite a bit and strongly suspect that this is the problem I (and others) might be having with Skyrim rebooting at random.  Consider this:

Note: my system has 2600k, Gigabyte MB, Gigabyte GTX 570, 16GB RAM

Known
  • Intermittent (sometimes after a minute, sometimes after an hour) hardware reset/crash/freeze.
  • Only with Skyrim
  • Works with other games
  • Works with Benchmarking tools (DX9, DX10, DX11) (over 7 hours running)
  • Works with stress testing tools (over 7 hours running)
  • Works with Benchmarking + Stress testing AT THE SAME TIME (Kombustor DX9 and DX10, 8X MSAA, 1920x1080, POST-FX, FULLSCREEN for about an hour)
  • Replaced Power Supply
  • Reinstalled Skyrim
  • Did the other things recommended by others on the 'net.
  • None of the above made a difference
  • No overheating
  • No error logs are generated - hardware just gets zapped
  • Unable to replicate rebooting/freezing outside of Skyrim
  • Disabling Speedstep made the system stable (one reboot in a week of playing for hours and hours a day).
  • Re-enabling Speedstep caused the crashes again.
  • Re-enabling Speedstep and disabling Core Parking seems stable (not fully tested yet).
  • Doing the above two seem to have worked for others (unsubstantiated reports only).
  • CPU/GPU load when playing Skyrim is very low on my system (settings are Ultra/maxed out).
  • Disabling Core Parking seems to be universally good for gaming (haven't found a negative post about it yet)
CPU Loads when running Skyrim

Unknown
  • Skyrim Creation Engine: is it built on Gamebryo? Why does it not use more cores?
  • Is Core Parking kicking in while playing Skyrim since the load is low?
  • Is the Skyrim engine somehow allergic to cores being powered down?
    • Skyrim works with Speedstep/C1E/C36/C6 State Support turned off (this prevents Core Parking)
    • Skyrim (seems to) work with Core Parking turned off and the above turned on
  • Out of the millions of Skyrim players, only a minority are having issues.  Is this because:
    • Only a few have high end processors + Windows 7 that support Core Parking?
    • Only a few have 16GB of RAM?
    • Only a few have low CPU/GPU use with Skyrim (most posts indicate that Skyrim is CPU intensive)?  This would mean my system is TOO FAST
    • Most have disabled Core Parking at some point in the past (I have a brand new box, nothing else loaded but games)?
    • Most have enough background processes to prevent Core Parking when playing Skyrim?
Anyway, some Core Parking things I found:
  • Works best in a server environment where there are cores that are truly not busy for long period of time
  • Does not save power if the system is under heavy load
  • Only in Windows 7, 8 and 2003, 2008
  • Impedes performance when enabled
  • Known to cause problems (Microsoft has a patch to disable it)

Testing to continue...

January 19, 2012

Skyrim Crash/Rebooting Fix: Disable Core Parking?

If you've read this blog before, you'll know that I've been having some issues with Skyrim on my new system.  Namely the problem was that Skyrim would cause the system to reboot intermittently.  It was very frustrating.  I had tried many, many things and the one that worked was disabling Speedstep.

However, this "solution" bugged me: it was not ideal and didn't explain the millions of other systems out there with Speedstep enabled that wasn't crashing.  In other words, it was not elegant or clean.  Of course, there are enough Internet posts that talk about the crashing, but certainly it is not as widespread as it should be if the problem was Speedstep.  Another problem was that my system was using power like it was still the 1990s!  This certainly goes against my green tendencies.  And then two days ago, I got yet another reboot in the middle of Skyrim.

So, I kept on searching when I stumbled across a series of posts and articles related to "Core Parking."  Having either never heard of this or assuming it's the same as Speedstep, I was surprised to learn that this was something that was found only the latest Intel processors (like mine) and is only found in the latest Windows OS (like mine).  I read through many many articles and it seemed this was indeed a likely culprit.

My theory goes like this: Skyrim's engine, the "Creation Engine" is not really new.  In fact, many others on the Internet postulate that it's simply built on the old engine, Gamebryo.  Gamebryo had some issues with Skyrim's predecessor, Oblivion.  Some research shows that the Creation Engine and Gamebryo share similar configuration files and in fact, the same behaviors.  Anyway, there's a lot to be said about Gamebryo vs. Creation that could fill pages and pages, but in short, I believe that "Creation" is a "new" engine that was built on the old one.  Gamebryo had problems with multi-core processors and it seems the Creation Engine also has at least some version of these problems.  Namely, not effectively using multi-cores and being single threaded.  In fact, some websites have shown that Skyrim does not run that much better with a quad-core vs a dual-core.  Something like 50% more performance from single-core to dual-core but only 3% from dual-core to quad-core.

Core Parking is essentially Windows deciding that if a core is not busy, it will power down that core to save energy.  If that core is needed for something, it wakes up in milliseconds.  My disabling Speedstep effectively prevented cores from being parked because they were running at full speed all the time anyway (although no load - it's quite possible that Core Parking relies on Speedstep to lower the core ratio/frequency and/or core voltage before deciding that the core is ready for 'parking').  At this point it's probably best to describe each:

Speedstep - this is a power-saving technology that reduces the frequency ratio of a core (mine goes from 34x to 16x @ 100MHz base, so 3.4GHz to 1.6GHz.  Additionally, it can reduce the voltage draw of a core as well.  The latest iteration also has a "Turbo" mode where the core can boost up to something like 42x (4.2GHz) if needed.  This is enabled/disabled in the BIOS.

Core Parking - also reduces power use, this basically disables a core.  This is a Windows thing.

Since I have a quad-core 2600K with multi-threading, Windows sees 8 processors.   During my test, when idle, with Speedstep on, Windows parked CPU1, CPU3, CPU5 and CPU6 and sometimes CPU7.  When I kicked up the load, the CPUs would get unparked and would kick in.

When Skyrim runs, it's not very processor intensive, at least not on my system.  The loads go up and down depending on what's happening.  Perhaps while in game, some cores are parked.  Perhaps combined with my 16GB of RAM and virtually nothing else running on my system, save for Steam, I'm not really needing a lot of cores while playing Skyrim.  My current theory is that neither the Gamebryo or the Creation Engine handles this very well with 4 cores and 8 "virtual processors."

I tested this by re-enabling all the defaults (i.e. Speedstep) and then disabling the Core Parking feature of Windows 7.  Note that there are many ways to do this.  I went with enabling the menu options and then setting minimum to 100% and maximum to 0% (see below).

Be sure to set "min cores" to 100% and "max cores" to 0%.
I recommend checking out the Microsoft way.  Anyway, it worked - I verified the Core Parking and Core Un-Parking effects using the built-in "Resource Manager" of Windows (scroll down for what it looks like).  Skyrim was played thoroughly for about and hour and DID NOT HAVE ANY CRASHES.  It's quite possible that this may be the "core issue" (pardon the pun).


It's still early in the testing, but at least I'm being somewhat green again, although not as green as when Core Parking is enabled.  But the power savings is minimal for me.  This is mostly intended for the huge server farms where some systems may stay idle for long stretches of time.

I'll give this at least a week of testing before I call it a win, but so far so good.  (I'll be out for a few weeks though, so no updates for a while.)   It's a nice discovery in any case and I learned way more about power savings than I intended to.  But now that I found it, I can't stop looking at it - kinda like the orb in Skyrim,

I found it, now I need to understand it before I can control it.

These are what parked cores look like in resource manager

After disabling core parking
Final note: After enabling the menu item, I made the changes and it immediately took effect, no reboot was necessary.  But, you probably should reboot anyway, just to make sure.  This is still Windows after all.  :-)

January 18, 2012

Skyrim Crashes Revisited

UPDATE:  I'm having success re-enabling Speedstep and disabling CORE PARKING.
  • 2011 DEC:
    • Installed Skyrim, Steam patched it to the 1.3 version, supposedly fixing a bunch of things
    • Skyrim works for a few days and then the dreaded "Reboot/Blinking Out" issue started.  In short, the entire system reboots.
  • 2012 JAN:
    • From late December to mid-January, troubleshooting in earnest.  All tests with other games, benchmarks, stress test, etc showed no issues. Roundup of testing here.
    • Charted voltages, temperatures, etc and found no issues whatsoever outside of Skyrim rebooting the system.
    • Bought and installed a 1000W Silverstone SST-ST1000-P PSU replacing the no-name 700W PSU that came with the system.  No change.
    • Tried a series of changes, systematically testing each one.  The one that made a change was disabling Intel Speedstep.  Skyrim crashed no more.
    • For about a week, no crashes and Skyrim is splendid!
    • The other day, I got a desktop crash.  Restarted Skyrim and no further issues.
    •  Last night, I got a reboot.  Not sure why, nothing had changed system-wise or application wise.  It's the same system that was running fine previously.  Reboots occured twice more and then stopped.  I doubt it's related, but I was at the mead poisoning mission just outside of Whiterun, in the dungeon just before combating the crazy conjurer that lived with the Skeevers.  I restarted the saved game and got beyong that part and no other issues for the next several hours.  (????????)
I am fairly confident (92%) that this is a problem with Skyrim programming.  I'm not bitter about having to buy a new PSU, because that just means ALL the critical components of my system are now high end pieces.  Also, if I ever decide to go dual or triple SLI, then I'm all set.

Skyrim being awesome
Skyrim not being awesome


Skyrim is an awesome game for sure, but these crashes are very, very frustrating.  It takes so much away from the game.  I really, really hope this gets resolved soon, either through Bethesda identifying an issue and releasing a patch or by me figuring out WTF is going on.

January 12, 2012

Skyrim Troubleshooting Compilation

UPDATE: It's quite possible that disabling Speedstep is overkill.  I'm having success re-enabling Speedstep and disabling CORE PARKING.

For the past day days week, I've been troubleshooting a peculiar problem with my new system.  So far I have three four five posts.  I'm going to use this as a summary post with links to the different parts.

BUT FIRST, a quick summary: new system runs everything great - no issues with anything EXCEPT for Skyrim, which causes a system freeze/reboot sometimes within a minute, sometimes after an hour or two.  After some troubleshooting, the problem is hopefully fixed.


The current fix involves turning off Intel Speedstep by disabling three options in the BIOS:

CPU Enhanced Halt (C1E)
C3/C6 State Support
CPU EIST Function


I posted some charts and data from running with Intel Speedstep off in part 5.

Posts related to troubleshooting Skyrim Blinking Out/Crashing/Rebooting:
For the initial post describing the system and initial troubleshooting steps (no resolutions but with fancy graphs), click on Part 1.

For the second post with more troubleshooting steps at the suggestion of a friendly Nexus Forums user and with crazy load testing (but no resolutions), click on Part 2.

The third post describes the replacement of the Power Supply Unit (no-name 700W PSU) that came with the system with a 1000W Silverstone SST-ST1000-P, which resulted in no resolutions BUT with some "gut check" tests that produced some positive results.  To read this, click on Part 3.

The fourth post is what the system looks like with Intel Speedstep disabled.  Please universe, let it be this one; I just want to play Skyrim!  This can be found in Part 4.

The fifth (and hopefully last) post on this topic has some charts and data from running with Intel Speedstep off.  Go to part 5 to read this..

I promise at least one photo per post for your entertainment!

Here's one on my CPU opened showing the water cooling unit on the CPU (bottom right) with the old PSU:




Skyrim Blinking Out/Rebooting Part 5 [SOLVED]

UPDATE: It's quite possible that disabling Speedstep is overkill.  I'm having success re-enabling Speedstep and disabling CORE PARKING.

After putting in some hours last night, I'm fairly confident that the problem with Skyrim rebooting my system is solved by disabling Intel Speedstep.  No issues whatsoever running Skyrim.  Naturally, since the idea behind Speedstep is to lower the power consumption, it stands to reason that increased power consumption and increased heat would be a concern.  I can't do anything about the power consumption, but I should really worry about the heat.  To that end, here are some charts I put together that hopefully will help add to the discussion regarding Skyrim reboots and the effects of disabling Speedstep. 

To disable Speedstep, go into your BIOS and disable the following "Advanced CPU" Core Features:

CPU Enhanced Halt (C1E)
C3/C6 State Support
CPU EIST Function


The following charts demonstrate the effect on CPU frequency and voltage when Speedstep is enabled and disabled:
CPU Frequency and Voltage under load with Speedstep ENABLED

CPU Frequency and Voltage under load with Speedstep DISABLED
 The following charts illustrate the effect on temperature on each core with Speedstep enabled and disabled.  As expected, the GPU temperature was not affected:

CPU/GPU Temperatures under load with Speedstep ENABLED

 
CPU/GPU Temperatures under load with Speedstep DISABLED

The data show that with Speedstep enabled, we get slightly lower CPU temperatures in the cores never going beyond 60C and possibly slightly lower CPU loads (this probably needs a more controlled test to confirm).  After Speedstep is disabled, we see core temperatures getting closer to 70C, still below the danger zones, although getting really close to it.  It's still not clear what the temperature max is for the 2600K, but Intel's specs show a "TCase" of 72.6C, which probably doesn't mean as much as the TJMax, which, if I read the Core MSRs on my CPU properly, comes out to 01100010 or 0x62 or 98C:

MSR 0x000001A2        0x00000000    0x00621200

I'm looking at this as the temperature at which the CPU will call it quits.  With a 20% buffer, I think I am comfortably outside the overheating temperature zone for this CPU.

Presumably the power consumption goes up with Speedstep disabled, but I have no measuring tools for that.  HWInfo64 does report that the +12V at a steady 12.025V - 12.074V.  Note: with my old no-name PSU, it varied between 11.436V and 11.976V.
So far so good.  I hope this helps someone out there.


January 11, 2012

Skyrim Blinking Out/Rebooting Part 4


 After the initial possible success of disabling Speedstep, I decided to re-enable Speedstep and sure enough, the reboots were seen again.  I also tried to adjust the Windows 7 power settings to 100%/Performance, but to no avail.  The informal causal relationship appears to be if the CPU frequency is changing, Skyrim does not like it.  Even under heavy load, with Speedstep enabled, I see frequency jumps from 1600MHz to 3700MHz varying for each core and voltage jumps from 0.966V to 1.281V.  It appears Skyrim just does not like this behavior.

Chart showing the CPU Frequency and Voltage fluctuations under load with Speedstep enabled.
With Speedstep disabled, we now have a steady 3700MHz for each core and voltage at a steady 1.276V (varying slightly, but only by a few fractions).

Chart showing the CPU Frequency and Voltage with Steepstep disabled.

HWInfo64 showing a steady clock on all four cores even in idle

CPU-Z CPU information
A few more days of no crash/reboots in Skyrim should be enough to show success.

If you want to try this, these are the items in the BIOS that I set to [Disabled] to turn off Speedstep:

CPU Enhanced Halt (C1E)
C3/C6 State Support
CPU EIST Function

 
Below is a screenshot of the BIOS screen where the settings are disabled marked in clumsy red arrows.  Note that this was stolen from a website so the clock ratio and frequency are not mine (WOW!):

Change the marked items to disabled to turn off Speedstep

Note: I also increased the memory timing from 666.7MHz to 800MHz which gives us 1600MHz on the FSB.

January 10, 2012

Skyrim Blinking Out/Rebooting Part 3

I bought and installed a Silverstone SST-ST1000-P PSU in hopes that the un-named 700W PSU that came with my system was causing the problems with Skyrim.  Below is a picture of the Silverstone (top) and the unnamed PSU (bottom).  The SST-ST1000-P is a 1000W PSU with 80A on the +12V (single) rail, which should be more than enough for even three GTX 570s.  It can also peak at 1100W.

Silverstone SST-ST1000P 1000W PSU on top, unnamed 700W PSU below

It's also modular, which is really nice.

However, after carefully completing the PSU swap (took a little over an hour), I fired up Skyrim and within a minute saw the same reboot problem.  After cursing under my breath, I started to plot the next steps.  First, I am fine with the new PSU.  It replaces a PSU that is questionable at best, eliminating that as a potential problem.  I then wrote down some other "gut check" things to try (that wouldn't cost money) and started to go through them:

  • Update GPU BIOS - I was surprised to find that there is a more recent BIOS for the GPU, adding a ".01" - seems to be a minor update, but I tried it anyway.  No luck, same symptoms.
  • Re-install DirectX - I was not able to find a way to do this, since it's a core component of Windows 7.  Did DirectX validation and checks.  No change.
  • Underclock/Overclock the GPU.  No change either way, other than slower graphics when the settings are lowered.
  • Underclock/Overclock the CPU.  No change either way.
  • Examine the memory speed.  I have four 4GB Corsair memory chips, with a rating of 800MHz, DDR3.  But, out of the box, the system booted with 1333MHz memory.  I adjusted this to 1600MHz, but while it looked promising, the same problems (reboot) occurred.
  • Disable Intel SpeedStep and variable VCORE.  This took some research and I learned a lot about how these new features from Intel work.  I had to disable three BIOS entries in the GIGABYTE Z68MA-D2H-B3 INTEL Z68 motherboard:

Go to Advanced Frequency Settings->Advanced CPU Core Features and disable the following:
CPU Enhanced Halt (C1E)
C3/C6 State Support
CPU EIST Function


After obtaining a fixed frequency and VCORE for the i7 2600K, I played Skyrim for over an hour without a reboot!  I shut it down since I was at the end of my day.  This seemed to have made a difference, although I am at best cautiously optimistic about this being the fix.

After going back and looking at the stats from previous tests, the CPU VCORE and CPU Frequencies were all over the place.  This is by design of course, but now I'm thinking that Skyrim programming may be hitting hardware more directly than other games.  It may be my tired state, but consider the following (note this may just be crazy thoughts):

  • Skyrim is a port from consoles.  Console games are written for specific hardware, giving programmers more  access to low-end functions than say a DirectX game.
  • Skyrim load times are ridiculous.  From hitting "Play" it takes about 5 seconds for the main screen to come up then maybe another 5-10 seconds to load a saved game.  Compare this to other games and you know they are doing something different with Skyrim.  Why would other games take so long to load compared to Skyrim when Skyrim is clearly a larger game?  Perhaps they skipped a lot of the frameworks that other games are built on.  Even something like the benchmark tool Heaven DX11 takes a while just to load all the textures.
  • If the problem is the variable CPU frequency/VCORE, why would I not be able to replicate the Skyrim reboots with other games?

In any case, cautiously optimistic is how I describe this phase of troubleshooting.  Next, I'm going to re-enable the BIOS settings and try to limit the CPU Speedstep via Windows 7 Power Settings instead.

Just in case this is a red herring, I'm planning on hitting the Sound Blaster X-FI card next by completely removing it.  After that, I'm completely out of ideas.

If you haven't already, be sure to see part 1 and part 2 of the Skyrim troubleshooting saga.

January 09, 2012

Skyrim Blinking Out/Rebooting Part 2

Thanks to the suggestions of a very helpful The Nexus Forums user (who successfully fixed his Skyrim problems by disabling his SLI config), I ran more tests and found the following:

  • Cleaned out drivers per these instructions.  Still rebooted. 
  • Running Prime95 for over 7 hours yielded no errors with RAM.
  • Running Linx for about an hour yielded no errors with RAM.
  • Running OCCT yielded no errors with GPU RAM.
  • Running OCCT "CPU:OCCT" test with large data sets for an hour showed no errors or issues.
  • Running Kombustor yielded no errors or crashes.
  • Running all of the above AT THE SAME time yielded no errors or crashes.
What this tells me is my system is stable - drivers and hardware all work well together.

I found some data from 03 JAN 2012 when I started earnestly troubleshooting the Skyrim crashes.  This chart shows what's happening with the GPU from idle (22:09:34), starting Skyrim (22:09:58) and then the reboot (22:17:01, end of the chart).  This involved loading the latest saved game and running through the area outside the College of Winterhold.  This session lasted 7 minutes.


As you can see, the GPU temperature and fan speeds are fairly consistent and way below the danger zone.  I did this a few more times and yielded similar results.  Basically, it a good run with no indicators of any problems related to temperature or load.

So, it's still crashing.  Our friendly Nexus Forums user summed up my feelings on this:
Unlike others I do feel that the fix has to do with ones system configuration/settings and not the game itself.  I really strongly feel it has to do with video drivers and something a bit deeper.  Quite frankly this problem isn't as wide spread as it potentially could have been with the vast amount of people who've purchased & pirated the game on PC.  At one point Steam had 500,000 people playing the game at the same time so my feeling is if it was the game a whole lot more people would be complaining.
Today, I found at a local store (Conrad's) a Silverstone ST1000-P 1000W for cheaper than NewEgg(!) and will install it tonight.  Hopefully the problem is that Skyrim does *something* that causes a huge draw on the PSU and this fixes the problem.  If not, in addition to the other quality components I have that have been thoroughly tested, I will also have a quality PSU.  Then, I'll just wait for Skyrim to be patched.

More to come...

January 06, 2012

Skyrim Blinking Out/Rebooting

I ordered a new gaming system from CyberPowerPC right after Thanksgiving, taking advantage of their sale at the time.  It was a great deal and the folks at CyberPowerPC were great with the order and the subsequent follow-ups, informing me of a backordered CPU and offering good alternatives.  The system I ended up with came in before Christmas in great shape and it worked out of the box (the shipping, packing, etc was great, no issues whatsoever).

I should note upfront that I've contacted them and they've graciously offered to upgrade my power supply.

Here's the specs:

ITEM            DESCRIPTION                                           QTY 
=============== ===================================================== === 
KB-152-101      BLACK XTREME GEAR MULTIMEDIA/ INTERNET USB KEYBOARD     1 
MO-115-101      BLACK XTREME GEAR OPTICAL USB GAMING MOUSE              1 
NC-USB-117      EDIMAX EW-7811Un IEEE NANO 802.11n USB WIRELESS ADAPTER 1 
HD-403-308      2TB SATA III 6.0 GB/S 3.5" HDD                          1 
RM-317-802      4GB CORSAIR VENGEANCE 1600MHZ DDR3                      4 
FA-WATER-101    ASETEK 510LC 120MM WATERCOOLER                          1 
FA-104-116      CASE FAN 120 MM                                         1 
CS-157-519      BLACK THERMALTAKE COMMANDER MID TOWER NO POWER          1 
CD-128-101      BLACK LITE-ON 24X DVDRW                                 1 
MR-104-101      12-IN-ONE INTERNAL CARD READER                          1 
MB-374-101      GIGABYTE Z68MA-D2H-B3 INTEL Z68 CROSSFIRE DDR3 SATA3    1 
               USB3.0 MICRO ATX LGA 1155
PS-119-108      APEVIA 700WATT POWER SUPPLY                             1 
SC-102-144      CL SB X-FI XTREME AUDIO RETAIL                          1 
SW-170-113      WINDOWS 7 HOME PREMIUM SP1 64-BIT                       1 
BOX1            SYSTEM BOX AND FOAM                                     1 
SERVICE-201     SOUND ABSORBING FOAM ON SIDE, TOP AND BOTTOM PANELS     1 
SERVICE-202     POWER SUPPLY GASKET                                     1 
SERVICE-104     ANTI-VIBRATION FAN MOUNTS                               1 
CU-208-206      INTEL I7-2600K 3.40 GHZ 8M LGA 1155 RETAIL              1 
VC-207-108      GIGABYTE GEFORCE GTX 570 1.2GB DDR5 PCI-E               1 


It ran every recent game I had (HL2, Portal 2, GRID, Just Cause 2, Batman:Arkham City, Assassin's Creed 1/2/Brotherhood, etc) at the highest settings without complaint. But I really got this rig for Skyrim.
I wanted to finally see the Aurora Borealis.
Of course, as Murphy predicted, the problems started when I installed "Elder Scrolls V: Skyrim".  At first, it was great, and the adventures started.  But after a few hours, everything turned off and the system restarted.  It could have been anything at this point, and I didn't think much of it until it started happening over and over again (Skyrim is a looong game).  I was unable to replicate the problem even after hours and hours of Batman:AC and other games.

Let the troubleshooting begin!

Symptoms:
  • Skyrim causing system to reboot.
  • No other issues observed on system

Related Info:
  • Skyrim is a DX9 game and is designed to run on anything from XP to 64-bit Windows 7.
  • Batman:AC and other games utilize DX11 and is significantly more CPU and GPU intensive (IMO).
  • No issues with ANY other game.
  • OS is 64-bit Windows 7.
  • When system reboots, temperature and fans are not high.
  • Logging with HWINFO64 and other tools confirmed this.
  • Power supply that came with system is possibly APEVIA brand (boo!), but I haven't removed it from the case to confirm this.
  • Power Supply specs (from side panel pic):
    •   700W (Sticker says ATX-CV700W - can't find info on this PN)
    •   +3.3V = 38A
    •   +5V = 40A
    •   +12V1 = 23A
    •   +12V2 = 26A
  • System should need under 600W and the +12V rails seems to be sufficient (based on Internet research)
  • Running two 21" LCD monitors: One monitor at 1920x1080 and the other on 1680x1050

Troubleshooting Steps so far that had no effect on the symptoms:
  • Lowered the clock rates on the video card (GIGABYTE 570 is OC a little)
  • Lowered settings and resolution in SKYRIM
  • Installed latest BETA Nvidia drivers (290.53)
  • Re-installed latest STABLE Nvidia drivers (285.62)
  • Cleaned out old video drivers
  • Disabled the (unused) onboard video (Intel HD)
  • Updated Creative Labs drivers (SB X-FI XTREME)
  • Disabled the (unused) onboard audio
  • Uninstalled "extra" stuff that Nvidia loads (i.e. 3d vision drivers)
  • Verified no OC on the CPU
  • Removed 2nd monitor
  • Ran Skyrim in Window mode
  • Reseated all components (memory, video card)
  • Verified power and cable connections
  • Reinstalled Skyrim

Last night I loaded the Heaven DX11 Benchmark to do some load testing.  It ran for over six hours without any issues!

I also logged the system stats using HWINFO64 and the system was under heavy load (much heavier than Skyrim and Batman).  Here's a summary of the results:

Test period: 01:30:45 - 06:42:35

Starting GPU Temp: 53C
Max GPU Temp: 72C
Ending GPU Temp: 65C

Stating GPU Fan: 2400RPM (48%)
Max GPU Fan: 3510RPM (70%)
Ending GPU Fan: 3330RPM (66%)

Starting CPU Temp: 41C
Max CPU Temp: 57C
Ending CPU Temp: 47C

Starting CPU Load: 0.8%
Max CPU Load: 31.7%
Ending CPU Load: 0.9%

The chassis temperature range for the test was: 31C - 34C.

GPU Memory allocation went from 130MB to 836MB and then back to 137MB, following a linear curve correlating the load.

Here are some charts from the data gathered:
GPU Temperature stats for the entire 6-hour run.  Note that it stabilized at 71-73C, well below the "danger zone"

GPU Fan Speed stats for the entire 6-hour run.  Note that it stabilized at around 69%, well below the "danger zone"

CPU temperature, CPU utilization and Chassis Temperature stats for the entire 6-hour run.  These are well below the "danger zones"
Since Skyrim is a DX9 game, I also ran a benchmark (15 minutes) using the DX9 settings and found the same stats as above.

It's interesting to note that when actually playing Skyrim, there is a CPU load on only one core (out of four).

I had originally been leaning towards the power supply being the issue, but the test results (and lack of issues with other games) causes me to lean more towards the Skyrim code doing *something* that causes my system to have issues.

Skyrim was released in November and it already has three (four?) patches and unsubstantiated reports by other gamers indicate that some Internet developers have made patches that bypass and optimize some poorly-written routines in the game, improving performance.  This leads me to believe that something in Skyrim is causing the "blinking out/rebooting" issue I'm (and a lot of other people) are experiencing.

UPDATE 07 JAN 2012: Tested with Kombustor DX9 and DX10, 8X MSAA, 1920x1080, POST-FX, FULLSCREEN for about an hour with no issues or crashes.


UPDATE 07 JAN 2012: I was able to play Skyrim without crashing by starting a new game.  I think my save file is corrupted somehow!  Will test this out thoroughly and will post findings later. This was a false lead - it still crashed, although it took a while longer.


See Part 2 Here.