Home Technical Talk

Workstation HELL!

admin
Offline / Send Message
System admin
Hi guys, I'm writing this using windows in safe mode + networking, lately been having some problems with rendering using advanced lighting in scanline and some with mental ray. Scenes are around 500k tris, half way through 2nd render the cpu temp warning light comes on and then system freezes with no option but to reset. Was asked to send an email to the manufacturers and the suppliers after many phonecalls to each testing for solutions.

It's a pretty long email so I apologise in advance:)


/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

18th August 2009

Recently and this morning while working on a 3d project I noticed a flashing red led cpu temperature indicator accompanied by an audible tone and then a freeze . Specifically this problem occoured while performing advanced 3d rendering in a moderately complex 3d scene (500k triangles). Everytime this happens the computer has to be reset as it is completely inoperable/unresponsive.

After reading up on "Vista freezing" through tech forums and blogs I decided to follow a tutorial and streamline the system as much as possible to further minimize memory useage as I regularly perform maintenance anyhow. This tutorial involved turning off alot of unrequired services, removing all unecessary programs and repairing some corrupted windows files using the cbs command prompt.

After a restart and a barely noticable improvement in speed the system was tested under the same conditions and the same freezing error occoured, I then decided to remove as many updates as Vista would allow and tested again, the results were identical.
Later I removed an internet settings update by accident and the lan connection was lost so I could no longer investigate anymore online solutions, this left me with little option but to reinstall windows via the desktop. During this time I thought I should seek some advice with the supplier and manufacturer of the components.

I contacted both companies advising of the situation and also reporting that that upon pressing the delete key to enter the bios setup screen the bottom half of the screen was pixellated with different coloured pixels, the key had to be pressed again to load the bios setup. Once inside the bios the screen had large black and white random blocks surrounding the lettering. I asked if I should flash a new bios but was advised not to as a faulty motherboard/component could possibly cause this.

After an initial discussion I was advised to contact the component suppliers as they may be able to exchange under warranty parts if they were faulty. At this time I was under the impression that the processors were in question due to the specific nature of my work combined with the overheating problem which seemed to be pointing towards this freezing issue.

It was recommended to me to download several programs to ascertain the problem, it was mentioned that it could also be a memory issue and that processors don't usually cause a system to freeze or hang, just crash. I explained the same things that the computer freezes every time after a few minutes and the nature of the useage including the program used which was "3ds max". At this time I enquired about the operating system and that people online were complaining of the system freezing and this seemed to be an issue with Microsoft Windows Vista but I was assured this was not the case.

All the components purchased all carry a Vista compatible certification so it was concluded the problem had to lie elsewhere. During a chat I was informed that the memory could be tested per 2gb stick in each slot though at the time I presumed this individual testing was available through the program!

As suggested, I downloaded some of the test programs as follows;
"prime95" (cpu/gpu/memory stress tester),
"coretemp" (cpu temperature monitoring).
"memtestx86" (bootable memory testing program).

Testing;
Ran "prime95" and purely stress tested the cpu's several times for around 10 minutes using 8 threads at 100% load. Everytime within the first 3 mins the cpu temp. rose to above 89 degress celcius which activated the temp. warning light and alarm. During the tests it turned out after about 5 mins the cpu temperature stablised at just under 70 degress celcius, the fans slowed down and the warning light/alarm was deactivated. At no time during this phase of testing did the system freeze.

The rest of the tests included stress testing the memory, everytime after around 5 minutes the system froze and as before I had to reset the computer. After several tests I concluded it must be the memory/motherboard at fault.

After a brief call to update the situation, I was advised to make a boot cd with "memtestx86" and run it from dos. I did this, reset the pc to start the program and the test began. It quickly rose to 3% complete then halted, the system was frozen again, more tests revealed exactly the same thing. After another reset I entered the bios and had a look at the memory logs for the dimms, there were around 30+ entries recorded as ecc correctable memory errors. Only two dimms were listed, "3b" and "4a".

Contacted both companies and returned the results. I was told I should test each memory stick in the same slot and then if required new slots to reveal the problem as either a memory stick or a memory controller/mboard fault. After the calls I removed all sticks except one in the first slot and started up the pc, the bios would not load - instead a longish regular alarm sound was heard.
After another phonecall it was concluded that the bios would not accept anything other than the hardware previously connected/detected. The bios had to be reset through the motherboard cmos clear facility to allow for a new hardware config.

I tried doing this as instructed in the manual (turn off power supply, remove power lead and bridge "jbt1" connections with a screwdriver) but after several attempts was unable to clear the bios for some unknown reason.

I then contacted manufacturers and and spoke to someone there who I quickly brought up to speed on the situation, I was informed that it would be better to speak to the man in charge. I then called the distributors again and advised someone else on the situation which is when I mentioned for the first time the only thing I could think of to clear the memory was to remove the cmos battery but I thought I should wait before doing that.

Later on around 5.30pm I got a call from the distributors, I quickly brought the person up to speed on the whole situation and we discussed removing the bios battery then bridging the connections to reset it. Also interleaving the memory was mentioned as the mother board supported this option.
After this call firstly I removed the battery and shorted "jbt1" to clear cmos, interleaved the sticks, 1 in each bank leaving a space between each stick. Performed the same test with battery and cmos leaving in 1 and then 2 memory sticks. The result was the machine powered up but before anything could be displayed it abruptly powered down and reset then the alarm sound was heard and the bios would not load.

Update: The bios is now freezing intermittantly even with default settings, leaving the machine off for a short while allows activity enough to load the o/s into safe mode with networking at the very most. If bios is left to skip to loading full o/s it doesn't get far before the system freezes completely.
At present all the ram sticks are back in place as no other configuration seems to work, my last test was to set the bios as per the hardware and disable all quiet boot modes, this revealed that 3/4 of the installed memory passed the test, around 6600mb and that memory sparing was not available.

Currently I am at a loss at what to do, I need to get on with my work but am unable to.

Sincerely,

Steven Bell


//Main system specs;
Motherboard = Supermicro x7dwa-n, 5400 series Seaburg Chipset
CPU = 2x Intel Harpertown e5472 Quad Core @3Ghz 1600 FSB
Installed memory = 4x2Gb Supermicro 800Mhz ECC DDR2 Ram
GPU card = 2Gb Radeon 4870x2 GDDR5//

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////


Just wondering you guys, I kind of hate Vista, not because of this but because it's so memory hungry and bloated with things that are mostly completely useless. I basically couldn't care less what it looks like, performance is the most important requirement for me and compatibility with programs which was a nightmare to get right finding patches and whatnot. Seriously when/if this problem is fixed I'm thinking about switching to XPx64, what do you think?

Replies

  • Ghostscape
    Options
    Offline / Send Message
    Ghostscape polycounter lvl 13
    that is some words
  • Ben Apuna
    Options
    Offline / Send Message
    Wow! sorry to hear about all your troubles. Looks like some serious overheating problems + bad ram and/or motherboard. Testing ram sticks isn't fun, I initially had a bad stick when I put my current system together too, which also caused random crashes until I figured out it was the cause and replaced it.

    You say that your temps stabilized during the prime95 test, still I would say it's really bad to activate the temp warning at all. A better cooling setup might help you a lot. In fact I believe all of your problems probably stemmed from this issue.

    As far as Visa/XP well if you've got the money or already have it on hand go with XP64, less fuss/more performance. I'm using Vista64 Ultimate and it's working fine, I would've went with XP64 but the price was double what I paid for Vista.

    Good luck getting everything back up and running.
  • System
    Options
    Offline / Send Message
    System admin
    Thanks Ben, got some good news, the company that supplied the parts offered to look at the machine, parcel force picked it up this morning so those guys are gonna stick it on a bench and give it a good thrashing! Better that than me with my screwdriver!
    Hooked up my old system to get back online (haven't got round to selling the damn thing yet) at least I can be kind of nostalgic:\
  • System
    Options
    Offline / Send Message
    System admin
    Spoke to Scan Computers today again, they have had the machine on for over 3 days, 72 hours testing processors and ram at maximum settings, temp didn't reach over 65 degrees celcius and here's why;

    IMG_0360_resize.jpg

    I thought this was an air filter but it turned out to be a big fluff buildup totally blocking airflow!

    Getting it back tomorrow sometime, here's what was done;

    Cleaned fluff out of whole machine :poly136:
    Ram placed correctly via manufacturers tech support (motherboard manual doesn't explain this fully and this was the reason the system was freezing not due to overheating)
    Tested 72 hours + full settings on ram and cpu's
    Tested with 2 max files sent via email - heavy scanline and mental ray rendering
    Latest bios file flashed
    *Fitted an optional exhaust fan

    Cost = Carriage and price for the parts but everything else was free!

    The guy I dealt with was superb so if anybody buys from Scan and has any problems speak to Anthony in research and development, he knows his stuff and is very helpful :)

    ps: thanks to all the voters, it's neck and neck it seems but they say better the devil you know, gonna go with xp pro x64 and save the extra ram for something else.
  • Ben Apuna
    Options
    Offline / Send Message
    Glad to hear your PC got all fixed up GCMP. All without any real hardware failure too, that's quite fortunate. I find that keeping my PC raised off the ground keeps the dust/dirt buildup to a minimum, then I just clean out the vents and fan blades about every six months or so.
  • Bruno Afonseca
    Good thing you got that sorted out. But take my advice: xp64 sucks. I'm switching back to 32 because I can't take anymore the slowness and instability. Everything crashes.
  • michi.be
    Options
    Offline / Send Message
    michi.be polycounter lvl 17
    I'm fine with x64 Vista. Will try out Win7 some day or do a work-setup only with nothing more then the tools I work with on. So it keeps slick and speedy.
  • System
    Options
    Offline / Send Message
    System admin
    @Ben Apuna cheers man, back with Vista now, tweaked the hell out of it, turned off all that fancy crap including updates and installed minimum required software, it's fast and furious :) Pity there's no decent firewalls about for it though, stuck with comodo and it's unstable with certain settings, miss sygate badly :(

    @fonfa wish I had seen your comment there before installing xp - it's as buggy as hell on this new system! Took ages to get programs installed, not much support, plus the most important thing was it could only see 4, not 8 gigs of ram:( Must be that my hardware is too new because my old emt64 system seems to love xp x64 = emulation!

    @michi.be had a look at the benchmarks for vista vs win 7 and it seems there isn't that much of a speed increase, I'm sure the market will catch up when win 7 becomes trustworthy though.
Sign In or Register to comment.