Disclaimer: I’m not a hardware engineer. This is not a story about CPU soldering or whatever hardware engineers find sexy. But it is a story.
I’ve recently been trying to resurrect my old PC. I haven’t used it at all over the last 3 years and moved twice in the meantime (including moving from Denmark to Poland).
Obviously, I started from plugging it in but it would not start. Nothing would happen when I pressed the start button.
That brought back the memories. I already had problems with this PC not wanting to start in the past. Every couple of weeks it wouldn’t react to pressing the power button. I also had problems with disks freezing to death while the PC was working (you could move your mouse, use apps that you loaded previously, but you couldn’t open anything new).
In the past, while I was still using it, I suspected a power supply damage. I replaced it. I changed the disks. RAM. Graphics card (because I wanted to play The Witcher and it was a good reason for buying a more powerful GC :D). Nothing helped.
When investigating this problem I noticed that opening the case, unplugging disks and plugging them back would sometimes help. There was that green LED (SB_PWR) next to the SATA sockets on the motherboard. It was off right after I had experienced an issue. But it would turn on either when I did the replugging or just waited some time. It was a mystery.
So, here I was 3 years later and trying to resurrect this PC. It was a pretty good PC a couple of years ago so, obviously, I didn’t want to just throw it away. Plus, I abandoned my Ubuntu distro with all my old data there so I needed to access those disks anyway.
I opened the case and realised that the SB_PWR LED is off. “Yes… that again. So it must be a motherboard’s fault. I replaced all the other parts and that didn’t help. I need to look for a new one.”
I left the PC and moved on to other things. A couple of days later, I look at it and yes, the green light is ON! It’s alive!
I booted it. Everything worked fine, although after I restarted it, I needed to repeat the old replugging of SATA drives. Just like in the past. However, if it had already booted, it worked fine, so I started backing up my old data (“Oh, Ubuntu… you’re my favourite Linux-based operating system”). In order to do that, I needed to plug in an external USB disk.
The PC was laying flat on my desk, so the easiest way to plug an external drive was to use the back panel, the one directly on the motherboard. When I touched the USB port with a plug, fireworks! The PC switched off, the LED turned dead.
Did I just burn it? And what the heck was that?!
Before I answer that, one more flashback. About 5 years ago I noticed that my MacBook Air’s charger (the metal plug) was slightly electrified. Back then I lived in Denmark where for some reason power sockets have no grounding. I thought, back then, that my charger has some sort of short circuit (you know how those Apple’s cables tended to look after a year of use) and the charge stays in the plug (which is aluminium) because the whole thing is not grounded. Changing the charger to a new one didn’t help, though, so I turned my attention to the power strip. It turned out that plugging the charger to some other socket was helping, so I thought that the old power strip (which was grounded, because I bought it in Poland) was at fault and I bought a new one (which didn’t have grounding bolts at all because we were in Denmark). That helped. No more shocks when using the MacBook charger.
Back to the fireworks. Guess which power strip I was using now? Yes, the one without the grounding. Guess what happened when I checked the PC’s case with an electrical tester? Yes, it was electrified. But why? That’s a new power strip I was using. The one which made the MacBook problems disappear!
That’s true, but it wasn’t the power strip’s fault. It wasn’t the old MacBook charger’s fault either. The problems that I had 4–5 years ago didn’t go away. They just focused solely on my PC… because I switched to a non-grounded power strip during my time in Denmark. As a result, my PC stopped affecting other appliances that I have plugged to the same power strip.
But how did it happen that my PC’s case was electrified? Cause I’m a lazy moron and I did a mistake when assembling a new PC 5 years ago. Here’s a link to a photo which explains what I did.
Those aluminium strips were touching the USB power pins. That made my PC’s case electrified. I’m guessing that they were touching just so slightly that the connection could change based on the temperature, me moving the PC’s case (to open it and replug the SATA drives), etc. Me touching the PC’s case (while doing the replugging) might’ve also affected what was happening (I was grounding it with my hands). Also, since the touching point was so delicate, it wasn’t a big “leak”. Presumably, it sometimes made the PC’s power supply lose just enough power (or made the voltage unstable) so it wasn’t able to properly power the USB ports and… the SATA drives. Hence, the green LED switching off.
Why did I see fireworks when I tried to plug an external USB drive there? Because I touched the aluminium plate (and the strips) and made the contact area larger. Why didn’t I discover it for a couple of years while using this PC? Because I never used those rear USB ports in normal situations.
While “debugging” these two issues (PC, MacBook charger) I replaced my disks, an old power supply, RAM, the old power strip, my old MacBook charger. I thought I resolved some of the problems, but it turned out that I just isolated the real problem that I had. It’s funny in how many ways the problem was manifested and what kind of dirty fixes I discovered. It’s also interesting how the environment (Danish non-grounded power sockets and later on a non-grounded power strip) affected the problem.
If I were using a grounded power strip and a grounded power socket my guess is I would either burn the power supply or an RCCB would indicate that I’ve got a big problem (though the “leak” was so small that perhaps nothing would’ve happened anyway). If I replaced the old grounded power strip with a new, grounded one I would realise that the same problems still occur, so the fault must be somewhere else (not the charger, not the power strip, so there is a big chance that I’d look at my PC which, at that time, was working quite fine).
This story has many analogies to debugging software (and most likely, pretty much any other more complex mechanism).
- A single issue can manifest itself in many ways, in many places. There’s a great chance that if you focus on the visible part of the problem, you won’t notice the real cause.
- When debugging an issue, you need to consider how changes in the environment affect your issue at hand. A problem may not occur locally or under a specific load, locale, etc. Taking the right conclusions from “this only happens when X” can often save you many hours.
- The most tricky issues to debug are those that seem to occur randomly. Finding patterns lets you tame them a bit, but may be really hard if you don’t know the scope of the problem.
- You can become more proficient in debugging (or “problem-solving” in general) by gaining more knowledge about the field and experience in resolving similar problems. If I were a hardware engineer, this would, most likely, be an easy-peasy for me.
- You may need tools and some spare parts. I wasn’t able to properly test what was happening with my PC which made validating my various hypotheses more of a guessing game.
- Sometimes the fix requires just 1 changed character in the code. It might take you a couple days if not months (or years :D) to find it, though.
- You need to understand why your patch fixes the problem. Sometimes developers are able to get rid of an error by tweaking the line in which it occurs. However, I tend to reject such patches if their authors can’t explain the whys and hows. Changing a line from which the error is thrown is often not the right fix if you can’t tell how you got to this state in the first place.
- Fixing harder bugs may require connecting many seemingly unrelated dots. Since evidence of the bug can appear only at certain conditions, it may happen that you’ll notice them rarely, over a longer period of time. Keep your eyes open and look for patterns.
- Debugging really complex issues is an art.
- You can’t just google some issues.
- Sometimes you just need to be lucky.
PS. I’d like to use this occasion to thank my Dad who made me curious about electronics and software when I was a kid. And to Olek who corrected this article because I apparently still don’t know much about electronics :D