I was going to call this post "Whaddya DO all day?!?"
Yesterday, nearly a decade after leaving the Navy, I realized that I still troubleshoot as though I was still in the Navy.
You know the old troubleshooting flowchart-- the one that starts with a box labeled "It's broken", leading to a diamond labeled "Were you messing with it when it broke? Yes/No". And if it wasn't your fault then it was either operator error or possibly, in a distant third/fourth place, caused by the environment or a design issue. And for those of you veterans who object to the term "operator error", I should point out that there's a reason the submarine force has done away with periscopes-- they're the only piece of equipment that officers were allowed to operate, and they broke more frequently than anything else. The periscopes, not the officers.
Anyway, two days ago at 8 PM I noticed that the pump on our solar water heater wasn't shutting down. (Yes, I still go around the house listening for signs of trouble. But at least I've pretty much stopped taking logs.) The pump controller is straightforward-- two separate wires leading to a hotter thermocouple (the collector on the roof) and a cooler thermocouple (the water tank). The controller turns on the pump when the roof thermocouple gets too hot (currently set at 12 degrees warmer than the tank) and runs until it cools to within four degrees (hard-coded into the controller's chip). At 8 PM I was pretty sure that the roof wasn't hotter than the tank, so I shut off the controller and tried to go to sleep.
11 hours later (seven hours of mental troubleshooting interrupted by four hours of actual sleep) it was light enough to go to work. Why did I wait until sunrise so that I could go on the roof? Because I knew that I hadn't been messing with the pump controller when it broke, and I was pretty sure that nobody else had been [-]operator error[/-] messing with it either. (We'll get back to this latter assumption in a couple paragraphs.) That left environment and design-- Hawaii is notorious for UV, water, wind, and dirt/debris so clearly the environment was the source of the fault. Hey, we built the darn thing and it's worked flawlessly for over five years. That usually means the environment finally ate through something.
Up on the roof I had to spend about 15 minutes clipping tie wraps, peeling back electrical tape, pulling off pipe insulation, and undoing wire nuts. Everything looked fine, so maybe the environment wasn't the problem-- but I decided to verify that. With my 25-year-old Radio Shack analog multimeter and my thermocouple spec sheet (yes, I save thermocouple spec sheets) I was able to verify electrical response (to a butane lighter and an ice cube). With the wire's loose ends carefully separated and capped off, I trotted back down to the garage to check resistance where the roof wire connected to the controller-- infinite. Great, no shorts. I trotted back up to the roof, shorted the loose ends together, and trotted back down to the garage to check continuity-- zero ohms. Great, no breaks or bad connections either. While I was there I did a couple tests from the controller tech manual (you know I save those too). Everything responded normally but the controller was still convinced that the roof was way hotter than the tank. Of course at 9 AM that's probably true.
In the garage, I repeated all of the previous paragraph's steps on the tank's thermocouple & wire. Everything was OK until the final step (it's always the final step) when sonofagun, shorted continuity showed infinite resistance-- the wire had a break or a bad connection. Sure enough, when I went over the wire by hand I found that it was severed at the bottom of the tank. I replaced it with a new wire and the controller happily set about comparing temperatures again. I put back the wire nuts, pipe insulation, electrical tape, and tie wraps, had my spouse clear the tagout, and returned the hot water system to service. Hollywood showers for everyone.
But wait, we submariners know that troubleshooting isn't complete until the root cause has been identified and corrected. Luckily it only took one more day to complete that step-- see below for the results of the incident critique.
Next time something breaks without warning, I'm going to start troubleshooting with a different assumption...
Yesterday, nearly a decade after leaving the Navy, I realized that I still troubleshoot as though I was still in the Navy.
You know the old troubleshooting flowchart-- the one that starts with a box labeled "It's broken", leading to a diamond labeled "Were you messing with it when it broke? Yes/No". And if it wasn't your fault then it was either operator error or possibly, in a distant third/fourth place, caused by the environment or a design issue. And for those of you veterans who object to the term "operator error", I should point out that there's a reason the submarine force has done away with periscopes-- they're the only piece of equipment that officers were allowed to operate, and they broke more frequently than anything else. The periscopes, not the officers.
Anyway, two days ago at 8 PM I noticed that the pump on our solar water heater wasn't shutting down. (Yes, I still go around the house listening for signs of trouble. But at least I've pretty much stopped taking logs.) The pump controller is straightforward-- two separate wires leading to a hotter thermocouple (the collector on the roof) and a cooler thermocouple (the water tank). The controller turns on the pump when the roof thermocouple gets too hot (currently set at 12 degrees warmer than the tank) and runs until it cools to within four degrees (hard-coded into the controller's chip). At 8 PM I was pretty sure that the roof wasn't hotter than the tank, so I shut off the controller and tried to go to sleep.
11 hours later (seven hours of mental troubleshooting interrupted by four hours of actual sleep) it was light enough to go to work. Why did I wait until sunrise so that I could go on the roof? Because I knew that I hadn't been messing with the pump controller when it broke, and I was pretty sure that nobody else had been [-]operator error[/-] messing with it either. (We'll get back to this latter assumption in a couple paragraphs.) That left environment and design-- Hawaii is notorious for UV, water, wind, and dirt/debris so clearly the environment was the source of the fault. Hey, we built the darn thing and it's worked flawlessly for over five years. That usually means the environment finally ate through something.
Up on the roof I had to spend about 15 minutes clipping tie wraps, peeling back electrical tape, pulling off pipe insulation, and undoing wire nuts. Everything looked fine, so maybe the environment wasn't the problem-- but I decided to verify that. With my 25-year-old Radio Shack analog multimeter and my thermocouple spec sheet (yes, I save thermocouple spec sheets) I was able to verify electrical response (to a butane lighter and an ice cube). With the wire's loose ends carefully separated and capped off, I trotted back down to the garage to check resistance where the roof wire connected to the controller-- infinite. Great, no shorts. I trotted back up to the roof, shorted the loose ends together, and trotted back down to the garage to check continuity-- zero ohms. Great, no breaks or bad connections either. While I was there I did a couple tests from the controller tech manual (you know I save those too). Everything responded normally but the controller was still convinced that the roof was way hotter than the tank. Of course at 9 AM that's probably true.
In the garage, I repeated all of the previous paragraph's steps on the tank's thermocouple & wire. Everything was OK until the final step (it's always the final step) when sonofagun, shorted continuity showed infinite resistance-- the wire had a break or a bad connection. Sure enough, when I went over the wire by hand I found that it was severed at the bottom of the tank. I replaced it with a new wire and the controller happily set about comparing temperatures again. I put back the wire nuts, pipe insulation, electrical tape, and tie wraps, had my spouse clear the tagout, and returned the hot water system to service. Hollywood showers for everyone.
But wait, we submariners know that troubleshooting isn't complete until the root cause has been identified and corrected. Luckily it only took one more day to complete that step-- see below for the results of the incident critique.
Next time something breaks without warning, I'm going to start troubleshooting with a different assumption...