In a previous issue of the VEXIS Voice, I discussed some of the tactics for effective troubleshooting that we successfully employ at VEXIS Systems. Much of the process we use for solving problems was derived from “The 10 Step Universal Troubleshooting Process” by Steve Witt. Here are a few of the steps we discussed in the previous article:
- Make a Damage Control Plan
- Get a Complete and Accurate Symptom Description
- Reproduce the Symptom
Now, here are the key remaining steps in the troubleshooting process to ensure timely and accurate resolution of the issue:
Do The Appropriate Corrective Maintenance
At one time or another, most of us have spent hours narrowing a problem down to something that could have been corrected by general maintenance. A specification is appropriate corrective maintenance if:
- Company policy is that the action must be taken before returning the system to production mode.
- It’s likely to fix the problem, easy to do, and is a maintenance item.
- The manufacturer of the hardware or software component has advised that the maintenance be performed (through a service bulletin, patch, or service pack release) and the maintenance does not conflict or interfere with other components of the system.
Note that corrective maintenance is one of the few weapons we have against that scourge of troubleshooters, the intermittent problem. Often the best economical solution is to do all corrective maintenance, then observe the component over time to determine if the issue was corrected.
Narrow It Down to the Root Cause
Mathematics tells us the fastest way to find a single element in an ordered set is binary search. Binary search is the process of repeatedly ruling out half the remaining search area until the element is found. What makes the system you’re troubleshooting an ordered set is your knowledge of it, reinforced by manuals and documentation. It’s that knowledge that allows you to devise tests to split the search area in half. Below is a simple diagram of a binary search finding the violet component with only six tests:
Testing is the best predictor of customer satisfaction. If the symptom you obtained in step 2 and reproduced in step 4 is now gone, and no new problems have occurred, it’s likely the customer will be happy. Most customer relations horror stories occur when testing was inadequate or non-existent. When you test, ask these four quality questions:
- Did the symptom go away?
- Did the right symptom go away?
- Did I fix the right cause?
- Did I create any other problems?
Employing these steps when troubleshooting any issue can greatly reduce system downtime and keep your customers happy!
Brian Smith, Director of Strategic Initiatives