Reducing No Fault Found and Improving Operational Availability through Intermittent Fault Detection and Isolation

Author: Ken Anderson, Vice President, Universal Synaptics Corporation  

For those in the avionics repair and maintenance business, the acronyms NFF (No Fault Found) and CND (Cannot Duplicate) are unfortunately, all too familiar terms. After several decades of frustration with this illusive phenomenon, it continues to consume an enormous amount of test and diagnostic effort and is the source of considerable cost and discomfort within the multi-level avionics repair model.

In this article we will outline the problem of intermittence and its testing difficulties and more importantly, describe the unique equipment and process which has produced overwhelming success in Intermittence NFF resolution and MTBDR (Mean Time Between Depot Repair) extension for the U.S. Air Force.  Universal Synaptics Corporation working with Total Quality Systems, (TQS) Ogden, Utah, implemented a team-developed overhaul system called IFDIS (Intermittent Fault Detection and Isolation System) which incorporates all the necessary testing procedures and technological capabilities that are proving to be critical to the resolution of the chronic intermittent NFF problem.

 

Introduction

Today’s defense environment requires responsive and affordable solutions to global weapon system support challenges.  U.S. Forces are simultaneously engaged in multiple humanitarian assistance and disaster recovery operations while rebuilding a nation in Iraq, drawing-down major combat operations in Afghanistan, fighting terrorism around the globe, and maintaining a deterrent to strategic-level threats like cyber warfare and weapons of mass destruction.  However, as operations have increased, the DoD’s ability to economically sustain them has become increasingly challenging.  The high, sustained operations tempo over the past decade in harsh environments has eroded weapon system readiness and reduced expected life span.  In the current economic and political environment, recapitalization of systems has been complicated by sequestration and the cancellation of weapon systems modernization programs.  In light of these realities, the Services are seeking ways to improve maintenance capabilities, reduce NFF, and increase operational readiness while simultaneously reducing life-cycle costs.

NFF test results in electronic boxes, primarily driven by intermittent faults, have become a significant concern and huge maintenance and life-cycle cost driver, and an operational readiness degrader within the DoD.  For many DoD weapon system electronic boxes driven to the depot for repair, less than half have the actual root cause of the problem identified and repaired.  The other half test NFF.  Unfortunately, conventional Automatic Test Equipment (ATE) was not designed to detect intermittent faults and is incapable of detecting and isolating momentary intermittent failures that cause NFF.  One of the reasons it is difficult for conventional ATE to detect intermittent faults is that circuits or functions are typically tested one at a time.  Unless the circuit with the intermittent defect happens to be tested at the very time the intermittent occurs, it is missed.  The undetected and unrepaired intermittent faults cause the electronic boxes to malfunction during operation, because these faults are not detected, and hence not repaired, at the depot.  Rather, the boxes continuously cycle between the field and depot consuming an enormous amount of resources, negatively impacting maintenance budgets, warfighter readiness and warfighter support.  Currently, NFF is a $2 billion dollar no-value added annual expense for the DoD.

There are undoubtedly many causes of NFF and all of them should be addressed.  The question is: Where do we start and which solution will be the most beneficial?

Our particular efforts have focused on the literal or statistical analysis of NFF, recognizing that if the system’s MTBDR has decreased, or if the device’s NFF rate has increased with age and deterioration, a physical fault is most likely present.  If it isn’t found during conventional testing then it probably only fails intermittently.  Similarly, having an intermittent failure, in all probability cannot be detected or diagnosed at testing time because of known and demonstrated limitations in the conventional measurement equipment used to perform the tests.

 

The Testing Problem

Intermittence occurs randomly in time, place, amplitude and duration.  The very nature of this type of failure suggests that the ability to detect and further isolate the intermittence root cause is based on detection SENSITIVITY and PROBABILITY, rather than conventional methods concentrating on ohmic measurement accuracy. Simply put, you can’t detect an intermittent event until it occurs, and then you might have limited opportunities to catch it on the specific circuit when it does occur. Trying to measure fractions of a milliohm, scanning one circuit at a time, is ineffective for this particular failure mode.

Through extensive hands-on failure analysis and repair of Rogue / Bad Actor / NFF avionics and other aging electronics, our research and practical application revealed that nearly all NFF failures are caused by underlying intermittence in the circuit path interconnections, not the electrical components. The electrical components generally fail “hard” and are, by comparison, easy to troubleshoot and repair. In contrast, the interconnecting devices mostly fail intermittently. These types of “devices” are defined as the connectors, crimps, splices, circuit board traces and solder joints, bulkhead connectors, backplanes, switches, circuit breakers, fuse receptacles, etc. In short, it is all the electromechanical devices that mechanically tie the circuit components together.

Just like machinery, these particular devices wear gradually, or contamination builds-up over a period of time.  Rarely, unless damaged, will they be working perfectly one minute and the next become a repeatable, testable, hard failure.  Instead, the electromechanical devices go into a long and frustrating period of low-level intermittency as their mechanical tolerances change depending on their age, wear and the current operational environmental conditions such as temperature, humidity, vibration and contamination exposure.

When a particular circuit device’s electromechanical intermittence reaches sufficient magnitude, its overall electrical function begins to malfunction, resulting in increasing intermittent-type system failures, which when subsequently tested on the ground in a static environment, may perform sufficiently well as to avoid detection.

It is important to note here that an intermittence of sufficient amplitude and duration to cause a system malfunction during extremes in the operating environment is likely to manifest itself at a much smaller amplitude and duration during ground-based testing, unless environmental stimulus is applied.  The amount of stimulus required to expose an intermittence is inversely proportional to the sensitivity of the testing equipment used to detect the intermittence.

It’s at this point that NFFs circular logic and confusion begins. When a malfunction is reported but is no longer evident or easily detectable with conventional scanning test equipment, the maintenance specialist has only two expedient diagnostic choices: the intermittence is either in the aircraft or it is in the Line Replaceable Unit (LRU) / Weapon Replaceable Assembly (WRA). It’s unlikely that the pilot imagined or fabricated the original in-flight malfunction. Consequently, line technicians are often left to simply take a “shotgun” approach to the repair in an attempt to address the original write-up in a timely manner.  Unfortunately, by removing system elements prior to locating the root of the intermittence, the potential exists that the removal was not necessarily the problem.  Suggestions that the maintenance specialist simply pulled the wrong item due to inadequate training, tech orders, inexperience, etc., somewhat ignores the original reported malfunction from the weapon system operator and ensures that the defect remains undetected somewhere in the system.  If it positively is not in the LRU / WRA, then it’s more than likely still in the aircraft.

Since intermittence occurs primarily in electromechanical devices, when the “most likely” opportunity is calculated, the LRU / WRA becomes the most prominent suspect.  There are hundreds and in many cases thousands of potential failure points in a typical avionics box, whereas the aircraft circuits and connections leading into the box may be just a few hundred.

The Testing Solution

Once intermittent failure modes are clearly understood, it becomes quite evident why the vast array of conventional test equipment cannot efficiently or effectively test for or isolate the root-cause of this elusive problem.

In a typical avionics system, there are thousands of internal and external circuit paths moving electrons through thousands more physical interconnection points which are all aging to some degree, and will fail intermittently long before they fail permanently.  It only takes one of these devices reaching this condition to render the unit unreliable.  Since it is virtually impossible to manually probe such a system, and even if attempted, the probability that you would be measuring that specific path, at just the right moment, looking for the right signal, would be infinitesimal and futile.

By any reasonable scientific explanation of the problem, to catch intermittents on the ground, you need to have phenomenal testing speed (sensitivity) and a 100% bandwidth.   In other words, the proper technology for the task must be able to test all of the failing system’s paths all of the time, in a simultaneous and continuous fashion.  Conventional test equipment does just the opposite.  Most testing devices employ digital sampling and averaging techniques to achieve higher levels of parametric accuracy, which will completely “average” a short-duration, ohmic, intermittent event right out of existence.  Likewise, virtually all continuity testing devices also employ scanning methodology and while they may be physically connected to each circuit, they still only measure one circuit at a time and then only briefly.  A continuity test ONLY verifies that the unit under test is wired correctly and is stable at that specific moment. These devices are typically limited to measurement speeds in the 100–200 millisecond range which add up to some rather massive holes in intermittence test coverage when testing just a single line and event detection is nearly impossible on all of the interconnections found in typical avionics systems.

To address these testing limitations, the Intermittent Fault Detector (IFD) was developed specifically with intermittence requirements in mind.  It uses super sensitive analog detection technology on the front end and digital reporting and data processing technology on the backend, and it does it all in an efficient, parallel circuitry manner. The IFD consistently detects any intermittent circuit event on any circuit simultaneously, at ohmic glitch durations as short as 50 nanoseconds.  The numbers of simultaneous test points are scalable from 256 up to 20,000.

What does this mean in the overall scope of intermittence detection probabilities?  It means everything!  It means success or failure, reliability or unreliability, integrity of a test or no integrity whatsoever.

The Intermittent Fault Detection & Isolation System™ (IFDIS™) delivered to the 523rd EMXS at Hill AFB to test the

F-16 AN/APG-68 Radar System PSP

The Intermittent Fault Detection & Isolation System™ (IFDIS™) that will be delivered to the NAVAIR FRC SW  to test the F/A-18 GCU

While certainly not comparing ourselves to Albert Einstein, his formula, E=MC2, which explained the force unleashed by the atomic bomb, is very similar to the probability gains derived from the IFD technology to catch random intermittents.  To explain and demonstrate this enhanced capability in a system of simultaneous circuit paths under test we use a similar formula that we affectionately, with respect to Mr. Einstein, call:

Universal Synaptics’ Law of Intermittent Fault Detection Effectiveness:

E=SC2

In our formula, E is the Effectiveness that the IFD technology provides in detecting the most evasive of intermittent malfunctions (those causing NFF) in a given Unit Under Test (UUT) device versus any other comparable piece of test equipment (measured in a ratio:1).

S is the single circuit intermittence detection Speed advantage that the IFD has over the single circuit intermittent detection speed capability of any comparable testing technology… for the IFD, use 50ns, 50 nanoseconds, .00000005 seconds.

Simply stated, what is the ratio of the shortest glitch detectable by any two pieces of test equipment on just a single circuit?

Example: 100us divided by 50ns = 2000:1 or 100ms divided by 50ns = 2,000,000:1

C is the number of Circuits in the device that require simultaneous testing and this value is squared.

Note:  The number one question that arises when explaining and using the Intermittent Fault Detection Probability formula is:

“Why do you square the number of circuits to be tested ”(C)” in the comparison formula?

Since this is the key to the entire solution, let’s take a moment to fully understand it.

The reason the number of circuits under test is squared is that while every other single point or scanning-type testers are measuring one circuit at a time, the IFD is simultaneously testing all of the other circuits at the same time, for the same duration.  As the conventional scanning continuity tester moves on to test a new circuit, the IFD continues to test all the other connected circuits at the same time, for the same period.

Intermittence by its very definition is random in time, place, amplitude and duration. Therefore, the detection of intermittence is a condition of probabilities and the ability to detect it is measured in test coverage.

The following is a simple explanation of the squaring effect of simultaneous and continuous testing for intermittence (see Table 1)

Using an easy example of a 3 by 3 matrix of circuits (9 total circuits to be tested,) like a simple 9 pin cable, let’s compare.  Conventional scanning test equipment, while physically connected to all the circuits, still only measures one circuit at a time. While this technology might measure test point 1 for one second, the IFD’s all-lines, all-the-time technology, simultaneously and continuously tests all 9 of the circuits for that same one second, for 9 total seconds of intermittence test coverage. When conventional equipment then moves (scans) to measure test point 2, also for one second, the IFD tests all 9 circuits for another second, giving you 9 more seconds of intermittence test coverage. Conventional equipment then moves on to test point 3 for one second, and the IFD again tests all 9 circuits for that same one second.  When conventional testers have finally completed testing each of the 9 circuits for the one second each (9 seconds total), the IFD has just simultaneously tested all 9 circuits for 9 seconds each, (9 x 9) or 81 total seconds.

 

Table 1

 

 

Test Points

Duration

of Tests

Conventional Scanning

Test Coverage

IFD All-Lines

Test Coverage

1 1s 1 Second 9 Seconds
2 1s 1 Second 9 Seconds
3 thru 8 1s 6 Seconds 54 Seconds
9 1s 1 Second 9 Seconds
Total Coverage  

9s

 

9 Seconds

 

81 Seconds

 

It doesn’t matter if you have a 9 pin cable or a 10,000 test point avionics box, with the IFD’s simultaneous and continuous test technology; you effectively square the number of circuits in total test coverage.

In fact, the IFD’s test coverage is actually even better than this.  Scanning continuity testers take valuable test time to switch to the next circuit and in order to see ohmic glitches (rather than complete opens); they must also charge the new line sufficiently to see an ohmic change.  Meanwhile, the IFD dutifully keeps watch for ohmic events on any line for the entire duration.

 

Result:

Using this simple-to-calculate formula (E=SC2) for test coverage or probability gain of the IFD technology, you can begin to see why the IFD works and other technologies simply don’t.

For example, consider a state-of-the-art, scanning continuity tester that claims to test continuity at the rate of 3,500 test points a minute.  The single-circuit intermittent discontinuity detection speed could then be computed to be approximately 17ms (.017 seconds) (60/3500).

If you were testing just one wire or circuit, then the IFD at 50ns (nanoseconds) calculates to be 340,000 times more sensitive at catching intermittence on a single circuit.

S= .017 divided by .00000005 = 340,000 times more likely to detect NFF intermittence on a single circuit.

Now, take a 100-circuit chassis or cable.

Using the formula E=SC2:

E = 340,000 x 100 x 100 = 3,400,000,000

In this example, the IFD is 3.4 billion times more sensitive than the scanning continuity tester for detecting intermittent/NFF at 50ns on a 100 circuit chassis or cable.

Next, take a 1,000 test point coverage requirement, such as the (MLPRF) Modular Low Power Radio Frequency LRU chassis in the AN/APG-68 radar used on the F-16 Fighting Falcon:

Using E=SC2:

E = 340,000 x 1,000 x 1,000 = 340,000,000,000

In this example, the IFD is 340 billion times more sensitive than the scanning continuity tester for detecting intermittent/NFF at 50ns on a 1,000 circuit chassis or cable.

Similarly, take a 3,000 test point coverage requirement, such as the Radar Receiver (RR) WRA chassis in the AN/APG-73 Radar used on the F/A-18 Hornet:

Using E=SC2:

E = 340,000 x 3,000 x 3,000 = 3,060,000,000,000

In this example, the IFD is 3 trillion, 60 billion times more sensitive than the scanning continuity tester for detecting intermittent / NFF at 50ns on a 3,000-circuit chassis.

The demonstrated advantages in detection probability are why IFD technology is actively reducing the intermittent / NFF problem down to a 5 minute test in a typical avionics system as outlined above.  The simple to compute metrics also show conclusively why IFD technology works so well for resolving the intermittent / NFF problem.  This technology sees real intermittent circuit occurrences that conventional test equipment cannot see and was not designed to detect.  Given this “explosion” in test coverage, it becomes crystal clear why the IFD is the only applicable technology designed specifically for, and capable of, detecting, resolving, and gauging the overall problem and levels of intermittent/NFF.

The U.S. Air Forces use of and understanding of the intermittent / NFF problem has allowed them to achieve unprecedented results in a very short time.  The IFDIS is being utilized on the F-16 radar system (60% of the LRUs tested has one or more intermittent circuit that conventional test equipment missed).  The IFDIS has provided a more than tripling of the MTBDR, reduced squadron re-work by 50% and returned to service over $42 million dollars’ worth of critical flight hardware previously considered “unrepairable”.   All in, the U.S. Air Force has achieved an 28 times return on investment (ROI) to date.

With a whopping 72% of all Navy and Marine Corp maintenance actions being avionics related, one must ask what portion of that is NFF and CND and how much intermittence is being missed by conventional test equipment?

 

IFDIS is the Cure for Intermittent / NFF

IFDIS Intermittent Fault Detection and Isolation Systems can best be described as a 3-pillared approach to resolving intermittent / NFF.

These pillars consist of;

  1. The implementation and use of serialized data tracking to identify bad actors and repeat offender problems by aircraft and LRUs / WRAs.
  2. The application of light environmental stimuli to duplicate the operational environment and rapidly expose even the “lowest amplitude and shortest duration” intermittent circuits during test time.
  3. The use of precise intermittence testing technology. Intermittent Fault Detectors (IFDs) developed by Universal Synaptics Corp. are specifically designed to detect and isolate the underlying intermittent causes at levels of sensitivity and probability never before possible, as well as form and fit Interface Test Adaptation (ITA) to ensure that all of the potential failing circuit interconnects in the suspect devices are all tested simultaneously and continuously while closely simulating the aircrafts operational environment.