By Dan Friedlander
Retired following 44 years in components engineering
For decades, military/space electrical, electronic, and electromechanical (EEE) parts have proved to be suitable for use in military and space applications. [NASA’s Office of Safety & Mission Assurance (OSMA) evaluates newly available and advanced electronic parts for programs and projects under its EEE Parts program.] The traditional MIL-SPEC [U.S. defense standard/military specification] methodology is based on risk avoidance by testing the finished parts. Yet, global developments, like declining availability and budget constraints, have triggered the need to find an alternative solution.
The alternative solution was officialized in 1994 by U.S. Secretary of Defense William Perry’s directive imposing the use of commercial off-the-shelf (COTS) parts in military applications, exempting space applications. After decades of successful use of COTS in military applications, the above change has been proved viable.
Many challenging space application requirements can be met only using COTS. The time comes (better sooner than later) for the policymakers to reach a consensus on applying the COTS philosophy to space applications. This policy change is critical for the space industry.
This paper attempts to suggest different issues to be tackled in order to meet technical, quality assurance, and cost requirements using COTS. The term “COTS” in this paper refers to commercial EEE Parts, including plastic encapsulated active ones.
EEE parts control concepts
U.S. Secretary of Defense William Perry’s 1994 directive officially started the transition from the use of Military electrical, electronic, and electromechanical (EEE) parts in military applications to the use of commercial EEE parts (or COTS) in military applications. Space applications were exempted from meeting the directive requirements; however, the same drivers to this cultural change (like parts availability) apply also to the space industry.
MIL-SPEC versus COTS
The following table compares the main principles of the two subject concepts.
|Risk Avoidance||Risk Management|
|Parts Testing||Statistical Process Control (SPC)|
|Small Volume Production||High Volume Production|
For COTS, the main idea is:
– Risk Management: It is the Project responsibility to manage
the risk. The risk cannot be zeroed, even by most stringent measures.
– Process Control: The Process builds reliability into the part.
– Production Level: High volume results in reliability.
Quality versus reliability
Quality and reliability are two different terms to be understood:
Quality is the conformance to requirements at the start of use.
Reliability is the probability of parts to meet the relevant specification over the time, under the worst operational conditions. Reliability is quality changing over time.
In other words, quality is a snapshot at the start of the life and reliability is a motion picture over the life.
Part testing versus process control
Given the above quality/reliability clarification, it is clear that reliability cannot be tested into the part. The qualification and screening are not considered as a substitute for manufacturing control, but rather as risk mitigation measures. Consequently, the way to address the reliability issue is by Statistical Process Control (SPC). The process (design followed by manufacturing) builds reliability into the part.
In the environment of high-volume commercial EEE Parts production, the SPC works well, resulting in very low failure rates at start of use timing. That does not mean that the part testing is worthless. The MIL-SPEC focuses on part testing, and much less on process control.
The Statistical Process Control does not work for a low-volume production, like space/military EEE parts production. The testing mitigates the lack of statistics value. The reliability is built into the part. The quality is addressed by testing. That it is the positive side.
Usually, the risk of damaging the parts during testing is ignored by policymakers. Any part level extra testing (outside the necessary manufacturer’s inline testing) of the actual units to be flown results in higher risk than testing them. Selected sample testing is OK. The tested sample should not be used for flight.
The manufacturer of military-qualified parts is not incentivized to improve the process. Changes involve time and money to meet the qualification requirements. The qualification addresses the quality. Process improvements addressing reliability may be skipped and the specification met anyway at parts delivery timing.
This not the case with COTS; the focus is on process control. The high-volume production justifies the efficient implementation of statistics. Consequently, the reliability is addressed. The COTS manufacturer, freed from the MIL-SPEC restrictions, can continuously monitor the process and continuously improve it.
The term “high-reliability” or “hi-rel” is used exclusively for space/military ones. It is wrong to conclude that by definition all COTS are non hi-rel parts.
Decades of COTS use in military applications proved that COTS are reliable enough for the specific mission. The mission conditions in military are often more severe than in space, except radiation. The mission duration is often longer than a space mission.
The leading document for reliability prediction has, for decades, been MIL-HDBK-217. This handbook, on the reliability prediction of electronic equipment, systematically suppressed the use of non-military parts in military applications. In spite of the wide usage of the above document, after the 1994 William Perry directive, it has been declared unreliable.
On 15 Feb. 1996, a memorandum signed by Assistant Secretary of the Army/Research, Development, Acquisition Gilbert Decker stated: “In particular, MIL-HDBK-217, Reliability Prediction of Electronic Equipment, is not to appear in an RFP (request for proposal) as it been shown to be unreliable and its use can lead to erroneous and misleading reliability prediction.”
It has to be remembered that the reliability prediction model is based on the Arrhenius equation, a formula for the temperature dependence of reaction rates, considering one stress. Various experts claim that there is ample evidence that a straightforward application of the Arrhenius equation, with activation energies determined from high-temperature accelerated stress testing, is not the right way to deal with the matter.
The term “space qualified” has to be fully understood in order not to be misused. The definition is: space systems, subsystems, and components which meet the specification relevant to their use.
For example, often parts qualified to MIL-SPEC quality level “S” are called “space qualified”. Quality level “S” is the highest quality level and should not be automatically interpreted as “S” = “space.” Level “S”, without specified radiation hardness assurance (RHA) for space systems requirement in the specific part specification, ignores fully or partially the radiation in space.
Military versus space application
The exclusive requirements for space are: radiation, outgassing, vacuum, microgravity, and atomic oxygen. Proper selected COTS (including nonhermetic ones) can meet these requirements.
COTS for space applications/issues to be revisited
In order to progress in the process of massive introduction of COTS in space applications, well rooted requirements shall be revisited within the risk management philosophy, in order to save time and money.
Following are some suggested revisit targets:
Too often, technical specifications/scopes of work (SOW) specify an absolute reliability limit to be met, penalizing use of COTS. The use of the term shall be revisited in order to use it within its limitations. MIL-HDBK-217F explicitly states its limitations: “a reliability prediction should never be assumed to represent the expected field reliability.”
A misinterpretation may lead to not justifiable extra cost. Anyway, as MIL-HDBK-217F admits, “those who view the prediction only as a number which must exceed a specified value can usually find the way to achieve their goal without any impact on the system.”
Market driven availability
In view of the 2+ decades of official use of COTS in military applications, it is obvious that the apocalyptic prophesies did not materialize. The military-qualified and space-qualified EEE parts’ share in the EEE parts global market is less than 0.5 percent in dollars.
Nobody can stop a manufacturer from leaving a market found not to be profitable. Keeping the traditional approach means that we learn from history that we do not learn from history. The future EEE parts availability has to be secured for military applications and for space applications, as well. The business decisions are much stronger than any EEE parts policy.
Hermeticity requirement versus plastic taboo
The importance of package hermeticity is not arguable. The problems encountered in early stages (from the 60s) with nonhermetic plastic encapsulated semiconductors caused the military and space parts policy makers to taboo their use in military and space applications. Of course, the “hermeticity” of the plastic parts cannot compete with the hermeticity of the hermetic parts. The goal should be “enough” not the “best.”
COTS post procurement testing
The needed COTS introduction into space applications is shadowed by the high non-recurring engineering (NRE) involved. The high NRE is driven by post-procurement testing requirements derived from the well-rooted traditional approach of part testing.
It is the user’s responsibility to find a way to assess the total ionizing dose (TID) withstanding capability of any part and use it accordingly. The following issues should be revisited:
Radiation Design Margin (RDM)
The RDM requirement provides a systematic approach to managing the mission risk posed by uncertainties in both the radiation model and hardware susceptibility to radiation. RDM takes into account traditionally an historic lot to lot variation that may be different today due to better manufacturing process control (SPC) and improvements. Revisit may lead to applied RDM reduction, leading to more COTS being able to withstand the radiation requirements.
Lot-by-Lot TID testing requirement
In view of the improved process control (SPC) and high-volume statistics validity, the lot-to-lot and within-a-lot variation issue is worth to be revisited. From the point of view of radiation testing, military-level qualified parts and COTS has the same lack of die traceability problem of not being wafer traceable. The space industry turns a blind eye on the lack of wafer traceability for the traditionally acceptable military level parts, but penalizes COTS in space because of same traceability issue!
Part TID tolerance level
The part TID withstanding capability level is established by radiation testing and is defined as the TID level at which the part goes out the specification limits for the first time. To relax the requirement, another practice is applied by those having a meaningful database. It is not a good engineering practice, because it relies on out-of-spec part operation toward the end-of-life phase of the mission.
The higher level achieved is called Design Part TID Withstanding capability level. The design Worst Case Analysis (WCA) shall validate the above practice. Revisiting the above widely used practice may lead to new ideas of easing the introduction of relatively TID weak COTS in space applications. On the other hand, it may lead to the conclusion that when operating outside the spec the risk is too high.
EEE parts shielding
Use of more effective shield material lowers the predicted mission TID level seen by a part. Consequently, the required Part TID tolerance is decreasing and more parts become suitable for the space application (from TID point of view). There are better than aluminum materials (like tantalum), and there are more effective shielding techniques (like multilayer shields). Active shields (like magnetic) are in research stage.
Radiation Induced Latch Up (SEL)
SEL, one of the Single Event Effects, is destructive. The mitigation consists of a latch up protection circuit to quickly disconnect the damaging current. To implement such a mitigation, the latch up current shall be known from the SEL testing.
The SEL Testing is performed in heavy ions accelerators and involves high cost and a long time. In order to reduce cost and time, the following issues should be revisited:
SEL test method
Revisiting the traditional method versus another method may lead to cost reduction and timesavings. Another SEL Test Method, worth being considered, is the less expensive Pulsed Laser SEL (and other SEE) testing, proved as efficient. By the way, the pulsed laser source may play an important role in bit mapping, mitigating Multiple Bit Upsets (MBU). Another known SEL Test Method is Californium 252, if used within its limitations.
SEL rate prediction
There is a practice, to be revisited for validity and increased use, based on comparison with the reliability figure of the part. It states that if the SEL probability is less than one tenth of the relevant reliability figure, the part may be used for flight as is.
To help the rather complicated, difficult task of COTS SEL assessment, there is a strong need for knowledge of the relevant part technology/process basic data. The following information is needed: technology. process, foundry, and die revision. Alliances in the space industry can substantially contribute to avoid testing duplications.
Destructive Physical Analysis (DPA)
The DPA, focusing on the process related issues, is a not a removable post-procurement activity. Saying that, it does not mean that the DPA testing depth should not be revisited for optimization.
“Upscreening” means a raise of confidence level for using the part in the given space application. Upscreening traditionally is performed 100 percent, but it can also reach its goal by sample testing. The rationale behind sample (not to be flown) testing is the conviction that any handling of flight parts may damage them.
It is worth paying attention to the NASA warning in official document PEM-INST-001: “There are numerous data indicating that improper handling and testing of the parts can introduce more defects than are screened out.”
The traditional post-procurement 100 percent upscreening requirement at part level needs to be revisited. MIL-STD-883, Method 1015, Burn-in Screen, states that “burn-in is performed for the purpose of eliminating marginal devices, those with inherent defects or defects resulting from manufacturing aberrations which are evidenced as time and stress dependent failures. In the absence of burn-in, these defective devices would be expected to result in infant mortality or early lifetime failures under use conditions.”
As seen above everything starts and ends within the manufacturing process of the die. Manufacturing defects may result in failures. The present COTS are manufactured in a rigorous statistical process controlled high-volume production regime. That results in a substantial better outgoing parts quality. The reliability cannot be addressed by upscreening.
Extensive design, production, and operational use of COTS (as procured) in military applications in harsh environments, resulted in successful experiences. As the technologies advance and the functional integration at part level increases, the electrical post-procurement testing becomes more and more difficult, inefficient, and costly.
The successful use of COTS parts in military applications (very long operational life is often applicable) supports the viability of use of COTS in space applications. NASA PEM-INST-001 states: “For all PEMs, qualification by flight history or similarity is not acceptable.” It is not understood why NASA is still following the traditional way of thinking, imposing such a general restriction on all PEMs. Qualification and reliability monitoring are performed routinely by best in class parts manufacturers.
Usually, the majority of the players in the space industry does not have all the skills and infrastructure to manage the EEE parts procurement alone. It is preferred to outsource these activities to specialized companies, called central parts procurement agencies (CPPAs). The procurement methodology for COTS can follow the one practiced for space parts. Centralization of the relevant activities is even more beneficial for COTS (reduction of duplicate NRE).
The 1994 Perry move meant reversal of EEE parts selection priority for military applications: First priority is given to use COTS. The focus went to process control, as the fundament of part reliability. The 1994 move to use of COTS in military has been proven successful.
For space applications, there is a hesitating move to permit the use of COTS, keeping the traditional EEE parts selection priority: first priority to space parts. This methodology is a “no choice” permit to use COTS in space, within the 100 percent testing regime (moved to post-procurement phase) of the traditional (pre-Perry directive) philosophy.
The minuscule military/space market life is finite and unpredictable.
The present methodology viability depends on the EEE parts availability driven by business decisions. Consequently, the above methodology is not a secure solution.
A revised, pragmatic space tailored COTS methodology has to be established and officially recognized. The methodology has to be based on the concept of process control rather than finished 100 percent part testing.
There is a strong technical need for use of complex parts built using advanced technologies, available only as COTS.
There is a strong need for cost reduction and schedule shortening.
Educated decisions on eliminating not value added activities is a must for cost reduction. As per the present methodology the COTS part ownership cost is often higher than the procurement cost of space parts.
There is no unsolvable technical issue to block the use of selected COTS in space applications.
Suggested action items
Absolute reliability figure shall not be specified as a requirement to be met. Reliability prediction models shall be understood and not misused.
The EEE parts availability assurance shall be considered a high-priority, critical parameter in their selection process. A new approach is needed to better secure the parts availability.
COTS post-procurement testing shall be optimized, eliminating activities that do not add value.
The present methodology has to be revisited for optimization and adaptation to the reality.
Shared parts radiation databases shall be established to avoid duplicate resting.
Focus on part types reduction and orders consolidation.
The time has come (better sooner than later) for policymakers to reach a consensus on applying a realistic COTS philosophy to space applications. There is no unsolvable technical issue to block the use of selected COTS in space applications. The resistance to change is the main obstacle to be overcome.
“There is one thing stronger than all the armies in the world, and that is an idea whose time has come.” — Victor Hugo
There is no elevator to success. You have to take the stairs.
The author has graduated ENGINEERING SCHOOL/TEL AVIV UNIVERSITY, physics 1965-1969. He has 44 years of experience in Component Engineering at MBT/ISRAELI AEROSPACE INDUSTRIES, 1969-2013, as Head of Components Engineering. He was responsible for all aspects of EEE Components (policy making, standardization at corporate level, approval, etc.) for military and space applications. Retired/Consultancy: 2013 – present. Further details of experience: see https://www.linkedin.com/in/dan-friedlander-63620092?trk=nav_responsive_tab_profile