
Failure Analysis by Case-Based Reasoning
P.R. Roberge
Department of Chemistry and Chemical Engineering Royal
Military College
Kingston, Ontario, Canada, K7K 5LO
M.A.A. Tullmin
Materials and Metallurgy Engineering Department
Queen's University, Kingston, ON Canada, K7L 3N6
K.R. Trethewey
Department of Engineering Materials
University of Southampton
Southampton UK, SO17 1BJ
email: roberge-p@rmc.ca
ABSTRACT
Computer reasoning by analogy, a technique known as Case-Based Reasoning (CBR) has met with tangible success in such diverse human decision-making applications as banking, autoclave loading, tactical decision-making, and foreign trade negotiations. Failure analysts and corrosion engineers also reason by analogy when faced with new situations or problems. Many non-expert failure analyses reach the wrong conclusions simply because they fail to define the system adequately or consider a case in isolation from a history containing other related or similar cases. The consultation of how similar cases were solved in the past is often the first lead a failure analyst will seek before refining the analysis process. Such a step can help to prevent many of the dead-ended searches that can greatly complicate the life of a failure analyst dealing with a corrosion problem. The CBR approach is particularly valuable in cases containing ill-structured problems, uncertainty, ambiguity, and missing data. The large number of session7/abstracts found by searching the modern engineering literature on the subject of failure and corrosion can provide a statistically solid foundation of specialized documents for further classification. This paper describes a method of indexing case histories for use in a case-based reasoning system in support of failure analysis. In the present paper, the keywords associated with each abstract stored in a commercial engineering literature abstracting system were fitted to a framework of materials degradation factors in order to generate indexing guidelines for failure analysis information. This approach has the merit to minimize the risk of context sensitivity surrounding the description of degradation processes that can affect materials and systems.
An interesting artificial intelligence (AI) paradigm that crosses the line between scientific information and human cognition is the technology described as case based reasoning (CBR). CBR is, in fact, more based on a psychological theory of human cognition than on any specific computer methodology [1]. The main power of the CBR technology resides in the potential to develop a tool that learns as cases are resolved. The ideal input of a CBR is the case solving process itself, provided the original indexing system is rugged and stable, an activity intrinsically very close to the activities of human experts. A successful failure analysis organization has to provide the fastest possible turn around time on incoming contract work. It happens from time to time, in urgent failure situations, that a client will expect a report on a failure investigation to be issued on the very same day that the damaged or failed component was delivered to the laboratory. High stress levels are commonly experienced by failure analysis consultants and their support staff due to time pressures, the handling of several ongoing failure analyses at the same time and due to the major implications (financial and other) that often arise from the conclusions of a report, especially in litigation or dispute related failure analysis work.
Even with a solid client base, it is relatively difficult to provide a failure analysis service on a profitable basis. The costs of employing the expert and support staff have to be covered, together with capital invested in sophisticated laboratory equipment (such as scanning electron microscopes), running costs and overheads. Any professional and equipment usage time saved during an investigation directly boosts profitability. It should also be borne in mind that a quote for a failure investigation is usually supplied (often conservatively) prior to the actual analysis and any over runs on time and effort spent on the failure represents an immediate financial loss.
Considering the above scenario, a CBR system should be of considerable utility to a failure analysis operation. The "instant" availability of previous related cases on a computer should significantly reduce the time and analytical procedures invested in completing an investigation. With the CBR, the critical key elements of the failure should be readily apparent from previous cases (Figure 1). In fact, previous cases are often invoked in a conventional failure investigation, but a paper based report filing system is laborious and time consuming for such purposes. Hundreds of reports tend to be generated each year by a team of investigators and a data base of thousands of failure analysis reports may be stored in filing cabinets of an organization, with no convenient mechanism available for re-use of this valuable information. This represents wastage of a very important organizational asset.

The value of a computerized CBR system for knowledge re-use should also be apparent in the context of staff turnover, retirements or retrenchments of experienced senior personnel, the promotion of senior "technical" personnel to managerial positions, staff on leave and the considerable time spent by experts on site, away from the central laboratory facility. During site work, the knowledge of these experts is not available in the laboratory environment but could be replaced by the availability of a CBR system. A common problem experienced by less experienced failure analysis staff is to identify the actual cause(s) of a failure, rather than merely the mechanism of failure, and to make recommendations for avoiding future failures. Such recommendations are often the ultimate client need and the most important component of an investigation. The proficiency and confidence levels of staff in meeting this challenge should improve considerably with the aid of "advice" gained from previous cases presented to them by the CBR system.
When performing a diagnosis, a properly designed CBR could support the human expert who, employed by a client with a corrosion failure, conducts an interview to determine the nature of the failure. During the interview - essentially a question-and-answer session - the human expert would naturally seek precise details of materials, environments, operating conditions etc., but usually widens these inquiries to include what would be the most standard considerations for the operation of a complex system:
The failure analysis process, be it related to a corrosion situation or any other problem, has to be well established before an investigation can proceed smoothly. A well organized system for collecting, analyzing, disseminating, and following-up reliability data is a fundamental requirement of any reliability work. On one hand, too little or inadequate information is useless if not simply misleading, and on the other too much information, if not properly channeled or filtered, can lead to information overload and misinformation due to apparent contradictions. The cycle of data collection and analysis, represented in Fig. 1, is normally a close loop in which the data collection both precedes and follows the design of the data collection strategy.
The use of fault and fault mode templates to rationalize an investigation is not new. In fact, there is a vast body of literature on reasoning by analogy but CBR and case-based planning seem to be the leading technologies [2,3]. A critical issue for the successful development of such systems is the creation of a solid indexing system since the success of a diagnosis depends heavily on the selection of the best stored case. Any misdirection can lead a query down a path of secondary symptoms and factors. It is therefore very important to establish an indexing system that will effectively indicate or counter-indicate the applicability of a stored case. Three issues are particularly important in deciding on the indices [4]:
Case indexing processes usually fall into one of three kinds: nearest neighbor, inductive, and knowledge-guided or a combination of these [5]. The following lists briefly describe the main points of each of these techniques:
Indexing by nearest neighbor
Inductive indexing
Knowledge-based indexing
The construction of a Knowledge Based System (KBS) to mimic the human expert's knowledge meets a number of obstacles. How does the expert decide which questions to ask when he is on a case and in what order? Answers to any given question frequently determine the one which follows. How do you cope with answers which contain errors or imprecise information? How do you ensure that the database of case histories upon which the corrosion knowledge would be based is sufficiently representative, i.e. that the 'experience' of the expert is sufficiently broad to encompass the failure in question? Given the broad range of parameters involved in most corrosion situations and the frequent absence of hard data, the indexing approach selected for this project was a combination of indexing with a framework constructed on knowledge-based principles with mechanisms triggered by inductive indexing. This combination was adopted to reduce the complexity of ordinary corrosion problems while allowing to operate with a limited number of cases.
Corrosion failure analysis has frequently been a bottom-up approach, starting with an investigation of primary mechanisms. This highly case-specific and scientific approach requires that a multitude of questions be asked through different tasks such as:
Even with great effort, this approach does not necessarily lead to the real cause of a corrosion failure and can be costly because it is both labor and time-intensive. Conversely, the systems approach in which an 'expert' applies his experience and corrosion knowledge to a situation can frequently provide a solution to a problem without recourse to an intensive investigation. In fact, there are many corrosion failures which are not amenable to solution by study of primary mechanisms and which are best tackled by an evaluation of secondary mechanisms by an 'expert'. The main problem faced by KBS developers is to structure computerized information in a fashion that will respect the functions carried out by human experts. Surprisingly, despite the vast amount of study of primary mechanisms, there have been few attempts to structure corrosion knowledge in logical formats amenable to algorithmic interpretation. With the explosion of effort in computing science and knowledge representation in general, has come the concept of object-oriented models of knowledge. Such an approach was proposed in support of failure diagnosis in another paper [6].
Context sensitivity is a fundamental character of failure analysis. For one, the criticality of a specific failure is very much system dependent. Pitting corrosion, which, for example, is almost a common denominator of all types of localized corrosion attack, may assume different shapes. Pitting corrosion can produce pits with their mouth open (uncovered) or covered with a semi-permeable membrane of corrosion products. Pits can be either hemispherical or cup-shaped. In some cases they are flat-walled, revealing the crystal structure of the metal, or they may have a completely irregular shape [7]. But a definition to satisfy the top-down approach would describe a pit as the concentration of corrosion in a small area of the total metal surface, in a manner that can produce failures even though only a relatively small amount of metal has been lost and the overall corrosion rate is relatively low [8]. In some instances, pits can be very damaging even at the mm level, while in other cases an inspector would only get nervous upon the appearance of pits at the cm level.
The complexity that becomes apparent at this level of description can be reduced systematically by codifying many of the ideas which have been part of historical corrosion 'common sense.' According to Staehle, only four modes are required to describe the specific morphology of any corrosion situation, i.e. general corrosion, pitting, intergranular corrosion, and parting or stress corrosion cracking [9]. Each mode can thus be visualized in relation with other modes or in relation with its own sub-modes or controlling parameters. Another point of view proposed by Staehle is that all engineering materials are reactive chemicals and that the strength of materials depends totally upon the extent to which environments influence the reactivity and subsequent degradation of engineering materials. Thus in order to define the strength of an engineering material it is essential to define the nature of the environments affecting the material over time.
In order to assess the evolution of science in the field of lifetime prediction of environmental cracking (EC), Staehle elaborated a systematic framework of the factors controlling the incidence of EC. This framework has proven to be an excellent model for the organization of knowledge and information concerning corrosion failures in general [10]. Table 1 describes the six main factors controlling an EC situation according to Staehle. Once a model is firmly established, it can be used to organize the available information into a KBS compatible system. But in order to avoid context sensitivity and provide an indexing system for reasoning with real information it is also important to test and document the framework with cases and other descriptive material. This was achieved by searching a database of modern engineering literature with general keywords to capture all the publications related to the present theme, i.e. corrosion and failure.
| Material | Chemical composition of alloy |
| Structure | |
| Grain boundary (GB) composition | |
| Surface condition | |
| Environment | |
| chemical definition | Type, chemistry, concentration, phase, conductivity |
| circumstance | Velocity, thin layer in equilibrium with relative humidity, wetting and drying, heat transfer boiling, wear and fretting, deposits |
| Stress | |
| stress definition | Mean stress, maximum stress, minimum stress, constant load/constant strain, strain rate, plane stress/plane strain, modes I, II, III, biaxial, cyclic frequency, wave shape |
| sources of stress | Intentional, residual, produced by reacted products, thermal cycling |
| Geometry | Discontinuities as stress intensifiers |
| Creation of galvanic potentials | |
| Chemical crevices | |
| Gravitational settling of solids | |
| Restricted geometry with heat transfer leading to concentration effects | |
| Orientation vs. environment | |
| Temperature | At metal surface exposed to environment |
| Change with time | |
| Time | Change in GB chemistry |
| Change in structure | |
| Change in surface deposits, chemistry, or heat transfer resistance | |
| Development of surface defects, pitting, or erosion | |
| Development of occluded geometry | |
| Relaxation of stress |
The CompendexPlus system was searched and the results archived in a database for further analysis. The keywords used for this search were failur(?) AND corro(?) where (?) indicates truncation. The resultant number of session7/abstracts and corresponding number of keywords used to characterize each abstract are presented in Table 2. The ratio of keywords per abstract indicates the consistency in the abstracting scheme used during that period and the number of unique keywords for each period reveals the level of compression which is achievable for each year of the study. Further compression was possible by combining all these unique keywords into a single database. The total number of unique keywords for the 1990-1995 period was found to be 1725, i.e. 81% compression of the total number of keywords for the same period. Validation of the framework presented earlier was attempted by classifying these unique keywords into the six factors controlling any corrosion situation. The classification process was made by attributing to each unique descriptor up to three factors. Table 3 lists the occurrence of each main factor of the framework of corrosion failure during the classification of the 1725 unique factors that described the database of 1156 hits of five years of literature session7/abstracts.
aonly one quarter of that year was available
at the time of the study.
During the factor assignment, many keywords, i.e. 56%, were not attributed any of the six factors controlling the probability of a corrosion failure, while 33%, 9%, and 2% were attributed one, two and three factors respectively. It can be seen in Table 3 that the most commonly recognized factors were Material and Environment. On the other hand, the time factor, which really describes the history of a system, is probably the most mysterious factor of the series. Only two occurrences were found for this factor and these were not associated with any other factor. In comparison, the environment factor was associated with 36% of the occurrences, the geometry factor 56%, the material factor 29%, the temperature factor 86%, and the stress factor 30%.
![]() | ![]() |
![]() | ![]() | ![]() |
![]() | |
| Environment | 254 | 22 | 38 | 44 | 0 | 10 |
| Geometry | 22 | 66 | 17 | 2 | 0 | 3 |
| Material | 38 | 17 | 400 | 64 | 0 | 21 |
| Temperature | 44 | 2 | 64 | 111 | 0 | 3 |
| Time | 0 | 0 | 0 | 0 | 2 | 0 |
| Stress | 10 | 3 | 21 | 3 | 0 | 117 |
A further logical and necessary development in this CBR initiative is the indexing of real failure cases. An initial assessment of this task has been made by considering nine cases of stress corrosion cracking published in the ASM Handbook of Failure Analysis [11]. These nine cases were indexed by a failure analysis "expert" in terms of the Stress Factor (Table 1). The descriptive elements related to the stress factor in these examples are presented in Table 4 with reference to an association to the keywords that had been associated with the stress factor in the database containing the results of the search on failur(?) AND corro(?) of the CompendexPlus abstracting system.
Out of a total of 45 expressions describing the influence or nature of the stresses involved in the examples, 24 (or 53%), identified with *, had an exact match in the list of keywords extracted from the database of keywords and 16 (or 36%), identified with < >, contained a variant of the word stress. Another 3 (or 7%), identified with |, contained one keyword from the list. The last 2 expressions, identified with , were in the global list of keywords but had inadvertently not been associated with the stress factor. While such a high success rate is very promising, some complications were revealed during that exercise, such as the requirement of dealing with "negative" index items. For example in example 8 (Table 4), the item "no excessive deformation" will be crucial to this failure's correct interpretation. If the impact and meaning of "no" was lost in the computer processing of character strings, the failure would, in all likelihood, be misinterpreted and the processing of subsequent cases be corrupted. A strategy to automatically assess the impact of such negation still has to be developed.
| Cases | Stress definition |
| Example 1 | residual stresses * |
| stress relieved * | |
| stress corrosion cracking * | |
| stress relief anneal * | |
| forming * | |
| Example 2 | stress corrosion cracks * |
| sustained tensile <stresses> | |
| tightening * | |
| hoop <stresses> | |
| high torque | |
| stress corrosion * | |
| direction of <stress> | |
| stress corrosion cracking * | |
| transverse tensile <stresses> | |
| shear failures * | |
| Example 3 | residual stresses * |
| residual internal <stresses> | |
| stress corrosion cracking * | |
| retubing | |
| stress relief * | |
| Example 4 | unloading pressure * |
| stress corrosion cracking * | |
| high <stress> levels from welding, unloading and traveling | |
| design for minimum <stress> levels | |
| Example 5 | stress corrosion cracking * |
| stress corrosion cracks * | |
| Example 6 | <stress> concentration |
| stress corrosion cracking * | |
| stress corrosion cracks * | |
| progressive <stress corrosion cracking> | |
| effective <stress> exceeded yield strength | |
| Example 7 | hoop <stress> of sustained tension |
| stress corrosion cracking * | |
| absence of paint in <stressed> area | |
| <stress> from interference fit | |
| Example 8 | stress corrosion cracking * |
| no indication of excessive |loading| | |
| no evidence of |deformation| | |
| <stress> raiser | |
| Example 9 | externally applied <stress> |
| bend tests * | |
| stress corrosion crack * | |
| stress corrosion cracking* | |
| applied <stresses> | |
| no loss of |strength| |
* Denotes item consistent with unique descriptor identified from computerized search of Compendex*PlusTM System
The list of keywords associated to the stress factor were further subdivided in categories to facilitate the indexing process and eventually allow to develop a system of weights for a more precise indexing of cases. These categories can be used as text boxes with room for a relative weight. The negation would be handled by putting a negative weight on the listed keywords.
Many non-expert failure analyses reach the wrong conclusions simply because they fail to define the system adequately or consider a case in isolation from a history containing other related or similar cases. The consultation of how similar cases were solved in the past is often the first lead a failure analyst will seek before refining the analysis process. Such a step can help to prevent many of the dead-ended searches that can greatly complicate the life of a failure analyst dealing with a corrosion problem. The large number of session7/abstracts found by searching the modern engineering literature on the subject of failure and corrosion provides a statistically solid foundation of specialized documents for further classification. In the present paper, the keywords associated with each abstract stored in a commercial engineering literature abstracting system were fitted to a framework of materials degradation factors in order to generate indexing guidelines for failure analysis information. This approach has the merit to minimize the risk of context sensitivity surrounding the description of degradation processes that can affect materials and systems. Some examples of failures were examined for the recognized influence of stress and the descriptive elements associated with this factor compared to the keywords gathered from the literature. A strategy for creating weighted indices will have to be developed from such comparison.