University of Bielefeld - Faculty of technology | |
---|---|
Networks and distributed Systems
Research group of Prof. Peter B. Ladkin, Ph.D. |
|
Back to Abstracts of References and Incidents | Back to Root |
The world has changed significantly for air travellers in the 1990s. New generation aircraft, such as the Airbus A319/320/321/330/340 series and the Boeing B777 are now in service. These aircraft are `fly-by-wire'-- their primary flight control is achieved through computers. The basic maneuvering commands which a pilot gives to one of these aircraft through the flight controls is transformed into a series of inputs to a computer, which calculates how the physical flight controls shall be displaced to achieve a maneuver, in the context of the current flight environment. While large commercial air transport aircraft have had hydraulically-aided controls for along time, and military air forces have been flying their aircraft under computer control for equally as long, many people including myself believe that the use of computer primary flight control in commercial transports heralds a new era, in which scientists concerned with specification and verification of computer systems, as well as passengers concerned with personal safety, should renew their interest.
It would be pleasant to say there have been no accidents. Unfortunately, as with many new types of aircraft, the Airbus A320/A330/A340 series has had its share. There have been fatal accidents with A320 aircraft in Bangalore, India; in Habsheim, in Alsace in France; near Strasbourg, also in Alsace in France; and in Warsaw, Poland. An A330 crashed on a test flight in Toulouse, killing Airbus's chief test pilot as well as the other crew on board. An A340 suffered serious problems with its flight management computer system en route from Tokyo to Heathrow, and further significant problems on approach to Heathrow. In late 1995 and early 1996, the B757 (not a fly-by-wire aircraft) suffered its first two fatal accidents in a decade and a half of service.
Even transport aircraft which are not fly-by-wire have nowadays their share of safety-critical computer-based systems. New-generation flight management and guidance systems (the so-called `glass cockpit') and Full-Authority Digital Engine Controls (FADEC) are found on most newly-built aircraft.
I collect here a series of comments and papers referring to recent computer-related accidents from various experts and not-so-experts. The collection was originally restricted to fly-by-wire aircraft, but has since broadened to include other computer-related or maybe-computer-related incidents. This page will grow with time. I sincerely hope not by much.
The US National Transportation Safety Board (NTSB) is responsible for analysing mishaps and accidents.
The NASA Ames Research Center in Mountain View, California has run for many years a program called the Aviation Safety Reporting System (ASRS). Users of the National Aerospace System are encouraged to report incidents and events which they feel may affect the safety of flight, or of which knowledge may contribute to flight safety. The reports are guaranteed to remain anonymous, and immunity from punitive administrative action is granted under most circumstances to those who can demonstate that they have reported the relevant safety-related incident to the ASRS. The result is an unparalleled accumulation of data on safety-related incidents. These are summarised in the ASRS monthly newsletter, Callback, and a journal, Directline. On-line copies of recent issues of Callback (since issue 187, December 1994) and Directline (since issue 5, March 1994) are available from the ASRS publications page.
RTCA is a private, not-for-profit organization that addresses requirements and technical concepts for aviation. Products are recommended standards and guidance documents focusing on the application of electronics technology. RTCA was organized as the Radio Technical Commission for Aeronautics in 1935. Since then, it has provided a forum in which government and industry representatives gather to address aviation issues and to develop consensus-based recommendations. It was reorganized and incorporated in 1991. Members include approximately 145 United States government and business entities such as: the Federal Aviation Administration, Department of Commerce, U.S. Coast Guard, NASA; aviation-related associations; aviation service and equipment suppliers; and approximately 35 International Associates such as Transport Canada, the Chinese Aeronautical Radio Electronics Research Institute (CARERI), EUROCONTROL, the UK CAA, Smiths Industries, Sextant and Racal. The Web site includes a broader statement of what they do.
Recommended Standards Documents such as DO-178B Software Considerations in Airborne Systems and Equipment Certification, DO-197A (Active TCAS I for `commuter' aircraft), DO-185 (TCAS I and II, including algorithms in pseudocode), DO-184 (TCAS I Functional Guidelines), as well as the standards for navigation equipment, windshear detection and thundestorm avoidance equipment are all available for purchase via the RTCA Web site. A full list of applicable standards is also available.
Frequency or rate of accidents can tell us what the likelihood of an accident may be, if all contributing factors remain the same. Likelihood of an accident should not be confused with risk. Risk is an engineering term which attempts to combine the likelihood of an accident with the severity of the consequences. For example, you could stub your toe while entering the aircraft from the airway. The likelihood of this could be much greater than the likelihood of you sustaining severe injury on board, but the risk might be much lower, because the consequences (a sore toe) are not severe. Risk is explained by Leveson (Safeware, Addison-Wesley 1995, p179) as the hazard level combined with (1) the likelihood of the hazard leading to an accident [..] and (2) hazard exposure or duration [..]. A hazard is itself explained as a state or set of conditions of a system that, together with other conditions in the environment of the system (or object), will lead inevitably to an accident (op. cit., p177). Hazard severity is measured by assessing the severity of the worst possible accident that could result from the hazard, given the environment in its most unfavorable state (op. cit., p178, modified). Hazard level is a combination of severity with likelihood of occurrence (op. cit., p179). Everything clear?
Consider you are the punk in the Dirty Harry movie at the end of Mr Eastwood's pistol. `Do you feel lucky, today, punk? Well, do yah?'. Mr. Eastwood thus makes it clear he constitutes a hazard. If you feel lucky (the `other conditions in the environment' are that you do, and you will go for it), this will lead inevitably to an accident (for you, not for Mr. Eastwood). Furthermore, the hazard severity is high (being shot is quite severe), and the likelihood? Very close to that of how lucky you feel (Mr. E could in real life suffer a stroke at the very moment you move, but this is the movies so the likelihood is zero).
See the section The Measurement of Risk, below, for some more comments on risk and perception of risk (which are known to differ, according to research by social psychologists).
There follows a short synopsis of 1996 accident statistics, with a list of significant fatal airline accidents. Boeing has for many years produced an annual statistical summary of aircraft accidents. Some excerpts from the 1959-1995 summaries show:
People seem to like to make comparisons between the risk to life of driving a car and the risk to life of flying on commercial carriers. But exactly what are these figures, where did they come from, and who did the comparisons? There were some articles in the journal Risk Analysis from 1989-91, using figures from the late 70's to late 80's, which I summarise in the essay To Drive or To Fly - Is That Really The Question?. The answer is that it depends on who you are. But one thing is pretty certain. If you're a drunken teenager, better take the bus.
Perrow is a pioneer in this area, and whether you agree with him in detail or not (and I have some reservations), it's pretty much required that one understand his work. A recent essay on an aviation accident by William Langewiesche, who writes responsibly and well, and whose prose is a joy to read, attempts to apply Perrow's ideas (and those of later sociologists Scott Sagan and Diane Vaughan), as well as describing in terms which one rarely reads exactly what it is like being around a major accident. I must confess to being speechless, sad, a little frightened, and feeling much too close to things after reading his account. Even though The Lessons of ValuJet 592 deals with an accident that is not computer-related, as far as anyone knows, the immediacy which Langewiesche brings to his descriptions, and the consideration to his thoughts, illuminates the horror and tragedy in our subject in a way I am unlikely to forget. Which is why I include it here.
In her article The Fall of TWA 800: The Possibility of Electromagnetic Interference, New York Review of Books, Special Supplement, April 9, 1998, pp59-76 (also available at http://jya.com/twa800-emi.htm), Elaine Scarry proposed that EMI might have been a causal factor in the crash of TWA800 in July 1996. I find this supposition highly implausible, and I wrote a critique of her argument, EMI and TWA800: Critique of a Proposal, Report RVS-J-98-03, on 10 April 1998.
National authorities supply navigational information to five industry data suppliers (Jeppesen, Racal Avionics, Aerad (British Airways), Swissair and GTE Government Services), which then supply this information to the almost twenty manufacturers of FMS's, many of whom have many different models. There is some concern about quality control in implementation of this data in FMS's. Although there is a standard, ARINC 424 from the industry/user group ARINC, which is `loosely followed' by the industry, this standard has no regulatory force and is not connected with any regulatory process. Shawn Coyle, of Transport Canada's Safety and Security division, has written a working paper, Aircraft On-Board Navigation Data Integrity - A Serious Problem, assessing the situation. It is not good. Coyle's argument is that FMS's are proliferating; that soon they will be used as primary navigation devices as GPS approaches come into use (they are advisory devices only at the moment - other avionics are the primary navigational devices); that they will therefore be used for precision instrument approaches, in which integrity of data is vital; and that there is no regulatory oversight into the integrity of the data used by these devices, nor into the process by which the data is implemented or updated in the FMS. Coyle gives eight examples in which nav data implemented in an FMS leads an aircraft to fly a profile different from the published procedure. Coyle says that Transport Canada is the first organisation to have systematically identified the problem.
The FAA's system modernisation effort was started in 1981. The Advanced Automation System (AAS) contract with Loral (formerly IBM Federal Systems Division) was cancelled in mid-1994 because of `schedule slips and cost overruns', and a much-reduced AAS design is being implemented. The NTSB produced Special Investigation Report NTSB/SIR-96-01 on January 23, 1996 in which they assessed the safety implications of the outages and the planned modernisation effort. They found that, despite sometimes severe degradation of service (delays to traffic), for the one-year period from September 12, 1994 to September 12, 1995, there was only one reported `operational error', a loss-of-separation incident, at Oakland Center on August 9, 1995, and that the modernisation efforts were appropriate in their new, evolutionary, form.
The U.S. General Accounting Office (GAO) has also kept a close watch on the AAS. Reports RCED-97-51, Air Traffic Control: Status of FAA's Standard Terminal Automation System Replacement Project, AIMD-97-30, Air Traffic Control: Complete and Enforced Architecture Needed for FAA Systems Modernization, AIMD-97-47, Air Traffic Control: Immature Software Acquisition Processes Increase FAA System Acquisition Risks are available on the WWW. Overviews of what the GAO calls High-Risk Projects, which it considers "at high risk for waste, fraud, abuse and mismanagement" (!), and which include the FAA AAS, are also available: HR-97-1, High-Risk Series: An Overview, HR-97-9, High-Risk Series: Information Management and Technology, and HR-97-2, High-Risk Series: Quick Reference Guide.
A recent perspective on the U.S. Air Traffic Control system, suggesting that the most worrying aspects lie on the human side, in the working conditions of air traffic controllers as air traffic increases, was proposed by pilot and journalist William Langewiesche in the October 1997 Atlantic Monthly article Slam and Jam. Whether one agrees with Langewiesche's perspective or not, he writes responsibly and well, and is a joy to read.
We include a PDF version of the full report, the Risk Assessment Study: Final Report
The Boeing B737 and Airbus A320 are rival airplane series. The A320 is the subject of many reports in this compendium. The B737 has recently come under investigation for suspected and reported rudder-control anomalies. Some investigators suspect that such anomalies may have played a role in the unexplained crashed of United Airlines Flight 585 on 3 March 1991 near Colorado Springs, and USAir 427 on 8 September 1994 near Pittsburgh. The NTSB has prepared an extensive report on its investigations, released at a public meeting on 16 October 1996, which contains recommendations A-96-107 through A-96-120.
The Formal Methods and Dependable Systems Group in the Computer Science Laboratory at SRI International has been pioneering formal methods for aerospace for a quarter century. Much of their work concerning SIFT (the first attempt to develop a provably-correct digital flight control system) in the 70's, and the subsequent development of the logic specification and proof systems EHDM and PVS, and their application to problems in digital flight control and avionics, is accessible with the WWW.
Much of SRI's work on formal methods for aviation systems has been supported by the NASA Langley Formal Methods Team, who also have publications in this area.
Nancy Leveson and her group at the University of Washington are applying formal methods (using RSML, a language for describing state machines suitable for requirements specification) to the analysis of TCAS II, the Traffic Alert and Collision Avoidance System, Version II. These papers may be found under the page for the Safety Research Project. Nancy Leveson has moved in 1999 to the Department of Aeronautics and Astronautics at MIT.
Other recent published academic research on applications of formal methods to aviation includes
While initially the accident seemed to have little to do with automated systems, it turned out that the Minimum Safe Altitude Warning (MSAW) System used by the Agana tower controllers and installed at nearby Andersen Air Force Base some 10 nautical miles beyond the departure end of Rwy 6L had unbeknowst to controllers not been operational, due to software errors in a new software installation.
Furthermore, when the descent profile and CVR transcript became available, questions were raised about the crew's "resource management" that are also pertinent to dealing with more recent automation and procedures.
These two points were sufficient for us to include two documents on this accident in the compendium.
The first, The Crash of Flight KE801, a Boeing B747-300, Guam, Wednesday 6 August, 1997: What We Know So Far, puts together publically-available information from the weeks after the crash; analyses this information with a view to determining what facts were available; and compiles and comments on the often confusing, unreliable and occasionally frankly false information distributed by news organisations as well as other organisations involved in the crash. Part of the purpose of this commentary is to establish a `social context' for the aftermath of a crash, as the author attempted also to do for the case of Aeroperu Flight 603 in 1996.We shall include the full set of documents provided by the US National Transportation Safety Board for the Public Hearings on the accident in March 1998, which is a local copy of the original Public Hearings documents on the NTSB Web site.
A draft of the final report (in German!) from the Dirección General de Aeronáutica Civil of the Dominican Republic, obtained from the Deutsche Luftfahrtbundesamt, was digitised from a copy sent by Karsten Munsky of EUCARE in Berlin, to whom we are very grateful. This draft includes only report body. I understand there are 100+pp of attachments also.
On February 7, the FAA issued a Press Release (Office of public affair Press Releases) clarifying the role played by the U.S. FAA (Federal Aviation Administration) and the NTSB (National Transportation Safety Board) in the investigation. On March 1, a short statement of Factual Information from a preliminary review of CVR and FDR data was made available by the NTSB on behalf of the Dominican Republic civil aviation authorities. On March 18, a longer Press Release, accompanied by the CVR transcript, explained further what the FDR and CVR data indicated. David Learmount in Flight International, 27 March - 2 April 1996, deduced from the CVR transcript four salient observations on the crew behavior and I provide a fifth from the B757 Operations Manual B757 Air Data System description and schematic diagram(JPEG, GIF). To paraphrase Learmount's points, although confusion about operation of computer-assisted systems (autopilot, warning annunciations) played a role, this confusion would not have arisen but for inappropriate pilot decisions and actions. However, a blocked pitot tube and inappropriate pilot behavior are not the only potential factors under study. The NTSB has identified a potential improvement in B757/767 operating manual as a result of further analysis (short note). note on the Puerto Plata and Cali accidents, highlighting the human-computer interface (HCI) issues, appeared in RISKS-18.10, was rebroadcast on Phil Agre's RRE mailing list (May 7th), and became the subject of the what's happening column of the British Computer Society HCI interest group magazine Interactions, July/August 1996, p13.
The information here is a resumé of known information, a high-level analysis of the failure modes of a B757 which would lead to an accident, details of the B757 pitot-static system, and a brief history of the news reports and statements made about this accident, for those whose interests stretch to the sociological. I would still recommend against attributing any cause prematurely (as had occurred and might still be occurring).
The CVR transcript (in Spanish) is also available.
A shorter note on this accident appeared in RISKS-18.51, the new findings were announced by Peter Neumann in a short note in RISKS-18.57, and my note detailing the latest findings appeared in RISKS-18.59
The problem was caused by an `Operand Error' in converting data in a subroutine from 64-bit floating point to 16-bit signed integer. One value was too large to be converted, creating the Operand Error. This was not explicitly handled in the program (although other were) and so the computer, the Inertial Reference System (SRI) halted, as specified in other requirements. There are two SRIs, one `active', one `hot back-up' and the active one halted just after the backup, from the same problem. Since no inertial guidance was now available, and the control system depends on it, we can say that the destructive consequence was the result of `Garbage in, garbage out' (GIGO). The conversion error occurred in a routine which had been reused from the Ariane 4, whose early trajectory was different from that of the Ariane 5. The variable containing the calculation of Horizontal Bias (BH), a quantity related to the horizontal velocity, thus went out of `planned' bounds (`planned' for the Ariane 4) and caused the Operand Error. Lots of software engineering issues arise from this case history.
Jean-Marc Jézéquel and Bertrand Meyer wrote a paper, Design by Contract: The Lessons of Ariane, IEEE Computer 30(2):129-130 January 1997, in which they argued that a different choice of programming language would have avoided the problem. Taken at face value, they are clearly right -- a language which forced explicit exception handling of all data type errors as well as other non-normal program states (whether expected or not) would have required an occurrence of an Operand Error in this conversion to be explicitly handled. To reproduce the problem, a programmer would have had to have written a handler which said `Do Nothing'. One can imagine that as part of the safety case for any new system, it would be required that such no-op handlers be tagged and inspected. An explicit inspection would have caught the problem before launch. As would, of course, other measures. Jézéquel and Meyer thus have to make the case that the programming language would have highlighted such mistakes in a more reliable manner than other measures. Ken Garlington argues in his Critique of "Put it in the contract: The lessons of Ariane" [sic] that they do not succeed in making this case.
The paper The Ariane 5 Accident: A Programming Problem? by Peter Ladkin discusses the characterisation of the circumstances of the Ariane Flight 501 failure in the light of the extensive discussion amongst computer scientists of the failure. Gérard Le Lann has proposed in his article The Failure of Satellite Launcher Ariane 4.5 that the failure has little connection with software, but is a systems engineering failure, and his argument is compelling. Le Lann's analysis is also supported by inspection of the WB-Graph of the Ariane 501 Failure, prepared by Karsten Loer from the significant events and states mentioned in the ESA Accident Report.
This is not the first time that computers critical to flight control of an expensive, complex and carefully-engineered system have failed. See The 1981 Space Shuttle Incident.
It transpires that the aircraft was equipped with only one ADF (Automated Direction Finder), a navigation device described as `primitive' by certain Air Force staff (see report). It seems the aircraft was not as well equipped as normal civilian standards would require, and it flew off course and hit a mountain while in the last stages of approach (a CFIT, Controlled Flight into Terrain, accident). One speculated almost immediately that more sophisticated navigation equipment would have helped avoid the accident; and immediately on publication of the report, US Defense Secretary William Perry ordered equipment changes.
One may conclude from Secretary Perry's executive order that this is an example of an accident in which lack of sophisticated avionics played a role. I conclude it is an example of the risk of not using up-to-date avionics - a lesson we may forget when thinking solely about the risks of using computers.
Early information on the incident is collected in a short report culled from reports in Flight International and Aviation Week. The investigation subsequently led to FAA Airworthiness Directive 96-19-10. The AD "is prompted by reports of interruptions of electrical power during flight due to improper installation of the main battery shunt and ground stud connection of the main battery. The actions specified in this AD are intended to prevent such electrical power interruptions, which could result in loss of battery power to the source of standby power to the airplane." Effective date is October 2, 1996. I am grateful to Hiroshi Sogame of All Nippon Airways Safety Promotion Committee for advising me of this AD. short note on the incident appeared in RISKS-18.19.
The Aircraft Accident Report (the final report) from the Colombian Aeronautica Civil was released by the NTSB on 27 September, 1996. It is included here in two parts, the text with Appendix A and the Appendices B-F. The NTSB Recommendations to the FAA were published on October 16, 1996. I thank Barry Strauch, Chief of the Human Performance Division of the NTSB, for sending me copies of the final report and the recommendations; and Marco Gröning for engineering the pictures from Appendices B-F.
The paper Analysing the Cali Accident with a WB-Graph contains a WB-Graph causal analysis of the events and states in the Cali Report, prepared by Thorsten Gerdsmeier, Peter Ladkin and Karsten Loer. WB-analyses determine the causal relations between the events of an accident according to a rigorous formal semantics, and may provide insight into the accident. These analyses are presented in the form of a graph whose nodes are critical events and states. The Cali WB analysis-exposes some fundamental causal factors that were mentioned in the report, and also addressed in the NTSB's recommendations to the FAA, but not included in the report's list of probable causes and contributory factors.
Early News
The NTSB issued a
Press Release containing
factual data, whose text contains the press release signed by the
Head of the Columbian Oficina de Control y Seguridad Aerea.
The two relevant arrival and approach navigation plates are the
Cali VOR/DME/NDB Rwy 19 Instrument Approach
Procedure (http://www.jeppesen.com/cali-1.html)
and the
Cali Rozo One Arrival Procedure (http://www.jeppesen.com/cali.html).
The specialist weekly Flight International included
reports and comment in its
January editions. Computer-relevance appears
in the crew's handling of the FGMS in concert with other
procedures. However, they descended below
the cleared altitude, and there appear to be other procedural
inadequacies in their flying (see also Wally Roberts's
TERPS Page for
a further comment on this). The FAA is conducting a review of
training at AA.
The short paper Comments on Confusing
Conversation at Cali by Dafydd Gibbon and Peter Ladkin
points out some linguistic features of the ATC - AA965 radio
conversation immediately prior to the accident which might have
contributed to the crew's confusion.
For those whose patience or WWW-bandwidth is limited, there is a synopsis of contemporary news concerning the FMC memory readout, giving the probable reason for the left turn away from course (namely that the pilots selected the ROZO beacon based on its identifier, but there is a specified difference between the ROZO beacon identifier and its identifier in the FMC database), the probable causes as contained in the final report, and suggested probable causes contained in American Airlines' submission to the docket in August.
As a result of the investigation of this accident, the NTSB issued a collection of safety recommendations on October 1, 1996, with the concurrence of the Aeronautica Civil of Colombia. These recommendations address various issues such as pilot and aircraft performance after the GPWS warning, specifically the feasibility of (retro-)fitting automatic speedbrake retractors (the pilots failed to retract speedbrakes, and also pulled up too far, momentarily going "through" the stick-shaker warning - some investigators believe that the aircraft could have missed the terrain, had an optimal escape manoeuver been executed: Flight International, 9-15 October, p9), modifications to FMS data presentation, evaluation of an Extended-GPWS system, a requirement to positively cross-check positional information on the FMS, certain enhancements to navigation charts, an ICAO review of navaid naming conventions, a program to enhance English fluency of controllers, and various other measures. (These last two measures address concerns also raised in the Confusing Conversation note.) note on the Puerto Plata and Cali accidents, highlighting the human-computer interface (HCI) issues, appeared in RISKS-18.10, was rebroadcast on Phil Agre's RRE mailing list (May 7th), and became the subject of the what's happening column of ACM Interactions, July/August 1996, p13.
An edited abbreviation of AAIB Aircraft Accident Report 2/95 published in Aerospace, April 1995, the monthly of the Royal Aeronautical Society, London.
Computer-Related Factors in
Incident to A320 G-KMAM, Gatwick on 26 August 1993,
by Peter Mellor.
Questions arose not only as to what the pilots had been doing,
but also how they were aided or hindered by the design of the
systems, including the cockpit interface and the behavior of
the aircraft.
The
Rapport Préliminaire
of the Commission d'Enquête is in French.
The RISKS reports are:
A330 crash: Press Release by Peter Mellor;
Re: A330 crash by Curtis Jackson and
Peter Ladkin;
A Correction ... by Peter Ladkin;
A330 crash investigation .... by Erik Hollnagel;
Some comments .... by Peter Ladkin.
The original AAIB incident report, AAIB Bulletin No: 3/95.
A340 shenanigans by Les Hatton;
Re: A340 incident by Peter Ladkin and
John Rushby;
A slight change... by Ric Forrester via Dave Horsfall refers to the same incident.
The short paper WB-Graph of the A300 Accident at Nagoya contains the textual form of a WB-Graph causal analysis of the events and states in the Nagoya Report, prepared by Peter Ladkin and Karsten Loer. WB-analyses determine the causal relations between the events of an accident according to a rigorous formal semantics, and may provide insight into the accident.
For those without the desire to wade through the entire report, a synopsis and commentary on the final report is based on an article in Aviation Week and Space Technology, July 29th, 1996 issue. The final report contained no surprises, based on what was known shortly after the accident.
A
note
on the accident report appeared in
RISKS-18.33.
Early discussion of this accident in RISKS in 1994
led to much discussion
about Airbus aircraft and accident statistics in general:
China Airlines A300 Crash by Mark Stalzer;
Re: China Air ... by David Wittenberg;
More on the A300 crash ... by Peter Ladkin;
Re: China Airlines ... by John Yesberg;
Re: China Airlines ... by Mark Terribile;
How to feel safer in an Airbus by Peter Ladkin;
Airbus A3(0?)0 deductions by Phil Overy;
Further Discussion by Mary Shafer,
Robert Dorsett, Phil Overy and Wesley Kaplow;
Further Discussion by Robert Dorsett,
Peter Ladkin, Wesley Kaplow,
Peter Mellor and Bob Niland;
Summary of Safety-Critical Computers in Transport Aircraft
by Peter Ladkin;
A320 Hull Losses by Peter Mellor.
The text of the
Accident Report from the Polish
authorities is reproduced here, along with
selected Appendices, namely
Section 4.2, CVR transcripts,
Section 5, Documentation
of the Braking System, and
Section 6, Performance
and Procedures Documentation.
The paper Analysing the 1993 Warsaw Accident with a WB-Graph contains a WB-Graph causal analysis of the events and states in the Warsaw Report, prepared by Michael Höhl and Peter Ladkin. WB-analyses determine the causal relations between the events of an accident according to a rigorous formal semantics, and may provide insight into the accident. These analyses are presented in the form of a graph whose nodes are critical events and states. The Warsaw WB analysis-exposes some fundamental causal factors that were mentioned in the report, but not included in the report's list of probable causes and contributory factors.
Clive Leyman, formerly reponsible for A320 landing-gear engineering at British Aerospace, and now a Visiting Professor at City University, London, has prepared an Engineering Analysis of the Landing Sequence which analyses the effects of all the factors on the stopping distance of DLH2904. Referenced in the analysis are graphs he plotted of Airspeed on Approach, Altitude and Windspeed in the Final Phases, Flare and Derotation Details, Calculated vs. Actual Distances, Stopping Distances, Ground Deceleration, and Runway Friction.
Questions from computer scientists and system experts focused on
why the braking systems didn't deploy as expected by the pilots. The
RISKS comments are:
Lufthansa in Warsaw by Peter Ladkin;
More News... by Peter Ladkin;
Re: Lufthansa Airbus ... by Udo Voges;
Lufthansa Warsaw Crash--A Clarification by Peter Ladkin;
More and more technical literature is discussing
this accident for one reason or another.
The X-31 and A320 Warsaw Crashes: Whodunnit?
by Peter Ladkin discusses causes and how to
try to ensure more complete coverage
of causal relations. The X-31 accident and the Warsaw A320
accident are analysed as examples.
A Model for a Causal Logic for Requirements Engineering, by
Jonathan Moffett, Jon Hall, Andrew Coombes and
John McDermid suggests a logical theory of causality for
engineering and applies that to analyse braking in the Warsaw
accident.
Reasons and Causes by Peter Ladkin discusses those notions
in general, and comments extensively on the
proposal of Moffett et al.
The investigation commission used the SHEL model, which provides a conceptual framework for understanding the interfaces between different `subsystems' in operation. SHEL focuses on the four basic subsystems: software, hardware, "environment" and "liveware" (people). No definitive story was determined as to how the extraordinary rate-of-descent was actually initiated. The commission analysed all of the possible alternative scenarios thoroughly and based their conclusions and recommendations on these alternatives.
This accident generated much discussion and controversy within the aviation community, focusing often on the design of the autopilot interface, specifically the mode change between HDG V/S (Heading and Vertical Speed mode) and TRK FPA (Track and Flight Path Angle) modes, which were set by a `toggle'-type switch.
We include the Report of the Commission of Inquiry (in French) in full.
Further testing showed that disintegration of an oil seal could physically block a valve essential for the functioning of the interlock, leading to a scenario in which the reverser could, in fact, reverse thrust in flight. It was not determined if such an event happened to the accident aircraft. Subsequent to the discovery of this potential interlock failure mode, the FAA issued in August 1991 an AD prohibiting use of thrust-reverse on late-model B767s. Similar mechanims were also to be found on other aircraft, and after a solution to the problem was developed, Boeing retrofitted B737, B757 and B767 aircraft, 2,100 of them in all, with a third, mechanical, thrust-reverser interlock (which also required a hydraulic system mod on the B767).
There was a report by Bill Richards in the Seattle Post-Intelligencer of 14 December 1991 of the view of Darrell Smith, an ex-Boeing engineer, who had reported to Boeing that faults in the `proximity switch electronics unit' (PSEU) could have resulted in actual thrust-reverser deployment. Boeing passed on the report to the software writer, Eldec Corp (Boeing contracts out much of its software), but neither company had, as the time of reporting, studied Smith's argument in detail. I do not know the resolution of this issue. Thus, one may consider this accident to remain `computer-related' until one knows the resolution of Smith's reports. synopsis as well as the final accident report from the Thai authorities has been prepared for the WWW by Hiroshi Sogame of All-Nippon Airways Safety Promotion Committee, to whom we are are very grateful.
The official report on the crash determined
" [...] the probable cause of this accident to be uncommanded in-flight deployment of the left-engine thrust reverser, which resulted in loss of flightpath control. The specific cause of the thrust-reverser deployment has not been positively identified."(op.cit., Flight International, 1-7 September 1993, p5).
The report of The Times, 3 June 1991, was relayed to RISKS-11.78
and RISKS-11.82 by the articles
Lauda
Air Crash by Paul Leyland, and
Re: Lauda
Air Crash by Steve Philipson.
Hermann Kopetz reported to RISKS-11.82 what appeared in the Austrian
press in the article
Lauda
Air Boeing 767 Aircraft Crash.
Boeing's initial denials were reported in the Washington Post of
3 June, relayed to RISKS-11.82 in
Lauda
Air plane crash by Joe Morris.
The Wall Street Journal of
3 June 1991 reported that in order to obtain certification of the
B767, Boeing had had to demonstrate the effects of in-flight reversal
by flight test: also conyeved to RISKS-11.82 in
Re: Lauda
Air crash by Jeremy Grodberg.
Peter Neumann reported on some of the details of the FADEC design in
RISKS-11.84:
Lauda
767 crash by Peter G. Neumann.
The European, a weekly newspaper, carried an article by Mark Zeller
entitled Boeing skipped essential test on Lauda crash jet,
which clarified the situation over certification of the reverser
mechanism. According to the FAA administrator at the time, James Busey,
the interlock was demonstrated by attempted in-flight deployment, but
only at low airspeed and idle thrust. Boeing had argued to the
certification authority that `...sophisticated flight control
computers
made an accidental inflight deployment of the thrust reversers
impossible' (I think Zeller meant FADECs - the B767 has no
flight control computers in the strict sense). The report also
stated that examination of the wreckage and the CVR showed that
one reverser `...failed to lock in place...' and that the
pilots had been discussed what to do about the warning light when
the upset took place. The European's article was relayed to RISKS-11.95
and discussed by Peter Mellor:
Lauda
air crash by Peter Mellor.
An article in the Seattle Times of 23 August 1991, Flawed part
in 767 may be flying on other jets by Brian Acohido, reported
in detail the possible oil-seal disintegration problem and that
it didn't seem to be restricted to late-model B767 aircraft. This
commentary was relayed to RISKS-12.16 by Nancy Leveson:
More on the
Lauda air crash by Nancy Leveson.
Nancy also relayed Bill Richards' reporting of Darrell Smith's
concerns about the PSEU to RISKS-12.69 in
More on
Lauda crash and computers by Nancy Leveson.
Other comment on this accident may be found in RISKS-11.78, RISKS-11.79, RISKS-11.82, RISKS-11.84, RISKS-11.95.
Subsequently, there was some discussion whether measures taken by
other manufacturers in the wake of the Lauda Air crash to prevent
in-flight deployment of reversers had contributed to their lack
of deployment when required in the A320 Warsaw accident:
Lufthansa
in Warsaw by Peter B. Ladkin, and
Re: Lufthansa
Airbus Warsaw Crash 14 Sep 93 by Udo Voges.
Noted human-factors expert Erik Hollnagel cited some CVR material
from the crash whilst discussing the efficacy and design of alarms
in
Re: alarms
and alarm-silencing risks in medical equipment by Erik Hollnagel.
Should we consider the Shuttle to be a transport category airplane? (Civilians have travelled on it.) Whatever, the incident is instructive, as well as interesting history.
Here's wishing everyone safe flying.
Peter Ladkin
Copyright © 1999 Peter B. Ladkin, 1999-02-08 | |
by Michael Blume |