University of Bielefeld -  Faculty of technology
Networks and distributed Systems
Research group of Prof. Peter B. Ladkin, Ph.D.
Back to Abstracts of References and Incidents Back to Root
This page was copied from: http://www.cs.york.ac.uk/~jdm/sclist/lelannariane.html

The Failure of the Satellite launcher Ariane 4.5

by Gérard Le Lann
INRIA - Projet REFLECS - BP 105
78153 LE CHESNAY Cedex, France
Fax: +33 (0)1 39.63.58.92


This contribution is prompted by the fact that there is still no widespread agreement on the nature of the failure of Ariane 5 flight 501 (June 1996). This contribution is also prompted by discussions I have had with Peter Ladkin, who I thank for having helped in improving the presentation of the arguments that follow.

An Inquiry Board (IB) was formed to identify the cause(s) of the 501 failure. The IB report concludes that causes are software (S/W) design and S/W implementation errors [ESA, 1996][Le Lann, 1996] for examples. (Of course, these analyses, as well as this contribution, assume that all causal factors appear in the IB report). In fact, it is almost straightforward to show that the 501 failure has a unique cause, which is a system engineering (SE) fault.

This is so for the reason that this SE fault is the root of the causal graph that leads to the 501 failure. Stated differently, among other causal factors (such as, e.g., the BH overflow), none precedes this one. (I leave it to Peter Ladkin to give a more refined definition of "cause").
Back to the facts. The alignment task was running, despite the fact that, after lift-off, realignment of the inertial platform, needed with Ariane 4 (A4), is useless in the case of Ariane 5 (A5). This task contains the conversion procedure that computes integer BH from horizontal velocity. What if someone would have had the idea of disallowing the execution of this task after lift-off? Simple. The scenario which has led to the 501 failure could not have occurred.

Now the argument. How could this someone know that this was the right thing to do? Obviously, only by correctly capturing the problem to be solved by those engineers in charge of the A5 computer-based system, i.e. by correctly specifying the interface between this particular A5 subsystem and the A5 inertial platform subsystem.
Decomposition of a launcher into subsystems, and specification of appropriate interfaces (capture of requirements and assumptions) between these subsystems, are SE activities, which depend on which satellite launcher technologies are selected. Only the main architect of a launcher can conduct such SE activities correctly, for the reason that only the main architect of a launcher is responsible for deciding on how to decompose a launcher into subsystems, given the technological choices made.
Consequently, this someone can only be an Ariane 5 engineer. Indeed, only an engineer aware of the technology retained for the A5 program can tell: "Given A5 technology, there is no need to have the strap-down inertial platform aligned after lift-off".

That system engineering-dependent knowledge is totally independent of the fact that the alignment "thing" which, after lift-off, happens to be needed (A4), or not needed (A5), is implemented in hardware, in software, or in melloware, correctly or incorrectly. That knowledge is also totally independent of the fact that the "thing" is a reused "thing" or a newly developed "thing". It is also totally independent of the fact that inhibition of the "thing" after lift-off is instantiated via, e.g., a boolean set to false, or a mechanical switch activated after lift-off.
Hence, the 501 failure does not result from "how" the "what" (was needed or not needed) was instantiated. The 501 failure has been caused by an overlook of the "what", which is a requirement capture fault. And given that the knowledge at stake is system engineering-dependent, the cause is a SE fault.
It has never been the intent of ESA, of CNES, of Arospatiale, or Arianespace, to plan, commission, build and operate a launcher based on A5's technology and which needs inertial platform alignment after lift-off, a fictitious launcher that could be labelled Ariane 4.5, half-way between A4 and A5. End of the argument.

Therefore, stricto sensu, all the work that has been invested in "inspecting the code" and ironing out the "S/W errors" from the alignment task, all the contributions - including ours [Le Lann, 1996], [Le Lann, 1997] - to the "Is the 501 failure due to software or system engineering mistakes?" debate, apply to this fictitious Ariane 4.5 launcher, that will never be operated, and whose unique flight is labelled 501, not to the Ariane 5 program.
The real qualification flights of A5 have been (successful) flights 502 and 503, which were conducted with the alignment task inhibited after lift-off. Consequently, success with these flights cannot result from having "inspected the code and corrected the bugs" of the alignment task (since this task was not in use (after lift-off)).

It is certainly interesting to keep discussing about the 501 failure, until, maybe, our community reaches a consensus on one of the three prevailing views, namely:

1) The 501 failure could have been avoided by "inspecting the S/W" (group G1),
2) No way! The failure has been caused by a requirement fault, which is further split in two diagnoses:
2.1)The failure could have been avoided by resorting to a "good" S/W Engineering method (group G2),
2.2) Maybe, with luck. The failure would have been avoided for sure (it's easier to say, now that we know what happened) by resorting to a
"good" System Engineering method (group G3).

Still, we should not forget that these discussions make sense only in the context of the fictitious Ariane 4.5 launcher. Neither should we ignore that the issue of "correcting the bugs" of the alignment task has lost any practical relevance as early as 1996.
As a member of G3, I am interested in keeping interacting with representatives of G2 (the most populated group it seems at this time), and discuss at greater length why I believe it does not make sense to shift responsibilities from System Engineering to S/W Engineering or to H/W Engineering.
In the particular case of flight 501, I have argued in [Le Lann, 1996] and [Le Lann, 1997] that those errors which have been identified in the IB report are causal consequences of System Engineering faults. They are not causes of the 501 failure, but manifestations of more "profound" causes.
Yes, maybe, with luck, following some "good" S/W Engineering method (some "good" H/W Engineering method if H/W implementation had been resorted to), someone could have been led to ask such questions as:

Q1: "Under which conditions should this function be available, be inhibited?",
Q2: "What's the range of possible values for horizontal velocity?". It's much less likely that the
Q3: "What's the failure model assumed for processors?" or the
Q4: "Can the assumption that there is no common mode failure (of the SRI module) be violated" questions would have been raised.

But why take chances, anyway? This knowledge (questions and responses) is natural and obvious to Ariane 5 engineers (Q1 and Q2), natural and obvious to (system-level) designers of the Ariane computer-based system (Q3 and Q4). With a "good" System Engineering method at hand, it would have been normal practice for these engineers to spontaneously "propagate that knowledge", via specifications handed over to S/W (to H/W) engineers, releaving them from the burden of "not forgetting to ask (the right questions?, all of them?)".
It seems there is a temptation to consider that a S/W (or a H/W) Engineering method is "good" not only if it guarantees correct implementations of specifications but, furthermore, if it also guarantees that the specifications under consideration are correct with respect to some higher-level problem. Why should a S/W (or a H/W) Engineering method compensate for lack of consideration for System Engineering issues? Where do these specifications meant to be S/W (or H/W) implemented come from? Is there not a boundary to the "universe" that is tractable with S/W (or H/W) concepts?
Besides this, concerning the Ariane 5 program, a really interesting question is as follows: Was the S/W used for flight 501 - to the exception of the alignment task - found to be "erroneous", and if the case, have experts found fatal S/W errors, i.e., errors which, if not corrected, would have led to a failure of flight 502?
As of now, I have been returned only one non content-free response (i.e., other than "it's secret", which might be understandable). I have been told by some experts - in group G1 - that they had found non fatal S/W errors. This demonstrates that bug-free S/W is not a necessity, given that Ariane 4 has been operated for over 10 years very successfully, despite the existence of these S/W errors.


[ESA, 1996] European Space Agency, "Ariane 5 - Flight 501 Failure", Board of Inquiry Report, 19 July 1996, 18 p. [http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html].

[Ladkin, 1998] P. Ladkin, "The Ariane 5 Accident: A Programming Problem?", Article RVS-J-98-02, Bielefeld University, Germany, March 1998 [http://www.rvs.uni-bielefeld.de/publications/"], (or look at the Computer-Related Incidents with Commercial Aircraft)

[Le Lann, 1996] G. Le Lann, "The Ariane 5 Flight 501 Failure - A Case Study in System Engineering for Computing Systems", INRIA Research Report 3079, Dec. 1996, 26 p [http://www.inria.fr/RRRT/publications-fra.html].

[Le Lann, 1997] G. Le Lann, "An Analysis of the Ariane 5 Flight 501 Failure - A System Engineering Perspective", 10th IEEE Intl. ECBS Conference, March 1997, 339-346.

[RISKS] The RISKS Forum [http://catless.ncl.ac.uk/Risks].

[SCS] Safety Critical Systems Mailing List [ftp.cs.york.ac.uk, directory hise_reports/sc.list].

back to top


This page was copied from: http://www.cs.york.ac.uk/~jdm/sclist/lelannariane.html
COPY!
COPY!
Last modification on 1999-06-15
by Michael Blume