University of Bielefeld - Faculty of technology | |
---|---|
Networks and distributed Systems
Research group of Prof. Peter B. Ladkin, Ph.D. |
|
Back to Abstracts of References and Incidents | Back to Root |
This page was copied from: http://catless.ncl.ac.uk/Risks/12.16.html |
ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator
[David's comments follow my synopsising of a San Fran Chron article. PGN] Pacific Bell's Message Center answering service broke down for 21 hours around the San Francisco Bay Area, affecting thousands of customers. Two hardware cards converting voice to digital failed at the same time, just before noon on Thursday. This was the longest outage since service began last November. (There had been a four-hour outage last December.) No messages could be recorded, and no recorded messages could be retrieved. However, no recorded messages were lost. Previous problems had been in software, attributed to the "newness of the system". Some grumbling was quoted about how "They're finding all these bugs at the expense of the customers. [Source: San Francisco Chronicle, Saturday, August 24, 1991, page A10, headline "Pac Bell Message Center Breaks Down; Electronic answering service out of whack for 21 hours", By Dan Levy, Chronicle Staff Writer] My comments: 1. Pacific Bell has been touting its residential voice mail as a more reliable replacement for the answering machine. They stopped the promotion for a time after word got out that their system was losing about ten percent (!) of all messages. 2. Pacific Bell's current promotion points out that answering machines are an old technology, but voicemail is new. Apparently, the company expects us to believe that new == more reliable. 3. There are times when centralizing a function makes it more reliable. This doesn't appear to be one. When the voicemail system went down, customers could not even rush to a store to buy their own answering machine as a workaround, it would appear. And what voicemail customer would know about the failure? Unlike an answering machine, which has a light to blink rapidly when the machine detects a fault, residential voicemail does nothing, and since the service is pitched as being "more reliable," why would you suspect it? David Schachter uucp: ...!{decwrl,mips,sgi}!llustig!david
In Austin TX, Malcolm Graham received a water bill for $22,000, for using almost 10 million gallons of water in one month. The meter reading for the month was slightly LESS than that for the previous month, which the computer interpreted as wrap-around. (A new meter had been installed between readings, and not set properly.) A manual review of unusually large bills failed to spot that one. A utility company spokesman said ``We have about 275,000 accounts each month. We just missed this one. If we only miss one a month, that's a pretty good percentage.'' <Source: CITY SOAKS A CITIZEN FOR $22,000 BILL, Article by Scott W. Wright [who'S got Right on it!], 1991 Cox News Service, 22 August 1991>
In a lengthy `Los Angeles Times' article focusing on AIDS infection from doctor to patient (JUDGING THE RISKS OF INFECTION, 26 August 1991, page A1), writer Janny Scott begins by highlighting recent research findings about risk perception: * Unusual and unknown risks are more terrifying than familiar ones, even though everyday risks claim more lives. * Risks undertaken voluntarily seem more tolerable and controllable than lesser risks imposed from outside. * Many people have difficulty understanding probability. * Familiar accidents may go barely noticed, while unfamiliar ones may provoke panic, particularly if they seem to set a precedent. * Experts and lay people value risks differently: experts count lives lost while the general public focuses on many other factors, including fairness and controllability. * Once people make a decision about the size of a risk, their minds are difficult to change. * "One thing people care a lot about is dread." (Peter Sandman, Rutgers)
I've interviewed the NASA experiment manager since then, and she described NASA's statements as overwrought. The shuttle was not mail bombed; applelink was, and this event was misportrayed to Josh Quittner. The flight director was upset because they didn't want anyone to know they were using applelink; but the atlantis account on applelink was created explicitly to facilitate the interest expressed by the network community. I suspect that the confusion stems from the confluence of divergent interests at work. [... which can also result in multimile piecemeal spacemail. PGN]
A short item from the August 22 `Los Angeles Times': PINKERTON WORKER PLEADS GUILTY TO COMPUTER FRAUD Pinkerton Security & Investigation Services, the 141-year-old detective agency whose slogan is "the eye that never sleeps," was caught napping by an employee who embezzled more than $1 million from the firm. Marita Juse, 48, of Burbank[, California], pleaded guilty to computer fraud this week in U.S. District Court in Los Angeles. Between January, 1988, and December, 1990, Juse wired $1.1 million of Pinkerton funds to her own account and accounts of two fictitious companies. Pinkerton discovered the theft through a routine audit conducted after Juse left the firm Jan. 7, said Sally Phillips, assistant general counsel for Pinkerton. Juse also pleaded guilty to unrelated 1986 charges of conspiracy, theft of government property and false claims in connection with a scheme to submit false tax returns claiming refunds. She faces a maximum sentence of 30 years and millions of dollars in fines.
To add injury to insult, the 2/3 of a million Ohio telephone subscribers who had their records searched by P&G (or the prosecutors P&G suborned) will have to PAY for the computer time and other costs of the search in their "regulated" bills. I think that some of the subscribers ought to petition the Ohio PUC to disallow the charges, on the theory that the telephone company failed to carry out its duty to attempt to minimize regulated costs when it (the telco) did not try to have the subpoena quashed.
Organization: Helsinki University of Technology, Finland The subject of this story is the unresponsiveness of CERT and vendors to security holes and the risk that this creates when someone thinks that the holes will get fixed once they are reported to CERT. The topic of how Unix system vendors, Unix system user and administrators, and organizations like CERT should react on security holes which are found on widespread Unix systems always seems to cause some controversy and lots of discussion. If you publish a security vulnerability widely, people will complain that you are giving information on how to break into systems to possible `crackers'. If you don't publish it, it perhaps will never get fixed. I'd like to report one story concerning one particular security vulnerability which allows any ordinary user to gain unauthorized superuser privileges - perhaps it can be useful to people studying the problems of what to do in case of a surfacing security hole and how to do it. This article probably has only a small fraction of the facts that have happened concerning this and related vulnerabilities. But it probably isn't very different from many stories of similar security holes. This article doesn't contain technical information - I'll post the details in a subsequent article in alt.security, comp.sys.sun & alt.sys.sun. May 1989 I send a bug report to Sun about SunOS vulnerability concerning the SunRPC service rpc.rwalld, the world-writability of /etc/utmp on SunOS and the fact that tftpd is enabled and able to read and write the root filesystem on SunOS. This bug report concerns SunOS 4.0.1 and previous versions. The hole allows anyone to get in from the Internet as the superuser in a few seconds on an off-the-box Sun. As one suggested fix I recommend write-protecting /etc/utmp. I don't notify CERT - I think at the time I'm not aware of CERT. The hole is fixed in a subsequent OS release - I'm not sure, but I think a separate fix is also published later. June 1989 I tell about the hole on the Sun-Spots mailing list (gatewayed as the Usenet newsgroup comp.sys.sun) with some details blanked out and give suggested fixes. A fix for the hole is published by Sun - I don't have records on when this happened. September 1989 In a security-related bug report reporting also a few other holes, I send the following to Sun and CERT (the Computer Emergency Response Team, an organization established by the Defense Advanced Research Projects Agency, DARPA, to address computer security concerns of research users of the InterNet): >5. /etc/utmp is world-writable. This was one of the original causes >ogf the rwall / wall / tftp hole, and probably takes part in other not >yet surfaced security holes. > >FIX : chmod og-w /etc/utmp October 1989 I send a somewhat `details-blanked' version of the above-mentioned bug report to the Sun-Spots combined mailing list and newsgroup, including the note about utmp. May 1990 A security hole with the program `comsat' which is used to report the arrival of new mail to users (enabled by the `biff' program) is discovered. The vulnerability gives unauthorized users root access. The hole is reported to Sun through JPL's Sun software representative. It is also reported to the Internet Computer Emergency Response Team (CERT) and the DDN Security Coordination Center (SCC). CERT & Sun publish no notice about the hole, no fix is published. In the NASA internal notice the suggested fix is to just disable comsat. March 1991 I independently find the hole with `comsat' and report it to Sun and Cert. They don't say it's been reported before, and seem to be somewhat unresponsive about it. At the same time, I publish a rough outline of the hole on the net, and I am told about the previous bug reports. Meanwhile, Sun talks something about a non-disclosure agreement that I should sign so I could get information on a product which will fix the hole. No notice to the net is made by Sun or CERT. No fix is made available. April 1991 As nothing seems to happen, I get a bit frustrated and send more mail to Sun & Cert: >If you can't come up with at least some kind of a solution to the >problem, perhaps someone on the Usenet can. I'll post the detailed >bug report & perhaps some additional suggestions of fixes to the >Usenet newsgroup alt.security a month from now if a decent fix isn't >available then. There's some answer by email from CERT, some talk about what to do. No answer from Sun. No notice to the net is made. No fix is made available. August 1991 Nothing has still happened - no notice about the vulnerability has been announced on the net. Someone takes up /etc/hosts.equiv containing '+' on comp.unix.admin. I remember the promise I made and write this article. Conclusions From CERT's press release 12/13/88, the paragraph quoted verbatim: >It will also serve as a focal point for the research community for >identification and repair of security vulnerabilities, informal >assessment of existing systems in the research community, improvement >to emergency response capability, and user security awareness. An >important element of this function is the development of a network of >key points of contact, including technical experts, site managers, >government action officers, industry contacts, executive-level decision >makers and investigative agencies, where appropriate. In the light of this story (and some other experience about CERT) I don't think CERT is doing a good job on `identification and repair of security vulnerabilities'. It is a good thing to have a central point to contact when trouble arises or when you have a security hole to report, and apparently CERT is doing a good job in acting as this central point, and distributing bug reports to the vendors. But I think that it is not enough. We need something more to fix the holes - as with this bug, it seems that when the vendor does nothing to fix things, CERT also sits idle, promptly forwards the bug report to /dev/null and does nothing. Solutions? I suggest we make it a policy that anyone who sends a security hole report to CERT and/or a vendor will send it to the Usenet some time (perhaps six months? a year?) after the ack from CERT or the vendor. Any more suggestions to solve the problem ?
IEEE SPECTRUM, August 1991, page 58, Section "Faults & failures": TCAS sees ghosts A system that warns pilots of impending midair collisions is finally, after 30 years in development, being installed in the U.S. airline fleet. The system, called TCAS for traffic alert and collision avoidance system, sends a stream of interrogation signals to the same equipment aboard nearby aircraft and from their responses determines the planes' altitude, distance, and approach rate. Plans call for all 4000 large aircraft in the United States to carry US$150,000 TCASs by the end of 1993. But the phase-in suffered a short-lived --and embarrassing--setback on May 2, when the Federal Aviation Administration (FAA) ordered a shutdown of 200 of the 700 units that had been installed. The 200 systems were seeing phantom aircraft and instructing pilots to evade planes that simply were not there. The cause was quickly identified as a software glitch. More precisely, it was a software gap--five lines of code missing from the faulty units. Not subject to the problem were TCASs manufactured by the Bendix/King Division of Allied Signal Inc., Baltimore, Md., and Honeywell Inc., Phoenix, Ariz. These were allowed to continue in service. However, TCASs made by Collins Defense Communications Division of Rockwell International Corp., Dallas, were recalled so that the software could be fixed. The fix was simple: the units were reloaded with the correct program. The problem arose in the course of testing, because Collins engineers had temporarily disabled the program's range correlation function--a few brief lines that compare a transponder's current response with previous ones and discard any intended for other aircraft. Without this filter, the system can misinterpret a response as coming from a fast-approaching airplane. After testing the systems, Collins shipped them to airline customers without re-enabling the range correlation. For the most part, the systems worked as intended. But in high-traffic areas where many airplanes are interrogating each other--around Chicago, Dallas, and Los Angeles, particularly--ghosts appeared frequently. Pilots were misled, and air traffic controllers were distracted from their routine tasks by the need to handle nonexistent situations. "A pilot would see the ghost image shoot across the screen because the on-board system was accepting all the replies as other TCAS airplanes in the vicinity interrogated the same TCAS transponder," Thomas Williamson, TCAS program manager with the FAA in Washington, D.C., told IEEE SPECTRUM. TCAS II, the system currently being installed, tells pilots to climb, dive, or maintain the same altitude to avoid a collision. It also displays nearby planes on a small screen. The system was first demonstrated in the early 1970s, but making it work reliably was difficult because of interference by overlapping signals from multiple aircraft in crowded area. The interference was eliminated by using directional antennas and variable- strength interrogation signals and developing range-correlation software to eliminate multiple responses. In the range correlation scheme, the system notes the distance at which it first receives a response from another aircraft--say 10 miles. At the next interrogation, the distance may be 9.5 miles. The system would then expect the next response to be at approximately 9 miles, and would set a range gate so that it could look for a signal at that distance and calculate the closure rate. Without this correlation, the system becomes confused. The FAA emphasized that the software fault did not pose a hazard. TCAS is a backup system; primary responsibility for avoiding midair collisions still remains with the ground-based air traffic control systems. Moreover, the FAA pointed out that TCAS has proved its worth in more than 1 million hours of operation. "Had the problem involved TCAS software on a generic basis, then we would really be concerned," Williamson said, "But it was a breakdown in the quality control procedures of a specific manufacturer." For its part, Collins has promised customers that it will correct all 200 systems within 90 days after discovery of the problem. "We'll be fully operational across the board well within that time frame," said Charles Wahag, Collins' manager of TCAS products. Wahag defends Collins' quality control procedures, which were approved by a team of FAA software experts. "We had a simple human error where an engineer misclassified the changes in the software," he told SPECTRUM. "It didn't show up in our testing because one of the essential elements was absent: you have to have many, many TCAS-equipped airplanes in the sky," as in the high-traffic-density areas where the ghost problem appeared. To prevent similar omissions, Collins now requires that a committee of software engineers review changes before a program is released. "More than one pair of eyes must review these things and make a decision," Wahag said. COORDINATOR: George F. Watson CONSULTANT: Robert Thomas, Rome Laboratory
[The following item was abridged by Nancy Leveson, and further by me. Also, today's paper indicates the FAA has backed off on some of its restrictions. PGN] From the Seattle Times, Friday August 23, 1991 (excerpts) Flawed part in 767 may be flying on other jets by Brian Acohido, Times Aerospace Reporter More than 1,400 Boeing 747, 757, and 737 jetliners may be flying with the same type of flawed thrust-reverser system as the ill-fated Lauda Air 767 that crashed in Thailand last spring. A thrust reverser inexplicably deployed on that May 26 flight, possibly flipping the plane into an uncontrollable crash dive. All 223 passengers and crew members were killed. Officials at Boeing and the Federal Aviation Administration say only that the matter is `under review' and that they are conferring about possible safety implications for Boeing models other than 767s. The use of thrust reversers on late-model 767s was banned last week by the FAA. Also last week, Boeing alerted airlines worldwide that it may, at some point, recommend that the reversers of these other models be inspected. Industry sources say it appears a dangerously flawed safety device that is an integral part of the reversers in question may be the same one that is in widespread use on other Boeing models as well. The device is called an electronically actuated auto-restow mechanism. The flaw was discovered last week, and was considered potentially hazardous enough to prompt the FAA to order reversers deactivated on 168 late-model 767s. The ban is in effect until Boeing redesigns the device. [... lots of stuff deleted about the use of it on other planes, etc.] [...] `In my estimation, the suggestion is very, very strong that there is the distinct possibility there could be further danger with these other aircraft,' said aviation safety analyst Hal Sproggis, a retired 747 pilot. [... more stuff deleted about arguments between the NTSB and the FAA about what should be done.] [...] On Boeing jets, reversers work like this: A door on the engine cowling slides open, simultaneously extending panels called `blocker doors,' which deflect thrust up and out through the cowling opening. In flight, the cowling door is designed to remain closed, with the blocker doors retracted, stowed, and locked. Depending on the engine type, the reverser system is powered either pneumatically using pressurized air, or, like the Lauda jet, hydraulically using pressurized oil. The flawed auto-restow device is designed to detect the system becoming unlocked in flight and to move quickly to restow and relock the system before any significant control problem can occur. According to industry sources, the NTSB, and the FAA, here's how the complex device works: An electronic sensor monitors the cowling and alerts a computer if the cowling door moves slightly in flight. The computer then automatically opens an `isolation valve' which permits pressurized oil or air to flow into the reverser system. This actuates a very crucial, and -- as was revealed last week by the FAA -- dangerously flawed part called a `directional control valve' or DCV. The DCV directs the pressurized oil or air to retract the blocker doors and shut the cowling door. The DCV can sit in only two positions: extend or retract. In flight, it is supposed to always remain in the retract position, ready to do its part in auto restow. In older Boeing aircraft, a mechanical part physically prevented the directional control valve from moving off the retract position as long as the plane was airborne. But in newer Boeing jets, the auto-restow mechanism is controlled and kept in the retract position by electronic means. `The reason they go for these electronic reversers is strictly economic,' safety expert Sproggis said. `It saves weight, and, in commercial aviation, weight is money.' When Boeing certified its electronically controlled reverser system, the company assured the FAA that it was fail-safe. As a result, the FAA never required the company to calculate or test what might happen should a reverser deploy in flight at a high altitude and high speed, as happened on the Lauda flight. After the Lauda crash, Boeing tested the system anew. An engineer wondered what would happen if a simple O-ring seal on the DCV deteriorated, with small bits getting into the hydraulic lines. A test was run. The result: the DCV clogged in such a way that when the auto restow was activated, the DCV moved off the retract to the extend position. Thus, the computer thought it was instructing the DCV to restow when, in fact, it was deploying the reverser. `I think they (Boeing officials) expected bits of the O-ring to run right through the system and were shocked when they saw the reverser deploy,' said a source close to the Lauda investigation. After learning of the results of the O-ring test, the FAA, which to that point had rejected repeated exhortations from NTSB Chairman James Kolstad to ban reverser use on 767s, did just that. Another revelation likely was a factor in the decision to ban reversers on 767s, sources said. After the Lauda crash, the FAA ordered reversers inspected on 55 767s powered by Pratt & Whitney PW4000 engines -- the same airframe/engine combination as the Lauda plane. (Later, Boeing revealed that a total of 168 767s actually use the same electronically controlled reverser system.) As 767 inspection reports came in, a disturbing pattern of chafed wires and out-of-adjustment auto-restow sensors emerged. In fact, nine out of every 10 planes checked had sensors out of adjustment, the FAA reported. Moreover, a Seattle Times review of five years of `service-difficulty reports,' or SDRs, filed by U.S. airlines with the FAA shows a similar pattern of reverser troubles for 747s, 737s, and 757s. Airlines are required to file SDRs with the FAA showing how various problems are dealt with. Problems with reversers on Boeing planes are cited on 118 reports from Jan. 1, 1985 through June 25, 1991, including 44 reports on 737 reversers, 25 on 747s, four on 757s, and three on 767s. SDRs have been widely criticized for being something less than comprehensive because of the wide leeway airlines are granted in deciding what to report. Even so, the reports ranged from cockpit warning lights flickering inexplicably and sensors repeatedly turning up out of adjustment, to numerous instances of stuck or leaking reverser parts. One case involved a 747 aborting a flight after a reverser deployed and broke up with a loud bang. The plane landed safely. A pattern of out-of-adjustment sensors suggests that maintenance instructions provided by Boeing to the airlines are not clear or perhaps that the part is badly designed and susceptible to readily moving out of adjustment, said industry sources. More significantly, it suggests that the auto-stow system may be activating unnecessary -- or more slowly than its supposed to -- due to a sensor that's out of adjustment, sources said. [... more discussion deleted about the risk on other Boeing planes] During the ill-fated Lauda flight, pilot Thomas Welsh, formerly of Seattle, discussed with this Austrian co-pilot, Josef Thurner, the flickering of a cockpit warning signal indicating a possible problem with one of the reversers. Everything was being handled routinely until a second warning signal indicated the left reverser had somehow deployed. Two seconds later, a loud snap is heard on the cockpit recorder, followed by swearing and the sound of warning tones. Thirty-nine seconds after the snap, the tape ends with the sound of a bang. The left engine was recovered from the wreckage with the reverser deployed, evidence that the DCV was improperly positioned, perhaps because it was contaminated, sources say. Sources said the valve could have become contaminated by something other than a bad O-ring and that investigators also are exploring the possibility that a stray electrical current, vibration or some other phenomenon moved the DCV to the deploy position. A key piece of evidence that could provide the answer -- the left DCV -- was missing from the wreckage. This incident brings up some important issues: -- The role of the computer in this particular accident -- The role and procedures of the FAA in regulating aircraft -- The trend to removing mechanical safety interlocks in order to save weight and the way that such cost/benefit decisions are being made. Note that there will be a session at SIGSOFT '91 (Software in Critical Systems) in December on government standards and regulation and that Mike Dewalt of the FAA (his title is "National Resource Specialist -- Software") will be discussing certification and standards for commercial avionics software. Nancy Prof. Nancy G. Leveson, University of California (on sabbatical at Univ. of Washington)
This page was copied from: | http://catless.ncl.ac.uk/Risks/12.16.html |
COPY! | |
COPY! |
by Michael Blume |