University of Bielefeld - Faculty of technology | |
---|---|
Networks and distributed Systems
Research group of Prof. Peter B. Ladkin, Ph.D. |
|
Back to Abstracts of References and Incidents | Back to Root |
This page was copied from: http://catless.ncl.ac.uk/Risks/16.96.html |
ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator
An article, Dismissal of Security Expert Adds Fuel to Internet Debate, by John Markoff in The New York Times, 22 Mar 1995, discusses last Monday's departure of Dan Farmer from Silicon Graphics. RISKS-16.90 noted that Dan and Wietse Venema have developed SATAN, a Security Administrator Tool for Analyzing Networks, which Dan plans to release openly to the world on 5 Apr 1995. RISKS has frequently seen discussions of the pros and cons of making such tools available. If the knowledge of vulnerabilities is not promulgated, the system flaws and configuration weaknesses do not seem to get fixed, and that knowledge seems to permeate the malicious-hacker community anyway. If the knowledge is promulgated, then the likelihood of exploitations tends to increase -- although it certainly provides an added incentive to clean things up in a greater hurry. Dan is at the center of this renewed controversy. Markoff's article includes this statement: In the Internet community, Farmer's case is seen as symbolizing the conflict between a time-honored ideal the free flow of information in cyberspace and the harsh new reality that corporations and government agencies must protect their computer systems against intruders and vandals armed with increasingly sophisticated break-in software.
There has been some discussion in recent issues of cellular phone monitoring in Pakistan. Nobody has mentioned that this technology is currently available off the shelf in the US from Harris Communications (Melbourne FL, address on request). The Harris "Triggerfish", from a photo in an ad, looks like a laptop computer with an extra box alongside. "Everything you need fits into a a suitcase" in the words of the ad. It will allow collecting and analysis of dialed number statistics, Identifying the telephone number when it is registered under another persons name or an alias, and developing usage patterns. "Wiretap applications..... provide audio minimization, on/off hook logging, multiple tape recorder outputs and necessary intercept documentation. A headphone jack with volume control and alarm speaker allows the monitoring agent to intercept each and every communication." (From Harris brochure) The brochure goes on for several pages but I think you get the idea. This is a wiretap in a suitcase. As it is listening to radio waves, there is no way of anyone, including the gov't, knowing when it is being used. The Triggerfish is sold only to law enforcement agencies and is *supposed* to be used only with a court order permitting a wiretap. The risk is, who is to know if and how it is being used? The brochure was reproduced in a newsletter that I used to get. I don't know how old it is but at least 1, possibly 2 years. I found an undated copy in my desk drawer when I was cleaning house the other day. So what's the fuss with Pakistan and Motorola? Perhaps somebody should give them Harris' phone number. 76655.2677@compuserve.com (John Henry)
We have debated and called to attention the troubles involved with non-standard interfaces, such as the power switch on the keyboard of the Mac and the like. Here's a new one I heard on the radio last night. According to a study released yesterday (sorry, I didn't write down the name of the study), the post-admittance treatment lag for heart attack patients in hospitals could be reduced from an average of 70 minutes to 30 minutes if the interfaces on the equipment used to treat these patients were standard. This was based on (so the announcer said, I have not seen the study) a hospital where every unit of equipment of type "x" (say, a defibrillator) was purchased with a standard interface. The treatment time went down because the personnel didn't waste time looking for the CHARGE button, instead they knew where it was. This is a truly amazing claim. If it is true, I would think the business case for purchasing standard equipment in a hospital would be quite strong, even if it was built only on liability reduction. Never mind the fact that 2 patients can be treated in the same window on time that one could before, or that your ROI for each unit of equipment would be higher due to the higher use. On a related tangent, I was recently in the hospital. The nurse hooked me to an IV, but the flow rate was too low; she went to get a pump. She hooked me into the pump, but couldn't get it started. When she did get it started, the rate was too low and the alarms were going off. She couldn't figure out the interface. I turned the IV pump to face me, and read the large amount of fine print on the side relating to the pump's operation. It turned out she needed to input the size of needle she was using to adjust the flow rate; she was using a small butterfly needle instead of the default needle, which is larger.
I heard on the radio news this morning (KGO 810 AM, about 11 AM) that a planeload of Taiwanese passengers thought their plane had been hijacked when it landed on a small island instead of at the capital Taipei during a domestic flight. The destination had been changed, and all the passengers knew it, but no one told the pilot. I would speculate that the scheduling system held the correct information, so that the displays in the airport directed all the passengers which plane to get onto, but that there is no direct computer link to the airplane, and either no one realized that the pilot needed to be told "manually", or the person responsible flaked. Mike Crawford
I just got a grant announcement from ARPA, discarded it in disgust, and then read Cliff Stoll's newest book, ``Silicon Snake Oil''. That led me to dredge the announcement out of the wastebasket, because it's such a perfect example of how much snake oil and snake-oil salesmen have come to permeate our field! DARPA SOL BAA95-21 DUE 052495 POC states that "Proposed research should investigate innovative approaches and techniques that lead to or enable revolutionary advances in the state-of-the-art. Specifically excluded is research which primarily results in evolutionary improvement to the existing state of practice ..." It seems to me that this wording is a specific invitation to snake oil salesmen and that it explicitly and openly promises to discriminate against an honest proposal that makes a realistic projection of the likely outcome of the proposed research. In general, my experience leads me to believe that research projects that actually produce revolutionary results rarely anticipate them, and research projects that promise revolutions rarely deliver! I don't deny that the projects funded on the promise of revolution haven't occasionally delivered interesting results, but I greatly detest a system that demands such empty promises as a condition of obtaining funding. In any case, ``Silicon Snake Oil'' is a book worth reading. Computer professionals will find it alternately infuriating, when it attacks our specialties, and rewarding, when it attacks all that hocum everyone else in the field has been pushing. Doug Jones jones@cs.uiowa.edu
Some time ago, RISKS ran an article about a long distance company offering an 800-operator service for discounted Collect calls. A competitor soon realized that *many* folks were dialing 800-operatEr. The competitor soon chartered this number for their own, strikingly similar (some would say indistinguishable to the naked ear) service and raked in the stray dialers. National Public Radio ran a piece last week telling of another similar incident. A small mom-and-pop company called "The wooden Boat store" was getting thousands of misdialed calls per month for an MCI 800 service. The small business contacted MCI about the problem and when MCI didn't offer any help, the Wooden boat store approached AT&T. Within 1 week, AT&T had set up a "voice mail" message that said something like "Welcome to the wooden boat store. Press 1 for AT&T Operator assistance, or press 4 for the Wooden Boat Store" Note that AT&T did not offer to connect to the intended, originally misdialed MCI service! Matt Weatherford Atlantis Diagnostics Int'l (206) 487-7826 mattw@atl.com
Les Hatton reports in RISKS-16.92 on an incident involving an Airbus A340 on a trip from Japan to London Heathrow. The aircraft is one of the A320/330/340 (also A319/321) family of Airbuses, some of whose primary flight control systems are computer-controlled (that is, the pilots control-stick movements are input to a computer that guides the control systems). The A340 is a very long-haul aircraft, capable of flying the very longest routes without refueling (it holds the world record for length of route flown without refuelling, for normally-equipped civil transport aircraft). The incident is of greatest significance for RISKS readers because it is the first time that an accident report on an A320/330/340 series aircraft specifically cites software and hardware reliability as the main problem. The incident concerned a Virgin A340 at Heathrow on 19 September 1994 (cf. the incorrect info reported by Hatton). A short article by Christian Wolmar appeared in The Independent newspaper, one of Britain's major dailies, on Wednesday March 15, 1995. After talking to Christian, I obtained a copy of AAIB (Britain's Air Accident Investigation Board) Bulletin No. 3/95, the report on the incident. I'll spoil the tale for everyone by giving the punchlines first: the description of the problem areas identified during the incident, and the report's conclusion, the `Safety Recommendation 95-1'. [during quotes, my editorial comments and elisions are contained within square parentheses such as these. PBL.] [begin quote] Problem Areas The AAIB identified and investigated the following problem areas: RTF [radio communication] phraseology; ATC [air traffic control] vectors and ILS [instrument landing system] performance; fuel quantity indications; double Flight Management Guidance System (FMGS) failure and aircraft type certification. [end quote] The radio phraseology can be ignored by RISKS readers. ATC vector problems had to do with capturing the `glideslope', the radio beam angled up into the sky from the end of the runway, down which an aircraft flies in order to land, under instrument conditions. The aircraft at one point encountered a `false glideslope' at about 9\deg at 5 miles from touchdown and 4,800ft altitude caused by a `shallow sidelobe' of the ILS. Such problems are known (glideslope is assured for between 1.35\deg and 5.25\deg in the UK) and the airplane wouldn't have got there had it not been vectored there by ATC - but one hastens to add that normally this is not a problem. Just in this case....see below. All the other problems concern the on-board computers, and the AAIB has written to the JAA (Joint Aviation Authority, which does for Europe what the FAA does for the USA) to determine if the JAA was "aware of some of the more significant shortcomings of the A340's fuel and flight management systems before the type certificate was granted". [begin quote] Safety Recommendation. It is recommended that the reliability of the Airbus A340 FMGS and the fuel management system should be reviewed to ensure that modified software and hardware required to achieve a significant improvement in reliability should be introduced as quickly as possible and the subsequent system performance closely monitored. [end quote] Here is why they made this recommendation: [begin quote] Autopilot and Flight Director heading performance The reason for the wrong response of the autopilot and one flight director to the left turn demand was a software error [...] [This error] was known to Airbus Industrie and corrective measures for this and several other software deficiencies were contained in [...] standard L-5 that has been issued and incorporated in most A340s on the UK register. Fuel Quantity indications In July 1994 Airbus Industrie issed an Operations Engineering Bulletin on the subject of fuel quantity indication. [The bulletin gives detailed descriptions of anomalies, when they occur, and how to take them into account.] Action pending by Airbus Industrie to correct fuel quantity errors involves the installation of five additional fuel probes in each inner tank and software standard 6.0. Action to correct CG control errors [the center of gravity is adjusted in cruise by moving fuel around, to give efficient cruise performance] is contained in a software only upgrade to standard 6.1. [...] FMGS double failures After landing the aircraft's Central Maintenance System had logged a fault in No 2 FMGEC. This was removed and sent to France for data extraction and fault analysis. No fault was found within the hardware and a comparable software fault could not be reproduced on the test bench. Nevertheless, the BITE data dump showed that at 1435 hours the No 2 FMGEC had detected a CLASS 1 HARD failure within itself and a simultaneous fault within FMGEC No 1. The investigation was complicated by the involvement of several sub-contractors in the manufacture of the FMGEC and its database. [...] Airbus Industrie were aware of the double FMGS failure mode that had first emerged on the A320 series aircraft. On A320/330/340 aircraft, each FMGEC is linked to its own set of peripherals and inertial reference system. Both FMGECs achieve their own computations and exchange data through a cross talk bus. One FMGEC is declared as master and the other as slave; the master FMGEC is related to the engaged autopilot. Some data in the slave FMGEC is synchronised to the master but all data inserted on any MCDU is transferred to both FMGECs and to all peripherals. According to Airbus Industrie, there are several ways in which the exchange of data and/or a problem in one computer can affect the other computer. Often the computers reset themselves after a few seconds but occasionally a fault results in repetitive resets or attempts to resynchronise. The fifth reset relatches the computer, which will not recover without a power interrupt. Reset breakers for manual power interrupts are on the flight deck overhead panel. Dual resets occur when both FMGECs encounter failures at the same time. They generally occur after a pilot entry that involves use of the navigation database or to an event synchronised between both flight management systems. Latched double failures usually occur if pilots successively perform three inputs that cause a reset, or if an `impossible' computation of predictions occurs. Airbus Industrie have succeeded in radically reducing the frequency of double FMGS failures on the A320 series aircraft; they are also addressing the problem on the A330 and A340 series. [...] [end quote] I point out that so-called Byzantine failure modes and algorithms for avoiding them in distributed systems were first identified and studied in the 70s by my former SRI colleagues Lamport, Shostak and Pease under the auspices of the SIFT project, and since then by many, many others. As for other such topics in computer science, this was regarded as `theory' for a few years. I could hazard a guess that many aerospace engineers still have not heard about this area. The account above of the problems with the A320/330/340 master/slave FMGECs may give RISKS readers reason to inform themselves about the `theory' of how all this anomalous and possibly dangerous behavior can be avoided in the first place. Many papers in the January 1994 issue of the Proceedings of the IEEE speak about the current state of the art. Finally, the story. On the ground in Tokyo before departure, one FCMC indicated numerous faults. It is accepted procedure to depart with only one FCMC operative - they did so, and followed the appropriate procedures for calculating fuel with only one FCMC. Early in cruise, the map symbology on the commander's EFIS (Electronic Flight Instrument System) disappeared and his MDCU (Multifunction Control and Display Unit) ceased calculating. They slaved both off the copilot's DMC (Display Management Computer). The EFIS is the pretty screen in front of the pilot that tells himher which way is up, which way is forward, and which way, as well as how fast (in three dimensions), and where heshe is. The MDCU displays flight plan info, and a bunch of other things. About an hour later, they found that the commander's EFIS had restored. Logical indications from the No.2 FCMC were also restored later by `resetting the computer'. Now come a few things which one should really think about hard. Getting close to home (Heathrow), the copilot tuned in the Lambourne VOR (a radio beacon) manually, to ensure that the EFIS displays were still accurate. They were cleared to fly direct to the beacon, but a few miles east, "the commander's EFIS map display symbology froze and lost all computed data [..]. His MDCU displayed the message `PLEASE WAIT' together with a page normally seen only when loading in data before flight. He was unable to obtain any other display. At [roughly the same time], the [copilot's] EFIS and MCDU exhibited identical behavior." Notice that not all flight control info was lost from the EFIS - they could still fly the airplane. They `dialed in' the ILS using a "back-up method", and while doing so received an ECAM warning of low fuel state and instructions to open crossfeeds (airplanes like this have many tanks and fuel is pumped around between them). The warning reoccurred and readings indicated they had some 2 tonnes (2000kgs) of fuel less than expected. They discussed traffic density with ATC, and eventually declared an emergency in order to get priority for landing. They had the autopilot automatically capture the ILS, which is when they hit the sidelobe. `The glidepath bar moved rapidly down the ILS display before moving rapidly up once again; the autopilot's attempt to follow the glidepath resulting in unusually high pitch rates and so the autopilot was disconnected.' The commander informed the tower they were having problems with the glideslope and requested an SRA (Surveillance Radar Approach). In an SRA, the controller talks the airplane down localiser and glideslope continuously, by giving an uninterrupted stream of position information relative to the glideslope/localiser pair. It's very impressive. Aircraft are on `final approach' when they're lined up with the runway centerline and heading down the glideslope to land. (This should not be confused with when the flight attendant says `final approach' to the passengers, which is usually when the aircraft is even before *initial* approach phase.) The approach was for Runway 09 Right at LHR (the `09' means that it's pointing roughly 90\deg to North). The crew were on the SRA, being vectored (given magnetic headings to fly) to intercept the final approach course. They were flying a heading of 180\deg and were commanded to change to 130\deg. When they turned the heading selector knob on the autopilot, both commander's and copilot's heading `bugs' moved correctly (that's an indication on the directional gyroscope of which heading you want the autopilot to fly to and hold - I had a lower-tech version on my Piper Archer), but the flight director bars went in opposite directions and the airplane followed the false movement of the copilot's bar, and turned right instead of left. At this stage, the copilot disconnected autopilot and flight directors and flew the plane `manually' (see first paragraph for why this is not quite an accurate expression for these aircraft). They landed without further incident; after taxiing in and shutting down, the fuel indications recovered; and thankfully everyone lived happily ever after. Peter Ladkin
I'm not sure you need to invoke Byzantine failures to explain the problems reported with the double FMGS failures in the Airbus A340 and its relatives, though Byzantine-fault-tolerant architectures are simpler and more regular than others -- and might therefore be less prone to bugs. A Byzantine failure is usually interpreted as a hardware fault that cases the errant device (e.g., a sensor) to provide conflicting information to the systems that interrogate it. These faults can be masked by suitable Byzantine-fault-tolerant algorithms (invented, as Peter correctly points out, by Pease, Shostak and Lamport during the SIFT project at SRI. Incidentally, you can retrieve a picture of SIFT, and of Pease, Shostak, and Lamport via WWW at URL http://www.csl.sri.com/ft-history.html ). However, Byzantine hardware faults don't seem to be the problem with the A340 FMGS--rather, it seems to have been a plain bug in the redundancy management. And from the description, it seems that the reason there are bugs is that the design of the system is not amenable to comprehensive analysis and thorough comprehension. The great contribution of Lamport et al. was the "state-machine" approach to fault-tolerant system design (tutorial reference at bottom). The advantage of this approach is that it provides a relatively simple architecture that can provably tolerate ANY KIND of fault, up to some number. In contrast, the type of architecture used in most aircraft systems is based on FMEA, where you explicitly try to anticipate and counter each specific kind of fault. This leads to complexity, and thence to bugs, and also to the possibility of overlooked fault modes (and, more likely, overlooked COMBINATIONS of faults). The disadvantage of the state-machine approach is that it requires a lot of redundancy (3n+1 channels to withstand n simultaneous faults). This is overcome, to some extent, by the "hybrid" fault-models introduced by the people at Allied Signal who developed the MAFT architecture. (There's a paper by them in the issue of the IEEE proceedings that Peter mentions). MAFT is the only architecture for primary flight control developed by a manufacturer of these things that uses the state-machine approach. It was proposed for the 7J7 and 767X, but Allied dropped out of the bidding after Boeing cancelled these and then invited new proposals for the 777. Systems above the PFC (primary flight control/computer) level usually seem to use dual, or dual-dual redundancy rather than the quad-and-above found in PFCs. The state-machine approach may not be appropriate here, but I'd hope that ideas from modern fault-tolerant design, and from formal state-exploration and verification could add something. As an aside, the mechanisms of fault tolerance, distributed coordination, concurrency management, etc. employed in aircraft systems owe little to those studied by academic researchers. For example, Not far from there (CNRS-LAAS a research center concerned with fault-tolerance), Airbus Industries builds the Airbus A320s. These are the first commercial aircraft controlled solely by a fault-tolerant, diverse computing system. Strangely enough this development owes little to academia. (IEEE Micro, April 1989, p.6) Of course, there is little reason to suppose that academic researchers know more about fault-tolerant architectures for avionics systems than those who actually develop them, but it does mean that the architectures and mechanisms used in aircraft systems cannot draw on the extensive analyses and (in some cases mechanically checked) proofs that have been published and subjected to peer review in computer science journals. John Introduction to the state-machine approach: @article{Schneider:state, AUTHOR = {Fred B. Schneider}, TITLE = {Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial}, JOURNAL = {ACM Computing Surveys}, YEAR = 1990, VOLUME = 22, NUMBER = 4, PAGES = {299--319}, MONTH = Dec } We've done lots of work applying formal methods to algorithms for state-machine replication under hybrid fault models. [You can get the redundancy down to about n >3a+2s+m where a, s, and m are the numbers of arbitrary (Byzantine), symmetric (wrong but consistent), and manifest (obvious, or crash) faults to be tolerated simultaneously.] Examples, if you're interested: http://www.csl.sri.com/podc94.html http://www.csl.sri.com/tse93.html http://www.csl.sri.com/compass94.html http://www.csl.sri.com/cav93-hybrid.html http://www.csl.sri.com/ftcs93.html http://www.csl.sri.com/ftrtft92-jmr.html http://www.csl.sri.com/ftrtft92-ns.html overview: http://www.csl.sri.com/tse95.html See also NASA's overall program and their own work in this area: http://shemesh.larc.nasa.gov/fm-top.html John Rushby Email: Rushby@csl.sri.com Computer Science Laboratory Tel: (415) 859-5456 (hit #0 to escape voice-mail) SRI International Fax: (415) 859-2844 333 Ravenswood Avenue WWW: http://www.csl.sri.com/rushby/rushby.html Menlo Park, CA 94025, USA ftp: ftp.csl.sri.com/pub/{reports|pvs}
Regarding FBOI having an account on Netcom... While I can understand a startup company "renting cheap office space", one would think that a new financial institution wouldn't keep its valuables in the electronic equivalent of a public square. There are many, many ways to break Unix security - at the very least, any netcom sysadmin could read and alter FBOI's files, and very likely there are many hackers who know about holes in netcom's system. If the PGP decryption is to take place on netcom's computers, rather than a private CPU within FBOI's "offices", then the cleartext passphrase will exist in memory where it can be accessed by anyone who can acquire root privileges. In addition, the encrypted secret key would have to be stored online, so it can be copied by an intruder, and one can use a program (trace? I don't recall) that displays all the system calls that a program makes in order to monitor the keystrokes that FBOI's operators make while entering the passphrase to decrypt the secret key. I consider it the height of irresponsibility to operate a financial service on a public access Unix system. Netcom might consider whether they would be liable in the event of a security breach. If you could make $100,000 by hacking an OS known to be as impregnable as swiss cheese, and you didn't have the ethical sense to stop you, wouldn't you put some effort into making off with the goods? Mike Crawford
This page was copied from: | http://catless.ncl.ac.uk/Risks/16.96.html |
COPY! | |
COPY! |
by Michael Blume |