The shape of the Year 2000 problem around the world is becoming clearer, as many companies finish their building their inventories of affected systems and processes, and are able to assess the time and resources they will need if they are to reduce their risks to the minimum. For two years, I led Year 2000 services for one of the world’s largest global management consultancies, seeing projects in most industries and in many of the world’s leading economies. This is a snapshot of what I have learnt.
Several problems come together in the next three years.
For hundreds of years, people have abbreviated dates by omitting the century, causing ambiguity and confusion for historians and archivists. In the 1950s and 1960s, as computers were used more and more for business data processing, it was inevitable that this convention would be carried forward. Storage space and processor cycles were scarce and expensive, and the cost of any potential ambiguity seemed insignificant. Few programs had to handle date ranges that spanned two centuries, and those that did (such as pension administration) were either written to cope, or they soon encountered problems and were corrected.
As we reach the end of this century, most programs will need to manipulate date intervals that cross the century boundary. When the year is only represented by two digits, files that are sorted by date will have "00" records added at the front rather than after "99". Calculations that subtract an earlier date from a later will get a negative result and fail. Comparisons of dates in different centuries will give the wrong answer, so that a credit card that expires in 01 seems 98 years out of date in 99, whereas one that expires in 99 may seem valid in 01 (and for a long time afterwards). Similar problems arise with the shelf lives of perishable foods and medicines.
Throughout the 50-year history of computing, whenever there was the possibility of a serious problem, programmers have found many creative ways to make the problem worse. The "two-digit year" problem is no exception: year values of 99 and 00 have been used with special meanings or to mark invalid fields. Programmers designing user-friendly systems have assumed that if the year field is typed as 00 up to 09 then what was meant was 90 to 99, because the 9 key is next to the 0 key and these are common typing errors. Some programmers, knowing that century years need different leap year processing, have then made mistakes in the calculation and lost February 29th 2000 (1).
There is also a separate, coincidental problem with the real-time clock in PCs, which may reset to 1900, 1984, 1980 or some other date instead of ticking over into 2000 successfully at the end of the century. This will not usually cause a problem as the BIOS in most recent PCs will detect the error and correct it. At worst, someone may have to reset the clock once manually. However older PCs and those with a faulty BIOS may need the correct date set every time they are powered up, and if the PC is being used to control some process directly, with the time taken straight from the real time clock and not through BIOS calls, any clock failure may have more serious consequences. PCs performing critical applications will need checking and may have to be replaced. (2)
Three quarters of all mainframe applications have year 2000 faults.
Finding the errors, making corrections, recompiling, re-linking, testing,
integrating and further testing will cost a great deal of money: somewhere
between 25p and £1 per line of software, depending on how professional
the IS department is. Unfortunately, most large companies cannot reliably
rebuild all their mainframe applications from program sources, even without
the added problem of needing to change 10% of the source lines. The latest
changes do not appear in the master source libraries, or parts of the system
have not been recompiled for so long that they need an obsolete version
of the compiler. (3)
The stories are depressingly common, and the lack of basic software engineering
disciplines will probably double the final cost of the Year 2000 problems.
Even so, an average of $1 US per line of software source may not seem
an enormous cost, but many companies have tens of millions of lines of
mainframe software source, and some have billions. An unforeseen expense
of over $1 billion, with no business benefit, may not fatally wound a Fortune
500 company - but it is certainly painful and represents a volume of work
that is unlikely to be funded, staffed, and completed successfully before
systems start to fail.
So perhaps it is surprising that mainframe applications are not
the biggest part of the Year 2000 problem.
Mainframe applications are usually managed by teams of programmers who
know their systems well and who are able to change them and rebuild them
competently. This may not be true for departmental systems (e.g. stock
control), desktop systems (e.g. spreadsheets, laboratory systems), factory
and warehouse automation, EDI, or communications systems. These present
greater difficulties, because they may have been acquired or developed
informally, the original vendor or developer may have disappeared, and
the system may not be well understood by anyone.
(4)
Year 2000 problems also exist in security and access systems
(5),
in air conditioning management and building control, in vital control systems,
such as those driving industrial gas valves or monitoring temperature in
power stations, and in engine management systems, alarms, and consumer
products (6).
The list of potential areas
of risk is almost endless. It is already far too late to find and correct
all the faults in these "embedded" systems, but some will be critical to
safety, the environment or the business, and must be given priority for
diagnosis, correction or replacement.
Even if the business has very modern systems, thoroughly checked and
warranted free of Year 2000 problems, there could still be trouble. Customers
may be unable to pay invoices for lengthy periods. Suppliers may fail,
perhaps several of them at once. Business partners may have to switch from
electronic data exchange to paper. Essential utilities may be interrupted.
Year 2000 is not especially a mainframe problem, or even an IT problem.
It is unfortunate that the 21st Century Date Problem was
not called the 1999 problem, or even the 1998 problem, since that is when
many systems will first fail. Too many companies are still saying "we know
that we have a Year 2000 problem, and next year we will put something in
our budgets to sort it out".
For most companies, systems will start to fail in 1988 or 1999 if they
are not failing already. The critical time, for every application, is the
first moment that it encounters dates in the 21st Century. From
that point forward, errors could occur at any time. They may cause application
failures, they may cause wrong results that are obvious, or the failures
may be much subtler. Wrong data may be calculated and stored or passed
to other systems. Records may be sorted into the wrong sequence and processed
twice or ignored (7).
It makes sense to talk about the failure horizon for each application
or item of plant or equipment. Some of these dates will be much closer
than you expect; some may even have passed.
Year 2000 affects the whole business, the deadline is immovable, and
resources are limited in every company. Inevitably, important business
investments will have to be delayed or abandoned if the year 2000 project
is to be given the resources it needs. In most companies, only the executive
committee or the Board can take such decisions. Auditors are already commenting
on year 2000 readiness in their reports to audit committees. Soon they
may have to start qualifying companies’ accounts. There may be issues affecting
legal regulation, Health and Safety legislation, and litigation risk. Insurance
cover for Year 2000 damage is limited and, in some cases, has even been
withdrawn completely, leaving companies and individual Directors exposed
to the possibility of crippling damages. Few IT Directors have the breadth
of knowledge and executive authority to make the necessary decisions on
behalf of the Board. Year 2000 is not an issue that can safely be left
to the IT department.
Year 2000 is a very significant member of a family of date related problems.
The GPS Global Positioning System overflows an internal clock field in
August 1999. Countries that use local calendars have similar problems on
other dates - for example, some Japanese systems used a calendar based
on the years of the emperor’s rule. The hardware clocks in most (perhaps
all) processors and the date fields in most operating systems overflow
at some time - one such problem occurred last Autumn. Then there is the
Year 10,000 problem - but that can wait for a later issue of Computer Journal.
The problems that have been created by incorrect date programming are
very different from each other and embedded in almost every form of electronics
technology. The corrections that have to be made, and whether they can
be made at all, differ for each application. There will be no magic solution
(8).
There are tools that can be very cost-effective
in helping with parts of the problem: preparing an inventory of software
on a particular hardware platform; scanning code for suspected date processing;
managing test data or controlling versions of source code. These tools
can save more than half the effort that would otherwise be spent in some
phases of the Year 2000 programme, and the cost estimates given earlier
assume that tools will be used. Nevertheless, most risks can only be identified,
prioritised and managed by people who understand the business and its processes.
It is very difficult to get accurate information about the scale and
nature of Year 2000 problems nation-wide or world-wide. At the end of 1997,
it seemed that most companies had not finished their inventory of Year
2000 risks, so they had insufficient information to be sure what the problem
would cost them, or whether they would get all the necessary work completed
in time. Not surprisingly, most companies initially underestimate the scope
and cost of the work needed, so budgets and timescales are constantly revised
upwards. Those surveys that have been published have all depended on data
from questionnaires filled in by companies themselves, without independent
audit. The surveys are inevitably based on incomplete information and optimistic
estimates.
Organisations are not good at delivering complex projects on time and
within budget. Estimates vary, but it seems that more than 75% of projects
are late or over budget and that many of the remaining 25% deliver less
than was originally intended. Year 2000 has fixed deadlines and scope;
it seems inevitable that a lot of the desirable work will not get finished,
that testing and other quality management activities will be skimped, and
that unplanned failures will occur.
Internationally, the level of awareness and action differs greatly from
country to country. My impression from my own international experience,
which is supported by the leaders of Year 2000 services in other major
consultancies, is that the USA and other English-speaking countries are
generally ahead of the rest of the world, but that even these countries
still have a large part of their economic activity at risk. In continental
Europe, preparations for European Monetary Union have taken priority over
Year 2000 work. In Asia, awareness of the issue is at an early stage, although
the problems exist in the same form as they do elsewhere. Central Europe
and Russia seem to have major problems, as do South American countries.
The evidence is weak, in that it is anecdotal, but it is quite consistent.
The problems are far from being under control.
Computing professionals created this problem, and we have a responsibility
to do what we can ensure that the risks are well understood, that priority
is given to the most important areas
(9),
and that nothing like this is ever allowed to happen again.
The individual can ensure that their employer is made aware of all the
Year 2000 risks and the actions that should be taken. It is particularly
important that companies that play an important role in the national infrastructure
keep functioning; if there is a risk that they may fail, their customers
need to know the worst feasible outcome so that they can be prepared. A
free flow of credible, auditable information is essential
(10).
In another, very different way, individuals’ actions will play a major
part in what happens in 1999 and 2000. If there is long-term disruption
at the end of the century it could cause serious damage to the health,
wealth and security of individuals and families, so if there is a belief
that such damage is a real possibility, people will sensibly try to protect
themselves. Consumers will stockpile food, water, and fuel. Investors will
want to hold real assets of assured value. This is rational behaviour and
feeds on itself. As shelves empty and shortages occur, more people will
buy when they can, for fear that they may not have the opportunity later.
The same factors will influence professional forecasters. If a bank
is unsure that a company will survive undamaged, it will be more likely
to reduce the company’s overdraft limit than to agree to additional credit,
yet this will be at just the time when the company may need help to get
through a cash crisis if customers have system failures and cannot pay
their bills on the usual timetable. Each investment fund manager will have
to decide whether they want their fund to be invested in a potentially
falling market, or if they would rather sell and wait for buying opportunities.
If they decide to sell, they must act while other managers still want to
buy, so markets may become increasingly unstable.
It would be satisfying to be able to end with reassurance; a plan of
action perhaps, and some confident predictions. Unfortunately, over the
two years since I first became involved in Year 2000 issues, my fears have
grown alongside my understanding, and the reports in the press have become
steadily worse. The only solution that I can see is that we continue to
prioritise and to address a million critical problems with diligence and
urgency.
To quote Franklin Roosevelt in 1942: "Never before have we had so little
time to do so much."
© Martyn Thomas 1998. This article appeared in the
Computer Journal 41(2), 1998.
Martyn Thomas is Chairman Emeritus of Praxis Critical Systems Ltd.
He can be contacted at:
mct@hollylaw.demon.co.uk
(1) A year is a leap year if it divides by 4 with no remainder, unless
it is a century year, which must divide by 400 with no remainder. If a
programmer ignores the century rule, 1900 and 2100 will be incorrectly
identified as leap years (which they are not), but 2000 will be treated
correctly. (2) It is important to prepare carefully before testing a PC by changing
the system date to see what happens, as there may be unexpected consequences.
A company that was using a PC-based email client tested its PCs in this
way last year, only to discover that the licence for the email software
expired. On resetting the date to 1997, the email software still would
not work because it maintained a secret record that it had already expired,
as security against a dishonest customer resetting the PC clock to an earlier
year so that they could continue to use the software after the expiration
of the licence. Even reinstalling the software did not clear the problem;
it was necessary to reformat the disks and reinstall Windows.
Some level of check can be made with one of the utilities that have
been designed to test the PC clock and BIOS. It is important to understand
what these utilities are actually testing, as the various utilities report
different results on many PCs. (3) Ideally, a company will keep a register of all its systems, and
changes to these systems will be rigorously controlled, to ensure that
there is full and up-to-date design documentation, that all changes are
approved, documented, and made by modifying the design and the program
source, recompiling, rebuilding and thoroughly testing, and that the test
suite is kept up to date. These change control processes should also ensure
that all the files, compiler versions and other tools needed for successful
rebuild of every system are kept in a working state.
Unfortunately, some IS managers have decided that these disciplines
are not necessary or cost-effective (perhaps in response to time pressures
and lack of resources). Their companies are now paying the price for this
lack of professionalism. (4) In one factory, the critical testing tools were calibrated daily
using a program that had been written some years before by an engineer
who had now left. The calibration program was essential to the manufacturing
process and the binaries existed on the C: drive of several PCs, but no
source code or documentation could be found. When the program was tested
with a Year 2000 system date it crashed. (5) Security systems may use cards containing an expiration date that
only has a two-digit year. The first sign of trouble may be when new cards,
expiring in the next century, are rejected by the system. There is a more
insidious fault: when 2000 comes, all the expired cards may suddenly become
valid again. (6)I have deliberately chosen examples of applications that have been
shown to contain serious faults.
For example, a UK power generator reported at an industry Year 2000
meeting that they had carried out some Year 2000 testing on a UK power
station during routine maintenance. One test involved setting system clocks
to the end of the century and watching what happened when they ticked through
midnight on December 31st 1999. Shortly after the start of the
new century the power station was shut down by a temperature monitor in
a cooling stack. The temperature monitor had been programmed to respond
to average temperature readings over a 20-second period, which it calculated
by taking readings every second and time-stamping them. When it attempted
to average readings that spanned two centuries the calculation went wrong
and the monitor tripped, shutting down the station.
As a single incident this would be embarrassing, but probably not disastrous.
When every power station using the same monitor, in the same time zone,
trips simultaneously, the consequences could be more severe. More complete lists may be found on the Internet, for example at http://www.compinfo.co.uk/y2k/examples.htm
and http://www.iee.org.uk/2000risk.
(7) There have already been a number of illustrative failures. A consignment
of tinned meat that arrived at Marks and Spencer in 1996 had a bar-coded
expiry date in 2000, represented as "00". The stock control system treated
this as 1900, making the meat 96 years past its sell-by date!
A labelling system for pharmaceutical products failed when it was first
required to generate Year 2000 expiry dates, brining the production line
to a standstill.
A packing system was programmed to dispatch the products with the shortest
shelf lives first. It packed all the products that expired in "00" before
those expiring in "99".
An advisory system monitoring nuclear waste storage recommended the
release of radioactive waste several years too early, because the calculated
number of half-lives led to a release date in 2002 ("02"), which was then
treated as 1902. (8) Every few weeks, someone publicises a "new idea" to solve the Year
2000 problem. These ideas have ranged from automatic scanners that will
automatically modify software, to resetting all calendars back 28 years
at some agreed instant before 2000. It should be clear that no automatic
scanner could do enough of the task, in general. The suggestion that the
calendar should be set back a multiple of 28 years (to preserve the correct
day of the week) is ingenious but, on consideration, seems as difficult
to implement safely and consistently as other solutions, and creates other
difficulties. Nevertheless, this may be a viable tactic for isolated equipment,
such as some domestic video recorders. (9) Any Government that understands the seriousness of the Year 2000
crisis will want to ensure that it does not make the problem worse by passing
legislation that requires big changes to computer systems in the next two
or three years. At the time of writing, in March 1998, no Government has
announced such a moratorium. (10) The three-day week during the UK miners strike in the early 1970s
showed how effective planning can be. Despite the absence of electricity
on two week days, industrial output actually rose.
The worst outcome of Year 2000 would be major, unexpected, long lasting
failures of power distribution, water, healthcare, communications, security,
transport or emergency services. Short term failures are sufficiently likely,
even under normal circumstances, that most organisations have adequate
contingency plans. Hospitals and air traffic control centres have stand-by
power generators, but they only store enough diesel for a few days, and
building new storage tanks will take time. In the extreme, if food distribution
and emergency services were to break down for an extended period, there
would be risk of serious public disorder.
There is not yet enough information available to rule out such a possibility.
Myth 3: Year 2000 is not yet urgent
Myth 4: Year 2000 is an issue for the IT Department
Myth 5: Year 2000 is the only date-related problem
Myth 6: There will be a magic technical solution
Myth 7: The problem is under control
Myth 8: There is nothing the individual can do
[back]
[back]
[back]
[back]
[back]
[back]
[back]
[back]
[back]
[back]