Cutting Ship Risk

The most monosyllabic way to summarize the job of software product management is: “cut ship risk.” Software projects all start as golden towers of dreams built atop silver mountains of abstractions. But then come the useless use cases, bikeshedding, and yak-shaving, and even later—users. Things crash. Schedules slip. The CIO comes back from a conference with a new security audit and no one can use git for two weeks. These are the risks. Your job is to cut those risks.

There are obviously a lot of good books about shipping software and managing the inherent risks: Everyone knows The Mythical Man Month, or Code Complete. But one great resource that doesn’t get enough press is The Risks Digest, an online news summary of software and technology risks that goes back to August 1985. Risks is a catalog of failure and problems with software and hardware—30+ straight years of screwups, malicious hacks, dumb mistakes, mistaken good intentions, and corruption. (I find myself drifting back to it every few weeks, because it’s such a universal document of the technology industry—it’s contrarian and pessimistic, because it cares.)

At first Risks looks like just any news summary. For example, from the most recent issue:

Cisco says in just one week in February they detected 1,127,818 different IP addresses being used to launch 744,361,093 login attempts on 220,758,340 different email addresses—and that 93% of those attacks were directed at two financial institutions in a massive Account Takeover (ATO) campaign.

Okay, fine. But Risks takes the long view. Look back to 1996:

Treasury Secretary Robert Rubin has admitted to a congressional committee that his department doesn’t have an overall master plan or blueprint for the multibillion modernization effort intended to replace the Sixties-era mainframes now in operation at the Internal Revenue Service and to link IRS offices across the nation. Congressman Jim Lightfoot characterized the project as “a $4-billion fiasco that is floundering because of inadequate planning.” Secretary Rubin says the only plan that exists (and which he has not read) is a highly technical 6,000-page document that “is not what we need.” (*Los Angeles Times*, 29 Mar 1996, D1)

It goes on and on. There are thousands of things going wrong across the decades. The goal is not to point fingers but to accept that they happen, understand them, and learn from them.

The newsletter was first established by the Association for Computing Machinery in 1984, to be moderated (it still is) by Peter G. Neumann. Risks announced itself with a clear mission (emphasis added):

Its intent is to address issues involving risks to the public in the use of computers. As such, it is necessarily concerned with whether/how critical requirements for human safety, reliability, fault tolerance, security, privacy, integrity, and guaranteed service (among others) can be met (in some cases all at the same time), and how the attempted fulfillment or ignorance of those requirements may imply risks to the public.

That’s a really good list, especially when you consider it was made in 1984. Think about it in the context of 2016.

Human safety. Should a self-driving car smash into a bus full of orphans in order to not kill a squirrel? What about an endangered squirrel?
Reliability. Check out this new, unusable, but type-safe programming language. This one will make it possible to [signal cuts out].
Fault tolerance. Hold on rebooting my car so I can update my phone.
Security. They hijacked my SIMM card so they could bypass the two-factor authentication, but otherwise things are fine.
Privacy. Hi we’re the robots from Target and we’d like to talk to your uterus.
Integrity. This one doesn’t get used as much as it used to—apparently a system has “integrity” when you have “complete assurance” that all of its components are correct and reliable. I imagine people just gave up hope.
Guaranteed service. Could you get on the Wi-Fi? No? Me neither.

The whole industry is still wrestling with each of these, every day. A huge number of discussions in technology touch on these subjects. We live in a world of Risks.

There are specific things in the back issues of Risks that are very much of their era. Some of its first year was dedicated to discussing the Space Shuttle Challenger disaster, and there was also much concern that bad programming could trigger nuclear incidents (that one should probably still be in the mix). Later there was concern over Y2K bugs, and so forth. But an awful lot of our risks are universal and, well—unchanging. Here’s the NSA fighting against encryption standards in 1986, or crappy password requirements, or stupid airplane problems. In the early days you can actually watch as software eats the world:

It seems more merchandise I purchase is shoddy, and I am beginning to wonder what some of the consequences of “making the metal a bit thinner to save..” could be. I realize we are using CAD and simulation tools to make things more efficient, perhaps the case of the over efficient engine which flamed out when it flew through rain [as reposted in SEN, I believe] might be a case in point. What were our margins of safety in the over-engineering we did in the past? Any studies yet?

This is not to say that things are the same. We used to have a few hundred programming languages, and now we have a few hundred programming languages that compile to JavaScript. But…not people. People are the same. That’s what I love about looking back through this mailing list: Patterns emerge that let you think big, holistic thoughts about the technology industry.

The issue that came out after September 11th, 2001, wasn’t a round-up of stories but rather a brief statement written by the moderator. It includes two paragraphs that are worthy to read and re-read.

The Risks Forum has persistently considered risks associated with our technologies and their uses, but we often note that many of the crises and other risk-related problems have resulted from low-tech events, misguided human behavior, or malicious misbehavior. In short, the typical search for high-tech solutions to problems stemming from social, economic, and geopolitical causes has frequently ignored more basic issues. Over-endowing high-tech solutions is riskful in the absence of adequate understanding of the limitations of the technology and the frailties and perversities of human nature. Whereas there are high-tech solutions that might be effective if properly used, we should also be examining some low-tech and no-tech approaches.

One pervasive theme in the Risks Forum over the past 16 years has been the ubiquity of systemic vulnerabilities relating to security, reliability, availability, and overall survivability, with respect to human enterprises, society at large, and to systems, applications, and enterprises based on information technologies. Evidently, we still have much to learn.

We’re still living in the shadow of those events, and we’re still learning to manage huge, serious risks. And—we still have much to learn.

Paul Ford is a co-founder at Postlight.