HW4: Reflections on software failures

Our readings this week began with a focus on several software engineering failures which resulted in devastating incidents such as plane crashes (Space Craft Accidents) and radiation overdoses (Therac-25 Accidents and 2010 Radiation Follies). My initial reaction to reading these articles was, frankly, fear of entering a career where I could unintentionally contribute to this kind of harm. However, examining these incidents in detail, it’s easy to pick out several common issues that contributed to each. By studying these, we can devise practices that support security and resilience in our engineering.

First, these problems occur in part because the process of identifying vulnerabilities, potential faults, and other bugs that may lead to software failure is not easy whatsoever. In particular, it’s nearly impossible to identify all vulnerabilities that exist within a system. Unfortunately, as mentioned in Chapter 13, Security Engineering, it’s often the people attacking a system, not the engineers, who experiment with it enough to identify the most hidden of vulnerabilities. They’ll try things outside of normal activity such as entering an absurd amount of characters into a field or toying with a vehicle’s wireless communication functions (FBI Auto Warning). To combat this, an engineer has to think like a hacker. Practically, that means incorporating sophisticated testing of several types (experience-based, penetration, tool-based, and formal) to cover all the bases. The “Swiss Cheese Model of Failure” in Chapter 14, Resilience Engineering, sums this up perfectly. Imagine a system in layers, each with holes representing vulnerabilities; when the holes line up just right, the attacker can penetrate your defense all the way to system failure. So, because you have to have some holes in your system, you should test to make sure they don’t line up in a perfect storm.

On the note of testing, another lesson found throughout these texts is that as a software engineer, you should test with the real environment of your software in mind. The Therac-25 accidents occurred in large part due to a bug that did not consider a specific “if-then” scenario. There was no “then” for “if” a user switches to electron mode while the machine is setting up for X-ray mode, leaving the machine in an unknown state which led to overdose. Perhaps, working with medical professionals who understand the high-stress environment where the machine is used could have surfaced the scenario during the testing phase. Similarly, in the 2010 Radiation Follies, failure of GE to implement a (seemingly simple) mechanism that shut down operation at an unsafe dose devastated numerous lives.

2010 Radiation Follies also serves as a perfect example for the “blame game” that ensues after accidents like these. The creator of the software that overdosed hundreds with radiation (GE), laid blame on the medical professionals for failing to notice dosing levels on their treatment screens. However, technologists claimed that the GE trainers were at fault as they never fully explained the automatic radiation dosing feature during training. Ultimately, I agree with the textbook’s author’s assessment in Chapter 14, Resilience Engineering (pg. 405). The author supports the system’s approach - that good systems are responsible for including safeguards against possible human error, especially when the safety of other humans is at stake. GE should have included a requirement that would shut down the machine’s operation at a harmful level, regardless of the automatic feature’s conclusions or the operator’s attention to detail.

The reasons why software projects fail listed in the article, adeptly named, Why Software Projects Fail sums up most of the problems that led to the disasters we studied. Particularly, it explains the FBI’s failures during the VCF and Sentinel projects, which transitioned the FBI’s paper files to a digital record storage system. Of course, this project was a huge undertaking, as it combined records from all of the FBI’s distributed centers into one digital database. Just to scratch the surface, over the course of these two projects and the four FBI articles taken together, the organization wasted millions of dollars and many years due to quick turnover of (and too many) personnel, unclear requirements, most prominently, poor organization. The VCF project had 400 change requests within a year (2002-2003) and to quote the article, “every time you write a line of code, you introduce bugs…and they had a bunch of people slinging code.” Best practice, though often impractical, is to keep your team as small as possible and your processes as centralized as possible.

2021

Meeting Charleston

2 minute read

Today, I attended the Alumni Symposium. During freshman year, one of my classes had encouraged attendance to the (then in-person) symposium, but I was unable...

Chapter 9

2 minute read

The journey does not end after a software project has gone live. This week’s reading was “Continuing the Journey” - Chapter 9 of Client-Centered Software Dev...

Chapter 6

3 minute read

“Databases reside at the heart of most software applications” (SD Chapter 6, pg 168). This week’s readings cover Chapter 6 of our textbook, Client-Centered S...

Chapter 5

2 minute read

This week’s reading (Chapter 5 of Client-Centered Software Development) covers domain classes and unit/system testing. According to the text, “domain classes...

Release early and often

2 minute read

Proper documentation for both internal and external users of a software application is crucial to its sustained success after deployment. This week, we read ...

Stupid or Solid?

2 minute read

This week, we read “From STUPID to Solid Code!” by William Durand. This article is packed with high-level do’s and dont’s of programming. The “dont’s” are co...

What’s Happening?

2 minute read

This week, our class chose and reflected on articles from Software, Computer, or CoACM magazines. While perusing software magazines (finding good ones was an...

This bugs me

4 minute read

6.4. Exercise - Find the Oldest Bug Find the oldest bug that’s still open in your chosen project. Write a blog entry describing the problem, with a theory ab...

Reflections on Open Source in Today’s World

2 minute read

This week, our assignment was to explore http://opensource.com/, reading at least two medium-length articles from the site and blogging about what we learned...

Reflections on FOSS

3 minute read

This class, CSCI 462, is centered around contributing to an open-source software project through bug fixes, documentation fixes, and other improvements. Befo...

Introduction

1 minute read

Hi everyone! My name is Janneke (pronounced ‘Yah-Nuh-Kuh’) Morin.

Back to top ↑

2020

HW21: Chapter 24

1 minute read

24.6 Explain why program inspections are an effective technique for discovering errors in a program. What types of error are unlikely to be discovered throug...

HW20: Team Progress II

1 minute read

I feel like our team made great progress on the most recent deliverable (deliverable 4)! We met via Zoom more often than we did between any other two variabl...

HW19: Chapter 23

less than 1 minute read

23.6 Figure 23.14 shows the task durations for software project activities. Assume that a serious, unanticipated setback occurs, and instead of taking 10 day...

HW18: Chapter 21 and Chapter 22

2 minute read

21.4 Explain why an object-oriented approach to software development may not be suitable for real-time systems.

HW17: Team Progress I

2 minute read

This is my first reflection on our team’s testing project. I think this will be a helpful exercise as we move into the final stages of building our testing f...

HW16: Chapter 20

2 minute read

20.10 You work for a software company that has developed a system that provides information about consumers and that is used within a SoS by a number of othe...

HW15: Chapter 19

1 minute read

19.3 Why is it impossible to infer the emergent properties of a complex system from the properties of the system components? In the words of Ian Sommerville,...

HW14: Chapter 18

less than 1 minute read

18.4 Define an interface specification for the Currency Converter and Check Credit Ratings services shown in Figure 18.7.

HW13: Chapter 17

2 minute read

17.10 Your company wishes to move from using desktop applications to accessing the same functionality remotely as services. Identify three risks that might a...

HW12: Chapter 16

less than 1 minute read

16.9 Design the interfaces of components that might be used in a system for an emergency control room. You should design interfaces for a call-logging compon...

HW11: Chapter 9

2 minute read

9.8 Briefly describe the three main types of software maintenance. Why is it sometimes difficult to distinguish between them? Fault repairs to fix bugs and v...

HW10: Chapter 15

2 minute read

15.10 The reuse of software raises a number of copyright and intellectual property issues. If a customer pays the software contractor to develop a system, wh...

HW9: Chapter 8 and reflections on testing

5 minute read

8.7: Write a scenario that could be used to help design tests for the wilderness weather station system. Context: According to Chapter 7, Design and Implemen...

HW8: Mythical Man Month

3 minute read

Mythical Man-Month - Does adding more labor to a project linearly reduce the time to completion? Or does it do the opposite, particularly to an already late ...

HW7: Chapter 5 and 6

1 minute read

5.3: You have been asked to develop a system that will help with planning large-scale events and parties such as weddings, graduation celebrations, and birth...

HW6: Chapter 2

1 minute read

2.1 Suggest the most appropriate generic software process model that might be used as a basis for managing the development of the following systems. Explain ...

HW5: Chapter 4 and reflections

6 minute read

4.5: Using the technique suggested here, where natural language descriptions are presented in a standard format, write plausible user requirements for the fo...

HW4: Reflections on software failures

3 minute read

Our readings this week began with a focus on several software engineering failures which resulted in devastating incidents such as plane crashes (Space Craft...

HW3: Chapters 11 & 12

3 minute read

11.4: What is the common characteristic of all architectural styles that are geared to supporting software fault tolerance? Architectural styles geared to su...

HW1: Chapter 1

3 minute read

1.3: What are the four important attributes that all professional software should possess? Suggest four other attributes that may sometimes be significant.

HW0: Introduction

less than 1 minute read

Hi everyone! My name is Janneke (pronounced ‘Yah-Nuh-Kuh’) Morin.

Back to top ↑