I took on a project not long ago that will sound familiar. It’s a huge, complex system that’s been running for a long time, and has had many, many sets of hands working on it over the years. There were some clear examples of last-minute hacks and tech debt that had never been paid off, but also evidence of good architectural decisions made early on that we could fall back on. Preliminary code review pointed out a number of cut corners and out-of-date practices, and we had a long list of issues to fix. I was not only a user of the system, but also responsible for bug fixes, and coordinating an upcoming swap of one heavy-traffic module for another, redirecting users to staging until the work was complete.
That sprint is starting in a few weeks, and we’ll be using the upstairs bathroom until it’s finished.
This new project is my house, which my wife and I bought in November. During the workday at Postlight, I help wrangle interesting software systems, and the more I work on this 1929 brick vernacular-bungalow-cottage-mishmash, I notice the similarities in dealing with the two. Here’s an example: the doorbell didn’t work.
Assign ticket to me
I started, as we often do, by checking the UI. The button at the door was an 80s-beige wireless thing with a broken battery door, hanging on by a piece of two-sided tape. I swapped in a new wee battery, but still no ring. So, one level deeper: the receiver/chime mounted in the dining room. I opened that to find not only dead batteries, but long-corroded batteries. I guess this had been a just-knock-real-loud house for a long time? Time to swap out the whole system, I assumed.
When I yanked off the broken push button, though, a treat: two cloth-wrapped wire ends hanging out in a hole in the frame. It was like finding hooks for the old API. Was there still something for them to talk to?
What a doorbell is
A doorbell’s a grade-school-level piece of electrical equipment with three parts: a transformer, a push button, and a chime. The transformer is wired to your breaker box and steps its 120V power down to about 15V that’s safe to touch. The transformer’s then wired to a chime and a push button, and one wire connects the push button and the chime, making a big open circuit. When a visitor pushes the button, the circuit’s closed and the chime uses that 15V to wiggle two plungers back and forth, bumping them into two little xylophone bars (Ding dong!), or just one bar (Ding!) if you have the optional back door button hooked up. Unless you’re talking wireless and don’t mind a digital simulacrum ding dong, or an IoT camera gadget, the technology hasn’t changed in a long time.
Unlike the grep-test-repeat process of tracking down unused delegations in an app, tracking doorbell wires is pretty direct. In the basement, under the front door, I found a pair of small-gauge wires and followed them to the original transformer, tucked behind a joist and still wired to the panel. The other branch went up through the floor into… the kitchen? We’d just scoured the whole kitchen, and hadn’t seen any detached wires hanging out, but I climbed up on a ladder and sure enough, laying on top of a cabinet, without room for the cover to fit, was a Nutone door chime.
The chime was, in fact, still wired, but wired wrong. Instead of a wire running to the front door and another to the transformer, making that open loop, the chime was wired to the front door and… the back door. No loop, just a useless Y. I rewired it and touched the front door wires together, and: Ding-clunk! Bug: tracked! Well, one of the plungers was broken, but that’s just an implementation detail. File a separate ticket for that.
Bug retrospective
So some previous homeowners had at some point noticed that the bell was broken and rather than track down the problem, just bought an ugly wireless set and screwed it into the dining room wall? Time for some house forensics.
Think of a big software system you’ve worked on: when do small, confusing bugs creep in? During a big rewrite, or when you’re adding significant features. Well, the kitchen was remodeled in 1989, and one of the new features was new, taller cabinets. My guess is that the kitchen contractor took the chime off the wall, installed the new cabinets, and reinstalled it in the little space left on top of the cabinets, wired incorrectly.
The homeowner might not have noticed the missing doorbell for months, and when they did, it seemed like a mystery bug. How many times have you seen that? “Hey, how long has our favicon been missing?” “I think when we added the new shopping cart the slug generation library broke. That doesn’t make any sense.” Full regression tests are hard enough in software, but I can’t imagine trying to run through a “make sure everything in the house works” checklist in meatspace.
So, what, programming is architecture, or something?
No, this isn’t one of those Analogy Explainers. What’s interesting is that so many different domains share the same processes for working out kinks, replacing modules, and maintaining big weird complex systems. You could get similar stories from a restaurant manager, line cook, stage director, or factory foreman. Since software development’s still a newish discipline, it can seem like all its problems are new, too. What’s the name of this pattern? Which refactor sequence is this? How many points should I assign this ticket in scrum? Specialty domains seem like black boxes to newcomers, but if you look at a system in these terms (One of these interconnected pieces is broken / How do I understand it? / How do I keep this from happening in the future?) there’s a wealth of wisdom available to untangle causalities. Or, y’know, not miss the UPS guy.
Drew Bell is an engineer at Postlight.
Story published on Feb 8, 2017.