What Can We Learn From History’s Most Bizarre Software Bugs? – 1v1 Video Chat & LIve Streaming & Influencer Subscription

“I’ve always been fascinated by how much we humans run on autopilot,” Mia Bajić says in an email interview. So last month for PyCon US 2025, the EuroPython Society vice chair identified “the most bizarre software bugs in history.”

And sometimes they were even delightfully illustrated with stick figures…

But Bajić — who is also a software engineer at data-management platform Ataccama — says the larger goal wasn’t just to recount all the bugs, but also “to understand what they taught us” — about how we can build more resilient systems, and how we can write better code. It’s a gallery of horrors, filled with cascading failures, unforeseen edge cases, horribly flawed assumptions, and the disastrous perils of insufficient integration testing.

If the mantra for programmers is to always be learning — then what is there to learn from history’s most bizarre software bugs?

The Dangers of Complex Systems

Bajić started with the horrific story of the fatal crash of a Boeing 737 MAX 8 in Jakarta in 2018. The pilots weren’t fighting turbulence, but “a few lines of code,” as Bajić’s tells it. Boeing wrote software to automatically correct for a dipping plane, which was then erroneously trigged by a malfunctioning sensor — and the pilots didn’t know how to disable it. Among the lessons to be learned: Software shouldn’t do things users aren’t aware of…

But later in the talk, Bajić emphasized that while single points of failure are dangerous — and presumably tested for — what’s harder to spot is these chain reactions, a cascading series of failures.

“It’s never just one thing that causes failure in complex systems.” In risk management, this is known as the Swiss cheese model, where flaws that occur in one layer aren’t as dangerous as deeper flaws overlapping through multiple layers. And as the Boeing crash proves, “When all of them align, that’s what made it so deadly.”

It is difficult to test for every scenario. After all, the more inputs you have, the more possible outputs — and “this is all assuming that your system is deterministic.” Today’s codebases are massive, with many different contributors and entire stacks of infrastructure. “From writing a piece of code locally to running it on a production server, there are a thousand things that could go wrong.”

But having said that, history also shows us there’s still situations where a lack of testing becomes glaringly obvious…

Tested in Production?

Bajić asked the audience to imagine they’re the Google engineer who in 2009, updated the company’s Safe Browsing service (which warns about unsafe websites before browsers visit them). “And then you realize that every single website on Google is marked as dangerous — including Google itself!”

Yes, regardless of what you searched for on Google, every result came with a warning that “This site might harm your computer.”

The problem? Google’s engineer had typed a forward slash — by itself — which apparently flagged every single file below the home directory. “What we can learn from this is that typos happen — and sometimes testing more is the answer.”

But another bizarre bug drives the point home: That terrible things happen when you don’t test enough. In 1999, NASA lost a $125 million Mars orbiter that was “about the size of a small car” — packed with “lots of sensors and lots of electronics” to give Mars a kind of weather satellite.

After 10 months of interplanetary spaceflight, it came to its glorious arrival at Mars. But as mission controllers waited back on Earth for its signal, “Instead, there was silence… Minutes passed. Hours — and still nothing.”

Eventually, they realized what had happened. While the probe was meant to orbit 110 kilometers above the surface of Mars, “instead they found out that it had dropped to 57 kilometers… and at that altitude the Martian atmosphere tore it apart.” The orbiter never completed its circuit around the far side of Mars, and in fact, never orbited.

CNN reported that the bug happened “because a Lockheed Martin engineering team used English units of measurement while the agency’s team used the more conventional metric system…” One team measured the thrusters’ firing impulses in pound-force seconds, while the other used an equivalent unit from the metric system, known as Newton-seconds.

And Bajić again illustrates this problem with stick figures.

It was obviously a communication failure, “because NASA’s navigation team assumed everything was in metric.” But you also need to check the communication that’s happening between the two systems. “If two systems interact, make sure they agree on formats, units, and overall assumptions!”

But there’s another even more important lesson to be learned. “The data had shown inconsistencies weeks before the failure,” Bajić says. “NASA had seen small navigation errors, but they weren’t fully investigated.” This specifically points to the importance of integration tests. “If NASA had simulated the navigation with the actual data units in place, they might have caught a discrepancy before launch…”

On a positive note, this leads Bajić to the lessons from how NASA handled the after-incident report. They learned from it, and improved their processes.

“After this disaster, NASA standardized units across the organization, enforced stricter checks, and improved inter-team communication protocols.”

What’s in a Name?

Bajić also recounts the legendary tale of the Sun Microsystems employee who kept mysteriously disappearing from the company’s databases back in the mid-1990s. “People started investigating, only to discover that the issue wasn’t a system failure. It was his name.” And the audience laughed when told that the employee’s name was…

Steve Null.

“And it turns out that some systems back then didn’t handle the string Null properly…”

Going the extra mile, Bajić even attempted to recreate the issue in Postgres here in 2025. “It works fine — but you can still find bugs like this in various systems.” (Bajić located a similar open/unresolved bug for the Apache Flex software development kit.)

And by the end Bajić is reenacting the famous XKCD comic about a public school student whose parents named him Robert’); DROP TABLE Students;–

Of course, data-checking code can have its own bugs. “Some automated systems will delete all records that start with test or abcde because they assume they’re test data…”

But then Bajić puts up a slide noting that over 300 children were indeed named Abcde between 1990 and 2020. Business Insider notes that 32 children were named Abcde just in the year 2009, with other sites suggesting it’s popular with parents “who are considering unisex or non-gendered baby names.”

The lesson? When handling user inputs, consider edge cases — “because real-world data can surprise you.”

Assumptions — and Spreadsheets

And what bizarre bug caused a textbook to be listed for $23.7 million on Amazon? Blame Amazon’s automatic price-setting tools, which inadvertently triggered an infinite loop of increases, Bajić explains.

“One seller set their rule to always be 0.07% cheaper than the next lowest price. Another seller had a rule to always be 27% more expensive than the lowest option.” But with only two listings for the book, soon the first seller’s price was jumping up 26.93%, making the second seller’s price jump 53.93% higher, “and their pricing algorithms got stuck in a loop… until the book was listed for $23.7 million.

The lesson? Be careful about your assumptions.

And did you know there are genes with the names SEP15 and MARCH5? Microsoft Excel didn’t. Its default settings in 2016 autocorrected those text strings into date format. One team of researchers found that in top genomics journals, “approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.”

Excel was just trying to be helpful, Bajić suggests — joking that “Sometimes it’s hard to say what is a bug and what is a feature.” But other times, it’s pretty obvious. In 2012, a spreadsheet error ultimately cost J.P. Morgan $6 billion, Bajić says, “because someone added two numbers together instead of averaging them.”

The lesson here? Spreadsheets need to be debugged too — just like any production-ready code…

The Worst Bug of All

Perhaps the ultimate bug happened in 2011, for Linux users installing the Bumblebee daemon to manage their Nvidia Optimus chipsets. The install script had meant to remove a specific directory, and the Linux command for removing files is rm. (And adding the flags -rf makes the removals happen recursively — removing files in all subdirectories — while doing it without prompting the user for confirmation.)

So — can you spot the difference between these lines of code?

rm -rf /usr/lib/nvidia-current/xorg/xorg
rm -rf /usr /lib/nvidia-current/xorg/xorg

Sure enough, back in 2011, one missing space (after /usr ) led to angry issues on GitHub like “Totally uncool dude!!! The script deletes everything under /usr. I just had to reinstall Linux on my pc to recover.”

Bajić enjoyed sharing some of the 884 sarcastic comments that greeted the maintainer’s apologies…

“no more lack of disk space now.”

“I didn’t like that folder anyway.”

But in the end, Bajić told me, her talk had a specific message for fellow Python developers. “The world is complex, and so are the systems we build.

“When something goes wrong, it’s rarely just one thing. It’s usually a chain of small things lining up in just the wrong way.”

“Sometimes, better testing can catch it. Sometimes it can’t.”