Are you fire fighting all the time?

How to get out of chasing a never ending stream of bugs

2025.02

"Never ending stream of bugs" is a surprisingly common problem in many software development shops.

It's not intractable problem, but to solve it you have to tackle it on several different fronts:

Application Specs
Code Architecture
QA Automation
Development Environment

Most bugs happen because:

Specs are not well understood
Architecture is confusing

Most bugs can't get fixed because:

Manual testing is near impossible
Unit testing is not nearly sufficient

(1) Application Specs

If no one knows what the program is supposed to do, forget it. Bugs will never get fixed, and QA will never get automated. This is what you should focus on first.

Specs should be clear but not verbose. No one wants to read half a page when a sentence would suffice.

Do not let business people write them! The chief architect or tech lead should be writing them.

Specification documents are an artifact that engineering is responsible for.

The specs will derive both the code architecture and QA automation.

(2) Code Architecture

The ground truth about computers is they can only do two things:

(a) Process data

(b) Move it around (I/O)

Code Architecture should reflect this. No crazy abstractions. No service providers. No business domain.

Do not model the problem. Model the solution.

Programmers need to understand how data is flowing through the system. It should be crystal clear. No obfuscation.

Every little thing you add to obfuscate data flow will be a time bomb waiting to explode into an intractable bug.

(3) QA Automation

Forget about unit tests. They are mostly a waste of time.

Allocate time and resource for an automated end-to-end testing instead.

How do you know whether the tests are good? Here are some heuristics:

They simulate accurately how a user interacts with the program/system
They do not deal with system state: only inputs that a user can give, and outputs that a user can see.
No mocking! The test executes all the relevant code paths in the system in exactly the same way that would happen in production.
Refactoring system internal should not require changing the test.
Tests only change when you make changes to the application's behavior.

Given the above, it's easy to fix bugs with confidence.

You can check the spec to see what's their expected behavior
You can run the end-to-end test suite to see if your fix broke anything

But there’s one last piece to the puzzle!

(4) Development Environment

All the above is almost useless if it's not easy for anyone on the team to run the QA test suite with one button (or one simple command).

Local development should be seamless. Every single programmer on the team should be able to run the entire system with one simple command. They should have a quick edit/run/debug cycle.

QA should also be trivially easy to run on staging too.

When you don't have all these points taken care of, an endless stream of bugs is just par for the course. There's nothing you can do about it unless you fix the root cause.