Sunday, October 21, 2012

Process Bugs

Context: A production bug is reported by 55 customers, causing a complete rollback of the new release in most of them, while some others halted their online services waiting for a hot-fix.
Adam (the PM), shouting: Who is responsible for this terrible bug?
Amr (the team leader): let's concentrate on resolving it and get these customers up and running again.
Adam, frowning: OK
After that the team spends more than 18 hours of continuous work fixing, testing, packaging, deploying, re-testing, etc. Finishing the last touches early next morning, Adam entered the team room:
Adam: Now, it was a horrible day yesterday, and I would like to know exactly what led us to this situation.
Mona (tester): developers are always careless about their code.
Sarah (developer): No way, it was due to your bad testing. If we are careless, Isn't it your responsibility to be careful and catch such tiny bugs?
Adam is listening ...
Yousef (developer): Hey, are we going to keep blaming each other and forget about the tight schedules, late notices, and minute changes, that we are always having?
Seconds of silence passed while all are staring anywhere not accepting any responsibility.
Amr: It was a process bug.
Adam: what?
Amr: A process bug. It has nothing to do with the team. One person or even the whole team cannot be blamed for such a recurring problem report. We keep looking for a person to shout at and forget about the root cause of the problem, which is a process bug.
This is a typical discussion that takes place in software development houses.  The last note is interesting and worth attention. Process Bugs are holes in the software development process which give way for bugs to slip into customer sites. Process bugs are what I always blame for bugs occurring at production. It is the development process which didn't help us prevent bugs from getting injected into our code. It also didn't help us discover these bugs and get them resolved before they reach customers.

It is very bad that we always keep thinking of which team members to blame. What is worse is our thinking that blaming really resolves the root-cause of the problem! If team leaders completely ignored these process bugs, and kept the deadly habit of team blaming, the following dynamic will definitely occur:

The deadly cycle of bug-blame-stress

If bugs occur and the team are blamed for it, they will lead to higher levels of stress; which will usually result in more bugs, more blaming, more stress, in an endless loop manner. Actually, this is an example of 'positive feedback loops' in which factors re-enforce themselves and result in system expulsion after a while. Examples such expulsions may be burn-out, getting out-ranged, quit development, and similar actions.

I have seen excellent software engineers quit the software industry because they lost confidence in themselves. Why? due to continuous public and personal blaming :(

In the bug case mentioned above (which actually happens every day), the following is a very modest listing of possible process bugs:

  • Code review is never planned. The team hardly find anytime to review.
  • Testing issues are reported by mail, and get lost every now and then due to complexity of managing hundreds of daily mails in the team leader's inbox
  • Bugs are assigned to module owners, whom may be very busy responding to customers or developing new features in another project. 
  • Team had several burn-outs, which resulted in more stress on the remaining team members!
  • ...
The key take-away of this blog is to look for root causes in process rather than looking for someone to blame. Blaming may relieve managers' stress and bad temper, but it will lead to team expulsion very soon!

Instead of blaming, a sound management question for the team should be similar to the one I described in this blog title: Take your time to pay for this technical debt, but let me know how you would prevent it in the future! As I said in this blog, part of fixing a problem is preventing it from re-occurring. In other words,  fixing the process hole that led to it.