Learn Debugging: History

How did we get into this mess?

The history of the system will tell you a lot about the state of it.

There are two types of state change:

the data within the system
the behaviour of the system

As these 2 interact, they produce a hellish combination of results. Data changes over time and if the behaviour of the system fiddles with that data, you end with a confusing array of behaviour and data.

So what do you do?

You need the history. Git or and versioning helps you, though be certain that you're following the right history. Multiple branches and repositories can confuse. *How then?*

Change logs are useful, if available particularly for third party software. This isn't just small libraries but major pieces of software such as the database. I've seen a change in the way a DB handles backups mess up a migration, and the history of the system (the database) is the best source of what happened.

User reports, properly filtered can be used to build a behavioural history of the system. User reports are typically lacking in information and need to be fleshed out with logs and error messages, but they do provide you with extra history. The one massive gotcha they come with is the lack of precision and, when looking at the history the date of an error - rather than the error report - can be skewed.

If the fix was deployed on Tuesday and the error report comes in on Wednesday but it fails to mention that the error happened on Monday, the error report is screwing up your understanding of the behaviour of the system. This is rarely malicious, so you need to have an excellent filter for user reports and placing what happened in an accurate timeline.

The history of data can be harder to track simply because the data is more likely to be at the mercy of your users, as they make minor changes and use the system at a faster rate that the code changes.

How detailed a history you can write depends on whether you have change logs for the data, how good the user reports are and whether you have backups of data to take you through what happened.

Using the history

What does all this tell you? The history will explain how the system behaved at particular points in time, allowing you to explain behaviour which might be impossible now.

It also tells you about conflicting behaviour, so when the system is was requested to do two conflicting things at different times, this can show why the data is behaving inconsistently.

The history is a replacement for you experience. It's substitutes your lack of experience with the system for a précis which will hopefully help you understand what's going on.

This is why really good debuggers and coders seem to spend a little time reading and listening before jumping into fixing bugs. They get and understanding of what's happened on the system, how its history suggests it should be behaving and they uses this to understand how it's really behaving now.