Sunday, July 31, 2011

On the meaning of truth

I am by no means a guru when it comes to the practice of Configuration Item (CI) management, for the longest time I though a CMDB was a splinter cell of the Berkeley DB project. (thank you, thank you, I'll be here all week!) I do, however, have had the poor fortune to be involved with more than one discussion on how hard it is to get a view of a CI that is mostly accurate.

In short: truth is relative, and you can't trust anyone who tells you otherwise.

Let's take a simple example of how difficult this is: a simple server CI. That server can pretty reliably describe itself. It knows it's own hostname, it knows it's IP addresses, it knows what filesystems it has, and if software got installed remotely sanely, it can describe what software is installed on it.

But consider the next level of complexity when you start having to make decisions about what data is primary. Let's say that same server has two active interfaces: one on the production network that is used for general access to the machine, and one on a backup network that is only used to pass traffic during backups. Which one is primary? How can a piece of software (a CI discovery tool) make a decision about something which to an operating system appears to be two equally valid (and important!) device configurations, but which are radically different to the users/administrators of that system? If you're like me, you care that both of those network interfaces are up, but what I really care about is that the production network interface is up and functional. Ergo, it is a more important device configuration that the backup interface.

But operating systems are dumb. CI discovery tools are dumb. There's a vendor I know (that shall remain nameless) that includes a pattern of recognition in it's agentless discovery mechanism that uses (in part) the lowest MAC address to determine which is the primary network interface.

It should remain an exercise left to the reader as to how badly this can confuse people who are interpreting the output of that system, and why childish namecalling and baseless accusations as to the participants heritage and lineage can result.

In short: this is hard work. It gets easier when you have standards that you implement solidly and uniformly: say either by automating the server build process and/or a public flogging or two of people who Fail At Implementing Standards. (Pour encourager les autres, if you will). If you can get another data source that is systematically gathered (eg, start to triangulate from agent and agent-less discovery mechanisms), then your ratio of truth is going to go up. What's a CMDB to do? The agent and the agentless discovery agree on 75% of the data, therefore it's mostly right. BACK OFF, HUMAN!

But it still takes humans to decide that something is more/less/equally as important. Anyone who tells you that their discovery mechanism can do that without human intervention is selling you something that doesn't exist.

No comments: