Musings on Technology: July 2011

Sunday, July 31, 2011

On the meaning of truth

I am by no means a guru when it comes to the practice of Configuration Item (CI) management, for the longest time I though a CMDB was a splinter cell of the Berkeley DB project. (thank you, thank you, I'll be here all week!) I do, however, have had the poor fortune to be involved with more than one discussion on how hard it is to get a view of a CI that is mostly accurate.

In short: truth is relative, and you can't trust anyone who tells you otherwise.

Let's take a simple example of how difficult this is: a simple server CI. That server can pretty reliably describe itself. It knows it's own hostname, it knows it's IP addresses, it knows what filesystems it has, and if software got installed remotely sanely, it can describe what software is installed on it.

But consider the next level of complexity when you start having to make decisions about what data is primary. Let's say that same server has two active interfaces: one on the production network that is used for general access to the machine, and one on a backup network that is only used to pass traffic during backups. Which one is primary? How can a piece of software (a CI discovery tool) make a decision about something which to an operating system appears to be two equally valid (and important!) device configurations, but which are radically different to the users/administrators of that system? If you're like me, you care that both of those network interfaces are up, but what I really care about is that the production network interface is up and functional. Ergo, it is a more important device configuration that the backup interface.

But operating systems are dumb. CI discovery tools are dumb. There's a vendor I know (that shall remain nameless) that includes a pattern of recognition in it's agentless discovery mechanism that uses (in part) the lowest MAC address to determine which is the primary network interface.

It should remain an exercise left to the reader as to how badly this can confuse people who are interpreting the output of that system, and why childish namecalling and baseless accusations as to the participants heritage and lineage can result.

In short: this is hard work. It gets easier when you have standards that you implement solidly and uniformly: say either by automating the server build process and/or a public flogging or two of people who Fail At Implementing Standards. (Pour encourager les autres, if you will). If you can get another data source that is systematically gathered (eg, start to triangulate from agent and agent-less discovery mechanisms), then your ratio of truth is going to go up. What's a CMDB to do? The agent and the agentless discovery agree on 75% of the data, therefore it's mostly right. BACK OFF, HUMAN!

But it still takes humans to decide that something is more/less/equally as important. Anyone who tells you that their discovery mechanism can do that without human intervention is selling you something that doesn't exist.

Sunday, July 24, 2011

On why change management is a good thing

I'm firmly of the belief that change managers (in the ITIL sense, not the MBA fellatio that gets thrown around to mean "someone who changes everything while waiting for the music to stop and their promotion to be posted") are the unsung heroes of the IT world.

See, I cut my teeth in Big Operations under a draconian system called "Permit To Work". It was as close to the Spanish Inquisition as I hope to ever come, and involved tests of willpower, documentation, and in many cases, nerves, to get permission to touch something in a datacenter.

What that did for me was appreciate what Change Management really does. What the Change process really does is force you to examine yourself, your motives, your plans, and your willingness to accept risk based on those plans before you alter a configuration item in your data center. For people who are used to doing this off the cuff (or the wrist, in certain cases, but it might make you go blind), it seems like useless drudgery and paperwork that only adds to the pain of your existence and may or may not steal your soul.

I don't see it that way. I relish the prospect of sitting down and thinking through what I'm trying to achieve, how I'm going to go about it, how I would test to make sure I was successful, and what I would do with the balloon goes up and things start to go pear shaped, and fast. That's value add, but only if you see the change process in the correct light.

My advice to you is: the next time your Change Manager hands you your plums on a platter for not filling out the right paperwork, instead of getting mad and planning revenge, thank them for the valuable criticism and promise to do better next time. And then do better next time! You'll save yourself a whole bunch of time, you'll be better prepared, and I think a better person all around.

A revelation about support tickets

I have a confession to make: I hate opening trouble tickets.

Really I do. Almost with a passion. But what I REALLY hate is when I hold my nose long enough to open a problem ticket and discover that the support person who receives said ticket ignores what I put into the trouble ticket, and asks me either for the same information or worse, to repeat the troubleshooting I have already done.

That torques me, and I'm not talking about Torx bolts or foot pounds at the differential.

See, I try to be different that your average bear when opening a trouble ticket. Mostly because I know how painful it is to experience a problem, swallow your pride and admit defeat, and open a trouble ticket to someone who you want to fix the problem. So when I get a bad response to a ticket, I take it almost personally. It's as if by your response you have attempted to roschambeau me for my ignorance. And I'm really not ignorant, I'm just out of idea on how to fix a problem.

So my request to the support people of the world is this: if you get an exceptionally well documented ticket (as I am wont to deliver you), please take that extra moment to read it, appreciate the effort I've put into attempting to clearly articulate the problem I'm experiencing and the things I have done to diagnose the problem and potential solutions, and then skip past the first part of the script where you ask me my name, rank and serial number so that we can get right onto fixing the problem.

Fair? I'll take silence as acknowledgement.

On attempting to understand the problem

For most of my technology professional career, I have been a technology guy. For every problem someone brought me, there was a technological solution to it. That's mostly true: there's always a way to solve a problem with technology. But recently, I've found myself communicating quite differently about problems, using the following phrase more than once:

"${TECHNOLOGY} is a how, not a why".

This got me an over-breakfast high 5 from a colleague who I greatly admire, and it was like a light bulb went on in my head. I have almost completely revolutionized my thinking to reframe discussions against the "how vs why" thought process.

For those who don't know, I'm leading a fairly sizeable data center automation (DCA) initiative at my employer. The concept of "how vs why" is a key element to work stream management. I have begun to drive the automation platform beyond the narrow implementation scope that I have been charted with, and the conversation inevitably goes something like this:

PersonX: ... and that's why I want to automate task X.
Me: Ok, so you want to automate task X. Why?
PersonX: huh?
Me: Why do you want to automate task X?
PersonX: buh?
Me: I'm not getting through to you... what business value will you derive if I were to automate task X?
PersonX: snuh.
Me: I'll come back when you're feeling better.

(I kid. Mostly)

The truth of the matter is, technologists are trained to look at things as having intrinsic value. If you can save 5 minutes by automating something, why wouldn't you? But the point that is in progress of being missed is that not everything has intrinsic value. If you're going to automate a task that saves one person five minutes every year, then the cost of creating, testing, implementing and maintaining that task automation is going to exceed the business benefit by a large amount. At $50 an hour going rate for labor, five minutes of savings for one person is about $4. Take that out three years and you're at $12 of savings. Contrast that with the cost of implementing the automation - for arguments sake, let's wrap up requirements gathering, code and testing into three hours, which equates to $150 to deliver that task automation. Then let's use a five percent overhead on maintenance per year, taken out three years for $22.50, for a grand total of $172.50 of development and maintenance costs over 3 years, versus $36 of savings.

To put it in different terms, that's a four-close-to-five year ROI.

Savings to automation = $36.00
Cost to implement & maintain = $172.50

Now, this brings me to the most important lesson of all: automation delivers benefits that vary with the number of people who use that task.

If I have one person benefiting, then it's a five year ROI.
If I have two people benefiting, then it's a two and one third year ROI.
If I have ten people benefiting, then it's a negative four tenths year ROI.

That's an easy concept to grasp, but what's more important is that it refocuses the discussion into Why rather than How. Automating the task is trivial compared to measuring the benefit of automating the task to your business, and yet the value is not in the automation of the task, it's in making the people who execute that automation more efficient.

In short, want to do yourself an enormous favor? The next time you're looking for technology to solve your problem, ask yourself that question:

What business value will I derive by implementing this technology?

Musings on Technology