Musings on Technology

Sunday, September 30, 2018

research project - composite metrics for patching

I wrote a blog post over on Linkedin about reworking metrics, and that got me down a path of research that I'm going to sketch out here.

1) Ansible playbook to auto-generate uptime and last patch status.

2) Greenbone scan data for CVE

some kind of mashing of those two datasets together.

Publish results.

Friday, September 14, 2018

Clocks

Linkedin article I wrote.

Poor articulation of clock metaphor for analysis of business processes with an eye to focusing autoamtion efforts.

Wednesday, May 23, 2018

Tidbit for the day

I saw this on Reddit the other day as I was keeping myself updated:

If I ever look busy, it's because you're having a bad day.

This was in response to someone's manager making the comment that they were "slacking off" because they didn't look busy.

That's the thing.

Operations should be visible and appreciated for not having a bad day, as much as the 'hero' culture disproportionately rewards those who are most visible during periods of high stress.

I propose a variant of "This workplace has gone days without an accident" to "This team has gone { hours || days } without a major incident" as a measure of how well things are run.

Tuesday, May 22, 2018

Thought for the day

"Technology lifespans are now measured in years, not careers".

Futureproof your career by learning how to learn, not what to learn.

Sunday, April 8, 2018

Pondering

Automation is where it's at.

I do a fair amount of work in automating using existing frameworks, but now is the time to get serious about Ansible and the rest of the world. Network Automation, here I come!

I swear I had something to call this post..

It'll come back to me.

Today, I want to talk about containers. You know, actually, I don't: I want to get back to the topic of Why vs How.

At the office this week, there was quite the blow up about Containers. Sure, it was dressed up as 'evolving our infrastructure", but in my humble opinion, it all got very sideways into a vendor discussion with way more alacrity than I'm accustomed to, even though the company I work for is terribly adept at focusing on technology.

The real sin here is that the discussion forgot one really important thing: the technology doesn't exist for no reason. The technology doesn't exist in and of itself, the technology exists to solve a problem that you have, not to make one.

This really is the recurring theme of my life, and I'm really getting tired of having the same arguments over and over. So, in the spirit on having this argument over and over:

Containers are a how. They are not a why.

Determining the Why that leads you to implement containers is vital.

If you don't appreciate the Why that gets you to containers, you're basically magnifying your underlying problems by 100x when you finally have a container farm full of images that you cant' manage. And if you think troubleshooting problems in today's infrastructure is a headache, wait until it's all containerized and therefore opaque to all the tools and techniques you have half-implemented now.

Without appreciating that the reason you have friction in deployments is a COMBINATION of problems that containers PARTIALLY address, you're flying blind.

Don't fly blind. Mountains don't move.
Don't fly only by radar. Mountains can hide behind systems failures.
Don't fly by sight alone. Humans are fallible.

The only safe thing to do is fly with radar, humans, and in daylight, with a plan that reflects the ground you're flying over. Anything else is a recipe for a mountainside perishing.

Events vs incidents

There was quite the discussion last week in the office, and to be honest, it caused a bit of a row. See, there's three camps of people in this world:

1) people who believe you should fix something as early as possibly - the Event People
2) people who believe you should fix something when there's an incident - the Incident People
3) people who want to know what an event or incident is - the Blissfully Unaware People

So the argument, if you can call it that, is around when should you try and fix something? Is it when the event is triggered and caught, or when the incident is created?

I'm firmly of the belief that the incident is the right time, but it took a bit of yelling to get that through. Here's my rationale:

1) an event exists only for a moment in time, and if you try and fix things from an event, you may be fixing something that doesn't exist any more;
2) an event has no record of it ever occuring;
3) once you've tried to fix it, you may well cause a different or allied problem to occur;
4) keeping state on events is Hard;
5) If you have to wait for the incident to be created to dispatch it to a fix agent, you're going to wait a longer amount of time.
6)The incident is the appropriate place to document the unfolding of the fix.
7) An incident gives you a longer-term record that can be used for trending and analysis.

[I wrote the above way back in 2016, everything from this point forward is new]

Even after 2 years of thinking and analysis and real-life view, I still take the same view. The point isn't really whether it's incident or event: the point is that there needs to be a way to track, over time, what happens in your infrastructure so taht you can make it better - application or otherwise.

Does any of this change in a [insert term du jour of today here] world? I don't believe so. I think the only thing that changes is whether a human looks at the incident or not, or whether an AI does.