Sunday, May 2, 2021

I've written in the past somewhat opaquely about certain programming languages and my complaints about them. One that I'm not afraid to complain about by name is Python. You can look in enough of my old posts to see this pattern keeps coming up. It never fails to make my life more interesting than it has to be.

So, with that said, one of the things I decided we needed at $COMPANY was something that would let us handle SEVs (you know, outages, site events, whatever?) well. What they had already when I arrived was, to put it mildly, cute. It was basically a wrapper around the Jira category they already had to track these things, plus it would blast out mails to extra places when someone commented in the tool. Unfortunately, those mails also tended to start full-on reply-to-all spam fests due to their scattershot nature. *Every person* was getting *every update* to *every SEV*.

This was getting people at the company to see the fact that something new had in fact broken, but only for about half a second before they spam-canned the whole thread, and then never saw the follow-ups as a result. It was annoying lots of people, and it was failing as a tool for actually getting a handle on what's broken now, what's broken lately, and what is going to happen about these things overall.

People agreed that it needed a per-user subscription model, and had already talked about moving it to a Postgres type backend where it would know about individual people at the company as user accounts, and they could subscribe or unsubscribe to updates for a SEV, and all of this good stuff. Some of the foundation work had even been laid in there, but there was a catch: the main person who had been doing this work had been doing it as a side amusement, it was not his nominally assigned project, and (worse still) his work on this thing was annoying his boss.

There was no way this would ever get a chance to blossom.

In talking with this person, he graciously agreed to step back and let me give it a shot. It was a case of truly egoless programming and supporting the case of the greater good, and I just wish I got to see it more in the world.

That left me with a bunch of Python running gunicorn, gevent, and Flask, running in VMs on some cloud vendor. It was basically everything I would never do for myself.

Did I turn it off? No.

Did I rewrite it in C++? Nope.

Did I go get HHVM and try to jam FB's infra into this company so I could use Hack and stuff like that because I had seen that before? Nuh uh.

So then, what did I do? I jumped in there and wrote more Python. That's right, I built the database schema, got a test db running locally, then started coding to it. There were SEVs, and SEVs had comments. Comments had authors and bodies and times and source IP addresses and all of this stuff. SEVs themselves had creation times and creators and owners and more. Then were tags, and tags were M:N with SEVs. Then we had to have users as the authors and creators and owners, and so on and so forth.

At some point during this work, the nascent "prod tools" team hired its first two software engineers and they joined in, and that stopped it from being just a wacky project driven by a single crazy person with a dream (me). Now it was really starting to shape up, and it was clear we were on to something.

All of this came together, and we "shipped" - we turned on the new Postgres-backed code for all to use, and left the old Jira-backed stuff in place just to make sure it'd hold up. Then we let it just run for a couple of days. When it was clear that nobody wanted to go back to the old way, we changed it to just use a HTTP redirect to fling you over to Jira itself when you tried to load one of the pre-Postgres SEVs.

Being one of those sticklers about the web and who believes "cool URLs don't change", I insisted that we not break any old URLs in case anyone had bookmarked the_service/whatever/FOO-1234. We would honor that forever, redirecting to Jira so they could see their FOO-1234 ticket and not lose access to it just because we decided we wanted a better backend.

With that done, we deleted every other part of the Jira-related code (connecting, fetching, rendering to a page, etc.) in our side of things and that was that.

The fact that this service might see at most 2000 distinct users over the next two years meant that it didn't really matter what it had been written in. If it could have been done safely with bash scripts and sed expressions to render templates, that probably would have been enough to handle the complete lack of load. The request rate would never be that high, and the things it did were not complicated.

Would I have liked to not write it that way? Well, sure. But people in hell also want ice water, and they aren't gonna get that, either.

I should mention that both of these folks showed up at the company and were dropped directly into this project with no warning, and both delivered. I'm really happy about how that project went, and it was really good to work with them. I hope they are doing well with whatever they are up to now.

Also, I had the pleasure of hosting an intern that summer who created a whole set of things in that tool which made it possible to manage the entire lot of SEVs, book the weekly review meeting, and generally removed a ton of manual labor from my plate. I loved showing off his creation and comparing it to the pile of gunk I had to do to track, choose, schedule, book, invite, and then review everything. It was also an honor working with him, and I know he's also going to go on to do some really amazing things some day.


Random bit of trivia: the biggest initial challenge to the project wasn't the fact that I was going to stop using Jira, or switch to Postgres, or write a bunch more code, or this, or that, or whatever else. Oh no.

The biggest fuss that happened early on was that I *dared* change it from "INCIDENT" (as in the Jira ticket/project prefix, INCIDENT-1234) to "SEV". It was amazing. People came out of the woodwork and used all of their best rationalization techniques to try to explain what was a completely senseless reaction on their part.

I got crap from all kinds of people about this, but the best one was from someone who said "new people to the company won't know what it means". This was said by someone who had been there long enough (4+ years) that they were newer than something like 90+% of the company according to the internal profile tool.

Meanwhile, my own tenure at the company at the time was... about three weeks. I told them that I'd be the judge of what a new person at the company had to figure out, given that I was already up to my neck in people who used terms without explaining them for the audience at that company.

I'm talking about the same place where I had to stop the presenter and say "I'm sorry, what are bookings?" after they had used it two or three times with absolutely no explanation given, and not enough context to derive it from adjacent words.

This might be okay in a meeting with a bunch of senior folks who had been there for years, but this bad use of jargon happened in on-boarding class. So yeah, I think I had a much better idea of what people needed to find out rather than someone who had been around since before most of us had even thought about working there.

The most amazing thing to me is that apparently none of these people had heard of the term "bikeshed" before. They were unintentionally really good at it, which is to say... incredibly irritating!