Automate everything aggressively
In any project, you do two kinds of work: you do the work, and you do the work that gets in the way of the work. The first kind is building the product, solving the hard problems, creating something new. The second is all the grunt work: deploying, testing, configuring, and pushing bits around. Most people treat this second kind of work as a necessary evil, a tax on the real work.
I’ve come to believe the opposite. I think this "work about the work" is the most important thing to focus on. Because aggressively, almost obsessively, automating it is the key to unlocking your ability to do the real work.
The Tax on Focus
This realization hits you after you've been through the cycle a few times. A new project begins, and with it comes a small, excusable amount of manual work. To get the first version live, you SSH into a server, run docker build
, push the image, and restart the service. It feels fast and simple. But the pattern is always the same. That one-time task becomes a weekly, then daily routine. Each repetition is a tiny drain on your focus, a small crack in your concentration.
Your first instinct, as an engineer, is to find the most direct solution to the immediate pain. The pain is repetitive typing, so you write a script. It's a natural and correct reflex, but naively applied. You've automated the action, but not the process.
This is a subtle but critical distinction. Automating the action gives you a script. Automating the process gives you a system. A script requires you to remember to run it, to manage it, to fix it. A system runs itself. The naive reflex leads to a folder full of tools you have to operate manually.
You eventually learn that the direct solution is rarely the right one. The right question to ask is not "How can I make this task faster?" but "How can I make this task disappear entirely?" In my case, this meant creating a single CI/CD pipeline that took over everything from the moment I typed git push
.
This is a more mature understanding of automation. It frees me from the need to even think about the problem, which I'm sure, is the ultimate goal of all automation.
The Price of a Manual Failure
My own experience was about protecting focus, but at a larger scale, this principle is about survival. The cost of a manual process failing isn't just a loss of concentration; it can be total ruin. There is no greater example of this than the story of Knight Capital.
On August 1, 2012, Knight Capital, a massive financial services firm, prepared to deploy new trading software. A technician was tasked with manually deploying the code to eight different servers. He made a mistake on one of them. For 45 minutes, that one server with faulty code executed millions of high-speed, erroneous trades. By the time they pulled the plug, the firm had lost $440 million. They were bankrupt by the end of the day.
The failure at Knight Capital was the failure of a manual execution process. The company's survival depended on a human performing a repetitive task perfectly every single time, under pressure. The risk was always there, hidden in the process itself, waiting for one small error to unleash chaos.
Automating Discovery
So what is the opposite of this? The opposite of passively hoping a manual process doesn't fail is to aggressively and systematically hunt for failure before it matters. The opposite of being destroyed by hidden risk is to drag that risk out into the open. This brings us to Netflix.
When Netflix moved to the cloud, they knew that their new environment was inherently chaotic. Servers could fail at any time. The old way of handling this would be another manual process: a tired engineer gets paged at 3 AM and frantically tries to fix a problem they didn't expect. This is the same pattern as Knight Capital, just a different scenario.
Instead of accepting this, Netflix chose to automate the process of finding failure itself. They built a tool called Chaos Monkey. It flies through their live, production environment and randomly shuts down servers. It replaces the unpredictable 3 AM failure with a predictable, constant state of chaos during business hours.
The purpose was to make failure normal. It forced engineers to build systems that could survive it automatically, without any human intervention. They didn't just automate a chore; they automated away an entire category of fear. While Knight Capital was destroyed by a risk they ignored, Netflix thrives by automating the very discovery of risk. One company was a victim of fragility; the other manufactures resilience.
The Real Work
This leads to a different way of thinking about laziness. The common view is that a lazy developer cuts corners. But the truly lazy developer is the one who tolerates the drag of manual work because they tell themselves they do not have time to automate it.
That is a false economy. It is like saying you do not have time to sharpen an axe because you are too busy chopping down a tree. You must invest the effort upfront to build a system that saves you from thousands of future actions and countless small worries. You must build a machine to do the work for you. A good machine runs itself. And that is what gives you the freedom to do your best work.