April 27, 2026 · 8 min

Why Your CI/CD Pipeline Has 47 Steps and Nobody Knows Why

Patrick McClory

The pipeline doesn't have 47 steps because 47 things need to happen. It has 47 steps because trust eroded over time and every erosion event got a new step added on top. The steps aren't doing work. They're doing anxiety.

automation platform-engineering engineering-culture consulting

Why Your CI/CD Pipeline Has 47 Steps and Nobody Knows Why

Nobody designed a 47-step pipeline.

It started with something reasonable: build, test, deploy. Maybe five steps. Maybe eight. Everything in there for a clear reason, owned by someone who could explain it.

Then things happened. An incident in production that could have been caught earlier. A compliance requirement from someone in legal. A new tool that the security team wanted integrated. A flaky test that kept failing so someone added a retry wrapper. A deployment that went wrong so someone added a validation step after it. A validation step that wasn’t trustworthy so someone added another validation step to validate the validation step.

Each addition made sense in the moment. Nobody ever removed anything. Three years later you have 47 steps, two people who understand maybe half of them, and a pipeline that takes 40 minutes to run and breaks in ways that require tribal knowledge to debug.

This is not a technical problem. It’s an organizational one.

The Jenkins Job That Validated Jenkins

I once found a Jenkins job whose entire purpose was to validate that 35 other jobs had succeeded.

Not to do anything. Not to produce an artifact or run a test or deploy anything. Just to confirm that the 35 upstream jobs had passed before allowing the pipeline to proceed.

Jenkins already gates on individual job success. Jenkins already has cumulative workflow stage gating. The built-in tooling handles exactly this case. Someone didn’t trust it. They demonstrated that distrust by writing their own validation layer outside the process. Then they kept the built-in gates too, because removing them also felt risky.

The result was three layers of gating: the individual job success requirements, the workflow stage gates, and the custom validation job. All three had to pass. None of them could be removed because each removal felt like a risk someone wasn’t willing to own. The person who wrote the custom job had left. Nobody knew exactly what it was checking or why the built-in gates weren’t sufficient.

That job ran on every pipeline execution for three years after the person who wrote it left the company. It added time to every deployment. It occasionally failed in ways that required manual intervention to understand. It existed because someone, at some point, didn’t trust the system they were using, and demonstrated it by writing a parallel system, outside the process, that also couldn’t be trusted enough to replace the original.

Not trusting your tooling and proving it by writing your own gates outside the process just shows you’re not using the tooling correctly. The right answer to “I don’t trust Jenkins to gate on job success” is to understand why Jenkins isn’t gating correctly and fix it. Not to add a layer on top.

But here’s the more honest version of what happened: it probably wasn’t a deliberate decision not to trust Jenkins. It was a lack of understanding deep enough to trust it. Understanding your own code feels safer than understanding someone else’s tooling. So the engineer built something they understood, outside the system, because that felt more controllable than learning the system well enough to rely on it.

Then they left. The person who inherited the custom validation job didn’t understand it either. Now there’s no understanding of Jenkins and no understanding of the custom gate , just two black boxes that both have to pass, neither of which anyone can confidently explain or remove.

Lack of understanding breeds lack of trust. Lack of trust breeds more steps. More steps breed more complexity. More complexity breeds less understanding. The pipeline accumulates in a loop, and each turn of the loop makes the next turn more likely.

How Pipelines Accumulate Steps

Every step in a legacy pipeline has an origin story. Most of them follow one of a few patterns.

The incident step. Something went wrong in production. A post-mortem produced action items. One of the action items was “add a check for X before deployment.” The check got added. The underlying problem that made the check necessary either got fixed or got forgotten. The check stayed.

The compliance step. Someone from legal or security showed up with a requirement. The path of least resistance was to add a step that satisfied the requirement visibly, even if the requirement could have been met more elegantly by changing how existing steps worked. The compliance step runs. The compliance team sees it running. Nobody asks whether it’s actually doing the thing it’s supposed to do.

The distrust step. Someone didn’t trust a step that already existed. Instead of fixing the untrusted step, they added a new step to verify it. Now both steps run. Neither can be removed without the other feeling more exposed.

The legacy step. A tool or process that used to be necessary got replaced. The step that invoked the old tool or process didn’t get removed because the replacement was added alongside it rather than in place of it. The legacy step either does nothing useful or invokes something that still runs but produces output nobody reads.

The mystery step. Nobody knows. The person who added it left. The commit message says “fix pipeline.” It runs. It passes. Nobody touches it.

Most pipelines have all five. Some pipelines are majority mystery steps at this point.

The Real Cost

The obvious cost is time. A 40-minute pipeline that could be 12 minutes costs the engineering team real productivity. Every run, every day, across every engineer who triggers it.

The less obvious cost is cognitive load. A pipeline with 47 steps that nobody fully understands is a system that engineers don’t trust and can’t confidently modify. When it breaks, and it will break, debugging it requires tribal knowledge that may not exist anymore. When it needs to change, and it will need to change, every modification feels risky because nobody knows what’s load-bearing.

The least obvious cost is the signal it sends about the organization. A pipeline that nobody understands is a pipeline that nobody owns. A pipeline that nobody owns is a platform that’s failing its users. The engineers who depend on it have learned that the deployment process is something that happens to them, not something they control. That learned helplessness compounds.

The 47-step pipeline isn’t just slow. It’s a symptom of an organization that stopped maintaining its own infrastructure.

The Archaeology

Before you touch anything, you have to understand what you have.

The right approach is pipeline archaeology: going through every step, in order, and answering three questions for each one. What does this step do? Why does it exist? What happens if we remove it?

The answers fall into categories. Some steps are doing real work that needs to happen. Some steps are doing work that used to need to happen and no longer does. Some steps are doing work that should be happening differently. Some steps are doing nothing but have never been verified to do nothing. Some steps are unknowable without more investigation.

The archaeology takes time. It’s not glamorous work. It requires reading old commit messages, talking to people who remember why things were built the way they were, and occasionally just running the pipeline without a step to see what breaks. It’s the kind of work that never makes it onto a roadmap because it doesn’t ship a feature. It just makes everything else faster and safer.

It’s also the work that has to happen before anything can improve. You cannot optimize a pipeline you don’t understand. You cannot safely remove steps you haven’t verified. The archaeology is the prerequisite.

What Good Looks Like

A healthy pipeline is one where every step has a clear owner, a clear purpose, and a clear answer to “what happens if this fails.”

That doesn’t mean short. Some pipelines legitimately need many steps. It means every step is justified and understood by the people who operate the pipeline. The test suite runs because the tests catch real bugs. The security scan runs because it’s calibrated to flag real issues. The deployment validation runs because it’s checking something the deployment step itself can’t verify.

It also means the pipeline evolves. Steps get removed when the conditions that required them change. Tools get upgraded. Flaky tests get fixed rather than wrapped in retries. The compliance requirement that could be met more elegantly gets met more elegantly when someone has time to do it right.

A pipeline that nobody changes is a pipeline that’s accumulating the conditions for the next 47-step problem. The pipeline should be a living system, owned by the people who use it, maintained with the same discipline as the software it ships.

The Honest Question

If you run your pipeline archaeology and discover that you don’t know what half the steps do, the honest question isn’t “how do we clean this up.” It’s “how did we get here.”

The answer is almost always the same: the pipeline grew faster than the team’s ability to maintain it, and maintenance never got prioritized because it didn’t ship features. Every incident added steps. Nothing ever removed them. The people who understood it left. The people who inherited it learned to work around the parts they didn’t understand.

The cleanup is tractable. The conditions that produced the 47-step pipeline are still present. Without addressing those conditions, ownership, maintenance culture, the expectation that pipelines get cleaned up the same way code does, the cleaned-up pipeline will accumulate back to 47 steps within a year or two.

The pipeline is a mirror. What it reflects is how the organization treats its own infrastructure. The 47 steps didn’t appear by accident. They accreted, one at a time, because nobody was watching and nobody was removing. Fix the pipeline. Then fix the conditions that produced it.

← back to all writing