March 22, 2026 · 6 min

Bootstrapping Network Gear You've Never Touched Before

Patrick McClory

The on-site session was two and a half hours. Rack, cable, verify, leave. That wasn't luck. The bootstrap happened weeks earlier at a desk, not in the rack.

networking infrastructure automation operations

Bootstrapping Network Gear You’ve Never Touched Before

The on-site session took two and a half hours. Three switches, fifty-odd connections, rack it, cable it, power it up, verify admin access, leave. Clean. No drama.

That wasn’t luck. And it wasn’t because the hardware was familiar.

The bootstrap happened weeks earlier, at a desk, before I’d touched any of the physical equipment. By the time I walked into that datacenter the hard work was done. The on-site session wasn’t the bootstrap. It was the verification.

Buy to the Design, Not to the Minimum

The hardware selection process started with the network design, not the other way around. The topology was already defined, management plane separation, storage network isolation, cluster traffic, public ingress, before a single purchase order was placed. The hardware had to fit the design, not the other way around.

That produced some decisions that look overkill on paper. The admin switching layer needed its own physical path because that’s how real production environments model it, and modeling it correctly matters more than optimizing port count. The right switch for that role happened to be an Arista DCS-7050TX, which ships with 96 RJ45 ports and 40 QSFP ports. For ten machines. The extra ports are a bonus. The capability and remote configurability are why it’s there.

The routing role went to a MikroTik on a Dell R630. The decision criteria were the same: capability against the design requirements, clean remote management, automation-friendly API. The refurb enterprise market made both accessible at a fraction of original cost. The decisions were made against an abstracted design that existed entirely in config files and diagrams at the time.

This is how you buy network hardware for a platform you’re serious about. You define the roles first. You define what automation-first means for each of them. Then you find the hardware that fits. The alternative, buying to the minimum viable physical setup and then designing around what you have, produces environments that accumulate constraints you didn’t choose and can’t easily escape.

The Vendor Translation Problem

I’d worked with Junos and Cisco before. EOS was simple to pick up. The mental model maps cleanly from either, the CLI is sensible, and the Ansible arista.eos collection is well-documented. At some point core network capabilities are core capabilities across vendors. Routing is routing. VLANs are VLANs. What changes is how a specific platform expresses those concepts and how you automate against it.

MikroTik is a different animal. RouterOS has its own way of thinking about interface relationships, routing tables, and firewall rules. The shell is funky. Designed for a different kind of operator than I am, with colors and session behavior that doesn’t behave the way you’d expect coming from a Linux or EOS background. You can maintain a connection and keep a session while getting logged out. It’s a choice someone made and I’m still not sure why.

In practice, none of this mattered much. Ansible kept me away from the shell for most of the work. The community.routeros collection uses api_modify tasks that express desired state declaratively. I cared about network routes, port relationships, and the structure of the thing. I didn’t need to care about the CLI quirks as long as the automation was right. Get it right once, make it parameter-driven, and lather rinse repeat.

That’s the practical payoff of automation-first as a design principle. It’s not just about consistency and repeatability. It’s also about not having to deeply care about every vendor’s idiosyncratic interface choices. The shell becomes an implementation detail you rarely have to touch.

The Bootstrap Happened at a Desk

The weeks before the on-site session were the real work. Translating the abstract network design into vendor-specific Ansible roles. Understanding how RouterOS structures its data model and where it differs from what you’d expect coming from EOS. Getting the community.routeros api_modify tasks right. Parameterizing the interface bindings so physical variance could be handled at the variable layer rather than requiring config changes throughout.

The interface-not-where-the-map-said problem, documented in [that post], was solved during this phase, not on-site. The design assumed an interface named WAN. The hardware had ether8. The fix was a one-line variable change in the right place because the right place had been built to accept that change. That’s what weeks of desk work buys you.

By the time I drove to the datacenter, the config was written. The automation was tested. The definition of done was defined. The on-site session had one job: close the gap between the abstraction and the physical reality.

What Two and a Half Hours Actually Means

Rack, cable, inventory what’s connected where, power everything up, verify admin access. That’s most of it. The last half hour was port mapping cleanup and coordinating with the provider in chat to get public IPMI access configured for the router and verify IPv6 on the public range.

The temptation at the end was to leave early. The IPMI coordination was taking time. The provider was catching up on their end. I could have left, handled it remotely, and called it done enough. I didn’t.

The definition of minimum viable initial config was specific: all network devices up and available, VLAN 1 only, SSH to each device confirmed working, IPMI solid on the R630 router. Those were the criteria. Not a vague sense of “done enough”. a named, testable list that someone else could verify independently.

The temptation at the end was to leave before the IPMI coordination was complete. The provider was catching up on their end. I could have handled it remotely, called it done, and been back on the road earlier. I didn’t. Not because leaving would have been wrong technically. Because the criteria said IPMI solid, and it wasn’t solid yet.

That discipline is easy to describe and harder to maintain than it sounds. But I want to be clear about what the discipline is actually for. Because it isn’t heroism and it isn’t about toughing things out.

Heroics are a signal. When an on-site session requires improvisation, extended presence, or exceptional effort to recover from something unexpected, that’s information about what went wrong upstream. Maybe the preparation was insufficient. Maybe the design had assumptions that didn’t survive contact with reality. Maybe the definition of done was fuzzy enough that the end state was negotiable.

I’m capable of heroics. Most experienced engineers are. But I build and design and operate specifically to minimize the situations that require them. The goal isn’t to be impressive under pressure. The goal is to make pressure unnecessary. An unremarkable two-and-a-half-hour session is a better outcome than a dramatic six-hour recovery, even if the six-hour recovery makes a better story.

The specific criteria exist for exactly this reason. When the standard is named and testable, there’s no negotiation about whether you’re done. The IPMI wasn’t solid. I stayed. When it was solid, I left. No heroics required.

What “Never Touched Before” Actually Means

I hadn’t run RouterOS before this build. I hadn’t operated this specific Arista model. The on-site session was the first time I’d physically touched any of this hardware.

None of that was a problem, for reasons that are worth naming clearly.

Core network capabilities are core capabilities. The concepts transfer across vendors. What doesn’t transfer is the specific expression of those concepts in a given platform’s data model and CLI. That’s a translation problem, not a knowledge problem. Spending time with the documentation, building the automation against the API, and testing it before you’re on-site closes most of the gap.

The rest gets closed by the design being right before the hardware arrives. If the abstraction layer is built correctly, patterns stable above the seam and physical specifics contained below it, then hardware variance and vendor idiosyncrasy are pipe fitting problems, not design problems. The pattern holds. You’re just finding where the interfaces actually live and wiring them up.

Two and a half hours. Rack, cable, verify, leave. That’s what it looks like when the bootstrap happens before the on-site session instead of during it.

Boring installs. Boring deploys. Boring software updates. That’s the goal. If the on-site session is a story worth telling, something probably went wrong upstream.

← back to all writing