Managing 600 devices with a small team: standardization lessons for ISPs
Four engineers. Six hundred twenty network devices. Three provinces. A NOC that runs 24/7 with rotating shifts.
This is not hypothetical. It is the operational profile of several regional ISPs in Latin America we work with at Ayuda.LA. And the question we hear most when we describe this scenario is the same: how is that possible with so few people?
The answer is not that those engineers work 80-hour weeks or are extraordinarily capable. The answer is that their network is standardized in a way most comparable-size networks are not.
The scale problem nobody names
When an ISP has 50 devices, heterogeneity is manageable. An engineer knows from memory each router’s firmware version, why that distribution switch in the north zone has a different configuration from the others, and the manual procedure to provision a new customer on each device type.
When that network grows to 300 or 600 devices, that heterogeneity becomes a serious operational problem:
Every device becomes a special case. No two devices are configured exactly alike. Every change requires reviewing individual configuration before running anything. Knowledge lives in people’s heads, not in systems.
Incident resolution scales poorly. An engineer resolving an incident on a known device may finish in 15 minutes. The same incident on a non-standard device can take two hours, because diagnosis first requires understanding how that specific box is configured.
Automation is nearly impossible. Writing a script that works across 600 devices with 40 configuration variants is a project in itself, more complex than managing devices by hand.
Standardization is not an optional improvement project. It is the precondition for operating at scale with a small team.
The five levels of standardization
Level 1: Vendor and model standardization
The first boundary a growing ISP should draw is how many vendors and hardware models it will support at once. Each additional vendor multiplies operational load:
- A separate skill set for certifications and support
- Different management tools (or adaptations so generic tools work with that vendor)
- Spare parts stock per model
- Separate firmware upgrade procedures
- A learning curve for every new team member
Practical recommendation: define an architecture of “chosen vendors” (sometimes called an approved hardware list) with no more than two vendors per network role (core, distribution, access, CPE). Anything outside that list is managed as an exception with a planned end-of-life date.
This does not mean never having multiple vendors — in legacy networks, that is inevitable. It means an active policy to reduce variety, not accumulate it.
Level 2: Software version standardization
Within each vendor and model, the number of simultaneous firmware/OS versions in production should be minimal. The goal is all devices of the same model on the same version (or one minor patch apart).
Why it matters: firmware bugs that cause odd behavior are common on network gear. With eight different firmware versions in production, diagnosing whether a problem is configuration or firmware means checking whether the bug exists in that specific version. With one or two active versions, that diagnosis is immediate.
How to implement it: run a periodic “version standardization” process (semiannual or annual) that sets the target version per platform and plans upgrade rollout. The process must include lab testing before production and a per-device rollback plan.
Level 3: Configuration standardization (golden config)
Each device type has a base configuration (golden config) that contains all organizational policies: enabled routing protocols, authentication policies, logging, QoS parameters, management lines.
Each device’s specific configuration (name, IPs, connected interfaces) is generated from that base template plus that device’s variable parameters.
Immediate operational benefit: when a deviation from standard configuration is found (undocumented manual change, incident, accumulated drift), the team has a clear reference to compare against. A compliance validation tool does not need to “understand” the network — it only needs to compare.
Tools: Ansible with Jinja2 templates is the standard for generating configuration from templates. Nornir with a compliance module can detect drift automatically. Git is enough to store versioned configurations.
Level 4: Naming and numbering standardization
How devices, interfaces, prefixes, and circuit IDs are named directly affects diagnosis speed during incidents.
A well-designed naming scheme lets an engineer who has never seen a device infer from its name: which city it is in, its role (core/distribution/access), which POPs it connects to, and which high-availability cluster it belongs to.
Example scheme:
[ciudad]-[función]-[vendor]-[número]
bzb-core-hw-01 → Bariloche, core, Huawei, router 1
bzb-dist-ck-01 → Bariloche, distribución, Calix, switch 1
bzb-acc-tp-47 → Bariloche, acceso, TP-Link, OLT 47
The scheme must be documented, exhaustive, and applied to all new gear from the day it is defined. Legacy devices with inconsistent names migrate in cycles.
Level 5: Operational procedure standardization
The most ignored level: procedures for frequent operational tasks must be written, versioned, and available to the whole team.
This includes:
- Provisioning new customers (step by step, by service type)
- Firmware updates (by vendor and model)
- Response to common incidents (BGP session down, interface down, upstream loss)
- Hardware replacement procedure
When procedures are documented, a junior engineer can run complex operations with minimal supervision. Critical knowledge stops being concentrated in one or two people.
The tool that makes it all work: NetBox as source of truth
None of the five standardization levels works without a centralized source of truth. NetBox is today the de facto standard in modern ISPs for that role.
NetBox stores:
- Device inventory (vendor, model, firmware version, location)
- IP numbering (prefixes, assigned IPs, management IPs)
- Network topology (device connections, circuits, providers)
- Provisioning data (configurations, active services per device)
With NetBox well populated, automation scripts consume inventory from the API instead of static files. When a device is replaced or an IP changes, updating NetBox propagates automatically to every system that consumes that data.
Critical requirement: NetBox must be the source of truth, not a catalog someone updates when they remember. That takes discipline: no device joins the network without being in NetBox first, and no IP change is applied without updating NetBox first.
The outcome: scale with a small team
An ISP that runs with these five standardization levels in place has qualitatively different operational capabilities:
Faster incident diagnosis. An engineer can understand any device’s context in seconds, without prior access to that specific box.
New engineer onboarding in days, not months. A new team member can start handling incidents after a week of training on documented procedures, without years of accumulated experience on that specific network.
Effective automation. With standardized configurations and inventory in NetBox, building an agent that operates on 600 devices is weeks of work, not months.
Operational predictability. Maintenance changes have predictable outcomes because devices are in known states. Incidents have predictable durations because resolution procedures are known.
Scale is not solved with more people — it is solved with less variety.
Where to start
If your network has more variety than your team can comfortably handle, standardization starts with two actions:
-
Real inventory audit. Before standardizing, you need to know exactly what you have: how many vendors, models, firmware versions, distinct naming schemes. This audit usually shows the situation is more complex than believed.
-
Target state definition. Which vendors, models, versions, and naming scheme you want in 18 months. Without a clear target, standardization becomes a series of ad-hoc decisions that do not converge.
At Ayuda.LA we run these audits and define standardization plans for ISPs in Latin America. The deliverable is not a document — it is an executable roadmap with measurable impact on team load.
Let’s talk about your network →
Standardization is not a constraint on operational flexibility. It is what makes scale possible.