The Day the Core Goes Down: What Every ISP Should Have Ready
The Core Is the Heart of the ISP
The core is not just “a pair of routers.” It’s the point where everything converges:
- the access network
- the transport links
- critical services
- the Internet exit
- authentication and management systems
When the core fails, everything fails.
And if there’s no plan, the outage quickly turns into chaos.
What Usually Happens When There’s No Preparation
When the core goes down without a prior plan, the scenario is almost always the same:
- nobody knows exactly what failed
- changes are tried blindly
- configurations are touched in production
- there’s no updated documentation
- customers call before the NOC understands what’s happening
Time passes, pressure increases, and every bad decision worsens the impact.
The Right Question Is Not “If,” But “When”
Many ISPs design their network thinking about normal operation, but not about the anomalous day.
The right question is not:
“Can the core fail?”
The real question is:
“What happens when it fails?”
And the answer should be written, tested, and known by the team.
What Every ISP Should Have Ready
1. Real Redundancy, Not Theoretical
It’s not enough to have two devices if:
- they’re in the same rack
- they depend on the same switch
- they use the same power
- they have the same configuration without validation
Redundancy must be electrical, physical, and logical, and it must be tested.
2. Clear and Updated Documentation
In an emergency, there’s no time to “see later.”
Documentation must exist that quickly answers:
- real core topology
- role of each device
- critical dependencies
- failover paths
- emergency access
If documentation only lives in someone’s head, it’s not documentation.
3. Backups That Work (and Are Tested)
It’s not enough to “have backups.”
You need to know:
- where they are
- what date they’re from
- how they’re restored
- how long it takes to be operational again
A backup that was never tested is just an illusion of security.
4. Emergency Procedures
In a serious outage, the team needs clear answers:
- who makes decisions
- what to touch and what not to
- in what order to act
- when to escalate
- when to communicate
Procedures reduce errors and lower stress in critical moments.
5. Useful Monitoring and Alerts
Monitoring shouldn’t alert when the customer already complained.
It should:
- detect degradations
- anticipate failures
- show real impact
- allow prioritization
Poorly designed alerts generate noise and delay reaction.
The Worst Time to Think Is During the Outage
Many ISPs start designing their plan when the core is already down.
That’s the worst possible time.
Preparation is done cold, with time and technical judgment.
Execution is done hot, following what was planned.
Going Down Is Not the Problem
All cores go down sometime.
The problem is:
- not knowing how to come back
- taking longer than necessary
- learning the lesson too late
A mature ISP is not measured by whether it fails or not, but by how it responds when it happens.
Being Prepared Is a Strategic Decision
Investing in preparation is not a technical expense, it’s a business decision:
- less downtime
- fewer lost customers
- less operational stress
- more internal confidence
At Ayuda.LA we help ISPs prepare before the critical day, not after.
If today your core works but you’re not sure what would happen if it fails, that day has already started counting.
Let’s talk before.