Time to leave spanning tree: MPLS and VXLAN overlay networks for small ISPs
There is a conversation that keeps repeating in network teams at small and medium ISPs in Latin America: the NOC detects a Layer 2 loop in the distribution backbone —one Spanning Tree should have prevented but did not contain— someone has to manually trace the port that caused the problem, and meanwhile a slice of subscribers is without service. The incident closes, a note goes on the wiki, and everyone knows it will happen again.
This pattern is not a configuration problem. It is an architecture problem. And it has a fix.
The root problem: switched networks at ISP scale
Switched networks with VLANs and Spanning Tree Protocol (STP/RSTP/MSTP) were designed for a specific context: campus LANs where Layer 2 must reach end-user devices and broadcast domains are relatively small.
When an ISP grows and uses the same architecture to interconnect distribution nodes, remote sites, or to transport customer services, problems emerge predictably:
Spanning Tree does not scale well with physical topology. In a network with multiple fiber rings and active/standby redundancy, STP blocks ports to break loops. That means part of installed capacity sits idle. Convergence on topology change —even with RSTP— means outages measured in seconds or tens of seconds.
The broadcast domain is a blast radius. A broadcast storm in one segment affects every device in the same VLAN domain. In an ISP network where that domain spans multiple nodes, a single misbehaving device can degrade service for hundreds of subscribers.
VLAN-based service separation has a ceiling. The 802.1Q VLAN ID space is 4094 tags. For a growing ISP that needs to isolate services, enterprise customers, and management segments, that ceiling arrives sooner than it seems.
Operations are manual and fragile. Each new VLAN is configured manually on every switch along the path. A configuration mistake at any point breaks connectivity for that VLAN segment. Without automation, error probability grows with network size.
The alternative: move services to the overlay
The right architecture for an ISP —regardless of size— is to separate the transport plane from the service plane:
- The underlay is a simple IP network: each node has a loopback IP, nodes interconnect with point-to-point links over IP, and a routing protocol (OSPF, IS-IS, or eBGP) distributes reachability to all loopbacks.
- The overlay is the services that run on top of that underlay: customer L2 circuits, Layer 3 VPNs, internal service segments. Those services use tunnels over IP —MPLS, VXLAN, or both— instead of depending on Layer 2 in the underlay.
The result is a network where:
- There is no spanning tree in the backbone
- Redundancy is active/active with ECMP (Equal-Cost Multipath)
- A loop in the underlay has the same impact as any routing failure —convergence in seconds with IGP fast reroute, without broadcast storms
- Customer services are fully isolated from each other in the overlay
- Adding a new service or customer does not require touching transport switch configuration
MPLS: the proven option for ISPs
MPLS (Multiprotocol Label Switching) is the most mature overlay technology for provider networks. It transports L2 services (pseudowires, VPLS) and L3 services (L3VPN) over an IP underlay with efficiency and scale proven in large carrier networks for decades.
For a small ISP, MPLS enables two immediate use cases:
VPLS or EVPN-VPLS for customer L2 transport: instead of extending a VLAN across multiple physical switches, customer traffic enters at the ISP edge and is encapsulated in an MPLS pseudowire to the other end of the service. The backbone sees only MPLS labels, not customer traffic.
L3VPN for enterprise connectivity services: each enterprise customer has their own routing plane on the ISP, isolated from other customers and from the ISP infrastructure network. The PE (Provider Edge) keeps a per-customer routing table (VRF), and traffic is forwarded between PEs with MPLS.
MPLS on Huawei VRP
Huawei CE (data center), NE (core routing), and AR (enterprise access) families support MPLS with a full stack. On Huawei VRP, enabling MPLS on the underlay requires:
mpls lsr-id <loopback-ip>
mpls
mpls ldp
And on each underlay interface:
interface GigabitEthernet1/0/0
mpls
mpls ldp
With LDP running over OSPF or OSPFv3, devices automatically discover MPLS paths among all LSRs on the network. The same stack then supports L2 services via Martini (pseudowires) or EVPN.
MPLS on MikroTik RouterOS
MikroTik has supported MPLS since RouterOS 2.9 and it is available on all RouterOS devices, including CCR (Cloud Core Router) units widely used at ISPs in the region. Support includes LDP, VPLS (with LDP signaling), and basic pseudowires.
Configuring LDP in RouterOS is straightforward from Winbox or the terminal:
/mpls ldp
set enabled=yes lsr-id=<loopback-ip> transport-address=<loopback-ip>
/mpls ldp interface
add interface=ether1
add interface=ether2
With OSPF on the underlay and LDP enabled on backbone interfaces, MikroTik routers establish LDP sessions automatically and can carry VPLS between sites.
Important MikroTik limitation: MPLS support in RouterOS is functional for basic use cases (VPLS, LDP pseudowires), but the stack does not include RSVP-TE (traffic engineering) or EVPN over MPLS in current versions. For ISPs that only need customer L2 transport or simple VPNs, this is usually enough.
VXLAN: the modern overlay for multi-vendor environments
VXLAN (Virtual Extensible LAN) uses UDP as transport instead of MPLS and was originally designed for data centers, but its adoption in ISP access and distribution networks grew strongly in recent years. Advantages over MPLS for small ISP networks:
No separate control plane for the underlay. VXLAN runs over plain IP with UDP 4789. It does not require LDP or RSVP. The underlay only needs IP connectivity between VTEPs (VXLAN Tunnel Endpoints).
Segment scale. The VNI (VXLAN Network Identifier) is 24 bits — more than 16 million network identifiers, versus 4094 VLANs.
Integration with EVPN for a distributed control plane. EVPN over VXLAN (RFC 7432 + RFC 8365) uses BGP to distribute MAC/IP reachability among VTEPs, removing the need for BUM (Broadcast, Unknown unicast, Multicast) flooding in the overlay at scale. This is key for scaling VXLAN in medium networks.
VXLAN on Huawei
Huawei CE6800, CE8800, and NE series devices support VXLAN with EVPN (some features may require specific licenses).
Part of a base VTEP configuration might look like:
interface Nve1
source <loopback-ip>
vni <id> head-end peer-list protocol bgp
Combined with BGP EVPN for the control plane, Huawei implements a VXLAN fabric with automatic MAC distribution and ARP suppression.
VXLAN on MikroTik
In v7, MikroTik added VXLAN with controlled flooding and data-plane improvements.
Current limitation: EVPN over VXLAN (BGP EVPN as control plane) has partial support in RouterOS 7.x. For full multi-vendor EVPN production deployments, MikroTik works better as a VTEP in topologies where another device acts as the EVPN route reflector (for example Huawei gear).
For ISPs with only MikroTik, VXLAN with static flooding (explicit peer list) is a workable alternative for small networks:
/interface vxlan
add name=vxlan10 vni=10 port=4789
/interface vxlan vteps
add interface=vxlan10 remote-ip=<vtep-remoto>
Migration path: how to change without cutting service
Migrating from a switched network to an overlay does not require a big-bang cutover. The recommended approach for small ISPs:
Phase 1: Build the IP underlay (without touching services)
Enable OSPF or IS-IS on backbone interfaces, assign a loopback to each node, and verify IP connectivity works among all nodes. At this point the Layer 2 network is unchanged — the IP underlay runs in parallel.
Phase 2: Bring up the overlay for new services
The first services on the overlay are new ones: new L2 customers, new VPNs. Configure VPLS or VXLAN for these from day one. Existing services stay on VLANs.
Phase 3: Migrate existing services one by one
With the overlay running and validated, existing services move from VLANs to overlay in maintenance windows. Each migration is atomic: configure the service on the overlay, verify, then remove the corresponding VLAN.
Phase 4: Remove spanning tree from the backbone
Once all services are on overlay, the backbone no longer needs spanning tree. STP can be disabled on backbone interfaces and distribution switches become overlay routers.
When this architecture makes sense
This migration is worth it when:
- The ISP has more than 2–3 distribution nodes interconnected
- There are recurring incidents related to STP or Layer 2 loops
- The network must scale customer services without proportionally increasing operational complexity
- You are considering adding enterprise services (MPLS L3VPN) to the portfolio
It is not worth it if the ISP has a single central node without redundant physical topology. In that case, a flat VLAN design is simpler to run.
How we help from Ayuda.LA
At Ayuda.LA we work with ISPs in Latin America —including operators with MikroTik and Huawei gear— on overlay architecture design and implementation:
- Current architecture audit: we identify operational risks in the existing network and improvement opportunities with overlay.
- Underlay and overlay design: we define numbering plans, routing topology, MPLS or VXLAN service model, and redundancy mechanisms.
- Migration without service outage: we design phased migration, service by service, with validation at each step.
- NOC team training: the team learns to operate an overlay network —how to diagnose, add services, scale— before the system is in production.
If your network has spanning tree in the backbone and you want to evaluate the alternative, the first step is an architecture audit.
Let’s talk about your network →
Want to know if your network is ready for an overlay? Write to us at [email protected] — we answer every message.