MC-LAG on Huawei routers: high availability at the access layer for ISPs
In most ISP networks in Latin America, the access layer remains the weakest link for availability. An aggregation device with a single uplink to the core, or bonding without chassis redundancy, means a hardware failure can leave hundreds or thousands of subscribers without service until a technician reaches the site.
MC-LAG —Multi-Chassis Link Aggregation Group— is the technology that addresses that problem, and Huawei implements it in a way worth understanding in detail before you design or review your access topology.
What problem does MC-LAG solve?
Conventional LAG (IEEE 802.3ad / LACP) aggregates multiple physical ports on one device into a single logical link. That improves bandwidth and adds some port redundancy, but it does not protect against failure of the entire chassis.
MC-LAG extends that model across two chassis: the downstream device (an access switch, an OLT, a CPE) sees a single logical LAG, but members of that LAG are spread across two separate PE devices. If one PE fails, the other keeps all LAG ports active.
The result: chassis redundancy without Spanning Tree, without 30–50 second STP convergence times, and with an active-active model that uses both devices’ capacity at once.
MC-LAG components on Huawei
Huawei implements MC-LAG using two proprietary technologies:
Eth-Trunk: Huawei’s standard LAG mechanism, equivalent to a Cisco port-channel. It supports three modes:
- Manual: ports are added without a negotiation protocol.
- LACP (802.3ad): standard negotiation, more robust.
- LACP 1:1: one active link and one standby, for clean failover.
E-Trunk: The proprietary component that synchronizes state between the two MC-LAG chassis. E-Trunk runs over UDP and maintains a peer-to-peer control session between the two PEs. Through that session, the two devices synchronize local Eth-Trunk state, BFD sessions, and forwarding decisions.
The link between the two PEs that carries E-Trunk is called the Peer-Link. Sizing it correctly is critical: traffic that would normally go to the remote PE crosses this link when one chassis loses downstream connectivity.
Role model: Master and Backup
In Huawei MC-LAG, one PE is Master and the other is Backup. The Master has priority for control-plane decisions (LACP PDUs, BFD, etc.). The Backup follows the Master while the E-Trunk session is up.
If the Master fails, the Backup takes control automatically. Switchover time depends on timers configured on BFD and E-Trunk, and can reach sub-second in tuned setups.
Basic step-by-step configuration
The following example shows MC-LAG between two Huawei PE routers (PE1 as Master, PE2 as Backup) with a downstream access switch connected to both.
1. Configure the Peer-Link (link between PE1 and PE2)
On both PEs, create a dedicated Eth-Trunk for the Peer-Link:
# PE1 y PE2
interface Eth-Trunk10
description PEER-LINK-MCLAG
mode lacp
trunkport GigabitEthernet 0/0/10
trunkport GigabitEthernet 0/0/11
2. Configure E-Trunk on PE1 (Master)
e-trunk 1
peer-address 10.255.0.2 source-address 10.255.0.1
priority 100
preempt enable
peer-link Eth-Trunk10
3. Configure E-Trunk on PE2 (Backup)
e-trunk 1
peer-address 10.255.0.1 source-address 10.255.0.2
priority 120
peer-link Eth-Trunk10
Note: lower priority = higher Master preference. PE1 with priority 100 wins over PE2 with priority 120.
4. Create the customer-side Eth-Trunk on both PEs
# En PE1 y PE2 (mismo Eth-Trunk ID)
interface Eth-Trunk20
description ACCESO-SWITCH-A
mode lacp
e-trunk 1
5. Bind physical ports to Eth-Trunk on each PE
# En PE1
interface GigabitEthernet 0/0/1
eth-trunk 20
# En PE2
interface GigabitEthernet 0/0/1
eth-trunk 20
The downstream access switch sees a single LAG with LACP, without knowing members are split across two chassis.
Load balancing: which hash to use
By default, Huawei uses a hash based on source/destination MAC. For ISP networks with heavy MPLS or IP traffic, it is more effective to hash on IP fields or MPLS labels:
# Balanceo por IP source/destination (más efectivo en redes IP/MPLS)
interface Eth-Trunk20
load-balance src-dst-ip
For MPLS traffic with multiple flows, hashing on the inner label (MPLS entropy) reduces polarization:
interface Eth-Trunk20
load-balance mpls-label-outer
BFD for fast failure detection
MC-LAG convergence depends heavily on how quickly a failure is detected. Without BFD, LACP can take up to 90 seconds to decide a link lost its peer. With BFD:
# Habilitar BFD sobre el Eth-Trunk del lado cliente
interface Eth-Trunk20
bfd min-tx-interval 300 min-rx-interval 300 detect-multiplier 3
With these values (300 ms interval, multiplier 3), failure detection happens in under one second.
Production considerations
VLAN and service: All services (VPLS, L3VPN, internet access) that traverse the downstream Eth-Trunk must be configured identically on both PEs. E-Trunk does not synchronize service configuration automatically — that is the operator’s responsibility (or the management system’s).
Peer-Link sizing: The Peer-Link must absorb normal traffic from one PE during failover. A practical rule is to size it to 100% of the busiest MC-LAG Eth-Trunk capacity.
Logs and alarms: Configuring SNMP traps for E-Trunk events (Master/Backup role change, peer-link loss) is critical to catch degradation before it becomes an incident.
Software version: E-Trunk behavior varies across VRP (Versatile Routing Platform) versions. Validating version compatibility between PE1 and PE2 before putting MC-LAG in production avoids interoperability surprises.
When MC-LAG makes sense — and when it does not
MC-LAG is the right choice when:
- You have access switches or OLTs homed to a single aggregation device and want to remove that SPOF.
- You are building an active-active topology without spanning tree at the distribution layer.
- Maintenance windows for that access gear have high operational cost (affected customers, night work).
MC-LAG can be overkill when:
- The downstream device already has its own redundancy (dual-homing to two independent switches with distinct paths).
- Traffic on that segment does not justify the extra configuration and monitoring overhead.
- Budget does not cover two PEs of the same model/version at that site.
Our field experience
At Ayuda.LA we have implemented MC-LAG on ISP networks with Huawei NE, CX, and CE series. The most common pattern we see in networks that come to us for audit: good gear, good capacity, but a flat topology without chassis redundancy in distribution.
MC-LAG usually does not require new hardware — it requires rewiring existing devices and configuring E-Trunk. ROI in availability terms is immediate.
Learn more about our networking and ISP support services.
Auditing or redesigning your access layer?
We can review your current topology and identify the failure points that hurt availability most. No hardware sales, no conflict of interest.
Specific questions about MC-LAG or high availability on Huawei? Write to us at [email protected] — we answer every message.