Network Congestion

Identifying and monitoring network congestion indicators, metrics, and common bottleneck locations

Signs of Network Congestion

Network congestion occurs when traffic demand exceeds available bandwidth, causing performance degradation across the network infrastructure. Understanding congestion indicators helps network engineers proactively identify and resolve bottlenecks before they impact end users.


Primary Congestion Indicators

  • High CPU utilization - Routers/switches exceeding 70-80% CPU usage consistently (not just spikes during convergence)
  • Buffer overflows - Device queues filling beyond capacity, forcing packet drops
  • Increased latency - Round-trip times significantly higher than baseline measurements
  • Packet loss - Dropped packets due to insufficient processing capacity or full buffers
  • Interface utilization - Links consistently operating above 70-80% capacity during normal operations

Performance Symptoms

  • Application timeouts - TCP sessions timing out due to delayed or lost packets
  • Slow file transfers - FTP, HTTP, or file sharing protocols experiencing reduced throughput
  • VoIP quality issues - Jitter, delay, or dropped calls (voice requires <150ms latency, <1% packet loss)
  • Video streaming problems - Buffering, pixelation, or connection drops
  • Database query delays - Client-server applications experiencing response time increases

Monitoring Metrics and Thresholds

Metric Normal Range Congestion Threshold Critical Level
Interface Utilization <70% >80% sustained >95%
CPU Usage <60% >75% sustained >90%
Memory Usage <70% >85% >95%
Queue Depth <50% buffer >80% buffer Buffer full
Packet Loss 0% >0.1% >1%
Latency (LAN) <10ms >50ms >100ms

Traffic Analysis Indicators

  • Microbursts - Brief traffic spikes exceeding interface capacity (may not show in 5-minute averages)
  • Asymmetric flows - Uneven traffic patterns indicating potential bottlenecks
  • Protocol distribution changes - Unusual increases in specific traffic types
  • Error counters increasing - CRC errors, input/output errors, or collisions (legacy Ethernet)

Common Congestion Locations

  • WAN links - Often the narrowest bandwidth point in enterprise networks
  • Internet gateways - Shared internet connections during peak usage
  • Server farm connections - High-demand applications creating hotspots
  • Wireless access points - Shared medium with limited aggregate throughput
  • Core switch uplinks - Aggregation points where multiple access switches connect

Troubleshooting Commands

  • show interfaces - Check utilization, errors, and drops
  • show processes cpu - Identify high CPU processes
  • show memory - Monitor memory usage and allocation
  • show ip route - Verify optimal path selection
  • show queueing interface <interface> - Examine queue statistics

Vocabulary

  • Microburst - Traffic burst lasting microseconds to milliseconds, often missed by standard monitoring
  • Buffer bloat - Excessive buffering causing increased latency without packet loss
  • Head-of-line blocking - Queued packets blocking subsequent packets even on different flows
  • Jitter - Variation in packet delay, critical for real-time applications
  • Throughput - Actual data transfer rate (differs from bandwidth capacity)

Notes

  • Monitor trending data, not just snapshots - Congestion patterns often correlate with business hours or application schedules
  • Interface utilization above 70% doesn’t always indicate problems - consider traffic patterns and application requirements
  • Voice and video traffic are most sensitive to congestion - prioritize these flows using QoS when congestion occurs
  • Wireless networks experience congestion differently due to shared medium and collision domains
  • Modern switches use cut-through or store-and-forward switching - congestion affects these differently
  • SNMP polling every 5 minutes may miss microbursts - consider more granular monitoring for critical links
  • Remember that full-duplex links can experience congestion in one direction only - monitor both transmit and receive statistics