An abnormal drop in server network throughput and a surge in TCP retransmission rate may not be a simple network fluctuation. Solving these problems requires a systematic debugging methodology—from macro-level traffic statistics to micro-level packet tracing, from hardware queues to the kernel protocol stack, dissecting this complex ecosystem comprised of network interface card (NIC) drivers, memory management, CPU scheduling, and network protocols layer by layer.
Understanding the complete journey of a data packet within the kernel is the cornerstone of debugging. When an Ethernet frame arrives at the physical NIC, it first enters the Ring Buffer, a DMA area managed by the NIC driver. The NIC then triggers a hardware interrupt, and the kernel's interrupt handler copies the packet to the Socket Buffer (SKB) structure, a general-purpose data container for the kernel's network subsystem. The packet then enters the network protocol stack: undergoing IP layer routing and TCP/UDP layer protocol processing, finally being placed in the corresponding application's receive queue. The sending path is the opposite: application data is encapsulated by the protocol stack, enters the send queue, and is ultimately sent through the NIC. Along this path, insufficient resources, misconfiguration, or code defects at any stage can cause packets to be dropped. The art of debugging lies in pinpointing the "leak" within a process spanning over ten stages.
Systematic debugging begins with a macro-level understanding of overall traffic and packet loss statistics. Use combined commands to quickly establish a global perspective:
# View error counts for each network interface
ip -s link show
# Check critical TCP layer error statistics
netstat -s | grep -E "(segments retransmitted|packet receive errors|dropped)"
# View detailed statistics for the network card driver (requires ethtool support)
ethtool -S eth0 | grep -i drop
The output of these commands can provide initial clues: if `ip -s link` shows high `rx_missed_errors`, it may indicate a FIFO overflow on the network card's receive side; if `rx_over_errors` increases, it points to insufficient DMA ring buffer; and `netstat` showing "dropped because of missing route" suggests a routing table problem. The key is to establish a baseline—record the values of these counters under normal conditions, and quickly pinpoint the problematic time period by comparing the increments when anomalies occur.
Once macroscopic statistics point in a specific direction, it's necessary to delve into the real-time dynamics of the kernel. At this point, dynamic tracing tools become crucial. The `dropwatch` tool can report the exact location of packet loss in the kernel in real time:
dropwatch -l kas
# After startup, packet loss events will trigger stack trace printing, for example:
# drop at: __netif_receive_skb_core+0x140/0x300
This output directly tells you that packet loss occurred at a specific offset in the kernel function `__netif_receive_skb_core`. Combined with the kernel source code, you can precisely pinpoint the specific decision branch in the processing flow. For more complex performance analysis, the `perf` tool can record the entire system call stack for packet loss events:
`perf record -e skb:kfree_skb -a -g -- sleep 10
perf script | awk '/skb:kfree_skb/ {print $5}' | sort | uniq -c | sort -nr`
This command analyzes the call stack of all SKB release events within 10 seconds. The most common release paths often point to the main cause of packet loss. If the packet loss is related to a specific CPU core, `perf` can also be combined with CPU affinity analysis to reveal uneven distribution of soft interrupts.
When delving into specific points of suspicion, it is necessary to examine kernel parameters and queue states. The `netdev_max_backlog` queue in the receive path is a common bottleneck:
# View current backlog and packet loss statistics
cat /proc/net/softnet_stat | awk '{print $1, $2, $3}'
# Adjust queue size (temporarily)
sysctl -w net.core.netdev_max_backlog=5000
The second column of the `softnet_stat` output is the packet loss count for the corresponding CPU core's `netdev_max_backlog`. If this value continues to increase, either increase the `backlog` or optimize soft interrupt handling—check `/proc/irq/*/smp_affinity` to ensure even interrupt distribution, or consider enabling RSS multi-queues.
Debugging the sending path focuses on `txqueuelen` and congestion control:
# Check sending queue length
ip link show eth0 | grep qlen
# Monitor sending queue backlog
tc -s qdisc show dev eth0
When the sending queue backlog (the `backlog` in the `tc` command output) is consistently non-zero, it may indicate insufficient network interface card (NIC) bandwidth or a small receiver window on the other end. In this case, it's necessary to use `ss -it` to check the TCP sending window status or use `tcpdump` to capture actual interactive traffic for analysis.
Packet loss caused by memory pressure requires special attention. Allocation of each SKB can fail:
# Check memory allocation failure statistics
grep -E "(allocfail|oom)" /proc/net/snmp
# Monitor system memory pressure
vmstat 1 5
When system memory is low, not only are application processes affected, but the kernel network stack also cannot allocate enough SKB structures. At this point, it's necessary to check if socket buffer parameters such as `net.core.rmem_default` and `net.core.rmem_max` are set appropriately, or consider reducing unnecessary memory caching.
In cloud environments and virtualization scenarios, debugging requires considering additional layers. Virtual network interfaces (such as veth and virtio-net) have independent statistics interfaces:
# View container veth interface statistics on the host machine
ethtool -S vethabcd123
Simultaneously, check for interference from TC (Flow Control) rules and eBPF programs:
tc filter show dev eth0
bpftool prog list | grep -i xdp
A misconfigured TC policy or a flawed XDP/eBPF program may unexpectedly drop legitimate packets.
After systematic debugging, optimization directions will naturally emerge. This might involve adjusting kernel parameters: optimizing `netdev_max_backlog`, `somaxconn`, and TCP buffer size based on actual load. It might also involve hardware configuration: enabling RSS, adjusting the ring buffer size, or even upgrading the network interface driver. It could also be an architectural improvement: deploying intelligent rate limiting at the traffic inlet, or refactoring the application to reduce unnecessary network round trips.
EN
CN