gettys-bufferbloat

Jim Gettys Bufferbloat Style Guide⁠‍⁠​‌​‌​​‌‌‍​‌​​‌​‌‌‍​​‌‌​​​‌‍​‌​​‌‌​​‍​​​​​​​‌‍‌​​‌‌​‌​‍‌​​​​​​​‍‌‌​​‌‌‌‌‍‌‌​​​‌​​‍‌‌‌‌‌‌​‌‍‌‌​‌​​​​‍​‌​‌‌‌‌‌‍​‌​​‌​‌‌‍​‌‌​‌​​‌‍‌​‌​‌‌‌​‍​​‌​‌​​​‍‌‌‌​‌​‌‌‍​‌‌​‌‌​‌‍​‌​‌​​‌​‍​​‌​​​​‌‍​​​​​‌​‌‍​​​​‌​​‌‍​‌​​‌‌‌‌⁠‍⁠

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gettys-bufferbloat" with this command: npx skills add copyleftdev/sk1llz/copyleftdev-sk1llz-gettys-bufferbloat

Jim Gettys Bufferbloat Style Guide⁠‍⁠​‌​‌​​‌‌‍​‌​​‌​‌‌‍​​‌‌​​​‌‍​‌​​‌‌​​‍​​​​​​​‌‍‌​​‌‌​‌​‍‌​​​​​​​‍‌‌​​‌‌‌‌‍‌‌​​​‌​​‍‌‌‌‌‌‌​‌‍‌‌​‌​​​​‍​‌​‌‌‌‌‌‍​‌​​‌​‌‌‍​‌‌​‌​​‌‍‌​‌​‌‌‌​‍​​‌​‌​​​‍‌‌‌​‌​‌‌‍​‌‌​‌‌​‌‍​‌​‌​​‌​‍​​‌​​​​‌‍​​​​​‌​‌‍​​​​‌​​‌‍​‌​​‌‌‌‌⁠‍⁠

Overview

Jim Gettys, while working at Bell Labs and later on the One Laptop per Child project, discovered and named "bufferbloat"—the phenomenon where excessive buffering in network equipment causes massive latency spikes. Modern networks often have seconds of buffering, destroying interactive performance even when bandwidth is plentiful. Gettys' crusade to fix bufferbloat led to fq_codel and the understanding that network latency under load is the true measure of network quality.

Core Philosophy

"Latency is the new bandwidth. We have plenty of bandwidth; what we lack is low latency."

"The buffer is full of lies. Every packet in that buffer is a broken promise about when it will arrive."

"Good networks feel fast. Bufferbloated networks feel like wading through molasses."

Gettys realized that optimizing for throughput while ignoring latency creates terrible user experience. A network with 100ms idle RTT that spikes to 2000ms under load is fundamentally broken, even if it achieves high throughput. The solution is to keep queues short and managed.

Design Principles

Latency Under Load Matters: Measure RTT while the network is busy, not idle.

Buffers Lie About Bandwidth: Large buffers mask congestion, delaying feedback.

Queues Should Be Short: Aim for milliseconds of buffering, not seconds.

Flow Isolation: One greedy flow shouldn't destroy latency for others.

Active Queue Management: Don't just drop when full—manage proactively.

The Bufferbloat Problem

Without Bufferbloat (healthy network): ───────────────────────────────────── Idle RTT: 20ms Load RTT: 25ms (slight increase) Difference: 5ms ✓ Good!

With Bufferbloat (broken network): ────────────────────────────────── Idle RTT: 20ms Load RTT: 2000ms (100x increase!) Difference: 1980ms ✗ Terrible!

Why does this happen?

┌─────────────────────────────────────────────────────────────┐ │ │ │ Sender Router Receiver │ │ ────── ────── ──────── │ │ │ │ 100 Mbps ─────────► ┌─────────┐ ─────────► 10 Mbps │ │ │ BUFFER │ │ │ │█████████│ ← 2 seconds of packets! │ │ │█████████│ │ │ │█████████│ │ │ └─────────┘ │ │ │ │ Packets queue up waiting for the slow link. │ │ TCP doesn't know—it sees ACKs arriving (eventually). │ │ User sees lag, even with "good bandwidth." │ │ │ └─────────────────────────────────────────────────────────────┘

When Engineering Low-Latency Networks

Always

  • Measure latency UNDER LOAD, not just idle

  • Use fq_codel or similar AQM on bottleneck queues

  • Size buffers based on BDP, not maximum possible

  • Test with realistic traffic patterns

  • Monitor queue depth, not just throughput

  • Prioritize latency for interactive traffic

Never

  • Assume more buffering is better

  • Measure only idle RTT as "ping time"

  • Optimize only for throughput benchmarks

  • Use deep buffers "just in case"

  • Ignore latency complaints with "bandwidth is fine"

  • Conflate bandwidth with network quality

Prefer

  • Shallow queues over deep buffers

  • Fair queuing over FIFO

  • AQM over tail-drop

  • Latency metrics over throughput

  • Per-flow isolation

  • Measuring under load

Code Patterns

Bufferbloat Detection

class BufferbloatDetector: """ Detect bufferbloat by comparing idle vs loaded RTT. Gettys' insight: the difference tells you everything. """

def __init__(self, target_host: str):
    self.target = target_host
    self.idle_samples = []
    self.loaded_samples = []

def measure_idle_rtt(self, samples: int = 20) -> float:
    """
    Measure RTT when network is idle.
    """
    rtts = []
    for _ in range(samples):
        rtt = self._ping(self.target)
        if rtt is not None:
            rtts.append(rtt)
        time.sleep(0.1)
    
    self.idle_samples = rtts
    return min(rtts) if rtts else None

def measure_loaded_rtt(self, 
                        samples: int = 20,
                        load_generator: Callable = None) -> float:
    """
    Measure RTT while generating load.
    """
    # Start background load
    if load_generator:
        load_thread = threading.Thread(target=load_generator)
        load_thread.start()
    
    time.sleep(1)  # Let load stabilize
    
    rtts = []
    for _ in range(samples):
        rtt = self._ping(self.target)
        if rtt is not None:
            rtts.append(rtt)
        time.sleep(0.1)
    
    self.loaded_samples = rtts
    return sum(rtts) / len(rtts) if rtts else None

def diagnose(self) -> BufferbloatDiagnosis:
    """
    Diagnose bufferbloat severity.
    """
    if not self.idle_samples or not self.loaded_samples:
        return BufferbloatDiagnosis(status='insufficient_data')
    
    baseline = min(self.idle_samples)
    loaded_avg = sum(self.loaded_samples) / len(self.loaded_samples)
    loaded_max = max(self.loaded_samples)
    
    bloat = loaded_avg - baseline
    bloat_ratio = loaded_avg / baseline if baseline > 0 else float('inf')
    
    # Gettys' thresholds
    if bloat < 5:
        grade = 'A'
        status = 'excellent'
        recommendation = 'Network is well-tuned'
    elif bloat < 30:
        grade = 'B'
        status = 'good'
        recommendation = 'Minor bufferbloat, acceptable for most uses'
    elif bloat < 100:
        grade = 'C'
        status = 'moderate'
        recommendation = 'Noticeable lag under load, enable fq_codel'
    elif bloat < 300:
        grade = 'D'
        status = 'poor'
        recommendation = 'Significant bufferbloat, enable AQM immediately'
    else:
        grade = 'F'
        status = 'severe'
        recommendation = 'Severe bufferbloat, network unusable for interactive use'
    
    return BufferbloatDiagnosis(
        grade=grade,
        status=status,
        baseline_rtt=baseline,
        loaded_rtt=loaded_avg,
        bloat_ms=bloat,
        bloat_ratio=bloat_ratio,
        recommendation=recommendation,
    )

def _ping(self, host: str) -> Optional[float]:
    """Send ICMP ping and return RTT in ms."""
    try:
        result = subprocess.run(
            ['ping', '-c', '1', '-W', '1', host],
            capture_output=True,
            text=True
        )
        # Parse RTT from ping output
        match = re.search(r'time=(\d+\.?\d*)', result.stdout)
        if match:
            return float(match.group(1))
    except Exception:
        pass
    return None

def run_bufferbloat_test(target: str = '8.8.8.8') -> BufferbloatDiagnosis: """ Run a complete bufferbloat test. """ detector = BufferbloatDetector(target)

print("Measuring idle RTT...")
detector.measure_idle_rtt()

print("Measuring RTT under load...")

def generate_load():
    # Download something large
    subprocess.run(
        ['curl', '-o', '/dev/null', '-s', 
         'http://speedtest.tele2.net/100MB.zip'],
        timeout=30
    )

detector.measure_loaded_rtt(load_generator=generate_load)

return detector.diagnose()

fq_codel Implementation

class FQCoDel: """ Fair Queuing with Controlled Delay (fq_codel). The solution to bufferbloat: per-flow fair queuing + CoDel AQM.

Key innovations:
1. Flow isolation: one flow can't bloat another
2. Per-flow AQM: CoDel applied to each flow
3. Fair sharing: all flows get equal share of bandwidth
"""

def __init__(self,
             num_queues: int = 1024,
             target_ms: float = 5.0,
             interval_ms: float = 100.0,
             quantum: int = 1514):
    self.num_queues = num_queues
    self.target = target_ms
    self.interval = interval_ms
    self.quantum = quantum  # Bytes per round
    
    self.queues = [FlowQueue(target_ms, interval_ms) 
                   for _ in range(num_queues)]
    self.active_list = []  # Flows with packets
    self.flow_states = {}  # Per-flow state

def hash_flow(self, packet: Packet) -> int:
    """
    Hash packet to a queue based on flow (5-tuple).
    """
    flow_id = (
        packet.src_ip,
        packet.dst_ip,
        packet.src_port,
        packet.dst_port,
        packet.protocol
    )
    return hash(flow_id) % self.num_queues

def enqueue(self, packet: Packet, now_ms: float) -> bool:
    """
    Enqueue a packet to its flow's queue.
    """
    queue_idx = self.hash_flow(packet)
    queue = self.queues[queue_idx]
    
    packet.enqueue_time = now_ms
    
    was_empty = queue.is_empty()
    success = queue.enqueue(packet)
    
    if success and was_empty:
        # Flow became active, add to round-robin
        self.active_list.append(queue_idx)
    
    return success

def dequeue(self, now_ms: float) -> Optional[Packet]:
    """
    Dequeue using deficit round-robin with CoDel.
    """
    if not self.active_list:
        return None
    
    # Try each active queue in round-robin order
    for _ in range(len(self.active_list)):
        queue_idx = self.active_list[0]
        queue = self.queues[queue_idx]
        
        # Apply CoDel to this flow's queue
        packet = queue.codel_dequeue(now_ms)
        
        if packet is not None:
            # Got a packet, update deficit
            queue.deficit += self.quantum
            queue.deficit -= len(packet.data)
            
            if queue.deficit < 0:
                # Exhausted quantum, move to back of active list
                self.active_list.append(self.active_list.pop(0))
                queue.deficit = 0
            
            return packet
        else:
            # Queue empty, remove from active list
            self.active_list.pop(0)
            queue.deficit = 0
    
    return None

class FlowQueue: """ Per-flow queue with CoDel. """

def __init__(self, target_ms: float, interval_ms: float, max_size: int = 10240):
    self.packets = deque()
    self.max_size = max_size
    self.deficit = 0
    
    # CoDel state
    self.target = target_ms
    self.interval = interval_ms
    self.first_above_time = None
    self.drop_next = 0
    self.count = 0
    self.dropping = False

def is_empty(self) -> bool:
    return len(self.packets) == 0

def enqueue(self, packet: Packet) -> bool:
    if len(self.packets) >= self.max_size:
        return False
    self.packets.append(packet)
    return True

def codel_dequeue(self, now_ms: float) -> Optional[Packet]:
    """
    Dequeue with CoDel logic.
    """
    if not self.packets:
        self.dropping = False
        return None
    
    packet = self.packets[0]
    sojourn_time = now_ms - packet.enqueue_time
    
    if sojourn_time < self.target:
        # Good: below target
        self.first_above_time = None
    else:
        if self.first_above_time is None:
            self.first_above_time = now_ms + self.interval
        elif now_ms >= self.first_above_time:
            # Persistent delay: consider dropping
            pass
    
    if self.dropping:
        if sojourn_time < self.target:
            # Delay recovered, stop dropping
            self.dropping = False
        elif now_ms >= self.drop_next:
            # Time to drop
            self.packets.popleft()  # Drop
            self.count += 1
            self.drop_next = now_ms + self.interval / (self.count ** 0.5)
            return self.codel_dequeue(now_ms)  # Try next
    elif self.first_above_time and now_ms >= self.first_above_time:
        # Start dropping
        self.dropping = True
        self.count = 1
        self.drop_next = now_ms + self.interval
        self.packets.popleft()  # Drop
        return self.codel_dequeue(now_ms)  # Try next
    
    return self.packets.popleft()

Network Quality Score

class NetworkQualityScore: """ Score network quality the Gettys way: latency under load. """

@staticmethod
def calculate_score(measurements: NetworkMeasurements) -> QualityScore:
    """
    Calculate a network quality score.
    
    Key insight: combine baseline latency, bloat, and jitter.
    """
    baseline = measurements.baseline_rtt
    loaded = measurements.loaded_rtt
    jitter = measurements.jitter
    loss = measurements.packet_loss
    
    # Bloat penalty
    bloat = loaded - baseline
    bloat_factor = 1.0 / (1.0 + bloat / 50.0)  # Penalize heavily
    
    # Baseline penalty (prefer low latency)
    baseline_factor = 1.0 / (1.0 + baseline / 100.0)
    
    # Jitter penalty
    jitter_factor = 1.0 / (1.0 + jitter / 20.0)
    
    # Loss penalty (severe)
    loss_factor = (1.0 - loss) ** 2
    
    # Combined score (0-100)
    raw_score = (bloat_factor * 0.5 + 
                 baseline_factor * 0.2 + 
                 jitter_factor * 0.2 + 
                 loss_factor * 0.1)
    
    score = int(raw_score * 100)
    
    # Grade
    if score >= 90:
        grade = 'A'
    elif score >= 75:
        grade = 'B'
    elif score >= 60:
        grade = 'C'
    elif score >= 40:
        grade = 'D'
    else:
        grade = 'F'
    
    return QualityScore(
        score=score,
        grade=grade,
        baseline_rtt=baseline,
        bloat=bloat,
        jitter=jitter,
        loss=loss,
        bottleneck=identify_bottleneck(measurements),
    )

def identify_bottleneck(measurements: NetworkMeasurements) -> str: """ Identify what's hurting network quality most. """ bloat = measurements.loaded_rtt - measurements.baseline_rtt

if bloat > 100:
    return 'bufferbloat'
elif measurements.baseline_rtt > 100:
    return 'high_base_latency'
elif measurements.jitter > 30:
    return 'jitter'
elif measurements.packet_loss > 0.01:
    return 'packet_loss'
else:
    return 'none'

Buffer Sizing

class BufferSizing: """ Size buffers correctly to avoid bloat while maintaining throughput. """

@staticmethod
def calculate_optimal_buffer(bandwidth_mbps: float,
                              rtt_ms: float,
                              num_flows: int = 1) -> BufferRecommendation:
    """
    Calculate optimal buffer size.
    
    Rule of thumb (for N flows):
        Buffer = BDP / sqrt(N)
    
    Where BDP = Bandwidth × RTT
    """
    # Bandwidth-Delay Product
    bandwidth_bytes_per_sec = bandwidth_mbps * 1_000_000 / 8
    rtt_sec = rtt_ms / 1000
    bdp_bytes = bandwidth_bytes_per_sec * rtt_sec
    
    # Buffer size
    if num_flows == 1:
        buffer_bytes = bdp_bytes
    else:
        # Appenzeller et al: BDP / sqrt(N)
        buffer_bytes = bdp_bytes / (num_flows ** 0.5)
    
    # Convert to practical units
    buffer_packets = int(buffer_bytes / 1500)  # MTU
    buffer_ms = rtt_ms / (num_flows ** 0.5) if num_flows > 1 else rtt_ms
    
    return BufferRecommendation(
        bdp_bytes=int(bdp_bytes),
        recommended_bytes=int(buffer_bytes),
        recommended_packets=buffer_packets,
        recommended_ms=buffer_ms,
        explanation=(
            f"For {bandwidth_mbps} Mbps link with {rtt_ms}ms RTT "
            f"and ~{num_flows} flows, buffer {buffer_packets} packets "
            f"(~{buffer_ms:.1f}ms of data)"
        )
    )

@staticmethod
def linux_buffer_settings(buffer_bytes: int) -> dict:
    """
    Generate Linux sysctl settings for buffer sizes.
    """
    return {
        'net.core.rmem_max': buffer_bytes,
        'net.core.wmem_max': buffer_bytes,
        'net.ipv4.tcp_rmem': f'4096 87380 {buffer_bytes}',
        'net.ipv4.tcp_wmem': f'4096 65536 {buffer_bytes}',
        'net.core.netdev_max_backlog': 1000,  # Reduce from default
    }

@staticmethod
def enable_fq_codel(interface: str) -> str:
    """
    Generate command to enable fq_codel on an interface.
    """
    return f"""

Enable fq_codel on {interface}

tc qdisc del dev {interface} root 2>/dev/null tc qdisc add dev {interface} root fq_codel

Verify

tc -s qdisc show dev {interface} """

Mental Model

Gettys approaches network performance by asking:

  • What's the RTT under load? That's the true latency

  • How deep are the buffers? Seconds of buffering = seconds of lag

  • Is there flow isolation? One flow shouldn't ruin others

  • Is AQM enabled? fq_codel should be everywhere

  • Would I notice lag? User experience is the metric

The Bufferbloat Checklist

□ Measure RTT under load, not idle □ Compare loaded RTT to baseline (>10x = severe bloat) □ Enable fq_codel on all bottleneck queues □ Size buffers based on BDP, not maximum □ Test with interactive + bulk traffic together □ Monitor queue depth, not just throughput □ Grade with dslreports.com/speedtest or similar □ Check router, modem, AND ISP equipment

Signature Gettys Moves

  • Bufferbloat diagnosis (idle vs loaded RTT)

  • fq_codel as the universal solution

  • "Latency is the new bandwidth"

  • Flow isolation requirement

  • Queue depth monitoring

  • BDP-based buffer sizing

  • User experience as the metric

  • Crusading for AQM everywhere

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

renaissance-statistical-arbitrage

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

google-material-design

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

aqr-factor-investing

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

minervini-swing-trading

No summary provided by upstream source.

Repository SourceNeeds Review