Skip to content

Latency calculation is off #131

@rileyhilliard

Description

@rileyhilliard

Describe the bug

The rr monitor TUI displays a "Latency" metric that conflates actual network/SSH latency with the time required to collect metrics on the remote host. On
macOS, network bandwidth calculation uses nettop -L N which samples over N seconds, causing the reported "latency" to show ~5 seconds even when actual
SSH latency is under 100ms.

This is misleading because users see "5173ms degraded" and assume there's a network problem when the connection is actually healthy.

To reproduce

  1. Configure two macOS hosts in ~/.rr/config.yaml
  2. Run rr monitor
  3. Observe "Latency" showing ~5000ms+ with "degraded" status
  4. Verify actual latency is fine:
    # SSH roundtrip is ~100ms                                                                                                                               
    time ssh <host> "echo ok"                                                                                                                               
                                                                                                                                                            
    # rr status shows correct latency                                                                                                                       
    rr status --machine                                                                                                                                     
    # Returns: "latency": "75.497083ms"                                                                                                                     
                                                                                                                                                            

Expected behavior

The "Latency" metric should reflect actual network/SSH latency (what rr status reports), not the total metrics collection time. If metrics collection takes
longer due to sampling windows, that duration should either:

  1. Not be included in the latency display, or
  2. Be shown as a separate "Collection Time" metric, or
  3. Be documented/labeled clearly (e.g., "Probe Time" instead of "Latency")

The "degraded" threshold should apply to actual network latency, not collection time.

Environment

  • OS: macOS 15.7.3 (also affects remote macOS hosts)
  • rr version: v0.14.1 (commit b2b1ac7)
  • Go version: go1.24.11
  • Architecture: darwin/arm64

Additional context

The root cause appears to be nettop usage for network bandwidth stats. To calculate bytes/sec, multiple samples are needed:

  # Single sample - fast (~150ms)                                                                                                                            
  time ssh host 'nettop -P -L 1 -J bytes_in,bytes_out'                                                                                                     
                                                                                                                                                             
  # 5 samples - slow (~4-5s)                                                                                                                                 
  time ssh host 'nettop -P -L 5 -J bytes_in,bytes_out'                                                         

rr status --machine correctly reports SSH probe latency (~75-100ms), so the accurate measurement exists. It's just not what rr monitor displays.

Relevant config:

  # ~/.rr/config.yaml                                                                                                                                        
  hosts:                                                                                                                                                     
    congress-m2-ultra:                                                                                                                                       
      ssh:                                                                                                                                                   
        - mac-m4                                                                                                                                           
      dir: ~/rr/${PROJECT}                                                                                                                                   
    lavaka-m2-ultra:                                                                                                                                         
      ssh:                                                                                                                                                   
        - mac-m1                                                                                                                                             
      dir: ~/rr/${PROJECT}                                                                                                                                   

Both hosts show identical ~5170ms "latency" in monitor, but rr status shows 75ms and 100ms respectively.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions