Skip to content

gstack-jsonl-merge: equal-ts entries resolve non-deterministically across machines (append-only logs never converge) #1769

@jbetala7

Description

@jbetala7

Observed problem

bin/gstack-jsonl-merge is the registered git merge driver for append-only JSONL
state files (telemetry, learnings, timeline, etc., wired up by
gstack-artifacts-init / gstack-brain-restore). Its job is to resolve the
same-tail conflict two machines produce when both append between pushes, and the
header promises this is deterministic: "both appends survive, ordered by
wall-clock timestamp where available, content hash otherwise."

It is not deterministic when two different entries share the same ts.

Current behavior on origin/main (19770ea)

The Python sort key for a timestamped line is (0, ts) with no further
tiebreaker. Python's sort is stable, so entries with an equal ts keep their
insertion order, which is base, ours, theirs. But which physical entry is
"ours" vs "theirs" depends on which machine runs the merge — git assigns the
local side to %A (ours). So the two machines resolving the same conflict emit
the two equal-ts lines in opposite order:

# Machine A (its line = ours):     event:a, event:b
# Machine B (its line = ours):     event:b, event:a

The merged files differ, so the repos diverge and the next sync is another
conflict — resolved differently again. The logs never converge.

This is not a corner case for gstack-telemetry-log, which stamps
second-granularity timestamps (date -u +%Y-%m-%dT%H:%M:%SZ): two skill events
in the same second collide routinely. Non-JSON lines and lines without a ts
are unaffected because they already order by SHA-256 of the line content, which
is side-independent.

Repro:

a='{"ts":"2026-05-28T10:00:00Z","event":"a"}'
b='{"ts":"2026-05-28T10:00:00Z","event":"b"}'
printf '%s\n' "$a" > ours; printf '%s\n' "$b" > theirs; : > base
bin/gstack-jsonl-merge base ours theirs && cat ours      # a then b
printf '%s\n' "$b" > ours; printf '%s\n' "$a" > theirs; : > base
bin/gstack-jsonl-merge base ours theirs && cat ours      # b then a  <-- diverges

Expected behavior

The same set of input lines resolves to the same file on every machine,
regardless of which side each line arrives on.

Duplicate searches performed

  • open PRs: jsonl merge, merge driver, jsonl-merge OR determinism OR convergence — none
  • scanned all 100 open PRs for any touching bin/gstack-jsonl-merge — none
  • open issues: jsonl merge driver, jsonl merge driver determinism — none
  • git log -- bin/gstack-jsonl-merge: last touched in v1.27.0.0, introduced v1.9.0.0; no in-flight work

Candidate fix shape

Make the sort a total order by adding the line content as the final tiebreaker:
(0, ts, line) for timestamped entries (and (1, h, line) for the hash path,
for symmetry). Equal-ts entries then order by content, identically on both
sides. Plus a regression test that runs the driver with the two sides swapped
and asserts identical output.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions