[design] Revise training trigger and sampling caps (R2.3.5, R2.3.6) by GilboaAWS · Pull Request #14 · ikolomi/valkey

GilboaAWS · 2026-05-31T15:05:10Z

Summary

Revises the training trigger and sampling model for S1.2 implementation, based on design discussion with @ikolomi.

Changes

Removed:

compression-dict-first-training-keys-count — replaced by compression-dict-min-training-keys

New/updated knobs (all configurable):

Knob	Default	Role
`compression-dict-min-training-keys`	1000	Trigger: start scan when total DB keys reach this count (O(1) check). After scan: submit training only if collected eligible samples meet this minimum. Below ~1000 samples ZSTD lacks volume to represent the data distribution.
`compression-dict-max-training-keys`	10000	Upper cap on samples collected per scan. Beyond ~10K, dictionary quality saturates — diminishing returns.
`compression-training-buffer-size`	16 MiB	Upper cap on training buffer memory. Prevents memory explosion (without cap: 10K × 128KB = 1.28 GB). Transient — freed after bio returns.

Cooldown:

30s hardcoded cooldown after a failed training attempt (scan didn't collect enough eligible samples). Avoids expensive repeated scans when the DB has many non-eligible keys (INT, EMBSTR, too small, etc.).

Flow

compressionCron:
  total DB keys >= min (1000)?
  AND no training in progress?
  AND not in cooldown?
  → start scan

Scan collects eligible values (raw strings within size bounds).
Stops when:
  samples == max (10K)  OR  buffer == 16 MiB  OR  keyspace exhausted

After scan:
  eligible samples >= min (1000)?  → submit to bio for training
  eligible samples < min?          → abort, enter 30s cooldown

Why

Trigger uses total DB keys (not eligible keys) because eligibility requires scanning — we can't know without doing the work. Total key count is O(1) via kvstoreSize.
Buffer cap (16 MiB) prevents memory explosion from large values.
Key cap (10K) bounds scan duration; beyond 10K ZSTD quality saturates.
Min (1000) ensures enough diversity for a useful dictionary.
Cooldown (30s) prevents wasting main-thread CPU on repeated failed scans.

Design doc changes

R2.3.5: trigger uses compression-dict-min-training-keys, 30s cooldown on failure
R2.3.6: dual-cap model (max keys OR buffer size), min floor for submission
Config table: removed old knob, added 2 new knobs with detailed descriptions, updated count to 12
§6.3: error handling updated to reference new knob + cooldown

No code changes — design doc only.

ikolomi

Extend config descriptions with ZSTD information

@ikolomi

Revise the training trigger and sampling model based on design discussion with @ikolomi: - Remove compression-dict-first-training-keys-count (replaced by min) - compression-dict-min-training-keys (default 1000): serves as BOTH the trigger to start a scan AND the minimum samples required to submit training. If scan doesn't reach the min, abort + 30s cooldown. - compression-dict-max-training-keys (default 10000): upper key cap - compression-training-buffer-size (default 16 MiB): upper buffer cap - Scan stops on whichever cap comes first, or keyspace exhausted - 30s hardcoded cooldown after failed attempt (avoids expensive repeated scans when not enough eligible keys exist) - Added both new knobs to the Advanced knobs config table (now 12)

Reconcile the planning docs with the implementation-phase changes that landed after the original design walkthrough. - idea-honing.md Q9: add a "Superseded during implementation" annotation pointing to the new training-knob model from PR #14 (compression-dict-min-training-keys / -max-training-keys / compression-training-buffer-size). The original walkthrough text is preserved below the annotation for historical record. - detailed-design.md §7.1 transparency-mode example: replace --compression-dict-first-training-keys-count (removed in PR #14) with --compression-dict-min-training-keys. - detailed-design.md §2.12 heading: "Advanced knobs (12)" → "(11)". Off-by-one count introduced when PR #14 reshuffled the table. - summary.md: add a new "What changed during implementation" section between the walkthrough section and the plan summary, capturing the three substantive post-walkthrough refinements (PR #14 training rewrite, PR #15 R2.5.6 read-hot gap, PR #13 QSBR plumbing). Doc-only; no code changes.

github-actions Bot assigned GilboaAWS May 31, 2026

GilboaAWS marked this pull request as ready for review May 31, 2026 15:07

GilboaAWS force-pushed the gilboa/design-training-trigger branch from 9014c20 to 8e0a806 Compare June 1, 2026 10:10

ikolomi approved these changes Jun 1, 2026

View reviewed changes

GilboaAWS force-pushed the gilboa/design-training-trigger branch 2 times, most recently from e0d0b13 to 506bf46 Compare June 1, 2026 12:31

GilboaAWS force-pushed the gilboa/design-training-trigger branch from 506bf46 to a86ebab Compare June 1, 2026 12:38

GilboaAWS merged commit 00bfb6b into unstable Jun 1, 2026
3 checks passed

GilboaAWS deleted the gilboa/design-training-trigger branch June 1, 2026 12:57

ikolomi mentioned this pull request Jun 2, 2026

docs(planning): sweep stale references after PR #14 / PR #15 #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[design] Revise training trigger and sampling caps (R2.3.5, R2.3.6)#14

[design] Revise training trigger and sampling caps (R2.3.5, R2.3.6)#14
GilboaAWS merged 1 commit into
unstablefrom
gilboa/design-training-trigger

GilboaAWS commented May 31, 2026 •

edited

Loading

Uh oh!

ikolomi left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GilboaAWS commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Flow

Why

Design doc changes

Uh oh!

ikolomi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GilboaAWS commented May 31, 2026 •

edited

Loading