Skip to content

[design] Revise training trigger and sampling caps (R2.3.5, R2.3.6)#14

Merged
GilboaAWS merged 1 commit into
unstablefrom
gilboa/design-training-trigger
Jun 1, 2026
Merged

[design] Revise training trigger and sampling caps (R2.3.5, R2.3.6)#14
GilboaAWS merged 1 commit into
unstablefrom
gilboa/design-training-trigger

Conversation

@GilboaAWS

@GilboaAWS GilboaAWS commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Summary

Revises the training trigger and sampling model for S1.2 implementation, based on design discussion with @ikolomi.

Changes

Removed:

  • compression-dict-first-training-keys-count — replaced by compression-dict-min-training-keys

New/updated knobs (all configurable):

Knob Default Role
compression-dict-min-training-keys 1000 Trigger: start scan when total DB keys reach this count (O(1) check). After scan: submit training only if collected eligible samples meet this minimum. Below ~1000 samples ZSTD lacks volume to represent the data distribution.
compression-dict-max-training-keys 10000 Upper cap on samples collected per scan. Beyond ~10K, dictionary quality saturates — diminishing returns.
compression-training-buffer-size 16 MiB Upper cap on training buffer memory. Prevents memory explosion (without cap: 10K × 128KB = 1.28 GB). Transient — freed after bio returns.

Cooldown:

  • 30s hardcoded cooldown after a failed training attempt (scan didn't collect enough eligible samples). Avoids expensive repeated scans when the DB has many non-eligible keys (INT, EMBSTR, too small, etc.).

Flow

compressionCron:
  total DB keys >= min (1000)?
  AND no training in progress?
  AND not in cooldown?
  → start scan

Scan collects eligible values (raw strings within size bounds).
Stops when:
  samples == max (10K)  OR  buffer == 16 MiB  OR  keyspace exhausted

After scan:
  eligible samples >= min (1000)?  → submit to bio for training
  eligible samples < min?          → abort, enter 30s cooldown

Why

  • Trigger uses total DB keys (not eligible keys) because eligibility requires scanning — we can't know without doing the work. Total key count is O(1) via kvstoreSize.
  • Buffer cap (16 MiB) prevents memory explosion from large values.
  • Key cap (10K) bounds scan duration; beyond 10K ZSTD quality saturates.
  • Min (1000) ensures enough diversity for a useful dictionary.
  • Cooldown (30s) prevents wasting main-thread CPU on repeated failed scans.

Design doc changes

  • R2.3.5: trigger uses compression-dict-min-training-keys, 30s cooldown on failure
  • R2.3.6: dual-cap model (max keys OR buffer size), min floor for submission
  • Config table: removed old knob, added 2 new knobs with detailed descriptions, updated count to 12
  • §6.3: error handling updated to reference new knob + cooldown

No code changes — design doc only.

@GilboaAWS GilboaAWS marked this pull request as ready for review May 31, 2026 15:07
@GilboaAWS GilboaAWS force-pushed the gilboa/design-training-trigger branch from 9014c20 to 8e0a806 Compare June 1, 2026 10:10

@ikolomi ikolomi left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend config descriptions with ZSTD information

@GilboaAWS GilboaAWS force-pushed the gilboa/design-training-trigger branch 2 times, most recently from e0d0b13 to 506bf46 Compare June 1, 2026 12:31
Revise the training trigger and sampling model based on design discussion
with @ikolomi:

- Remove compression-dict-first-training-keys-count (replaced by min)
- compression-dict-min-training-keys (default 1000): serves as BOTH the
  trigger to start a scan AND the minimum samples required to submit
  training. If scan doesn't reach the min, abort + 30s cooldown.
- compression-dict-max-training-keys (default 10000): upper key cap
- compression-training-buffer-size (default 16 MiB): upper buffer cap
- Scan stops on whichever cap comes first, or keyspace exhausted
- 30s hardcoded cooldown after failed attempt (avoids expensive
  repeated scans when not enough eligible keys exist)
- Added both new knobs to the Advanced knobs config table (now 12)
@GilboaAWS GilboaAWS force-pushed the gilboa/design-training-trigger branch from 506bf46 to a86ebab Compare June 1, 2026 12:38
@GilboaAWS GilboaAWS merged commit 00bfb6b into unstable Jun 1, 2026
3 checks passed
@GilboaAWS GilboaAWS deleted the gilboa/design-training-trigger branch June 1, 2026 12:57
ikolomi added a commit that referenced this pull request Jun 2, 2026
Reconcile the planning docs with the implementation-phase changes
that landed after the original design walkthrough.

  - idea-honing.md Q9: add a "Superseded during implementation"
    annotation pointing to the new training-knob model from PR #14
    (compression-dict-min-training-keys / -max-training-keys /
    compression-training-buffer-size). The original walkthrough text
    is preserved below the annotation for historical record.

  - detailed-design.md §7.1 transparency-mode example: replace
    --compression-dict-first-training-keys-count (removed in PR #14)
    with --compression-dict-min-training-keys.

  - detailed-design.md §2.12 heading: "Advanced knobs (12)" → "(11)".
    Off-by-one count introduced when PR #14 reshuffled the table.

  - summary.md: add a new "What changed during implementation"
    section between the walkthrough section and the plan summary,
    capturing the three substantive post-walkthrough refinements
    (PR #14 training rewrite, PR #15 R2.5.6 read-hot gap, PR #13
    QSBR plumbing).

Doc-only; no code changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants