Skip to content

feat(provide): +unique and +entities strategy modifiers#11245

Draft
lidel wants to merge 15 commits intomasterfrom
feat/provide-entity-roots-with-dedup
Draft

feat(provide): +unique and +entities strategy modifiers#11245
lidel wants to merge 15 commits intomasterfrom
feat/provide-entity-roots-with-dedup

Conversation

@lidel
Copy link
Member

@lidel lidel commented Mar 20, 2026

Warning

not ready for review, this is a sandbox for running CI

Summary

Adds two experimental Provide.Strategy modifiers (+unique and +entities) for large IPFS nodes that pin many overlapping DAGs, extends fast-provide to pin add/pin update, and adds a new --fast-provide-dag flag for full-DAG provide on write.

Motivation

Nodes hosting large, versioned datasets (e.g. dist.ipfs.tech) pin thousands of DAGs that share 90%+ of their blocks. Today, every reprovide cycle re-walks every shared subtree once per pin, wasting I/O and time. Meanwhile, announcing every internal file chunk to the DHT produces millions of provider records when only entity-level CIDs (files, directories) matter for discovery.

Separately, pin add and pin update never fast-provided the root CID, leaving a gap where newly pinned content was undiscoverable until the next reprovide cycle (up to 22h).

Changes

+unique strategy modifier

Appending +unique to a pinned, mfs, or pinned+mfs strategy enables bloom-filter deduplication across recursive pins within a single reprovide cycle.

  • A shared BloomTracker (from boxo/dag/walker) is created per cycle, sized from the previous cycle's persisted count (stored in datastore at /reprovideLastUniqueCount).
  • When a CID is already in the bloom, its entire subtree is skipped -- reducing traversal from O(pins * total_blocks) to O(unique_blocks).
  • Memory: ~4 bytes/CID (vs ~75 bytes for an exact cid.Set), enabling dedup on repos with tens of millions of CIDs.

Example: Provide.Strategy = "pinned+mfs+unique"

+entities strategy modifier

Appending +entities announces only entity roots (file roots, directory roots, HAMT shard nodes), skipping internal file chunks. Implies +unique.

  • Uses walker.WalkEntityRoots from boxo, which inspects UnixFS node types and stops descending into file chunk subtrees.
  • Non-UnixFS content (e.g. dag-cbor) is still fully walked.
  • Drastically fewer DHT provider records for repos with large files.

Example: Provide.Strategy = "pinned+mfs+entities"

pin add / pin update fast-provide

Both commands now announce the pinned root CID immediately after pinning (matching ipfs add and ipfs dag import behavior). New flags:

  • --fast-provide-root (default: Import.FastProvideRoot, true)
  • --fast-provide-dag (default: Import.FastProvideDAG, false)
  • --fast-provide-wait (default: Import.FastProvideWait, false)

With Provide.Strategy=all (default), this is a no-op since the blockstore already provides every block on write.

--fast-provide-dag flag

New flag on ipfs add, ipfs dag import, ipfs pin add, and ipfs pin update. When enabled, walks and provides the full DAG immediately after write using the active Provide.Strategy to determine scope. A single bloom tracker is shared across all roots for dedup. Configurable via Import.FastProvideDAG (default: false).

With --fast-provide-dag, the DAG walk emits the root first (DFS pre-order), so --fast-provide-root is redundant and not needed.

Gate providingDagService behind --fast-provide-dag

The providingDagService wrapper (which provides every block during ipfs add writes) is now only active when --fast-provide-dag=true. Previously it was always on for pinned strategy, causing per-block DHT provides during add regardless of user intent. Now the default path only fast-provides the root CID after add completes, and the reprovide sweep handles the rest.

Hardened strategy parsing

ParseProvideStrategy now returns an error instead of silently ignoring unknown tokens. Catches:

  • Unknown tokens (typos like "uniuqe")
  • Empty tokens from malformed delimiters ("pinned+", "+pinned", "pinned++mfs")
  • Invalid combinations ("all+pinned", "roots+unique")

Validated at startup via ValidateProvideConfig. Internal callers use MustParseProvideStrategy for already-validated strings.

Rename ExecuteFastProvide to ExecuteFastProvideRoot

Clarifies intent now that ExecuteFastProvideDAG exists alongside it.

Files changed

  • config/provide.go -- +unique/+entities strategy constants, error-returning ParseProvideStrategy, MustParseProvideStrategy, validation in ValidateProvideConfig
  • config/provide_test.go -- tests for valid strategies, unknown tokens, empty tokens, invalid combos, MustParseProvideStrategy, and config validation
  • config/import.go -- DefaultFastProvideDAG, FastProvideDAG config field
  • core/node/provider.go -- +unique/+entities reprovide cycle with bloom tracker, MFS entity-root walker, bloom count persistence
  • core/commands/cmdenv/env.go -- ExecuteFastProvideRoot (rename), new ExecuteFastProvideDAG
  • core/commands/add.go -- wire --fast-provide-dag flag
  • core/commands/dag/dag.go, core/commands/dag/import.go -- wire --fast-provide-dag flag
  • core/commands/pin/pin.go -- fast-provide flags on pin add and pin update
  • core/coreapi/unixfs.go -- gate providingDagService behind FastProvideDAG setting
  • core/coreiface/options/unixfs.go -- FastProvideDAG option
  • core/node/core.go, core/node/storage.go -- switch to MustParseProvideStrategy
  • docs/config.md -- document +unique, +entities modifiers and caveats
  • docs/changelogs/v0.41.md -- changelog entries

Compatibility

  • Default behavior (Provide.Strategy=all) is completely unchanged.
  • +unique and +entities are opt-in modifiers.
  • --fast-provide-dag defaults to false.
  • Strategy parsing is now stricter; previously-ignored typos in config will now produce an error at startup.

Depends on

  • boxo/dag/walker -- BloomTracker, WalkEntityRoots, WalkDAG, LinksFetcherFromBlockstore, NodeFetcherFromBlockstore
  • boxo/pinning/dspin -- NewUniquePinnedProvider, NewPinnedEntityRootsProvider

Context

- config: ParseProvideStrategy returns error, rejects "all" mixed with
  selective strategies, removes dead strategy==0 check
- config: add MustParseProvideStrategy for pre-validated call sites
- config: ValidateProvideConfig validates strategy at startup
- config: ShouldProvideForStrategy uses bitmask check for ProvideStrategyAll
- core/node: downstream callers use MustParseProvideStrategy
- core/node: fix Pinning() nil return that caused fx.Provide panic
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 420b111 to 4468527 Compare March 24, 2026 00:34
lidel added 8 commits March 24, 2026 01:47
- ProvideStrategyUnique: bloom filter cross-DAG deduplication
- ProvideStrategyEntities: entity-aware traversal (implies Unique)
- parser: "unique" and "entities" tokens recognized
- validation: modifiers must combine with pinned/mfs, incompatible
  with all/roots
- go.mod: update boxo to feat/provide-entity-roots-with-dedup
  (VisitedTracker, WalkDAG, WalkEntityRoots, NewConcatProvider,
  NewUniquePinnedProvider, NewPinnedEntityRootsProvider)
pure rename, no behavior change. prepares for ExecuteFastProvideDAG
which will walk the DAG according to Provide.Strategy.
adds ExecuteFastProvideRoot calls to pin add and pin update,
matching the behavior of ipfs add and ipfs dag import. respects
Import.FastProvideRoot and Import.FastProvideWait config options.

previously, pin add/update did not trigger any immediate providing,
leaving pinned content invisible to the DHT until the next reprovide
cycle (up to 22h).
when Provide.Strategy includes +unique, the reprovide cycle uses a
shared BloomTracker across all sub-walks (MFS, recursive pins, direct
pins). duplicate sub-DAG branches across recursive pins are detected
and skipped, reducing traversal from O(pins * total_blocks) to
O(unique_blocks).

- readLastUniqueCount / persistUniqueCount: persist bloom sizing count
  between cycles at /reprovideLastUniqueCount
- uniqueMFSProvider: MFS walker with shared tracker + locality check
- createKeyProvider restructured: +unique bit checked first, non-unique
  strategies fall through to existing switch unchanged
- per-cycle fresh BloomTracker sized from previous cycle's count
- channel wrapper persists count on successful cycle completion
when Provide.Strategy includes +entities (which implies +unique), the
reprovide cycle uses WalkEntityRoots instead of WalkDAG, emitting only
entity roots (files, directories, HAMT shards) and skipping internal
file chunks.

- mfsEntityRootsProvider: MFS walk with entity root detection
- createKeyProvider: select walker based on +entities flag via function
  references (makePinProv / makeMFSProv) to avoid duplicating the
  stream wiring logic
- all combinations: pinned+entities, mfs+entities, pinned+mfs+entities
- config.md: document +unique, +entities modifiers with caveats
  (range request limitation, roots vs entities distinction)
- changelog v0.41: add entries for strategy modifiers, pin add/update
  fast-provide, and hardened strategy parsing
per-block providing during ipfs add is now opt-in via
--fast-provide-dag (or Import.FastProvideDAG config, default: false).

without it, only the root CID is fast-provided after add, and the
reprovide cycle handles the rest. this changes the default for
Provide.Strategy=pinned: previously every block was provided during
write, now only the root is immediate.

use --fast-provide-dag=true to restore the previous behavior.
Provide.Strategy=all is unaffected (blockstore hook provides on Put).
pin add and pin update now accept the same --fast-provide-root and
--fast-provide-wait CLI flags as ipfs add and ipfs dag import,
with the same config fallbacks (Import.FastProvideRoot,
Import.FastProvideWait).

previously these were config-only with no CLI override.
@lidel lidel changed the title fix(config): harden provide strategy parsing feat(provide): +unique and +entities strategy modifiers Mar 24, 2026
--fast-provide-dag now available on ipfs add, ipfs dag import,
ipfs pin add, and ipfs pin update (matching --fast-provide-root).

- ExecuteFastProvideDAG accepts []cid.Cid so multiple roots share
  one bloom tracker (cross-root dedup for dag import and pin add)
- --fast-provide-dag supersedes --fast-provide-root (DAG walk
  includes the root CID as the first emitted via DFS pre-order)
- wait parameter: when true blocks until walk completes, when false
  runs in background goroutine
- Import.FastProvideDAG config option (default: false)
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 05f8870 to 07d7c66 Compare March 24, 2026 03:33
lidel added 4 commits March 25, 2026 23:38
- strategy section: clearer trade-offs, suggested configurations,
  memory comparison with concrete numbers
- Import.FastProvideDAG: new config option documentation
- Import.FastProvideRoot/Wait: updated to mention pin commands
- all three Import.FastProvide* options: consistent "Applies to" lists
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 800a1ef to a858eb1 Compare March 26, 2026 23:31
when TEST_DHT_STUB=1, the CLI test harness creates 20 in-process
libp2p hosts on loopback, each running a DHT server with a shared
in-memory ProviderStore. kubo daemons bootstrap to them over real
TCP, exercising the full DHT code path without public internet.

tests opt in via h.SetStubBootstrap(nodes) after Init().

on the daemon side, WAN DHT filters (AddressFilter, QueryFilter,
RoutingTableFilter, RoutingTablePeerDiversityFilter) are lifted
to accept loopback peers when TEST_DHT_STUB is set.

depends on: github.com/libp2p/go-libp2p-kad-dht#1241
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from a858eb1 to 4a47439 Compare March 27, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improved Reprovider.Strategy for entity DAGs (HAMT/UnixFS dirs, big files)

1 participant