feat(provide): +unique and +entities strategy modifiers#11245
Draft
feat(provide): +unique and +entities strategy modifiers#11245
Conversation
8d8d18c to
420b111
Compare
- config: ParseProvideStrategy returns error, rejects "all" mixed with selective strategies, removes dead strategy==0 check - config: add MustParseProvideStrategy for pre-validated call sites - config: ValidateProvideConfig validates strategy at startup - config: ShouldProvideForStrategy uses bitmask check for ProvideStrategyAll - core/node: downstream callers use MustParseProvideStrategy - core/node: fix Pinning() nil return that caused fx.Provide panic
420b111 to
4468527
Compare
- ProvideStrategyUnique: bloom filter cross-DAG deduplication - ProvideStrategyEntities: entity-aware traversal (implies Unique) - parser: "unique" and "entities" tokens recognized - validation: modifiers must combine with pinned/mfs, incompatible with all/roots - go.mod: update boxo to feat/provide-entity-roots-with-dedup (VisitedTracker, WalkDAG, WalkEntityRoots, NewConcatProvider, NewUniquePinnedProvider, NewPinnedEntityRootsProvider)
pure rename, no behavior change. prepares for ExecuteFastProvideDAG which will walk the DAG according to Provide.Strategy.
adds ExecuteFastProvideRoot calls to pin add and pin update, matching the behavior of ipfs add and ipfs dag import. respects Import.FastProvideRoot and Import.FastProvideWait config options. previously, pin add/update did not trigger any immediate providing, leaving pinned content invisible to the DHT until the next reprovide cycle (up to 22h).
when Provide.Strategy includes +unique, the reprovide cycle uses a shared BloomTracker across all sub-walks (MFS, recursive pins, direct pins). duplicate sub-DAG branches across recursive pins are detected and skipped, reducing traversal from O(pins * total_blocks) to O(unique_blocks). - readLastUniqueCount / persistUniqueCount: persist bloom sizing count between cycles at /reprovideLastUniqueCount - uniqueMFSProvider: MFS walker with shared tracker + locality check - createKeyProvider restructured: +unique bit checked first, non-unique strategies fall through to existing switch unchanged - per-cycle fresh BloomTracker sized from previous cycle's count - channel wrapper persists count on successful cycle completion
when Provide.Strategy includes +entities (which implies +unique), the reprovide cycle uses WalkEntityRoots instead of WalkDAG, emitting only entity roots (files, directories, HAMT shards) and skipping internal file chunks. - mfsEntityRootsProvider: MFS walk with entity root detection - createKeyProvider: select walker based on +entities flag via function references (makePinProv / makeMFSProv) to avoid duplicating the stream wiring logic - all combinations: pinned+entities, mfs+entities, pinned+mfs+entities
- config.md: document +unique, +entities modifiers with caveats (range request limitation, roots vs entities distinction) - changelog v0.41: add entries for strategy modifiers, pin add/update fast-provide, and hardened strategy parsing
per-block providing during ipfs add is now opt-in via --fast-provide-dag (or Import.FastProvideDAG config, default: false). without it, only the root CID is fast-provided after add, and the reprovide cycle handles the rest. this changes the default for Provide.Strategy=pinned: previously every block was provided during write, now only the root is immediate. use --fast-provide-dag=true to restore the previous behavior. Provide.Strategy=all is unaffected (blockstore hook provides on Put).
pin add and pin update now accept the same --fast-provide-root and --fast-provide-wait CLI flags as ipfs add and ipfs dag import, with the same config fallbacks (Import.FastProvideRoot, Import.FastProvideWait). previously these were config-only with no CLI override.
--fast-provide-dag now available on ipfs add, ipfs dag import, ipfs pin add, and ipfs pin update (matching --fast-provide-root). - ExecuteFastProvideDAG accepts []cid.Cid so multiple roots share one bloom tracker (cross-root dedup for dag import and pin add) - --fast-provide-dag supersedes --fast-provide-root (DAG walk includes the root CID as the first emitted via DFS pre-order) - wait parameter: when true blocks until walk completes, when false runs in background goroutine - Import.FastProvideDAG config option (default: false)
05f8870 to
07d7c66
Compare
- strategy section: clearer trade-offs, suggested configurations, memory comparison with concrete numbers - Import.FastProvideDAG: new config option documentation - Import.FastProvideRoot/Wait: updated to mention pin commands - all three Import.FastProvide* options: consistent "Applies to" lists
…-roots-with-dedup
800a1ef to
a858eb1
Compare
when TEST_DHT_STUB=1, the CLI test harness creates 20 in-process libp2p hosts on loopback, each running a DHT server with a shared in-memory ProviderStore. kubo daemons bootstrap to them over real TCP, exercising the full DHT code path without public internet. tests opt in via h.SetStubBootstrap(nodes) after Init(). on the daemon side, WAN DHT filters (AddressFilter, QueryFilter, RoutingTableFilter, RoutingTablePeerDiversityFilter) are lifted to accept loopback peers when TEST_DHT_STUB is set. depends on: github.com/libp2p/go-libp2p-kad-dht#1241
a858eb1 to
4a47439
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Warning
not ready for review, this is a sandbox for running CI
Summary
Adds two experimental
Provide.Strategymodifiers (+uniqueand+entities) for large IPFS nodes that pin many overlapping DAGs, extends fast-provide topin add/pin update, and adds a new--fast-provide-dagflag for full-DAG provide on write.Motivation
Nodes hosting large, versioned datasets (e.g. dist.ipfs.tech) pin thousands of DAGs that share 90%+ of their blocks. Today, every reprovide cycle re-walks every shared subtree once per pin, wasting I/O and time. Meanwhile, announcing every internal file chunk to the DHT produces millions of provider records when only entity-level CIDs (files, directories) matter for discovery.
Separately,
pin addandpin updatenever fast-provided the root CID, leaving a gap where newly pinned content was undiscoverable until the next reprovide cycle (up to 22h).Changes
+uniquestrategy modifierAppending
+uniqueto apinned,mfs, orpinned+mfsstrategy enables bloom-filter deduplication across recursive pins within a single reprovide cycle.BloomTracker(fromboxo/dag/walker) is created per cycle, sized from the previous cycle's persisted count (stored in datastore at/reprovideLastUniqueCount).cid.Set), enabling dedup on repos with tens of millions of CIDs.Example:
Provide.Strategy = "pinned+mfs+unique"+entitiesstrategy modifierAppending
+entitiesannounces only entity roots (file roots, directory roots, HAMT shard nodes), skipping internal file chunks. Implies+unique.walker.WalkEntityRootsfrom boxo, which inspects UnixFS node types and stops descending into file chunk subtrees.Example:
Provide.Strategy = "pinned+mfs+entities"pin add/pin updatefast-provideBoth commands now announce the pinned root CID immediately after pinning (matching
ipfs addandipfs dag importbehavior). New flags:--fast-provide-root(default:Import.FastProvideRoot, true)--fast-provide-dag(default:Import.FastProvideDAG, false)--fast-provide-wait(default:Import.FastProvideWait, false)With
Provide.Strategy=all(default), this is a no-op since the blockstore already provides every block on write.--fast-provide-dagflagNew flag on
ipfs add,ipfs dag import,ipfs pin add, andipfs pin update. When enabled, walks and provides the full DAG immediately after write using the activeProvide.Strategyto determine scope. A single bloom tracker is shared across all roots for dedup. Configurable viaImport.FastProvideDAG(default: false).With
--fast-provide-dag, the DAG walk emits the root first (DFS pre-order), so--fast-provide-rootis redundant and not needed.Gate
providingDagServicebehind--fast-provide-dagThe
providingDagServicewrapper (which provides every block duringipfs addwrites) is now only active when--fast-provide-dag=true. Previously it was always on forpinnedstrategy, causing per-block DHT provides during add regardless of user intent. Now the default path only fast-provides the root CID after add completes, and the reprovide sweep handles the rest.Hardened strategy parsing
ParseProvideStrategynow returns an error instead of silently ignoring unknown tokens. Catches:"uniuqe")"pinned+","+pinned","pinned++mfs")"all+pinned","roots+unique")Validated at startup via
ValidateProvideConfig. Internal callers useMustParseProvideStrategyfor already-validated strings.Rename
ExecuteFastProvidetoExecuteFastProvideRootClarifies intent now that
ExecuteFastProvideDAGexists alongside it.Files changed
config/provide.go--+unique/+entitiesstrategy constants, error-returningParseProvideStrategy,MustParseProvideStrategy, validation inValidateProvideConfigconfig/provide_test.go-- tests for valid strategies, unknown tokens, empty tokens, invalid combos,MustParseProvideStrategy, and config validationconfig/import.go--DefaultFastProvideDAG,FastProvideDAGconfig fieldcore/node/provider.go--+unique/+entitiesreprovide cycle with bloom tracker, MFS entity-root walker, bloom count persistencecore/commands/cmdenv/env.go--ExecuteFastProvideRoot(rename), newExecuteFastProvideDAGcore/commands/add.go-- wire--fast-provide-dagflagcore/commands/dag/dag.go,core/commands/dag/import.go-- wire--fast-provide-dagflagcore/commands/pin/pin.go-- fast-provide flags onpin addandpin updatecore/coreapi/unixfs.go-- gateprovidingDagServicebehindFastProvideDAGsettingcore/coreiface/options/unixfs.go--FastProvideDAGoptioncore/node/core.go,core/node/storage.go-- switch toMustParseProvideStrategydocs/config.md-- document+unique,+entitiesmodifiers and caveatsdocs/changelogs/v0.41.md-- changelog entriesCompatibility
Provide.Strategy=all) is completely unchanged.+uniqueand+entitiesare opt-in modifiers.--fast-provide-dagdefaults to false.Depends on
boxo/dag/walker--BloomTracker,WalkEntityRoots,WalkDAG,LinksFetcherFromBlockstore,NodeFetcherFromBlockstoreboxo/pinning/dspin--NewUniquePinnedProvider,NewPinnedEntityRootsProviderContext