Skip to content
This repository was archived by the owner on May 14, 2026. It is now read-only.
This repository was archived by the owner on May 14, 2026. It is now read-only.

Refactor and document the dependency injection pattern for tests #339

@KSXGitHub

Description

@KSXGitHub

Background

PR #332 introduced a dependency injection pattern in pacquet-modules-yaml so that filesystem-touching code paths can be exercised by tests without setting up real fixtures on disk. The pattern was synthesized in PR #332 comment 4345054524. A follow-up correction appeared in comment 4345689798. PR #333 extended the pattern to cover the bin-linking IO surface and prompted the Fs/RealFs to Api/RealApi rename described in comment 4345803572.

Two problems remain. The first is that the pattern lives only in those PR comments. New pull requests keep introducing coverage holes around side-effecting code such as filesystem access, environment variables, network calls, and time. Pull requests generated by AI agents are particularly prone to this. Without a canonical reference in the repository, each new pull request rediscovers the pattern from scratch and often deviates in subtle ways. Common drifts include lumped super-traits, multiple generic parameters per function, &self capability methods, and mismatched provider names.

The second is that the names chosen in PRs #332 and #333 are not the cleanest expressions of the pattern. The Naming considerations section below catalogues the known nits. The pattern itself is sound, but the identifiers carrying it could be clearer.

Goal

This issue is both a refactor and a documentation task. The refactor brings the existing dependency injection implementations in crates/modules-yaml and crates/cmd-shim into a single coherent shape, including the chosen identifiers. The documentation captures that shape in CODE_STYLE_GUIDE.md so future pull requests follow it. The refactor and the documentation ship together so that the document and the code agree from the day the document lands.

Extending the same pattern to the environment-variable lookups in crates/npmrc is tracked separately in #343 and depends on this issue.

The pattern

The principles below are normative. The identifiers in the example code (Api, RealApi, FsReadToString, and so on) are the names that shipped in PRs #332 and #333. The implementer of this issue chooses the final identifiers in the Naming considerations step below, then applies the choice across the example, the documentation, and the existing crates.

Principles

  1. One trait per capability. Each side-effecting operation gets its own trait. Functions bind only what they consume, and test fakes implement only what gets called. Avoid lumped traits such as FsApi { read, mkdir, write, ... } that force unreachable!() to fill the unused methods.

  2. One generic parameter satisfying multiple bounds. When a function needs several capabilities, compose them as <Api: FsCreateDirAll + FsWrite>. Resist the multi-parameter form such as <Disk: DiskApi, Fs: FsApi>. The generic is a capability provider rather than a capability domain, so it should never be named after one domain.

  3. Capability methods take no &self. Use statics for stateful fakes. Capability methods are associated functions, never &self methods. Production implementations are unit structs such as struct RealApi;. Fakes are unit structs declared inside the test function, alongside any per-test scenario data held in statics such as static CALLS: Mutex<Vec<String>> = Mutex::new(Vec::new()) or static SEQUENCE_INDEX: AtomicUsize = AtomicUsize::new(0).

    Declare each interior-mutable static inside the body of the #[test] function that owns it. Never at module scope, and never inside a helper that several tests call. The test runner invokes each #[test] function exactly once per process, so a static declared inside a single test body is only ever touched from one test thread, which keeps the fake stateless from the trait's perspective and free of data races. An interior-mutable static shared across tests is no longer per-test: multiple test threads can hit it in parallel, which brings back the same data race that pushed env::set_var into unsafe and motivated EnvGuard in the first place.

    Immutable statics are exempt from this restriction. Items such as static FIXTURE_BYTES: &[u8] = b"..."; declared at module scope and shared across tests are fine, because reads do not race.

  4. Associated types instead of &self when a capability operates over a data type. If a capability would otherwise force callers to pass an instance just to extract data, lift the data type to an associated type. The capability stays a static-method namespace, and the associated type lets one provider describe how to operate over a chosen data shape.

  5. Capability traits live on the implementor, not the data. The capability provider RealApi implements the trait FsReadToString. The value type Path does not. This keeps the capability implementation swappable without changing call sites.

  6. Provider names are domain-neutral, and trait names are domain-scoped. The generic and its production implementation are Api and RealApi regardless of how many domains they cover. The traits keep a domain prefix such as Fs* or GetDisk* so a reader can see which domain a bound belongs to without chasing definitions.

  7. Production callers turbofish the real implementation explicitly. For example, read_modules_manifest::<RealApi>(dir). Defaults on free-function type parameters are unstable on stable Rust, so an explicit turbofish is the price of zero-cost dependency injection.

  8. Capabilities are primitives, not algorithms. A capability trait names a leaf-level effect, typically a single std function or a single syscall. Its method signature and its name promise only what the underlying primitive promises, never more. When a feature needs a stronger guarantee, such as atomic file replacement, a transactional state machine, or a retried network request, implement it as a free function or a higher-level type that consumes the primitive capabilities. Do not promote the algorithm into a capability trait of its own. Naming a trait FsWriteAtomic and backing it with std::fs::write is the canonical failure mode. The trait name lies to every caller, the lie spreads through the public surface of the crate, and downstream code starts relying on a guarantee the runtime never offered. The right shape is FsWrite matching std::fs::write, plus a free function such as fn atomic_write<Api: FsWrite>(...) once atomicity is actually needed. Reducing an algorithm to a capability trait conflates what the runtime can do with what the program builds on top, and erases the seam dependency injection is meant to expose.

Filesystem capabilities (shipped in PR #332)

pub trait FsReadToString {
    fn read_to_string(path: &Path) -> io::Result<String>;
}

pub trait FsCreateDirAll {
    fn create_dir_all(path: &Path) -> io::Result<()>;
}

pub trait FsWrite {
    fn write(path: &Path, contents: &[u8]) -> io::Result<()>;
}

// One provider for the whole codebase. Today it carries only the three
// filesystem capabilities. Future PRs add disk, network, time, and other
// capability implementations to the same struct so callers never juggle
// multiple providers.
pub struct RealApi;
impl FsReadToString for RealApi { /* delegates to std::fs::read_to_string */ }
impl FsCreateDirAll for RealApi { /* delegates to std::fs::create_dir_all */ }
impl FsWrite        for RealApi { /* delegates to std::fs::write           */ }

Public APIs bind minimal capabilities:

pub fn read_modules_manifest<Api: FsReadToString>(...) -> Result<...>
pub fn write_modules_manifest<Api: FsCreateDirAll + FsWrite>(...) -> Result<...>

Test fakes describe the behaviour they fake rather than their provider role:

struct BadYamlFs;
impl FsReadToString for BadYamlFs {
    fn read_to_string(_: &Path) -> io::Result<String> {
        Ok("{ this is not valid yaml or json".to_string())
    }
}
let err = read_modules_manifest::<BadYamlFs>(modules_dir).expect_err("expected");
assert!(matches!(err, ReadModulesManifestError::ParseYaml { .. }));

The fake declares only FsReadToString because the bound on read_modules_manifest is Api: FsReadToString. No FsWrite or FsCreateDirAll implementations are needed, and no unreachable!() guards are required.

Combining capabilities of different domains (hypothetical)

The single-Api-and-single-RealApi shape only proves itself once capabilities from more than one domain hang off the same provider. The example below extends the shipped filesystem capabilities with two further families that pacquet does not currently need but might plausibly grow into. Disk inspection sits in GetDisk* traits with an associated Disk type, which puts principle 4 to work. Network access sits in HttpGet and ResolveHost. Every new trait implements on the same RealApi that already carries FsReadToString, FsCreateDirAll, and FsWrite, so production code never imports a second provider type regardless of how many domains it touches.

pub trait GetDiskKind {
    type Disk;
    fn get_disk_kind(disk: &Self::Disk) -> DiskKind;
}

pub trait GetDiskName {
    type Disk;
    fn get_disk_name(disk: &Self::Disk) -> &OsStr;
}

pub trait GetMountPoint {
    type Disk;
    fn get_mount_point(disk: &Self::Disk) -> &Path;
}

pub trait HttpGet {
    fn http_get(url: &str) -> http::Result<Vec<u8>>;
}

pub trait ResolveHost {
    fn resolve_host(name: &str) -> io::Result<Vec<IpAddr>>;
}

// The same `RealApi` from the filesystem section gains every new impl. One
// provider type carries every capability the workspace needs.
impl GetDiskKind   for RealApi { type Disk = sysinfo::Disk; fn get_disk_kind(d: &Self::Disk)   -> DiskKind { d.kind() } }
impl GetDiskName   for RealApi { type Disk = sysinfo::Disk; fn get_disk_name(d: &Self::Disk)   -> &OsStr   { d.name() } }
impl GetMountPoint for RealApi { type Disk = sysinfo::Disk; fn get_mount_point(d: &Self::Disk) -> &Path    { d.mount_point() } }
impl HttpGet       for RealApi { fn http_get(url: &str) -> http::Result<Vec<u8>> { /* delegates to the chosen HTTP client */ } }
impl ResolveHost   for RealApi { fn resolve_host(name: &str) -> io::Result<Vec<IpAddr>> { /* delegates to std::net or hickory-dns */ } }

A function that crosses domains binds every capability it needs under one generic parameter. When several traits share an associated type, the equality bound <Disk = <Api as GetDiskKind>::Disk> locks them to the same data type so the function operates on a single slice without re-introducing a lumped super-trait:

fn warm_cache_for_local_disks<Api>(
    disks: &[<Api as GetDiskKind>::Disk],
    seed_url: &str,
) -> Result<(), CacheError>
where
    Api: GetDiskKind
       + GetDiskName<Disk = <Api as GetDiskKind>::Disk>
       + GetMountPoint<Disk = <Api as GetDiskKind>::Disk>
       + HttpGet
       + FsCreateDirAll
       + FsWrite,
{ /* ... */ }

Tests provide their own fake Api that implements every capability the function needs, sets type Disk = FakeDisk;, and defines FakeDisk inside the test function. There is no &self, no per-test instance plumbing, and no second generic parameter for any of the three domains. The same shape works regardless of how many domains the function reaches into.

When dependency injection is not needed

Some tests should not use dependency injection because a real fixture is simpler. The two main categories are tests where the real filesystem can produce the input cheaply through fixture files in tests/fixtures/, and tests that exercise pure logic such as is_string_present or derive_hoisted_dependencies.

Dependency injection is for error paths that real filesystem operations cannot reach, and for behaviour whose triggering conditions are awkward to set up on disk.

Naming considerations

The names that shipped in PRs #332 and #333 are not sacred. The implementer of this issue has freedom to keep them, swap them for any of the alternatives below, or invent new names that fit better. The principles are what matter; the identifiers are negotiable. Whatever set is chosen will be applied uniformly across the example, the new style guide section, and the existing crates as part of the refactor.

The known nits are:

  1. Api is overloaded. In Rust idiom an API is the public interface of a crate, which is a different concept from a capability provider. Alternatives that read more naturally in this role include Env, Sys, Effects, and Cap. Env matches how cargo and several Rust dependency injection experiments name capability containers, and it parses as "the environment the function runs against." Note that picking Env for the provider conflicts with using Env* as a domain prefix for the new environment-variable trait, so the two decisions need to be made together.

  2. RealApi carries a Platonic-prefix anti-pattern. When the production type needs a Real, Default, or Standard prefix, it implies that the unqualified name belongs to something else. The convention runs the other way: production is unqualified, and fakes carry the qualifier. With the rename above, the production type could simply be Host or Pacquet, while fakes stay BadYamlFs, FakeReader, and so on.

  3. Api collides with itself in the bad example. Principle 1's bad example writes FsApi { read, mkdir, write, ... } to illustrate a lumped trait, then Api reappears as the provider type a few lines later. Renaming the bad example to FsCapabilities or just Fs removes the collision and leaves Api, or whatever replaces it, free to mean only one thing.

  4. FsCreateDirAll jams three words into one PascalCase token. It mirrors std::fs::create_dir_all exactly, which is defensible. The alternative is to scope the domain by module rather than by name prefix, giving fs::CreateDirAll, fs::ReadToString, and fs::Write. Principle 6 then becomes "providers are domain-neutral, and trait modules are domain-scoped," which aligns with how the standard library organizes the same names.

  5. BadYamlFs violates principle 6. The fake is a provider, so by the rule it should carry a domain-neutral name. The Fs suffix labels the domain. BadYamlReader or MalformedYamlSource describes the behaviour the fake stubs without the domain marker, and matches the principle the document itself states.

Tasks

Pick the names. Decide on a coherent set of identifiers for the capability provider type, the generic parameter, the existing filesystem trait family, and the test fakes. The Naming considerations section above lists the known nits and several alternatives. Record the decision so the rest of the work has a single reference point. The follow-up in #343 will reuse the same identifiers when adding the environment-variable capability.

Refactor crates/modules-yaml to use the chosen names. Audit the implementation against the eight principles and fix any drift such as lumped traits, extra generic parameters, &self capability methods, domain-named providers, or capability traits that promise more than the underlying primitive delivers. Update tests, callers, and any other site that referred to the old identifiers.

Refactor crates/cmd-shim the same way. Apply the chosen names, audit against the eight principles, and update callers and tests.

Add a ### Dependency Injection for Tests section to CODE_STYLE_GUIDE.md, placed near ### Unit test file layout. The section contains the eight principles, the filesystem worked example, and the guidance on when dependency injection is not needed. Anchor the worked example to the post-refactor code in crates/modules-yaml so the document and the code agree.

Update AGENTS.md with a pointer to the new section in CODE_STYLE_GUIDE.md.

Once #332, #333, and #337 have all landed and the rename above has been applied across every affected crate, evaluate whether the three RealApi declarations in crates/modules-yaml, crates/cmd-shim, and crates/npmrc can be consolidated into a single shared crate that exports the provider type and all capability traits. The duplication is acceptable today only because each PR ships a self-contained slice of the pattern. The long-term shape implied by principle 6 and by the comment "one provider for the whole codebase" in the worked example is a single provider type for the whole workspace, not one per crate. If the three structures are similar enough after the rename, move RealApi and every capability trait it implements into a new shared crate such as crates/api, and update crates/modules-yaml, crates/cmd-shim, and crates/npmrc to depend on it instead of defining their own. Mind the orphan rule: a capability trait whose only production impl is on RealApi belongs in the same crate as RealApi. If the structures diverge enough that consolidation would require contortions, leave them separate and record the reason in the commit or PR description so the next reader does not retread the same path.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationrustPull requests that update Rust code

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions