Skip to content

fix: PMAT-517 — publish safety gate prevents broken crates.io releases (19 incidents) #701

@noahgift

Description

@noahgift

Summary

cargo install apr-cli has broken 19 times across the paiml org due to crates published without post-publish verification. The latest incident: entrenar 0.7.12 published with missing contract_pre_with_resident_weights!() macro stub, causing every cargo install apr-cli to fail with a compile error.

This issue documents the systemic fix (PMAT-517) already shipped in apr-cli 0.4.17.

Incident Catalog (19 broken publishes across 5 repos)

aprender (9 incidents)

# Date Version What Broke Fix Commit
1 2026-02-26 0.4.4 Tracked symlinks (target/mnt/nvme-raid0/..., models/mnt/nvme-raid0/...) published to crates.io tarball → "Permission denied" for all external users b2b9638c
2 2026-02-26 0.4.5 with_file_name("tokenizer.json") anti-pattern → misses hash-prefixed companion files → garbage output "< ?< ?" on all safetensors models 17aa7d5d
3 2026-03-14 include_str!("../../../../contracts/") points outside crate directory → YAML files missing in crates.io tarball 075c3781
4 2026-03-04 8+ unpublished APIs (TokenizeBrick, LoraForwardBrick, etc.) called from apr-cli → compile errors on crates.io ed8fc241
5 2026-02-16 Diamond dep: multiple trueno versions → Vector<f32> type mismatches between aprender & renacer 9c0c5e91
6 2026-03-31 Undefined xa_t variable in entrenar debug logging + missing #[cfg(feature = "cuda")] gate 552f8e68
7 2026-04-05 0.4.13 apr --version outputs (unknown) sentinel — build.rs single-source git hash fails on crates.io builds cdb5a1c3
8 2026-04-06 0.4.14 load_model_tensors is pub(crate) in published aprender 0.27.7 but apr-cli expects pub — [patch] masks this locally 52e90c70
9 2026-04-06 0.4.14 [patch.crates-io] in .cargo/config.toml creates diamond deps during cargo publish — every publish requires manual config editing 392bc944

entrenar (10 incidents)

# Date Version What Broke Fix Commit
1 2025-11-29 0.2.2 Missing sub-crate versions — crates.io can't resolve path-only deps cef1cfc1
2 2025-12-02 0.2.2→0.2.3 [patch.crates-io] in Cargo.toml with hardcoded paths published to crates.io 85ce7979
3 2026-02-27 0.7.2 Same — [patch.crates-io] in Cargo.toml leaks to published tarball 022771088
4 2026-03-09 Nightly CI fails — GH runners don't have sibling repos for path deps 0077f5b0
5 2026-03-09 trueno-gpu declared path-only (no version) — can't resolve after path stripping 05c1e750
6 2026-02-28 0.7.2→0.7.3 CudaContext::sm_target() doesn't exist in published trueno-gpu 0.4.18 b7d3e95d
7 2026-03-04 Hard-coded [patch.crates-io] path overrides in .cargo/config.toml prevent clean-room CI 759d7cfe
8 2026-03-29 Stale generated_contracts.rs — placeholder assertions from old codegen e7662bb8
9 2026-03-09 add_tensor_f32_owned() and write_into() don't exist in published aprender 0.27.4 30b47e93
10 2026-04-05 presentar-core path dep points to restructured/deleted repo 12fdf0ae

trueno (1 incident)

  • generated_contracts.rs was gitignored → E0583 (file not found) on crates.io builds (fix: 5690e316)

realizar (1 incident)

  • Same: generated_contracts.rs gitignored + [patch.crates-io] conflict (fix: 23d03694, ed2f18f1)

Five-Whys Root Cause Analysis

Starting from the latest incident (entrenar 0.7.12 → apr-cli broken):

  1. Why did cargo install apr-cli fail? → entrenar 0.7.12 calls contract_pre_with_resident_weights!() in wgpu_block.rs:121 but the macro has no definition in the published crate.

  2. Why is the macro missing? → Fallback stubs existed in local lib.rs (lines 56-59) but the published crate's generated_contracts.rs was regenerated without this macro, and the stubs were committed AFTER the previous publish — the published 0.7.12 tarball has an older lib.rs without the stubs.

  3. Why wasn't the missing macro caught before upload?make publish CRATE=apr-cli in the Makefile uses cargo publish --allow-dirty --no-verify. The --no-verify flag skips the build check entirely. cargo uploads the tarball without compiling it.

  4. Why is there no post-publish verification? → Nobody runs cargo install apr-cli from a clean environment after publishing. Success = "uploaded to registry", not "builds for users". The publish is fire-and-forget.

  5. Why hasn't this been permanently fixed despite 19 incidents? → Each incident was treated as a one-off fix (add a stub, bump a version, strip a patch) instead of implementing a systemic gate that makes broken publishes impossible.

Fix Shipped (PMAT-517) — 3 Deliverables

1. entrenar scripts/check_publish_safety.sh

Pre-publish gate that verifies all 1171 contract macros used in source have definitions — either in generated_contracts.rs or as fallback stubs in lib.rs:

$ cd entrenar && bash scripts/check_publish_safety.sh
Entrenar publish safety gate (PMAT-517)...
  Contract macro completeness... OK (1171 macros, all defined)
  No patches in Cargo.toml... OK
  No version-less path deps... OK

OK: All 3 publish safety checks passed

The script uses comm -23 on sorted lists of used vs defined macros. If ANY macro is used but not defined, the publish fails with the exact macro name and fix instructions.

2. Removed --no-verify from make publish

Before (line 847):

cargo publish -p $$CRATE --allow-dirty --no-verify;

After:

cargo publish -p $$CRATE --allow-dirty;

cargo now builds the crate in a clean-room sandbox before uploading. This catches missing macros, missing deps, and API incompatibilities that [patch.crates-io] masks during development.

3. Post-publish cargo install smoke test

Added to both aprender AND entrenar Makefiles:

# After successful cargo publish:
echo "=== POST-PUBLISH VERIFICATION (PMAT-517) ==="
echo "Waiting for crates.io index to update..."
sleep 15
echo "Verifying: cargo install apr-cli --force ..."
cargo install apr-cli --force 2>&1 | tee /tmp/publish-verify-$CRATE.log
if [ $? -ne 0 ]; then
    echo "FATAL: cargo install apr-cli FAILED after publish!"
    echo "The published crate is BROKEN. You must fix and republish."
    exit 1
fi
echo "POST-PUBLISH VERIFICATION: PASSED"

This is the definitive gate: if the published crate doesn't build from crates.io, make publish fails loudly.

4. Missing contract macro stubs added to entrenar lib.rs

// PMAT-517: Fallback stubs for contract macros not in generated_contracts.rs.
// Run `bash scripts/check_publish_safety.sh` to verify completeness.
macro_rules! contract_pre_data_read { () => {{}}; ($($x:expr),+ $(,)?) => {{ $(let _ = &$x;)+ }}; }
macro_rules! contract_pre_data_mut { () => {{}}; ($($x:expr),+ $(,)?) => {{ $(let _ = &$x;)+ }}; }
macro_rules! contract_pre_transpose_tracked { () => {{}}; ($($x:expr),+ $(,)?) => {{ $(let _ = &$x;)+ }}; }
macro_rules! contract_pre_with_resident_weights { () => {{}}; ($($x:expr),+ $(,)?) => {{ $(let _ = &$x;)+ }}; }
macro_rules! contract_pre_alignment_enforcement { () => {{}}; ($($x:expr),+ $(,)?) => {{ $(let _ = &$x;)+ }}; }
macro_rules! contract_pre_geometric_mean { () => {{}}; ($($x:expr),+ $(,)?) => {{ $(let _ = &$x;)+ }}; }
macro_rules! contract_pre_layer_composition { () => {{}}; ($($x:expr),+ $(,)?) => {{ $(let _ = &$x;)+ }}; }
macro_rules! contract_pre_mqs_pass_rate { () => {{}}; ($($x:expr),+ $(,)?) => {{ $(let _ = &$x;)+ }}; }

Published Versions

Crate Version Status
entrenar 0.7.13 Published, verified
apr-cli 0.4.17 Published, cargo install PASSED

Verification

$ cargo install apr-cli --list | grep apr
apr-cli v0.4.17:
    apr

$ cargo install apr-cli --force
   Compiling entrenar v0.7.13
   Compiling apr-cli v0.4.17
    Finished `release` profile [optimized]
   Replacing /home/noah/.cargo/bin/apr
    Replaced package `apr-cli v0.4.15` with `apr-cli v0.4.17`

Falsification Condition

F-PUBLISH-01: If cargo install apr-cli fails after ANY crate publish (entrenar, aprender, trueno, etc.), the publish pipeline is broken. The make publish post-publish verification gate must catch this before the developer walks away.

Upstream Commits

  • entrenar 6f86e8d0: PMAT-512 loss emission fix
  • entrenar aa016837: version bump 0.7.12
  • entrenar aa43456c: PMAT-517 publish safety gate + stubs
  • entrenar (latest): version bump 0.7.13
  • aprender ae491cbf: PMAT-512 stderr loss emission
  • aprender 16b6338b: version bump 0.4.16 + entrenar dep
  • aprender e4742feb: PMAT-517 Makefile publish verification
  • aprender e396b657: version bump 0.4.17 + entrenar 0.7.13

Action Items for This Repo

  1. Always use make publish CRATE=<name> — never raw cargo publish
  2. Run bash scripts/check_publish_safety.sh before publishing entrenar (called automatically via make publish)
  3. If adding new contract macros in entrenar: add corresponding fallback stubs in lib.rs — the safety script will catch missing ones
  4. If the post-publish verification fails: DO NOT walk away. The published crate is broken. Fix and republish immediately.

Refs: PMAT-517, qwen-train-canary spec v6.29.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions