Skip to content

Migrate Tenstorrent support to upstream luwen crates (0.8.x) and retire the all-smi-luwen-* republished forks #265

Description

@inureyes

Summary

The Tenstorrent NPU reader currently depends on four all-smi-* crates that we republished to crates.io from Tenstorrent's luwen repository:

[target.'cfg(target_os = "linux")'.dependencies]
all-smi-luwen-core = "0.2.0"
all-smi-luwen-if   = "0.7.9"
all-smi-luwen-ref  = "0.7.9"
all-smi-ttkmd-if   = "0.2.2"

These are produced by scripts/vendor_all_luwen.sh, which clones luwen, renames its crates under an all-smi- prefix, strips publish = false, and (manually) publishes them. This issue proposes migrating to the upstream-published luwen crates (0.8.x) and retiring the fork plus the vendoring script.

Why the forks exist (context)

all-smi is published to crates.io, and crates.io forbids git/path dependencies in published crates. The Tenstorrent reader needs APIs that, at the time, lived only on luwen's main branch. The dependencies were originally declared as git = "https://github.com/tenstorrent/luwen.git", branch = "main". The crates.io releases of luwen-if / luwen-ref were (and still are) frozen at 0.4.4 / 0.3.1 from March 2024 and lack those APIs. To keep all-smi publishable with the newer code, we vendored main, renamed it all-smi-luwen-*, and published it under our own ownership.

Why migrate now

Upstream restructured the workspace and now publishes a current crate set to crates.io that tracks main:

Upstream crate Latest on crates.io Role
luwen (umbrella) 0.8.5 (2026-03-30) re-exports api, def, pci, kmd
luwen-api 0.8.5 successor to luwen-if + luwen-ref (detection lives here too)
luwen-def 0.8.5 successor to luwen-core (common definitions, Arch)
luwen-pci 0.8.5 PCIe layer; provides the low-level detect_chips_silent
luwen-kmd 0.8.5 successor to ttkmd-if (pulled transitively)

The old crate directories (luwen-core, luwen-if, luwen-ref, ttkmd-if) no longer exist on luwen main, so scripts/vendor_all_luwen.sh can no longer re-vendor against current upstream regardless. The original reason for the fork is gone: we can depend on upstream's published crates directly.

Target dependencies

Replace the four fork crates inside the existing Linux-only target block.

Recommended (granular, mirrors the old split):

[target.'cfg(target_os = "linux")'.dependencies]
luwen-api = "0.8.5"   # Chip, ChipImpl, Telemetry, ChipDetectOptions, DeviceInfo, UninitChip
luwen-pci = "0.8.5"   # detect_chips_silent(options) -> Vec<UninitChip>
luwen-def = "0.8.5"   # Arch
# luwen-kmd is pulled in transitively; no direct dependency needed.

Alternative (single umbrella dep): luwen = "0.8.5", then reference luwen::api::*, luwen::def::Arch, luwen::pci::detect_chips_silent. Either is acceptable; the granular form keeps the dependency set explicit and minimal.

API mapping (verified)

All call sites live in src/device/readers/tenstorrent.rs.

Current (fork) New (upstream)
all_smi_luwen_core::Arch luwen_def::Arch
all_smi_luwen_if::ChipDetectOptions luwen_api::ChipDetectOptions
all_smi_luwen_if::chip::{Chip, ChipImpl, Telemetry} luwen_api::chip::{Chip, ChipImpl, Telemetry}
all_smi_luwen_ref::detect_chips_silent(opts) luwen_pci::detect_chips_silent(opts)

The entire Telemetry surface the reader uses is unchanged in name and signature: arch, try_board_type(), board_serial_number_hex(), arc_fw_version(), eth_fw_version(), firmware_date(), ddr_fw_version, spibootrom_fw_version, voltage(), current(), power(), asic_temperature(), vreg_temperature(), inlet_temperature(), board_temperature, ai_clk(), arc_clk(), axi_clk(), telemetry_heartbeat(). DeviceInfo (domain / bus / slot / function / vendor / device_id + pcie_current_link_width() / pcie_current_link_gen()) is likewise unchanged. ChipDetectOptions still has a local_only field and a Default impl, so ChipDetectOptions { local_only: true, ..Default::default() } continues to work as-is.

Breaking changes / required code changes

  1. detect_chips_silent moved and split. The old luwen-ref::detect_chips_silent(options) -> Vec<UninitChip> is now luwen_pci::detect_chips_silent(options) -> Result<Vec<UninitChip>, LuwenError>: same shape, drop-in. Note that luwen_api::detect_chips_silent is a different function with signature (root_chips: Vec<Chip>, options) -> Result<Vec<Chip>, _>; do not use that one. Use the luwen-pci entry so the existing UninitChip::init() loop is preserved.

  2. Arch::Grayskull is now #[deprecated] ("legacy architecture that is no longer supported"). The exhaustive match in extract_static_info emits warning: use of deprecated unit variant ... Arch::Grayskull, and CI (cargo clippy -- -D warnings) turns that into a hard error. Fix by adding #[allow(deprecated)] on the match, or by replacing the Grayskull arm with a catch-all (keep Wormhole / Blackhole named, map the rest to "Unknown"/"Grayskull") so the deprecated variant is never named. Arch is a plain (non-#[non_exhaustive]) enum with exactly Grayskull, Wormhole, Blackhole, so the match must remain exhaustive.

  3. UninitChip::init error semantics. It returns Result<Chip, InitError<E>>; even with an Infallible callback, InitError::PlatformError can occur on a real init failure. The current Err(_) => None arm with the comment "This should never happen with Infallible" is now misleading. Keep dropping the chip on error, but update the comment and consider logging the platform error.

  4. Drop the now-unnecessary bare use all_smi_luwen_core; / use all_smi_luwen_ref; lines; reference the crates through their items or paths directly.

Build / CI considerations

  • luwen-api has a build script that compiles three protobuf files via prost-build 0.13, so it needs protoc at build time. all-smi's own build.rs already requires protoc (via tonic-prost-build 0.14 for the TPU gRPC protos on Linux), so this adds no new system requirement on the Linux build paths. Confirm the musl-static, aarch64-gnu cross, Debian, and PPA build paths still succeed.
  • MSRV: luwen targets Rust 1.75 (edition 2021); all-smi is 1.95 / edition 2024 (no conflict).
  • luwen-kmd depends on nix + memmap2 (pure Rust, no C deps), so musl static linking remains viable.

Implementation steps

  1. Cargo.toml: replace the four all-smi-* deps with luwen-api / luwen-pci / luwen-def = "0.8.5" (or the umbrella luwen). Update the stale # Tenstorrent dependencies from GitHub comment (they are no longer git deps).
  2. src/device/readers/tenstorrent.rs: update imports per the mapping table; switch detection to luwen_pci::detect_chips_silent; handle the deprecated Arch::Grayskull (allow or catch-all); fix the init error-handling comment.
  3. Remove scripts/vendor_all_luwen.sh. The references/luwen and /vendor/ entries in .gitignore can stay or be cleaned up.
  4. (Owner action, separate from this PR) Optionally deprecate/yank the all-smi-luwen-* crates on crates.io once no released all-smi version depends on them. Do not delete older published all-smi releases: they must remain buildable.

Acceptance criteria

  • all-smi builds on Linux (glibc) against upstream luwen 0.8.x, with no remaining all-smi-* fork references anywhere in the tree.
  • cargo clippy --all-targets -- -D warnings is clean (in particular the Arch::Grayskull deprecation is handled).
  • musl static build and aarch64-gnu cross build succeed.
  • Tenstorrent reader output is unchanged on real hardware: device name, board type, telemetry detail fields, PCIe details, and the power / temperature / clock / utilization metrics. If hardware is unavailable, at minimum confirm the detection path compiles and the mock server path is unaffected.
  • scripts/vendor_all_luwen.sh is removed.

Verification evidence

A throwaway probe crate depending on luwen-api / luwen-pci / luwen-def = "0.8.5" and replicating the reader's exact call sites compiles cleanly (cargo build finished; protoc present in the build environment). Removing #[allow(deprecated)] reproduces error: use of deprecated unit variant luwen_def::Arch::Grayskull under -D warnings, confirming change #2.

Out of scope (possible follow-ups)

  • Surfacing the richer 0.8.x Telemetry fields (per-GDDR temperatures, fan RPM, throttler state, board power limit, etc.) as new metrics. This issue is a like-for-like migration.
  • Memory utilization (used_memory) remains a TODO in the reader and is not addressed here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions