Skip to content

ci: pre-built release binaries for linux, macos and windows#22

Merged
mudler merged 5 commits into
masterfrom
feat/release-binaries
Jun 11, 2026
Merged

ci: pre-built release binaries for linux, macos and windows#22
mudler merged 5 commits into
masterfrom
feat/release-binaries

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Fixes #21

Adds a release workflow that builds self-contained parakeet-cli bundles for every v* tag and attaches them to the GitHub release for that tag (creating a draft release if none exists yet, so creating the release before or after pushing the tag both work). workflow_dispatch builds the same bundles as plain workflow artifacts, useful for testing.

Bundles

Platform Variants
Linux x64 cpu, vulkan, cuda
Linux arm64 cpu
macOS arm64 metal (metallib embedded in the binary)
macOS x64 cpu (cross-compiled; GitHub is retiring the Intel runners)
Windows x64 cpu, vulkan, cuda

Plus a separate cudart-parakeet-bin-win-cuda-x64.zip with the CUDA runtime DLLs (llama.cpp convention), so Windows users who already have the toolkit skip a large download. Each bundle is one static binary (BUILD_SHARED_LIBS=OFF) packaged with the LICENSE and README; tar.gz on Linux/macOS, zip on Windows.

It reuses the docker pipeline conventions: CUDA 13 with the Blackwell archs on Linux, GGML_CUDA_NO_VMM=ON since the build runners have no GPU driver, GGML_NATIVE=OFF for portable binaries, native arm64 runners. The Linux cuda tarball carries cudart/cublas next to the binary with an $ORIGIN rpath. Windows builds use Ninja with the MSVC environment, which also lets nvcc work without the Visual Studio CUDA integration; the ggml patches are applied explicitly under Git Bash first because CMake's find_program(bash) can pick up the WSL stub.

The README gets a short Pre-built binaries section pointing at the releases page.

Validation

The full matrix ran green on this branch via a temporary push trigger (removed in the last commit): run 27364238396, all 9 jobs, including the first Windows builds of this repo. I downloaded the linux-cpu-x64 artifact and ran it locally: exec bit intact, usage banner prints. Each bundle also gets a smoke test in CI.

Notes: the CUDA jobs take 65 to 90 minutes on the free runners, so a release fully populates roughly 1.5 hours after the tag push. The CUDA assets are large (~530 MB Linux, ~550 MB Windows cudart zip), almost all of it cublasLt.

🤖 Generated with Claude Code

mudler and others added 5 commits June 11, 2026 17:01
Adds a release workflow that builds self-contained parakeet-cli bundles
for every v* tag: linux x64 (cpu, vulkan, cuda) and arm64 (cpu), macos
arm64 (metal) and x64 (cpu), windows x64 (cpu, vulkan, cuda) plus a
separate cudart runtime zip. Assets attach to the GitHub release for
the tag, creating a draft release when none exists yet.

Fixes #21

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
parakeet-cli exits 2 when invoked bare; under the runner's bash -e -o
pipefail that exit code fails the pipeline even though grep matched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Dropping the hand-rolled CMAKE_CUDA_ARCHITECTURES lists lets ggml's
curated non-native default apply: PTX for the datacenter generations
(75, 80, 90), real code for the common consumer cards (86, 89, 120a),
and 121a-real for GB10 on CUDA 13. Smaller binaries, faster builds,
and the list stays current with submodule bumps.

Temporarily re-adds the branch trigger to validate the CUDA builds.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@mudler mudler merged commit 9db92be into master Jun 11, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pre-built binaries / release bundles

2 participants