ci: pre-built release binaries for linux, macos and windows#22
Merged
Conversation
Adds a release workflow that builds self-contained parakeet-cli bundles for every v* tag: linux x64 (cpu, vulkan, cuda) and arm64 (cpu), macos arm64 (metal) and x64 (cpu), windows x64 (cpu, vulkan, cuda) plus a separate cudart runtime zip. Assets attach to the GitHub release for the tag, creating a draft release when none exists yet. Fixes #21 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
parakeet-cli exits 2 when invoked bare; under the runner's bash -e -o pipefail that exit code fails the pipeline even though grep matched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Dropping the hand-rolled CMAKE_CUDA_ARCHITECTURES lists lets ggml's curated non-native default apply: PTX for the datacenter generations (75, 80, 90), real code for the common consumer cards (86, 89, 120a), and 121a-real for GB10 on CUDA 13. Smaller binaries, faster builds, and the list stays current with submodule bumps. Temporarily re-adds the branch trigger to validate the CUDA builds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #21
Adds a release workflow that builds self-contained parakeet-cli bundles for every v* tag and attaches them to the GitHub release for that tag (creating a draft release if none exists yet, so creating the release before or after pushing the tag both work). workflow_dispatch builds the same bundles as plain workflow artifacts, useful for testing.
Bundles
Plus a separate cudart-parakeet-bin-win-cuda-x64.zip with the CUDA runtime DLLs (llama.cpp convention), so Windows users who already have the toolkit skip a large download. Each bundle is one static binary (BUILD_SHARED_LIBS=OFF) packaged with the LICENSE and README; tar.gz on Linux/macOS, zip on Windows.
It reuses the docker pipeline conventions: CUDA 13 with the Blackwell archs on Linux, GGML_CUDA_NO_VMM=ON since the build runners have no GPU driver, GGML_NATIVE=OFF for portable binaries, native arm64 runners. The Linux cuda tarball carries cudart/cublas next to the binary with an $ORIGIN rpath. Windows builds use Ninja with the MSVC environment, which also lets nvcc work without the Visual Studio CUDA integration; the ggml patches are applied explicitly under Git Bash first because CMake's find_program(bash) can pick up the WSL stub.
The README gets a short Pre-built binaries section pointing at the releases page.
Validation
The full matrix ran green on this branch via a temporary push trigger (removed in the last commit): run 27364238396, all 9 jobs, including the first Windows builds of this repo. I downloaded the linux-cpu-x64 artifact and ran it locally: exec bit intact, usage banner prints. Each bundle also gets a smoke test in CI.
Notes: the CUDA jobs take 65 to 90 minutes on the free runners, so a release fully populates roughly 1.5 hours after the tag push. The CUDA assets are large (~530 MB Linux, ~550 MB Windows cudart zip), almost all of it cublasLt.
🤖 Generated with Claude Code