This alpha release requires macOS 15. We plan to update support for older OS versions in the next update.
If you're running on a high-performance Apple chip (Ultra or Pro variants), we strongly recommend running the dual model benchmark to properly evaluate the enhanced ANE capabilities:
# Update meta.yalm file and download new models
python examples/sync_models.py --update
# For Ultra models (M1/M2/M3 Ultra) and M4 Pro/Max models
python examples/benchmark_dual_models.py --runs 300The tool will automatically detect your CPU model and provide testing recommendations:
- If running on M1/M2/M3 Ultra: Dual model testing is essential to evaluate the dual ANE clusters
- If running on M4 Pro/Max: Dual model testing is recommended to evaluate enhanced ANE performance
- For other models: Standard benchmarking is sufficient, but dual testing provides additional insights
When you run benchmark_all_models.py, you'll see a recommendation to run the dual test if your system would benefit from it:
ANEMLL-Bench (pronounced like "animal-bench") is a benchmarking tool specifically designed to measure and evaluate the performance of machine learning models on Apple's Neural Engine (ANE). It provides comprehensive metrics including inference time and memory bandwidth utilization (GB/s) to help researchers and developers optimize their models for Apple Silicon.
This alpha release requires macOS 15. We plan to update support for older OS versions in the next update. Currently, only Memory bandwidth (GB/s) is benchmarked in this release.
ANEMLL-Bench is part on ANEMLL Open Source Project anemll.com
π View Benchmark Results π
Check out our latest benchmark results comparing performance across different Apple Silicon chips (M1, M2, M3, M4, M5 series).
| Chip | ANE BW (GB/s) | System Mem BW (GB/s) | ANE Utilization |
|---|---|---|---|
| M5 Max | 148 | 614 | 24% |
| M4 Pro | 126 | 273 | 46% |
| M3 Max | 120 | 400 | 30% |
| M4 Max | 119 | 546 | 22% |
| M5 | 70 | 154 | 46% |
| M4 | 64 | 120 | 53% |
| M3 | 63 | 100 | 63% |
| M2 Max | 62 | 400 | 16% |
| M2 Ultra | 62 | 800 | 8% |
| M1 | 61 | 68 | 89% |
| M2 | 60 | 100 | 60% |
| M1 Pro | 55 | 200 | 27% |
| M1 Ultra | 55 | 800 | 7% |
| M1 Max | 55 | 400 | 14% |
| M1 Series | M2 Series | M3 Series | M4 Series |
|
β M1 β
β M1 PRO β β M1 MAX β β M1 ULTRA β |
β M2 β
β M2 PRO β M2 MAX β β M2 ULTRA β |
β M3 β M3 PRO β M3 MAX β β M3 ULTRA |
β M4 β
β M4 PRO β β M4 MAX β |
π§ Submit results to: realanemll@gmail.com or open an issue
- Recommended: Python 3.9.x
- Compatible: Python 3.10-3.12 (may have minor issues)
- Not Compatible: Python 3.13+ (has significant compatibility issues with PyTorch 2.5.0)
- Required for ANEMLL: PyTorch 2.5.0
- Issue with Python 3.13+: PyTorch 2.5.0 is not available for Python 3.13+
- Workaround for Python 3.13+: Use PyTorch 2.6.0, but expect potential compatibility issues with coremltools
- macOS with Apple Silicon (ARM64) - ANE is not available on Intel Macs
- Xcode Command Line Tools installed
- Native ARM64 Homebrew (for installing Python 3.9)
ANE (Apple Neural Engine) is ONLY available on Apple Silicon (ARM64) Macs. You MUST use native ARM64 Python to access ANE hardware.
# Install native ARM64 Python via Homebrew
/opt/homebrew/bin/brew install python@3.9
# Create virtual environment with native Python
/opt/homebrew/opt/python@3.9/bin/python3.9 -m venv env-anemll-bench
source env-anemll-bench/bin/activate
pip install -r requirements.txt
pip install -e .# Use system Python (already native ARM64)
/usr/bin/python3 -m venv env-anemll-bench
source env-anemll-bench/bin/activate
pip install -r requirements.txt
pip install -e .# Run the ARM64-optimized setup script
./create_python39_env.sh
# Then activate and install dependencies
source env-anemll-bench/bin/activate
cd env-anemll-bench
./install_dependencies.shNote: The setup script now prioritizes native ARM64 Homebrew and will warn you if it detects x86_64 Python on Apple Silicon.
If you see "ANE Hardware Available: False" in diagnostics, you're using x86_64 Python under Rosetta.
Symptoms:
python -c "import platform; print(platform.machine())"returnsx86_64- ANE diagnostic shows "Not running on Apple Silicon (arm64)"
- Models run slowly (CPU-only performance)
- Benchmark times are much slower than expected
Solution: Rebuild your environment using one of the ARM64 options above.
Why this happens: Even on Apple Silicon Macs, if you install Python via the default Homebrew (which may be x86_64), Python runs under Rosetta translation and cannot access ANE hardware.
Important: ANEMLL-Bench only works with platform-specific ANEMLL models, not generic Hugging Face models.
After setting up with native ARM64 Python:
# Download all optimized ANEMLL models for your macOS version
python examples/sync_models.py
# Update meta.yaml file and download any missing/new models
python examples/sync_models.py --update
# Benchmark all available ANEMLL models (default behavior)
python examples/benchmark_all_models.py --use-local --no-sync
# Benchmark a specific ANEMLL model (optional)
python examples/benchmark_all_models.py --model llama_lm_head --use-local --no-syncAvailable ANEMLL Models:
llama_lm_head- Llama language model head (ANE-optimized)llama_lm_head_lut6- Llama model with 6-bit LUT quantization (smaller, faster)DeepHermes_lm_head- DeepHermes language model head (ANE-optimized)
β Don't use: Generic Hugging Face models (like microsoft/DialoGPT-small) - these are not ANEMLL-optimized and won't work properly with ANE.
ANEMLL Models (β Recommended):
- ANEMLL-optimized models hosted on Hugging Face
- Pre-converted CoreML models optimized for Apple Neural Engine
- ML Programs with ANE-specific optimizations
- Fast inference times (7-20ms)
- High throughput (50+ GB/s)
Generic Hugging Face Models (β Not Recommended):
- Raw PyTorch models that need conversion
- Not optimized for ANE hardware
- Slow inference times (100-500ms)
- Low throughput (CPU-only performance)
This will automatically download and prepare all the optimized models for your specific macOS version. The models are stored in ~/.cache/anemll-bench/ and are ready to use immediately.
After running benchmarks, check out the benchmark results to see how your device compares to other Apple Silicon chips.
Problem: Models are running slowly or ANE diagnostic shows "ANE Hardware Available: False"
Root Cause: You're likely using x86_64 Python under Rosetta translation, which cannot access ANE hardware.
Quick Diagnosis: Check your Python architecture:
python -c "import platform; print(f'Architecture: {platform.machine()}')"Expected Result: arm64 (on Apple Silicon Macs)
If you see x86_64 on Apple Silicon:
- β You're using Python under Rosetta translation
- β ANE hardware is not accessible
- β Models will run on CPU only (much slower)
Solution: Rebuild your environment with native ARM64 Python:
# Remove old environment
rm -rf env-anemll-bench
# Create new environment with native ARM64 Python
/usr/bin/python3 -m venv env-anemll-bench
# OR install native Python first
/opt/homebrew/bin/brew install python@3.9
/opt/homebrew/opt/python@3.9/bin/python3.9 -m venv env-anemll-bench
source env-anemll-bench/bin/activate
pip install -r requirements.txt
pip install -e .Performance Difference:
- ARM64 Python + ANE: ~7-20ms inference time
- x86_64 Python (CPU only): ~100-500ms inference time
Run the setup verification script:
python check_setup.pyOr run the detailed ANE diagnostic:
python debug_ane.pyYou should see:
- Architecture:
arm64 - ANE Hardware Available:
True - Overall Status:
ANE should be available
The benchmark tool now automatically validates ARM64 setup and will fail early if you're using x86_64 Python.
When you run any benchmark, it will:
- β Pass: If using ARM64 Python on Apple Silicon
- β Fail: If using x86_64 Python under Rosetta (with clear instructions)
β οΈ Warn: If running on Intel Mac (asks if you want to continue)
Example error message:
β ERROR: ANE ACCESS BLOCKED
You're running Python under Rosetta (x86_64 emulation) on Apple Silicon.
This prevents access to the Apple Neural Engine (ANE) hardware.
SOLUTION: Rebuild your environment with native ARM64 Python:
1. Remove current environment: rm -rf env-anemll-bench
2. Create new environment with native Python: /usr/bin/python3 -m venv env-anemll-bench
3. Activate and reinstall: source env-anemll-bench/bin/activate && pip install -r requirements.txt && pip install -e .
4. Verify ARM64 setup: python check_setup.py
Skip validation (for testing only):
python -m anemll_bench --skip-arm64-check --model your_model- Standard Benchmarking: Measure individual model performance on Apple Neural Engine (ANE)
- Dual Model Benchmarking: Run two models simultaneously to test bandwidth utilization and parallel processing efficiency
- Comprehensive Metrics: Inference time, memory bandwidth utilization (GB/s), and more
- Platform-specific Models: Pre-configured models optimized for different Apple Silicon chips
- Report Generation: Generate detailed HTML reports with comparative visualizations
- Automatically collect system information (Mac model, CPU details, memory)
- Generate comprehensive HTML reports with visualizations
- Upload and share reports via multiple services (GitHub Gist, JSONBin, Pastebin)
- (future) Easy-to-use API for integrating new models
- Automatic downloading of platform-specific optimized models (macOS 15.x+)
- Robust model size detection for accurate throughput calculation
We provide a script to create a Python 3.9 virtual environment:
# Make the script executable
chmod +x create_python39_env.sh
# Run the script
./create_python39_env.sh
# Activate the environment
source env-anemll-bench/bin/activate
# Install dependencies
cd env-anemll-bench
./install_dependencies.sh
cd ..
pip install -e .
#download models
python examples/sync_models.py --update
#run benchmarks
python examples/benchmark_all_models.py
# for Ultra Modles please also run/share Dual-model run to profile 2xANE clusters
python examples/benchmark_dual_models.py
If you want to use your current Python version:
# Make the script executable
chmod +x install_dependencies.sh
# Run the script
./install_dependencies.shNote: This may result in compatibility issues if you're using Python 3.13+. See the Troubleshooting section for common issues and solutions.
ANEMLL-Bench requires Xcode Command Line Tools to be installed on macOS, as they provide essential development libraries and compilers needed for the build process.
To check if Xcode Command Line Tools are installed:
xcode-select -pIf the command returns a path (e.g., /Library/Developer/CommandLineTools or /Applications/Xcode.app/Contents/Developer), then the tools are installed.
If not installed, you can install them by running:
xcode-select --installTo verify your installation, run the system info command:
python -m anemll_bench --system-infoThis should display information about your system, including whether you have Apple Silicon and Neural Engine available.
You can easily benchmark all available platform-specific models with a single command:
# Benchmark all models with default settings (300 iterations)
python examples/benchmark_all_models.py
# Customize the benchmarking process
python examples/benchmark_all_models.py --runs 500 --sequence-length 1 --output my_report.html
# Skip model synchronization and use only local models
python examples/benchmark_all_models.py --no-sync --use-local
# Generate a report without charts
python examples/benchmark_all_models.py --no-chartsThis will automatically:
- Download any missing models (unless
--no-syncand--use-localare used) - Benchmark each available model for your macOS version
- Generate a comprehensive report with comparison metrics
The dual model benchmarking feature allows you to run two models simultaneously to measure potential bandwidth improvements:
# First time setup: ensure you have all required models
python examples/sync_models.py --update
# Run the dual model benchmark with default settings
python examples/benchmark_dual_models.py
# Customize the benchmark run
python examples/benchmark_dual_models.py --runs 500 --backend ANEThis benchmark will:
- Run each model individually as a baseline
- Run both models simultaneously in separate threads
- Calculate the bandwidth utilization factor to determine if parallel execution is efficient
- Show detailed performance analysis for individual and combined runs
For detailed documentation and troubleshooting tips, see Dual Model Benchmarking Guide.
The bandwidth utilization factor indicates how efficiently the system can run multiple models in parallel compared to running them individually.
The package can check Hugging Face for updated model definitions:
from anemll_bench.models import check_and_update_platform_models
# Check for updated model definitions online
check_and_update_platform_models()You can also use the example scripts provided:
# Standard benchmarking (uses local models)
python examples/load_platform_models.py
# Check for updates online, then benchmark
python examples/load_platform_models.py --check-online
# Benchmark a specific model with online check
python examples/load_platform_models.py --model llama_lm_head --check-online --num-runs 50
# Check and update model definitions from Hugging Face
python examples/check_online_models.pyThe easiest way to get all required models is to run the sync script:
# Sync all platform models for your macOS version
python examples/sync_models.pyThis single command will:
- Download the latest model definitions from Hugging Face
- Identify which models are available for your macOS version
- Download and unzip any missing models
- Skip models that are already in your cache
After running this command, all optimized models will be ready to use without additional setup.
Additional sync options:
# Update meta.yalm file and download any missing/new models (recommended)
python examples/sync_models.py --update
# Download models in parallel for faster synchronization
python examples/sync_models.py --parallel
# Customize parallel download workers (default: 4)
python examples/sync_models.py --parallel --workers 8
# Force update of meta.yalm before syncing
python examples/sync_models.py --force
# Quiet mode (less output)
python examples/sync_models.py -qYou can also synchronize models programmatically:
from anemll_bench.models import sync_platform_models
# Sync all platform models (download what's missing)
results = sync_platform_models()
# Force update of meta.yalm before syncing
results = sync_platform_models(force_update=True)
print(f"Downloaded {results['models_downloaded']} models")For advanced users, the cache management tool provides additional options:
# Sync all platform models
python examples/manage_cache.py sync
# Force meta.yalm update before syncing
python examples/manage_cache.py sync --force
# Output results in JSON format
python examples/manage_cache.py sync --jsonAll downloaded models and metadata are stored in ~/.cache/anemll-bench/. The cache can be managed using the provided utility:
# Display cache information
python examples/manage_cache.py info
# Display cache information in JSON format
python examples/manage_cache.py info --json
# Clear all models from the cache
python examples/manage_cache.py clear
# Clear a specific model from the cache
python examples/manage_cache.py clear --model llama_lm_head
# Clear the entire cache including metadata
python examples/manage_cache.py clear --all
# Update model definitions from Hugging Face
python examples/manage_cache.py updateYou can also manage the cache programmatically:
from anemll_bench.models import get_cache_info, clear_cache, CACHE_DIR
# Get information about the cache
cache_info = get_cache_info()
print(f"Cache directory: {CACHE_DIR}")
print(f"Total cache size: {cache_info['total_size_mb']:.2f} MB")
# Clear specific models
clear_cache(model_name="llama_lm_head")
# Clear the entire cache
clear_cache(include_meta=True)ANEMLL-Bench provides several key performance metrics to help you evaluate your models:
The time it takes to perform a single forward pass of the model, measured in milliseconds (ms). This is calculated by averaging the time across multiple iterations (default: 300) to get a stable measurement.
This metric measures how efficiently your model uses the available memory bandwidth. It is calculated by:
Throughput (GB/s) = Model Size (GB) / Inference Time (seconds)
The throughput calculation uses the actual model weights size to provide a more accurate representation of memory bandwidth utilization, especially on the Apple Neural Engine (ANE).
The TFLOPS metric (Tera Floating Point Operations per Second) is temporarily disabled in reports as we work on implementing more accurate calculation methods for various model architectures. Future versions will re-enable this metric with improved precision.
ANEMLL-Bench automatically detects model size by examining the weight files in both .mlmodelc and .mlpackage formats. This size is used when calculating memory bandwidth utilization.
If you encounter errors like:
ERROR: Could not find a version that satisfies the requirement torch==2.5.0
ERROR: No matching distribution found for torch==2.5.0
This is likely due to using Python 3.13+, which doesn't have PyTorch 2.5.0 available. Solutions:
- Use Python 3.9 as recommended
- Accept the installation of PyTorch 2.6.0 instead, but be aware of potential compatibility issues with coremltools
If you encounter errors related to missing modules like:
Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
This could be due to:
- Incompatibility between PyTorch 2.6.0 and coremltools
- Incorrect installation order (PyTorch should be installed before coremltools)
Try reinstalling with Python 3.9 using the provided script.
If you encounter issues with browser functionality when generating reports:
-
Multiple Browser Windows: If the same report opens in multiple browser windows, this could be due to both Safari and system 'open' commands being used simultaneously. Recent versions of the tool have fixed this issue to ensure only one browser window is opened.
-
Browser Not Opening: If reports are generated successfully but don't open in the browser, check:
- The file exists in the expected location (typically
~/.cache/anemll-bench/reports/) - Your default browser settings
- File permissions for the generated HTML report
- The file exists in the expected location (typically
You can manually open generated reports using:
open $(find ~/.cache/anemll-bench/reports -name "*.html" | sort -r | head -1)For more detailed documentation, please refer to the docs directory.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- This project is part of the ANEMLL (Artificial Neural Engine Machine Learning Library) initiative
- Special thanks to Apple for developing the CoreML toolchain
- π Website: anemll.com
- π€ Models: huggingface.co/anemll
- π± X: @anemll
- π» GitHub: github.com/anemll
For any questions or support, reach out to us at realanemll@gmail.com

