pixi install
pixi run post_installDownload pipeline inputs:
pixi run download_models # download pre-trained audio models
pixi run download_fleurs # download FLEURS-R audio dataset
pixi run download_glottolog # extract lineages from FLEURS-R
pixi run download_reference_trees # extract and process reference trees
pixi run download_geojson # download language polygon data (Glottography)Download external data from Zenodo (see ZENODO.md for full manifest):
# From the repo root:
tar -xzf phylaudio_zenodo.tar.gzThis unpacks BEAST2 posteriors, XLS-R embeddings, and regression outputs into
data/.
You will need to setup a user and project in Weights & Biases. See the Quickstart for more information.
pixi run lid --dataset fleurs-r --model_id NeMo_ambernet --project phylaudiopixi run sentence_distance --dataset fleurs-r --model_id NeMo_ambernet --ebs 1pixi run sentence_discrete --dataset fleurs-r --model_id NeMo_ambernetpixi run sentence_astral pdist
pixi run sentence_summary pdistpixi run beast2 -beagle_SSE -threads 8 -seed 889 data/trees/beast/speech/0.01_brsupport/input.xmlpixi run beast2 -sampleFromPrior -beagle_SSE -threads 8 -seed 889 data/trees/beast/speech/0.01_brsupport/prior.xmlscripts/beast_combine_logs.sh data/trees/beast/speech/0.01_brsupport input_v12pixi run treeannotator -topology CCD0 data/trees/beast/speech/0.01_brsupport/input_combined_resampled.trees input_combined_resampled.ccd0pixi run network_analysis data/trees/beast/speech/0.01_brsupport/input.xmlInstall the regression environment:
pixi install -e regressionBefore running regression or plotting, the following files must be present:
| File | Source |
|---|---|
data/trees/beast/speech/0.01_brsupport/input_combined_resampled.mcc |
Zenodo (speech MCC tree) |
data/trees/beast/speech/0.01_brsupport/input_combined_resampled.log |
Zenodo (speech BEAST log) |
data/trees/beast/speech/0.01_brsupport/input_combined_resampled.trees |
Zenodo (speech posterior trees) |
data/trees/beast/speech/0.01_brsupport/prior_1.log |
Zenodo (speech prior log) |
data/trees/references/raw/iecor.nex |
pixi run download_reference_trees (IECoR MCC tree) |
data/trees/beast/iecor/raw.trees |
pixi run download_reference_trees (IECoR posterior) |
data/trees/beast/iecor/raw.log |
pixi run download_reference_trees (IECoR posterior log) |
data/trees/beast/iecor/prior/raw.log |
pixi run download_reference_trees (IECoR prior log) |
data/trees/beast/iecor/prunedtomodern.trees |
pixi run download_reference_trees (auto-pruned) |
Generates metadata CSVs (with and without phoneme inventory) for both speech and
cognate trees. Reads MCC trees from data/trees/beast/:
pixi run -e regression prepare_regression_dataThis writes 4 files to data/phyloregression/.
pixi run -e regression beast_phylolm -- --model_type linear_geo --tree input_v12_combined_resampled --variant with_inventory
pixi run -e regression beast_phylolm -- --model_type linear_geo --tree heggarty2024_raw --variant with_inventorypixi run -e regression beast_phylolm -- --model_type gp_geo --tree input_v12_combined_resampled --variant with_inventory
pixi run -e regression beast_phylolm -- --model_type gp_geo --tree heggarty2024_raw --variant with_inventoryResults are written to data/phyloregression/<variant>/.
Install visualization dependencies:
pixi install -e viz# Figure 1
pixi run -e viz fig1_acc_vs_brsupport # Panel A: LID accuracy vs. bootstrap support
pixi run -e viz fig1_nmf # Panel B: NMF structure plot
pixi run -e viz fig1_delta # Panel D: per-language delta scores
pixi run -e viz fig1_pca # Extended: PCA of XLS-R embeddings
pixi run -e viz fig1_sqa # Extended: silhouette vs. SI-SDR + correlation
# Figures 2–3
pixi run -e viz fig2_rates # Figure 2 panel B: speech rate over time
pixi run -e viz fig2_rates_cognate # Cognate rate over time
pixi run -e viz fig3_geo # Figure 3: regression panels
# Extended
pixi run -e viz ext_rates_and_maps # rate scatter, GP maps, root age, rate-over-timepixi run python -m src.tasks.phylo.compute_paper_stats