Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR aims to speed up the newick string construction by reducing string insert operations and minimizing unnecessary string clones.
- Introduces a new helper function prepare_cache to pre-initialize string caches
- Refactors build_newick to build the newick string by appending precomputed parts
- Updates sample vector generation in profile_main and isolates the to_newick operation in benchmarks
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| phylo2vec/src/tree_vec/ops/newick/mod.rs | Adds prepare_cache and updates build_newick to improve performance and clarity |
| phylo2vec/src/profile_main.rs | Replaces randomized sample_vector with a fixed vector for improved stability |
| phylo2vec/benches/benchmarks/core.rs | Adjusts benchmarking to focus solely on the to_newick operation |
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes Newick string construction by eliminating intermediate string insertions and reducing clones during subtree concatenation, and updates benchmarks and profiling code to reflect the changes.
- Introduce
prepare_cacheto pre-build leaf prefixes and avoid repeated allocations - Refactor
build_newickandbuild_newick_with_blsto build in-place viaString::push - Update test, benchmark, and profiling code to use fixed vectors and isolate
to_newickperformance
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/tree_vec/ops/newick/mod.rs | Added prepare_cache, refactored build_newick* to in-place |
| src/tree_vec/ops/mod.rs | Added a debug println! in a test |
| src/profile_main.rs | Switched profiling to use a fixed vector |
| benches/benchmarks/core.rs | Simplified benchmarks to isolate to_newick |
Comments suppressed due to low confidence (1)
phylo2vec/src/tree_vec/ops/newick/mod.rs:260
- [nitpick] This comment is vague; either expand it with a concrete benchmark result or remove it to avoid confusion.
// Faster than map+collect for some reason
|
|
||
| for (i, (&(c1, c2), &[bl1, bl2])) in pairs.iter().zip(branch_lengths.iter()).enumerate() { | ||
| let s1 = std::mem::take(&mut cache[c1]); | ||
| // let s1 = std::mem::take(&mut cache[c1]); |
There was a problem hiding this comment.
Remove this commented-out code since it is no longer needed and clutters the function.
| // let s1 = std::mem::take(&mut cache[c1]); |
| sub_newick.push(')'); | ||
| sub_newick.push_str(&sp); | ||
| cache[c1] = sub_newick; | ||
| // let capacity = s1.len() + s2.len() + sp.len() + sb1.len() + sb2.len() + 5; |
There was a problem hiding this comment.
There are multiple lines of commented-out legacy capacity calculations and push logic—consider removing them to improve readability.
phylo2vec/src/tree_vec/ops/mod.rs
Outdated
| ], "((0:0.1,2:0.2)5:0.5,(1:1,3:3)4:0.7)6;")] | ||
| fn test_to_newick_from_matrix(#[case] m: Vec<Vec<f32>>, #[case] expected: &str) { | ||
| let newick = to_newick_from_matrix(&m); | ||
| println!("Newick: {}", newick); |
There was a problem hiding this comment.
This debug print inside a test will clutter test output; consider removing it or using dbg! only when actively debugging.
| println!("Newick: {}", newick); | |
| dbg!(&newick); // Use dbg! for debugging purposes |
Supersedes #26
This avoids string inserts of left parens + minimises string clones during sub-newick formation (i.e., string concatenation)