Feature/oswpq reader by singjc · Pull Request #11 · singjc/arycal

singjc · 2025-11-16T19:40:00Z

This pull request adds support for the PyProphet split parquet format (oswpqd) to ARYCAL, enabling fully parallelized reading and alignment workflows. The changes introduce a unified feature accessor abstraction for both OSW (SQLite) and OSWPQ (parquet) formats, update documentation and configuration templates, and simplify the input configuration structure. These improvements make the tool more flexible, easier to use, and better documented for multiple data formats.

Major feature additions:

Added support for PyProphet split parquet format (oswpqd) in both the CLI and configuration, including a parallelized reader and unified feature accessor (FeatureAccessor) for both OSW and OSWPQ formats. [1] [2]
Updated input validation and configuration to support both sqMass and parquet for XIC files, and both osw and oswpqd for feature files. [1] [2]

Documentation and usability improvements:

Expanded the README.md with clear documentation for multiple input formats, quick start instructions, configuration field explanations, and example configurations for both OSW and OSWPQ workflows. [1] [2]

Codebase simplification:

Removed unused GUI-related fields from the input configuration structure to streamline CLI usage.
Refactored the codebase to use the new FeatureAccessor abstraction throughout the alignment and results writing logic, ensuring compatibility with both OSW and OSWPQ feature types. [1] [2] [3] [4] [5] [6]

These changes collectively improve ARYCAL's flexibility, performance, and ease of use for users working with different chromatogram alignment workflows.

Summary by CodeRabbit

New Features
- Support for PyProphet split-parquet (OSWPQ), sqMass and Parquet XIC inputs.
- Automatic config template generation and Quick Start flow when no parameters provided.
- New top-level config options: threads, log_level, compute_scores, scores_output_file, retain_alignment_path.
Documentation
- Expanded README with Quick Start, template usage, parallelization notes, and detailed Configuration Fields.
Improvements
- Stronger config validation and file-path inference; clearer config field naming (decoy → include_decoys) and updated alignment defaults.

…essors - Updated Input validation to accept 'parquet' as a valid xic type alongside 'sqMass'. - Enhanced FeaturesFileType enum to include OSWPQ and updated deserialization logic. - Introduced FeatureAccessor enum to handle both OSW and OSWPQ feature accessors. - Implemented OswpqAccess struct for reading PyProphet's split parquet format. - Updated Runner to support loading features from OSWPQ files and adjusted related methods. - Added necessary methods in OswpqAccess for fetching precursor features and writing alignment data.

…n names

…dd peak mapping conversion for PyProphet format

…d batch insertions in OSWPQ access

… configuration template generation, and implement output file backup

coderabbitai · 2025-11-16T19:40:09Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds PyProphet Split Parquet (OSWPQ) feature-format support, expands XIC/feature input formats, refactors feature access via a new FeatureAccessor, adds CLI config-template generation, and tightens input validation and file-type inference.

Changes

Cohort / File(s)	Summary
Documentation & Configuration Schema `README.md`, `crates/arycal-common/src/config.rs`	README updated with OSWPQ/parquet support, parallelization notes, quick-start and template guidance. `FeaturesFileType::OSWPQ` added; `FiltersConfig.decoy` replaced by `include_decoys` (serde alias preserved); defaults updated.
Input Validation & Loading `crates/arycal-cli/src/input.rs`	Removed GUI and several CLI-specific public fields; added `from_arguments`, `load`, `save_to_file`; added file-existence checks, file-type inference from paths, and support for Parquet XIC and OSWPQ features; adjusted logging and overrides handling.
Core Alignment Refactor `crates/arycal-cli/src/lib.rs`	Introduced `FeatureAccessor` enum (Osw / Oswpq) and delegation methods; Runner and alignment write functions refactored to operate on `FeatureAccessor`; added feature-type logging and early failure for Unknown types; updated call sites and signatures.
CLI Template Generation & Control Flow `crates/arycal-cli/src/main.rs`	`parameters` arg made optional; added `generate_config_template` to emit `arycal_config_template.json` + docs when no config provided; CLI now prints guidance and exits when template is generated.
OSWPQ Module & Integration `crates/arycal-cloudpath/src/oswpq.rs`, `crates/arycal-cloudpath/src/lib.rs`	New `oswpq` module: `OswpqAccess`, error type, data models (`PrecursorFeature`, `AlignmentFeature`), DuckDB-backed parquet read/write, run discovery, parallel feature fetching, batch alignment writing, and helpers; `pub mod oswpq` exported.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Main
    participant Input as Input Loader
    participant Runner as Alignment Runner
    participant FeatureAccessor as FeatureAccessor
    participant OSW as OswAccess
    participant OSWPQ as OswpqAccess

    CLI->>CLI: Check for config file
    alt No config provided
        CLI->>CLI: generate_config_template()
        CLI->>CLI: Write arycal_config_template.json + docs
        CLI->>CLI: Exit with guidance
    else Config provided
        CLI->>Input: from_arguments()
        Input->>Input: infer_types() from paths
        Input->>Input: validate() existence & types
        Input-->>CLI: Return Input
        
        CLI->>Runner: Initialize with Input
        alt FeaturesFileType::OSW
            Runner->>OSW: Create OswAccess
            Runner->>FeatureAccessor: Wrap as Osw
        else FeaturesFileType::OSWPQ
            Runner->>OSWPQ: Create OswpqAccess
            Runner->>FeatureAccessor: Wrap as Oswpq
        else FeaturesFileType::Unknown
            Runner->>Runner: Fail early with error
        end
        
        Runner->>FeatureAccessor: fetch_transition_ids()
        FeatureAccessor-->>FeatureAccessor: Delegate to inner impl
        Runner->>Runner: Run alignment & scoring
        Runner->>FeatureAccessor: write_ms2_alignment_batch()
        FeatureAccessor-->>FeatureAccessor: Route to OSW or OSWPQ writer
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Substantial refactor touching core Runner and many call sites (FeatureAccessor delegation).
Large new module (oswpq.rs) with DuckDB/parquet logic and rayon parallelism.
Config semantics changed (decoy → include_decoys) and new defaults; verify backward compatibility.
Areas to inspect closely:
- DuckDB in-memory queries and parquet export paths
- Concurrency (rayon) safety and batching logic in OSWPQ writes
- Correct delegation and parity between OswAccess and OswpqAccess methods
- File-type inference and file-existence validation paths

Possibly related PRs

Feature/oswpq reader #11 — Implements OSWPQ support and FeatureAccessor abstraction; likely overlaps strongly with this PR's OSWPQ and access-layer changes.
Update/gui #2 — Modifies core CLI/config types (Input, config enums); changes are directly related to this PR's input validation and config schema updates.

Poem

🐰 I dug through parquet fields and rows,

where OSW and OSWPQ both grow.
A template sprouted, threads awake,
FeatureAccessor hops in stride to make
alignments sing as outputs flow. 🎋

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Feature/oswpq reader' refers to a key feature added in the PR, but is somewhat generic and doesn't clearly communicate the primary scope of the change—which involves adding OSWPQ support and introducing a unified feature accessor abstraction for multiple file formats.	Consider a more descriptive title such as 'Add PyProphet parquet (OSWPQ) feature reader and unified FeatureAccessor' to clarify that this adds both a new file format reader and an important abstraction for feature access.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 88.10% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01dc35b and 97861d4.

📒 Files selected for processing (4)

README.md (2 hunks)
crates/arycal-cli/src/main.rs (3 hunks)
crates/arycal-cloudpath/src/oswpq.rs (1 hunks)
crates/arycal-common/src/config.rs (6 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

crates/arycal-cli/src/input.rs (2)
115-137: Error message for features type mentions oswpqd instead of oswpq

In validate:
if self.features.file_type != Some(FeaturesFileType::OSW) && self.features.file_type != Some(FeaturesFileType::OSWPQ) {
    return Err(io::Error::new(
        io::ErrorKind::InvalidData,
        format!(
            "Invalid features type {:?}; expected 'osw' or 'oswpqd'",
            self.features.file_type
        ),
    ).into());
}
You correctly accept FeaturesFileType::OSWPQ, whose string representation is "oswpq", but the error message says "oswpqd". That’s slightly misleading and could confuse users when they copy/paste from the message into their config.

I’d suggest updating the message to:
-            "Invalid features type {:?}; expected 'osw' or 'oswpqd'",
+            "Invalid features type {:?}; expected 'osw' or 'oswpq'",
to match the actual accepted values and the template documentation.

92-113: Fix extension-based type inference to use extension() method instead of ends_with()

In crates/arycal-cli/src/input.rs lines 92-113, PathBuf::ends_with compares whole path components, not filename suffixes. For typical file paths like "/path/to/merged.sqMass", the condition ends_with(".sqMass") will fail since ".sqMass" is not a complete path component. This causes the inference to default to Unknown, which validation then rejects unless the file type is explicitly set in JSON.

Use the extension() method instead:
if first_file.extension() == Some(OsStr::new("sqMass")) {
    self.xic.file_type = Some(XicFileType::SqMass);
}
Apply the same fix to the features file type check (use "osw" instead of ".osw").
crates/arycal-common/src/config.rs (1)

113-137: Alias mapping does not invert legacy decoy semantics as documented; old configs will have inverted behavior

The review comment is accurate. With only #[serde(alias = "decoy")], the mapping is direct (decoy: X → include_decoys: X), which inverts the intended behavior:

Old decoy: true (exclude decoys) → include_decoys: true → filter_decoys = false (include) — wrong

Old decoy: false (include decoys) → include_decoys: false → filter_decoys = true (exclude) — wrong

The contradictory comments at lines 118-120 and 121 also create confusion: one suggests semantics are preserved, the other admits they won't be ("interpreted with new semantics").

To resolve this:

Option A: Add a custom Deserialize impl for FiltersConfig that inverts the deserialized decoy value

Option B: Remove the misleading documentation and clarify this is an intentional breaking change (update lines 118-120)
crates/arycal-cli/src/lib.rs (1)
371-419: Update MPI path to wrap OswAccess in FeatureAccessor enum and ensure Clone is derived

The MPI run() method at lines 408-418 has a type mismatch that will fail compilation:

Line 412 creates vec![osw_access] as Vec<OswAccess>

Line 414 tries to assign self.feature_access.clone() as Vec<FeatureAccessor>

These incompatible types cannot coexist in the same variable

The non-MPI path (lines 268-278) correctly wraps variants: FeatureAccessor::Osw(osw_access) and FeatureAccessor::Oswpq(oswpq_access).

The MPI path must do the same:
let osw_access = OswAccess::new(&scores_output_file, false)?;
vec![FeatureAccessor::Osw(osw_access)]  // Wrap in enum variant
Also verify that FeatureAccessor derives Clone (not visible in the enum definition at line 41), since line 414 calls .clone().

🧹 Nitpick comments (9)

README.md (1)
178-179: Missing documentation for top-level config fields.

The Configuration Fields section documents nested config sections (XIC, Features, Filters, Alignment) but omits documentation for the top-level threads and log_level fields that appear in the OSW example config (lines 178-179). Add these to the Configuration Fields section for completeness.

Consider adding:
**Top-level Section:**
- `threads`: Number of threads for parallel processing (default/auto if omitted)
- `log_level`: Logging verbosity ("trace", "debug", "info", "warn", "error")
Also applies to: 215-215
crates/arycal-cli/src/main.rs (1)

135-187: Optional parameters flow and template error handling look correct

Making parameters optional and explicitly handling the “no config provided” case (generating arycal_config_template.json, printing guidance, then exiting with code 1) cleanly avoids panics in Input::from_arguments. Just be aware that Input::from_arguments still calls .expect("required parameters"), so if it’s ever used outside this guarded path it will need a more graceful handling of missing configs.

crates/arycal-cli/src/lib.rs (2)

163-203: Feature type dispatch in Runner::new looks correct, but still assumes a feature file exists

The new feature_file_type switch correctly routes to OswAccess::new vs OswpqAccess::new based on FeaturesFileType, with an explicit error for Unknown. That matches the updated config schema.

One thing to consider: parameters.features.file_paths[0] will panic if features.file_paths is empty. Today that’s only prevented by config discipline; a small explicit check with a clear error (e.g., “no feature file configured”) would make this more robust.

1180-1205: write_aligned_score_results_to_db still OSW-specific and currently unused

write_aligned_score_results_to_db still accepts &[OswAccess] and isn’t called anymore (the call is commented out). That’s fine for now, but if you plan to resurrect full-trace score writes you’ll likely want to:

Switch its parameter to &[FeatureAccessor], and

Add corresponding methods to FeatureAccessor so it can write to OSW and OSWPQ backends consistently.

Otherwise, consider removing or clearly marking it as legacy to avoid confusion.
crates/arycal-cli/src/input.rs (2)
36-73: from_arguments flow is reasonable; note hard expect on parameters

The from_arguments pipeline (load → infer_types → validate → apply xic_paths override → final consistency checks) is straightforward and integrates well with the new CLI behavior.

Just be aware that it still calls:
let path = matches
    .get_one::<String>("parameters")
    .expect("required parameters");
While main now guards the “no parameters supplied” case, this will panic if from_arguments is ever used elsewhere without that guard. If you want the library API to be more robust on its own, you could return a Result error instead of panicking when the parameter is missing.

138-159: Path existence checks are good; note that CLI overrides aren’t revalidated

The validate loops that ensure every path in xic.file_paths and features.file_paths exists are helpful, especially with OSWPQ directories allowed.

One subtle point: you apply the xic_paths CLI override after calling validate in from_arguments, so additional paths supplied on the command line aren’t checked for existence here. If you want full parity, you could either re-run validate() after applying overrides or explicitly check those extra paths.
crates/arycal-cloudpath/src/oswpq.rs (3)

568-568: Address TODO: Strip modifications from modified sequence.

The comment indicates that unmodified sequences should be extracted by stripping modifications, but currently the modified sequence is used as-is. This may cause issues if downstream code expects true unmodified sequences.

Would you like me to help implement a modification-stripping function, or should this be tracked as a separate issue? Common formats use brackets [] or parentheses () for modifications.

747-859: Consider extracting common write logic to reduce duplication.

The write_ms2_alignment_detailed method duplicates the batch insert and append pattern from write_alignment_features. Consider extracting a generic write_to_parquet_with_schema helper method.

This would reduce maintenance burden and ensure consistent behavior across write methods. However, since this method is marked #[allow(dead_code)], it may be lower priority.

962-973: Consider expanding test coverage.

The current test only covers the error case for a nonexistent path. Consider adding tests for:

Successful creation with a valid test directory

Backup behavior when output files exist

Path handling with/without .oswpq extension

Query building and result parsing with mock data

These tests would increase confidence in the implementation, especially for the SQL generation logic.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 952c22e and 01dc35b.

📒 Files selected for processing (7)

README.md (2 hunks)
crates/arycal-cli/src/input.rs (2 hunks)
crates/arycal-cli/src/lib.rs (10 hunks)
crates/arycal-cli/src/main.rs (3 hunks)
crates/arycal-cloudpath/src/lib.rs (1 hunks)
crates/arycal-cloudpath/src/oswpq.rs (1 hunks)
crates/arycal-common/src/config.rs (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

crates/arycal-cli/src/lib.rs (2)

crates/arycal-cloudpath/src/oswpq.rs (6)

fetch_transition_ids (497-581)

fetch_feature_data_for_precursor_batch (684-702)

fetch_full_precursor_feature_data_for_runs (584-680)

create_feature_ms2_alignment_table (706-711)

create_feature_transition_alignment_table (714-717)

new (116-140)

crates/arycal-cloudpath/src/osw.rs (10)

fetch_transition_ids (884-1050)

fetch_feature_data_for_precursor_batch (1763-1958)

fetch_full_precursor_feature_data_for_runs (1364-1570)

create_feature_ms2_alignment_table (2070-2139)

create_feature_transition_alignment_table (2212-2267)

insert_feature_ms2_alignment_batch (2142-2208)

insert_feature_transition_alignment_batch (2270-2323)

new (171-200)

new (220-236)

new (314-337)

crates/arycal-cli/src/main.rs (2)

crates/arycal-cli/src/input.rs (1)

default (24-33)

crates/arycal-common/src/config.rs (8)

default (13-15)

default (50-52)

default (130-137)

default (180-198)

default (281-305)

default (372-427)

default (494-532)

default (644-722)

crates/arycal-cloudpath/src/oswpq.rs (1)

crates/arycal-cli/src/lib.rs (6)

new (160-240)

fetch_transition_ids (47-63)

fetch_full_precursor_feature_data_for_runs (81-96)

fetch_feature_data_for_precursor_batch (65-79)

create_feature_ms2_alignment_table (98-109)

create_feature_transition_alignment_table (111-122)

crates/arycal-cli/src/input.rs (2)

crates/arycal-cloudpath/src/oswpq.rs (1)

new (116-140)

crates/arycal-cli/src/lib.rs (1)

new (160-240)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (8)

README.md (3)

24-24: Feature list accurately documents parallelization and multi-format support.

The additions clearly communicate the new parallelization capability and expanded input format support. The documentation is clear and actionable for users evaluating whether ARYCAL meets their needs.

Also applies to: 30-34

183-214: OSWPQ example is valid but inconsistently detailed compared to OSW example.

The OSWPQ configuration correctly shows the new parquet file types and .oswpqd directory path. However, the example is noticeably sparse compared to the OSW example—it omits many fields present in the OSW config (e.g., use_tic, smoothing, compute_scores, threads, log_level).

This risks confusing users about whether missing fields are required, optional, or have defaults. Either:

Include the same level of detail in both examples, or

Add a note explaining that the OSWPQ example shows only essential fields and that all other fields from the OSW example are also supported.

Please verify that all omitted fields have sensible defaults or are truly optional by checking the config schema.

135-181: Based on the verification results from the shell script execution, the filters section field names have been confirmed to match the codebase exactly:

include_decoys: bool ✓

include_identifying_transitions: Option ✓

max_score_ms2_qvalue: Option ✓

precursor_ids: Option ✓

All default values in the example also match the FiltersConfig implementation. The field names use proper Rust naming conventions with serde renames handled correctly.

However, the shell script output was truncated before showing the complete AlignmentConfig struct definition and the top-level Config struct fields (threads, log_level). While the filters section is verified as correct, a complete verification of all fields in the example would require viewing the full struct definitions.

No issues found in the filters section; field naming is consistent with the schema.

crates/arycal-cloudpath/src/lib.rs (1)

11-18: OSWPQ module export looks good

Adding pub mod oswpq; is consistent with the new OswpqAccess usage in arycal-cli and is required for those imports to compile. No further issues here.

crates/arycal-common/src/config.rs (1)

42-77: OSWPQ feature type wiring is consistent

The new FeaturesFileType::OSWPQ variant, deserialization from "oswpq" | "parquet", and as_str returning "oswpq" are consistent with the CLI and template documentation and should work as expected for OSWPQ feature sources.
crates/arycal-cli/src/lib.rs (2)
40-149: FeatureAccessor abstraction is a good unification of OSW/OSWPQ backends

Wrapping OswAccess and OswpqAccess in FeatureAccessor and forwarding the key methods (fetch_transition_ids, feature fetch, and alignment-table/insert methods) gives you a clean, uniform surface for the rest of the pipeline. The error mapping into anyhow::Result also keeps the call sites tidy.

267-285: Remove immutable variable declaration and add mut keyword

The code declares separate_output_accessor immutable, then assigns to it in both conditional branches, which is a Rust compilation error. This must be mutable:
-        let separate_output_accessor: Option<FeatureAccessor>;
+        let mut separate_output_accessor: Option<FeatureAccessor>;
With this change, the borrow via from_ref is valid because separate_output_accessor stays in scope and is not mutated after the borrow.

Likely an incorrect or invalid review comment.
crates/arycal-cloudpath/src/oswpq.rs (1)

682-702: LGTM: Effective parallelization pattern.

The parallel processing of precursor batches is well-implemented. Each thread creates its own DuckDB connection (via fetch_full_precursor_feature_data_for_runs), avoiding concurrency issues.

coderabbitai · 2025-11-16T19:48:02Z

crates/arycal-cli/src/main.rs

+fn generate_config_template(path: &str) -> Result<()> {
+    // Create a default Input with only core alignment fields
+    let template_config = Input {
+        xic: XicConfig::default(),
+        features: FeaturesConfig::default(),
+        filters: FiltersConfig::default(),
+        alignment: AlignmentConfig::default(),
+        threads: std::thread::available_parallelism().unwrap().get().saturating_sub(1).max(1),
+        log_level: "info".to_string(),
+    };
+
+    // Serialize to pretty JSON
+    let json = serde_json::to_string_pretty(&template_config)?;
+
+    // Create documentation to append
+    let docs = r#"
+
+# Configuration Guide:
+#
+# NOTE: Remove comments (lines starting with #) before using this file as a configuration.
+# 
+# XIC Section:
+#   - include-precursor: Include precursor chromatograms (true/false)
+#   - num-isotopes: Number of isotopic peaks to include (typically 3)
+#   - file-type: "sqMass" for SQLite-based XICs or "parquet" for Parquet-based XICs
+#   - file-paths: List of paths to your XIC files
+#
+# Features Section:
+#   - file-type: "osw" for SQLite features or "oswpq" for PyProphet split parquet format
+#   - file-paths: Path to your features file (merged.osw or merged_runs.oswpqd directory)
+#
+# Filters Section:
+#   - include_decoys: false = only align targets, true = align both targets and decoys
+#   - include_identifying_transitions: Include non-quantifying transitions (default: false)
+#   - max_score_ms2_qvalue: Maximum q-value threshold for filtering (1.0 = no filtering)
+#   - precursor_ids: Optional path to TSV file with specific precursor IDs to process
+#
+# Alignment Section:
+#   - batch_size: Number of precursors to process before writing results (10000 recommended)
+#   - method: Alignment method - "FFT", "DTW", or "FFTDTW" (FFTDTW recommended)
+#   - reference_type: "star" (align to one reference) or "mst" (minimum spanning tree)
+#   - reference_run: Specific run to use as reference (null = auto-select)
+#   - use_tic: Use total ion chromatogram for alignment (always true)
+#   - smoothing: Savitzky-Golay filter parameters
+#     * sgolay_window: Window size (must be odd, typically 11)
+#     * sgolay_order: Polynomial order (typically 3)
+#   - rt_mapping_tolerance: Retention time tolerance in seconds for peak mapping (10.0 recommended)
+#   - decoy_peak_mapping_method: "shuffle" or "random_region"
+#   - decoy_window_size: Window size for random_region method
+#   - compute_scores: Calculate alignment scores (true recommended)
+#   - scores_output_file: Write scores to separate file (null = write to input file)
+#   - retain_alignment_path: Keep alignment path data (needed for identifying transitions)
+#
+# Threads & Logging:
+#   - threads: Number of threads to use (default = # of CPUs - 1)
+#   - log_level: Log verbosity ("error", "warn", "info", "debug", "trace")
+#
+# Example configurations:
+#
+# For PyProphet parquet format:
+#   "features": {
+#     "file-type": "oswpq",
+#     "file-paths": ["/path/to/merged_runs.oswpqd"]
+#   }
+#
+# For parquet XICs:
+#   "xic": {
+#     "file-type": "parquet",
+#     "file-paths": ["/path/to/file1.parquet", "/path/to/file2.parquet"]
+#   }
+#
+# To filter for specific precursors:
+#   "filters": {
+#     "precursor_ids": "/path/to/precursor_list.tsv"
+#   }
+#
+# To use a specific run as reference:
+#   "alignment": {
+#     "reference_type": "star",
+#     "reference_run": "your_run_name"
+#   }
+"#;
+
+    let mut file = File::create(path)?;
+    file.write_all(json.as_bytes())?;
+    file.write_all(docs.as_bytes())?;
+    Ok(())


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Template generation is fine; fix decoy_peak_mapping_method name mismatch

The template generation logic (default Input, pretty-printed JSON, appended docs) is sound.

However, the docs block advertises:

decoy_peak_mapping_method: "shuffle" or "random_region"

while the implementation in crates/arycal-cli/src/lib.rs checks for "shuffle" and "random_regions" (plural). With the current code, users following the docs and setting "random_region" will silently get no decoy peak generation for the “random region” mode.

Consider either:

Updating the docs here and in config.rs to "random_regions", or

Accepting both spellings in the implementation for robustness.

As a minor clean-up, you could also build template_config via Input::default() to avoid duplicating the default field initialization.

I'll verify the core claim about the decoy_peak_mapping_method spelling mismatch by examining the implementation files.

🏁 Script executed:

#!/bin/bash # Search for decoy_peak_mapping_method usage in lib.rs and config.rs echo "=== Searching for decoy_peak_mapping_method in lib.rs ===" rg -n "decoy_peak_mapping_method|random_region" crates/arycal-cli/src/lib.rs -C 3 echo -e "\n=== Searching for decoy_peak_mapping_method in config.rs ===" rg -n "decoy_peak_mapping_method|random_region" crates/arycal-cli/src/ -C 2 --type rust echo -e "\n=== Checking AlignmentConfig struct definition ===" ast-grep --pattern 'struct AlignmentConfig { $$$ }'

Length of output: 6217

I need to locate the AlignmentConfig struct definition to verify the field name:

🏁 Script executed:

#!/bin/bash # Find AlignmentConfig struct definition echo "=== Searching for AlignmentConfig struct definition ===" rg -n "struct AlignmentConfig" crates/arycal-cli/src/ -A 20 --type rust echo -e "\n=== Also check for serde rename attributes ===" rg -n "AlignmentConfig|#\[serde" crates/arycal-cli/src/ -B 2 -A 8 --type rust | head -100

Length of output: 3410

The AlignmentConfig is defined in arycal_common::config. Let me search for it there:

🏁 Script executed:

#!/bin/bash # Find AlignmentConfig struct in arycal_common echo "=== Searching for AlignmentConfig struct in arycal_common ===" find . -name "*.rs" -path "*/arycal_common/*" -exec grep -l "struct AlignmentConfig" {} \; echo -e "\n=== Get the struct definition and fields ===" find . -name "*.rs" -path "*/arycal_common/*" -exec grep -l "AlignmentConfig" {} \; | head -5 | xargs -I {} sh -c 'echo "File: {}"; grep -A 30 "struct AlignmentConfig" {}'

Length of output: 107

Let me search more broadly for the struct definition:

🏁 Script executed:

#!/bin/bash # Broader search for AlignmentConfig definition echo "=== Searching entire codebase for AlignmentConfig struct ===" rg -n "struct AlignmentConfig" . -A 25 --type rust echo -e "\n=== Check if it's in a config.rs file ===" fd "config.rs" . --type f

Length of output: 2974

Fix decoy_peak_mapping_method spelling: docs advertise "random_region" but code expects "random_regions"

The implementation in crates/arycal-cli/src/lib.rs (lines 712, 1131) checks for "random_regions" (plural), while the documentation in crates/arycal-cli/src/main.rs (lines 62–63) and the comment in crates/arycal-common/src/config.rs (line 165) advertise "random_region" (singular). Users following the docs will silently receive no decoy peak generation for that mode.

Update crates/arycal-cli/src/main.rs lines 62–63 and crates/arycal-common/src/config.rs line 165 to match the implementation: change "random_region" to "random_regions".

As a minor refactor, also simplify template_config initialization at crates/arycal-cli/src/main.rs line 17 by using Input::default() and only override non-default fields.