Skip to content

tech-debt: Rename 499 _part_XX files to semantic names #305

@noahgift

Description

@noahgift

Summary

aprender has 499 files (≈33% of src/) with meaningless _part_XX mechanical split names. These names carry zero semantic signal and make the codebase unnavigable.

Cross-ref: paiml/paiml-mcp-agent-toolkit#233 (semantic-assisted rename tooling for pmat query)

Scale

Total _part_ files: 499
Double-nested (_part_XX_part_YY): 70+

Worst Directories

Directory _part_ Files
format/ 44
format/converter/tests/ 25
format/converter/ 24
online/ 13
nn/ 13
voice/ 12
synthetic/ 12
format/rosetta/ 12
metaheuristics/ 10
format/v2/ 10
format/gguf/ 10
tree/ 9
stats/ 8
optim/tests/ 8
linear_model/ 8

Concrete Rename Examples

format/converter/ — 24 files

The format/converter/ module has the most part files in aprender. These are splits of the APR/GGUF/SafeTensors conversion pipeline:

Current Pattern Count Likely Semantic Split
export_part_02.rs through export_part_05.rs 4 Split by format: export_gguf.rs, export_safetensors.rs, export_apr.rs, export_helpers.rs
import_part_02.rs through import_part_06.rs 5 Split by format: import_gguf.rs, import_safetensors.rs, etc.
mod_part_02.rs through mod_part_04.rs 3 Core converter types/dispatch
merge_part_02.rs 1 Model merge continuation
write_part_02.rs, write_part_03.rs 2 Writer splits

Test files (format/converter/tests/): 25 _part_ files including:

  • coverage_functions_part_03_part_02.rs (double-nested)
  • coverage_types_tests_part_05.rs
  • pure_functions_part_05.rs

These should be named by what they test: test_q4k_roundtrip.rs, test_metadata_parsing.rs, etc.

autograd/ — 5 files

Current Name Likely Content Suggested Name
grad_fn_part_02.rs Additional gradient function impls grad_fn_binary.rs or grad_fn_matmul.rs
grad_fn_part_03.rs More gradient functions grad_fn_loss.rs or grad_fn_activation.rs
grad_fn_part_03_part_02.rs Double-split overflow Name by actual gradient ops inside
grad_fn_part_03_part_03.rs Double-split overflow Name by actual gradient ops inside
ops/mod_part_02.rs Additional autograd ops Name by op category

format/gguf/ — 10 files

Current Pattern Count Likely Semantic Split
reader_part_02.rs through reader_part_03.rs 4 (with double nesting) reader_header.rs, reader_tensors.rs, reader_metadata.rs
dequant_part_02.rs, dequant_part_03.rs 2 dequant_q4.rs, dequant_q8.rs (by quant type)
reader_part_02_part_02.rs, reader_part_02_part_03.rs 2 Double-nested reader splits
Various test parts 4 Name by test subject

nn/ — 13 files

Current Pattern Likely Semantic Split
quantization_part_02.rs, quantization_part_03.rs quantization_calibration.rs, quantization_schemes.rs
quantization_part_03_part_02.rs, quantization_part_03_part_03.rs Double-nested quantization splits
vae_part_02.rs through vae_part_03_part_03.rs vae_encoder.rs, vae_decoder.rs, vae_loss.rs
transformer/mod_part_02.rs, mod_part_03.rs transformer_attention.rs, transformer_ffn.rs

Other Directories

Directory Files Notes
citl/ 16 Compiler, encoder, neural, pattern — each with _part_ splits
online/ 13 Corpus, curriculum, distillation splits
voice/ 12 Isolation, style splits
synthetic/ 12 Code EDA, shell generation splits
tree/ 9 Helpers, random forest splits
stats/ 8 Statistics module splits
metaheuristics/ 10 NAS, constructive search splits
embed/ 7 Tiny embedding model splits
text/tokenize/ 7 Tokenizer splits
graph/ 7 Graph module splits (double-nested)
cluster/tests/ 6 Clustering test splits
classification/ 6 Classifier splits

Execution Strategy

Phase 1: format/ (69 files — highest concentration)

The format module (converter, gguf, rosetta, v2) has 69 _part_ files. These are the most navigated files (model I/O) and benefit most from semantic names.

Phase 2: nn/ + autograd/ (18 files — core ML)

Neural network and autograd modules are high-traffic for anyone doing ML work.

Phase 3: Domain Modules (remaining ~412 files)

Everything else: citl, online, voice, synthetic, tree, stats, etc.

Rename Mechanics

Each rename requires:

  1. git mv old_part_name.rs new_semantic_name.rs
  2. Update #[path = "old_part_name.rs"] mod name; in parent module
  3. Verify cargo test --lib still passes
  4. Batch renames per directory to keep commits reviewable

Tooling Support

When paiml/paiml-mcp-agent-toolkit#233 (pmat query --suggest-rename) ships, use it to auto-generate the rename plan:

pmat query --suggest-rename --path src/format/converter/
pmat query --suggest-rename --path src/nn/
pmat query --suggest-rename --path src/autograd/

Until then, manual inspection of function signatures per file is required (as done for the examples above).

Impact

  • 499 files renamed to meaningful names
  • Developer navigation time reduced significantly
  • IDE go-to-file and fuzzy-find become useful (searching "loader" finds the loader, not mod_part_02_part_02.rs)
  • pmat query results show readable file paths
  • New contributors can understand the codebase structure without reading every file

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions