Skip to content

feat: Add CodeEDA SyntheticGenerator for code augmentation #76

@noahgift

Description

@noahgift

Summary

Add code-specific EDA (Easy Data Augmentation) implementation of SyntheticGenerator trait.

Requirements

  • Implement CodeEDA struct with operations:
    • Synonym Replacement (variable renaming)
    • Random Insertion (comments/asserts)
    • Random Swap (reorder independent statements)
    • Random Deletion (dead code removal)
  • Quality scoring via AST parse validity + token overlap
  • Diversity scoring to detect mode collapse

References

  • Wei & Zou (2019) EDA paper
  • verificar automl-synthetic-data-codex-spec.md

Acceptance Criteria

  • CodeEDA implements SyntheticGenerator<Input=String, Output=String>
  • Quality threshold filtering (default 0.75)
  • Integration tests with Python/Rust code samples

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions