Skip to content

aprender-shell v2.0: World-class AI shell completion #102

@noahgift

Description

@noahgift

Summary

Elevate aprender-shell from solid baseline to world-class AI shell completion.

Current State (v0.2.0)

  • N-gram language model (trigram default)
  • Sub-10ms latency, 2.7MB binary
  • Local-only, privacy-preserving
  • Basic augmentation and AutoML tuning

Gap Analysis

Feature Current World Class
Model N-gram Transformer (CodeT5)
Understanding Pattern matching Semantic
Context Command prefix only Dir, git, env, history
Learning Per-session Cross-session, federated
NL support None "list large files" → find . -size +100M

Roadmap to World Class

Phase 1: Context Awareness

  • Current directory context
  • Git repository state (branch, status)
  • Environment variables
  • Recent command success/failure
  • Time-of-day patterns

Phase 2: Semantic Understanding

  • Command embedding space (Code2Vec style)
  • Intent classification (file ops, git, docker, etc.)
  • Argument type inference
  • Error prediction (will this command fail?)

Phase 3: Transformer Integration

  • Small transformer encoder (distilled)
  • Pretrain on shell corpus (bash/zsh scripts)
  • Fine-tune on user history
  • Quantized for <50ms latency

Phase 4: Natural Language

  • NL → command translation
  • "find large files" → find . -size +100M
  • Explanation mode ("what does this do?")
  • Safety warnings for destructive commands

Phase 5: Cross-Session Learning

  • Persistent learning across sessions
  • Workspace-specific models
  • Team model sharing (opt-in)
  • Federated learning (privacy-preserving)

Technical Requirements

  • Latency: <50ms (transformer), <10ms (fallback n-gram)
  • Binary size: <20MB (with embedded model)
  • Memory: <100MB runtime
  • Privacy: Local-first, opt-in cloud features
  • Platforms: Linux, macOS, Windows

Dependencies from aprender

From more-learning-specs.md:

  • §3: Contrastive learning (command embeddings)
  • §31: Code-specific ML (AST tokenizer)
  • §32: Embedding & retrieval (semantic search)
  • §33: Transfer learning (cross-user knowledge)

Success Metrics

Metric Current Target
Top-1 accuracy ~40% >70%
Top-5 accuracy ~70% >90%
User satisfaction Baseline 4.5/5 stars
Daily active usage N/A >1000 users

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions