Skip to content

Inference effect: configurable max_tokens / temperature #370

@aallan

Description

@aallan

The Inference.complete host function hardcodes max_tokens: 1024 for the Anthropic provider (OpenAI and Moonshot use provider defaults). Users have no way to override this from Vera code or via environment variable.

Scope

  • Add VERA_INFERENCE_MAX_TOKENS env var (integer, applied to all providers)
  • Add VERA_INFERENCE_TEMPERATURE env var (float, applied to all providers)
  • Longer term: a richer InferenceOptions struct or additional effect operations

Current workaround

None — responses are capped at 1024 tokens for Anthropic calls regardless of prompt length or use case.

Related

Introduced in v0.0.101 as part of #61.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestlimitationKnown compilation limitation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions