API Automation Agent

An open-source AI Agent that automatically generates an automation framework from your OpenAPI/Swagger specification or Postman collection, based on the api-framework-ts-mocha template (https://github.com/damianpereira86/api-framework-ts-mocha).

Features

Generates type-safe service and data models
Generates test suites for every endpoint
Reviews and fixes code issues and ensures code quality and best practices
Includes code formatting and linting
Runs tests with detailed reporting and assertions
Migrates Postman collections to an open source automation framework, maintaining test structure and run order

Usage

Standalone Installer

Download the standalone executable:

Prerequisites

Windows 7+ or macOS 10.14+
API key (OpenAI or Anthropic)
Node.js 18+

Windows Users

Go to Releases
Download api-agent-windows.zip
Extract and follow the included USAGE-GUIDE.txt

Mac Users

Go to Releases
Download api-agent-macos.tar.gz
Extract and follow the included USAGE-GUIDE.txt
Make the executable runnable: chmod +x api-agent

Manual Installation (for development)

Prerequisites

Node.js 18 or higher
Python 3.8 or higher
OpenAI API key or Anthropic API key (Anthropic API key required by default)

Installation Steps

Clone the repository:

git clone https://github.com/TestCraft-App/api-automation-agent.git
cd api-automation-agent

Install Python dependencies:
```
pip install -r requirements.txt
```
Set up environment variables:
```
cp example.env .env
```

Edit the .env file with your API keys:

OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
GOOGLE_API_KEY=your_gemini_api_key_here
AWS_ACCESS_KEY_ID=your_aws_access_key_id_here
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here
AWS_REGION=us-east-1

Large Language Models

This project supports Anthropic, OpenAI, Google Generative AI, and AWS Bedrock language models.

Security Note: Claude models are significantly more resistant to prompt injection attacks compared to OpenAI or Google models when processing API definitions. This is particularly important when working with API specifications from untrusted sources. See the prompt injection evaluation dataset for details.

Supported Models

Anthropic

Claude Sonnet 4.6 (claude-sonnet-4-6) - Default: Best balance of quality and cost
Claude Opus 4.6 (claude-opus-4-6) - Highest quality for complex tasks
Claude Sonnet 4.5 (claude-sonnet-4-5)
Claude Haiku 4.5 (claude-haiku-4-5) - Fast + low cost
Claude Opus 4.5 (claude-opus-4-5)
Claude Sonnet 4 (claude-sonnet-4)

OpenAI

GPT-5.4 (gpt-5.4) - Recommended
GPT-5.3 Codex (gpt-5.3-codex) - Optimized for code
GPT-5.4 Mini (gpt-5.4-mini) - Fast + low cost
GPT-5.4 Nano (gpt-5.4-nano) - Cheapest
GPT-5.2 (gpt-5.2)
GPT-5.1 (gpt-5.1)
GPT-5 (gpt-5)
GPT-4.1 (gpt-4.1)
GPT-5 Mini (gpt-5-mini)

Google

Gemini 3.1 Pro Preview (gemini-3.1-pro-preview) - Recommended: Most capable
Gemini 3 Flash (gemini-3-flash) - Fast + low cost
Gemini 3 Pro Preview (gemini-3-pro-preview) - Deprecated: shut down March 9, 2026

AWS Bedrock

AWS Bedrock provides access to multiple model families through a unified API. Use the actual Bedrock model IDs:

Claude models: anthropic.claude-sonnet-4-6-v1:0, anthropic.claude-opus-4-6-v1:0, anthropic.claude-sonnet-4-5-v1:0, anthropic.claude-haiku-4-5-v1:0, anthropic.claude-opus-4-5-v1:0, anthropic.claude-sonnet-4-v1:0
OpenAI models: openai.gpt-5.4, openai.gpt-5.3-codex, openai.gpt-5.4-mini, openai.gpt-5.4-nano, openai.gpt-5.2, openai.gpt-5.1, openai.gpt-5, openai.gpt-4.1, openai.gpt-5-mini
Google models: google.gemini-3.1-pro-preview, google.gemini-3-flash, google.gemini-3-pro-preview

Authentication Options:

Option 1: AWS CLI (Recommended)

# One-time setup
aws configure
# Enter your AWS Access Key, Secret Key, Region, and Output format

# Then in your .env file:
MODEL=anthropic.claude-sonnet-4-5-v1:0
AWS_REGION=us-east-1

Option 2: Environment Variables

MODEL=anthropic.claude-sonnet-4-5-v1:0
AWS_ACCESS_KEY_ID=your_access_key_id
AWS_SECRET_ACCESS_KEY=your_secret_access_key
AWS_REGION=us-east-1

The agent will automatically use your AWS CLI configuration if credentials are not explicitly provided in the .env file. This approach is more secure and supports IAM roles, AWS SSO, and other AWS authentication methods.

You can configure your preferred model in the .env file:

MODEL=gpt-5.1

Important: Before using any model, check the current pricing and costs on the respective provider's website (Anthropic, OpenAI, or Google). Model costs can vary significantly and may impact your usage budget.

Running the Agent

Standalone Executable

If you downloaded the standalone executable:

./api-agent <path_or_url_to_openapi_definition>

Manual Installation

If you installed manually for development:

python ./main.py <path_or_url_to_openapi_definition>

The agent accepts either:

A local file path to your OpenAPI/Swagger specification or Postman collection
A URL to a JSON or YAML OpenAPI/Swagger specification (URL not supported for Postman collections)

Options

--destination-folder: Specify output directory (default: ./generated-framework_[timestamp])
--use-existing-framework: Use an existing framework instead of creating a new one
--endpoints: Generate framework for specific endpoints (can specify multiple)
--prefixes: Specify one or more API path prefixes to remove from endpoints (can specify multiple)
--generate: Specify what to generate (default: models_and_tests)
- models: Generate only the data models
- models_and_first_test: Generate data models and the first test for each endpoint
- models_and_tests: Generate data models and complete test suites
--list-endpoints: List the endpoints that can be used with the --endpoints flag

Note: The --endpoints, --generate, --list-endpoints, and --use-existing-framework options are only available when using Swagger/OpenAPI specifications. When using Postman collections, only the --destination-folder and --prefixes parameters are fully supported.

Examples

# Generate framework from a local file
./api-agent api-spec.yaml
python ./main.py api-spec.yaml

# Generate framework from a URL
./api-agent https://api.example.com/swagger.json
python ./main.py https://api.example.com/swagger.json

# Generate list root endpoints
./api-agent api-spec.yaml --list-endpoints
python ./main.py api-spec.yaml --list-endpoints

# Generate complete framework with all endpoints
./api-agent api-spec.yaml
python ./main.py api-spec.yaml

# Generate models and tests for specific endpoints using an existing framework
./api-agent api-spec.yaml --use-existing-framework --destination-folder ./my-api-framework --endpoints /user /store
python ./main.py api-spec.yaml --use-existing-framework --destination-folder ./my-api-framework --endpoints /user /store

# Generate only data and service models for all endpoints
./api-agent api-spec.yaml --generate models
python ./main.py api-spec.yaml --generate models

# Generate models and first test for each endpoint in a custom folder
./api-agent api-spec.yaml --generate models_and_first_test --destination-folder ./quick-tests
python ./main.py api-spec.yaml --generate models_and_first_test --destination-folder ./quick-tests

# Combine options to generate specific endpoints with first test only
./api-agent api-spec.yaml --endpoints /store --generate models_and_first_test
python ./main.py api-spec.yaml --endpoints /store --generate models_and_first_test

The generated framework will follow the structure:

generated-framework_[timestamp]/    # Or the Destination Folder selected
├── src/
│   ├── base/                       # Framework base classes
│   ├── models/                     # Generated TypeScript interfaces and API service classes
│   └── tests/                      # Generated test suites
├── package.json
├── (...)
└── tsconfig.json

Framework State & Incremental Generation

The framework state management feature enables incremental generation of endpoints one at a time, while preserving all previously generated models as context for subsequent generations. This allows you to generate your test framework in stages, edit tests and models manually, and continue generation later without losing context or regenerating existing artifacts.

Every framework contains a framework-state.json file at its root. This file tracks each generated endpoint, the verbs that were processed, the TypeScript model files (path + summary), and the associated test specs.
When you pass --use-existing-framework, the agent loads this state file from --destination-folder and re-reads the referenced model files from disk so that manually edited models are still used as LLM context.
If the user requests to generate an endpoint that is part of the loaded state, the agent prompts the user whether to override it, skip it, or exit.
Tests are marked as part of the state file as soon as they are generated, so you can run the agent in multiple stages (e.g., generate models first, return later to add tests) without losing track of your progress.

Postman Collection Migration

The API Automation Agent can now convert your Postman collections into TypeScript automated test frameworks, preserving the structure and test logic of your collections.

Supported Features

Converts Postman Collection v2.0 format JSON files into a TypeScript test framework
Maintains the original folder structure of your Postman collection
Preserves test run order for consistent test execution
Creates service files by grouping API routes by path
Migrates test scripts and assertions

Limitations

Only supports local Postman collection files (no HTTP download support yet)
Currently only supports Postman Collection v2.0 format
Scripts contained in folders (rather than requests) are not processed
Limited CLI support - only the --destination-folder and --prefixes parameter is fully supported with Postman collections

Best Practices

The migration works best with well-structured APIs where:

Endpoints are organized logically by resource
Similar endpoints (e.g., /users, /users/{id}) are grouped together
HTTP methods follow REST conventions

Usage

# Migrate a Postman collection to TypeScript test framework
./api-agent path/to/postman_collection.json --destination-folder ./my-api-tests
python ./main.py path/to/postman_collection.json --destination-folder ./my-api-tests

Testing the agent

The project uses a comprehensive testing strategy with three complementary approaches:

Traditional Tests: Unit and integration tests to ensure code quality and functionality
Evaluations: LLM-based model-graded evaluations to assess generated code quality
Benchmarks: Performance metrics for different LLM models generating API test frameworks

Traditional Tests

The project includes a comprehensive test suite to ensure code quality and functionality. Here's how to run and work with the tests:

Test Structure

Unit tests are located in tests/unit/
Integration tests are in tests/integration/
Test fixtures and mocks are in tests/fixtures/

Running the Test Suite

Install test dependencies:

pip install -r requirements.txt
pip install -r requirements-test.txt  # Additional test dependencies

Run all tests:
```
pytest
```

Run specific test categories:

pytest tests/unit/  # Run only unit tests
pytest tests/integration/  # Run only integration tests

Run tests with coverage report:

pytest --cov=src --cov-report=term --cov-config=.coveragerc

Test Best Practices

All external LLM calls are mocked to keep tests fast and free of API costs
Use the @pytest.mark.asyncio decorator for async tests
Follow the naming convention: test_<function_name>_<scenario>
Keep tests focused and isolated
Use fixtures for common setup and teardown

Writing New Tests

When adding new tests:

Place them in the appropriate directory based on test type
Use descriptive names that explain the test scenario
Mock external dependencies using pytest-mock
Add appropriate assertions to verify behavior
Consider edge cases and error scenarios

Evaluations

This project includes an evaluation suite designed to assess the quality of code generated by LLM services using model-graded evaluations. The evaluation infrastructure allows you to:

Define test cases with API definitions and evaluation criteria
Run evaluations for generate_first_test, generate_models, and generate_additional_tests
Automatically grade generated files using LLM-based evaluation
Generate detailed reports of evaluation results

The evaluations use LLM-based grading to assess whether generated files meet specified criteria, providing structured results with scores, detailed criterion-by-criterion evaluation, and reasoning.

The evaluation suite includes security evaluations, such as the prompt injection dataset, which evaluates the agent's resistance to prompt injection attacks embedded in API specifications.

View the latest results on the Evaluation Dashboard.

For detailed instructions on how to set up, run, and interpret the evaluation results, please refer to Evaluations.

Benchmarks

This project includes a benchmark tool designed to evaluate the performance of different Large Language Models (LLMs) in generating API test frameworks using this agent. It automates running the agent against an OpenAPI specification for various LLMs and collects quantifiable metrics.

For detailed instructions on how to set up, run, and interpret the benchmark results, please refer to Benchmarks.

Checkpoints

The checkpoints feature allows you to save and restore the state of the framework generation process. This is useful if you need to interrupt the process and resume it later without losing progress.

Checkpoints vs. Framework State: Checkpoints are still used for crash/interrupt recovery during a single run. The new framework-state.json file (described above) captures long-term metadata so you can return to an existing framework and continue generation later.

Purpose

The purpose of the checkpoints feature is to provide a way to save the current state of the framework generation process and restore it later.

How to Use

Saving State: The state is automatically saved at various points during the framework generation process. You don't need to manually save the state.
Restoring State: If a previous run was interrupted, you will be prompted to resume the process when you run the agent again. The agent will restore the last saved state and continue from where it left off.
Clearing Checkpoints: After the framework generation process is completed successfully, the checkpoints are automatically cleared.

Implementation

The checkpoints feature is implemented in the src/utils/checkpoint.py file. It uses the shelve module to store the state in a persistent dictionary.

Decorator to Functions

The Checkpoint class provides a checkpoint decorator that can be used to automatically save and restore the state of a function. This decorator can be applied to any function that you want to checkpoint.

Example:

from src.utils.checkpoint import Checkpoint

class MyClass:
    def __init__(self):
        self.checkpoint = Checkpoint(self)

    @Checkpoint.checkpoint()
    def my_function(self, arg1, arg2):
        # Function logic here
        pass

Wrapping Loops

The checkpoint_iter method of the Checkpoint class can be used to wrap a for-loop and automatically save and restore progress. This is useful for long-running loops where you want to ensure progress is not lost.

Example:

from src.utils.checkpoint import Checkpoint

class MyClass:
    def __init__(self):
        self.checkpoint = Checkpoint(self)
        self.state = {"info": []}

    def my_loop_function(self, items):
        for item in self.checkpoint.checkpoint_iter(items, "my_loop", self.state):
            self.state["info"].append(item)
        print(self.state)

In the example above, the checkpoint_iter method is used to wrap the for-loop. The self.state dictionary is passed as the third argument to the checkpoint_iter method. This dictionary needs to be in the format of a dict with a state. The iteration will start from where it left off (index) and restore the last state of the third variable.

Contribution Guidelines

Contributions are welcome! Here's how you can help:

Finding Tasks to Work On

We maintain a project board to track features, enhancements, and bugs. Each task in the board includes:

Task descriptions
Priority
Complexity
Size

New contributors can check out our "Good First Issues" view for beginner-friendly tasks to get started with.

Contribution Process

Fork the repository
Create a new branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests and linting
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request to the original repo

Reporting Issues

Found a bug or have a suggestion? Please open an issue on GitHub with:

A clear description of the problem
Steps to reproduce
Expected vs actual behavior
Your environment details (OS, Python version, etc.)

Code Formatting

This project uses strict code formatting rules to maintain consistency:

Black is used as the Python code formatter
- Line length is set to 88 characters
- Python 3.7+ compatibility is enforced
VS Code is configured for automatic formatting on save
Editor settings and recommended extensions are provided in the .vscode directory

All Python files will be automatically formatted when you save them in VS Code with the recommended extensions installed. To manually format code, you can run:

black .

Logging

The project implements a dual logging strategy:

Console Output: By default shows INFO level messages in a user-friendly format

Generated service class for Pet endpoints
Creating test suite for /pet/findByStatus

File Logging: Detailed DEBUG level logging with timestamps and metadata in logs/[framework-name].log

2024-03-21 14:30:22,531 - generator.services - DEBUG - Initializing service class generator for Pet endpoints
2024-03-21 14:30:22,531 - generator.services - INFO - Generated service class for Pet endpoints
2024-03-21 14:30:23,128 - generator.tests - DEBUG - Loading OpenAPI spec for /pet/findByStatus
2024-03-21 14:30:23,128 - generator.tests - INFO - Creating test suite for /pet/findByStatus

Debug Options

You can control debug levels through environment variables:

Application Debug: Set DEBUG=True in your .env file to enable debug-level logging in the console output
LangChain Debug: Set LANGCHAIN_DEBUG=True to enable detailed logging of LangChain operations

Example .env configuration:

DEBUG=False          # Default: False (INFO level console output)
LANGCHAIN_DEBUG=False  # Default: False (disabled)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

OpenAI and Anthropic for their AI models
All contributors who have helped build and improve this project

Name		Name	Last commit message	Last commit date
Latest commit History 449 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
.vscode		.vscode
.windsurf		.windsurf
api-framework-template		api-framework-template
benchmarks		benchmarks
evaluations		evaluations
prompts		prompts
scripts		scripts
src		src
tests		tests
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
USAGE-GUIDE.txt		USAGE-GUIDE.txt
api-automation-agent.spec		api-automation-agent.spec
example.env		example.env
main.py		main.py
pyproject.toml		pyproject.toml
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
tmp-issuefile.md		tmp-issuefile.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

API Automation Agent

Features

Usage

Standalone Installer

Prerequisites

Windows Users

Mac Users

Manual Installation (for development)

Prerequisites

Installation Steps

Large Language Models

Supported Models

Running the Agent

Standalone Executable

Manual Installation

Options

Examples

Framework State & Incremental Generation

Postman Collection Migration

Supported Features

Limitations

Best Practices

Usage

Testing the agent

Traditional Tests

Test Structure

Running the Test Suite

Test Best Practices

Writing New Tests

Evaluations

Benchmarks

Checkpoints

Purpose

How to Use

Implementation

Decorator to Functions

Wrapping Loops

Contribution Guidelines

Finding Tasks to Work On

Contribution Process

Reporting Issues

Code Formatting

Logging

Debug Options

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 42

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages