Using the node package

Installation

promptfoo is available as a node package on npm:

npm install promptfoo

Usage

Use promptfoo as a library in your project by importing the evaluate function and other utilities:

import promptfoo from 'promptfoo';

const results = await promptfoo.evaluate(testSuite, options);

The evaluate function takes the following parameters:

testSuite: the Javascript equivalent of the promptfooconfig.yaml as a TestSuiteConfiguration object.
options: misc options related to how the test harness runs, as an EvaluateOptions object.

The results of the evaluation are returned as an EvaluateSummary object.

Provider functions

A ProviderFunction is a Javascript function that implements an LLM API call. It takes a prompt string and a context. It returns the LLM response or an error. See ProviderFunction type.

You can load providers using the loadApiProvider function:

import { loadApiProvider } from 'promptfoo';

// Load a provider with default options
const provider = await loadApiProvider('openai:o3-mini');

// Load a provider with custom options
const providerWithOptions = await loadApiProvider('azure:chat:test', {
  options: {
    apiHost: 'test-host',
    apiKey: 'test-key',
  },
});

Assertion functions

An Assertion can take an AssertionValueFunction as its value. The function receives:

output: the LLM output string
context: execution context, including prompt, vars, test, logProbs, config, provider, providerResponse, and optional trace data for debugging

Type definition

type AssertionValueFunction = (
  output: string,
  context: AssertionValueFunctionContext,
) => AssertionValueFunctionResult | Promise<AssertionValueFunctionResult>;

interface AssertionValueFunctionContext {
prompt: string | undefined;
vars: Record<string, unknown>;
test: AtomicTestCase;
logProbs: number[] | undefined;
config?: Record<string, any>;
provider: ApiProvider | undefined;
providerResponse: ProviderResponse | undefined;
trace?: TraceData;
}

type AssertionValueFunctionResult = boolean | number | GradingResult;

interface GradingResult {
// Whether the test passed or failed
pass: boolean;

// Test score, typically between 0 and 1
score: number;

// Plain text reason for the result
reason: string;

// Map of labeled metrics to values
namedScores?: Record<string, number>;

// Weighted denominator for namedScores when assertion weights are used
namedScoreWeights?: Record<string, number>;

// Record of tokens usage for this assertion
tokensUsed?: Partial<{
total: number;
prompt: number;
completion: number;
cached?: number;
}>;

// Additional matcher/provider metadata
metadata?: Record<string, unknown>;

// List of results for each component of the assertion
componentResults?: GradingResult[];

// The assertion that was evaluated
assertion?: Assertion;
}

For more info on different assertion types, see assertions & metrics.

Transform functions

When using the node package, you can pass JavaScript functions directly as transform, transformVars, or contextTransform values — instead of string expressions or file:// references.

This enables better IDE support, type checking, and debugging:

import promptfoo from 'promptfoo';

const results = await promptfoo.evaluate({
  prompts: ['What tools did you use to answer: {{question}}'],
  providers: ['openai:gpt-5-mini'],
  tests: [
    {
      vars: { question: 'What is 2+2?' },
      options: {
        // Transform the output before assertions
        transform: (output, context) => {
          return output.toUpperCase();
        },
      },
      assert: [
        {
          type: 'contains',
          value: 'calculator',
          // Transform just for this assertion
          transform: (output, context) => {
            const tools = context.metadata?.toolCalls ?? [];
            return tools.map((t) => t.name).join(', ');
          },
        },
      ],
    },
  ],
});

Transform functions receive:

output: the LLM output (string or object)
context: an object containing vars, prompt, and optionally metadata from the provider response

note

Function transforms are not serializable. If you use writeLatestResults: true, function transforms will not be persisted in the stored config. Use string expressions or file:// references if you need results to be fully reproducible from the stored eval.

For more on transforms, see Transforming Outputs.

Example

promptfoo exports an evaluate function that you can use to run prompt evaluations.

import promptfoo from 'promptfoo';

const results = await promptfoo.evaluate(
  {
    prompts: ['Rephrase this in French: {{body}}', 'Rephrase this like a pirate: {{body}}'],
    providers: ['openai:gpt-5-mini'],
    tests: [
      {
        vars: {
          body: 'Hello world',
        },
      },
      {
        vars: {
          body: "I'm hungry",
        },
      },
    ],
    writeLatestResults: true, // write results to disk so they can be viewed in web viewer
  },
  {
    maxConcurrency: 2,
  },
);

console.log(results);

This code imports the promptfoo library, defines the evaluation options, and then calls the evaluate function with these options.

You can also supply functions as prompts, providers, or asserts:

import promptfoo from 'promptfoo';

(async () => {
  const results = await promptfoo.evaluate({
    prompts: [
      'Rephrase this in French: {{body}}',
      (vars) => {
        return `Rephrase this like a pirate: ${vars.body}`;
      },
    ],
    providers: [
      'openai:gpt-5-mini',
      (prompt, context) => {
        // Call LLM here...
        console.log(`Prompt: ${prompt}, vars: ${JSON.stringify(context.vars)}`);
        return {
          output: '<LLM output>',
        };
      },
    ],
    tests: [
      {
        vars: {
          body: 'Hello world',
        },
      },
      {
        vars: {
          body: "I'm hungry",
        },
        assert: [
          {
            type: 'javascript',
            value: (output) => {
              const pass = output.includes("J'ai faim");
              return {
                pass,
                score: pass ? 1.0 : 0.0,
                reason: pass ? 'Output contained substring' : 'Output did not contain substring',
              };
            },
          },
        ],
      },
    ],
  });
  console.log('RESULTS:');
  console.log(results);
})();

There's a full example on Github here.

Here's the example output in JSON format:

{
  "results": [
    {
      "prompt": {
        "raw": "Rephrase this in French: Hello world",
        "display": "Rephrase this in French: {{body}}"
      },
      "vars": {
        "body": "Hello world"
      },
      "response": {
        "output": "Bonjour le monde",
        "tokenUsage": {
          "total": 19,
          "prompt": 16,
          "completion": 3
        }
      }
    },
    {
      "prompt": {
        "raw": "Rephrase this in French: I&#39;m hungry",
        "display": "Rephrase this in French: {{body}}"
      },
      "vars": {
        "body": "I'm hungry"
      },
      "response": {
        "output": "J'ai faim.",
        "tokenUsage": {
          "total": 24,
          "prompt": 19,
          "completion": 5
        }
      }
    }
    // ...
  ],
  "stats": {
    "successes": 4,
    "failures": 0,
    "tokenUsage": {
      "total": 120,
      "prompt": 72,
      "completion": 48
    }
  },
  "table": [
    ["Rephrase this in French: {{body}}", "Rephrase this like a pirate: {{body}}", "body"],
    ["Bonjour le monde", "Ahoy thar, me hearties! Avast ye, world!", "Hello world"],
    [
      "J'ai faim.",
      "Arrr, me belly be empty and me throat be parched! I be needin' some grub, matey!",
      "I'm hungry"
    ]
  ]
}

To get a shareable URL, set sharing: true along with writeLatestResults: true:

const results = await promptfoo.evaluate({
  prompts: ['Your prompt here'],
  providers: ['openai:gpt-5-mini'],
  tests: [{ vars: { input: 'test' } }],
  writeLatestResults: true,
  sharing: true,
});

console.log(results.shareableUrl); // https://app.promptfoo.dev/eval/abc123

Requires a Promptfoo Cloud account or self-hosted server. For self-hosted, pass sharing: { apiBaseUrl, appBaseUrl } instead of true.

Installation​

Usage​

Provider functions​

Assertion functions​

Transform functions​

Example​

Sharing Results​