Embeds the provided input text with ZeroEntropy embedding models.
The results will be returned in the same order as the text provided. The embedding is such that queries will have high cosine similarity with documents that are relevant to that query.
Organizations will, by default, have a ratelimit of 2,500,000 bytes-per-minute and 1000 QPM. Ratelimits are refreshed every 15 seconds. If this is exceeded, requests will be throttled into latency: "slow" mode, up to 20,000,000 bytes-per-minute. If even this is exceeded, you will get a 429 error. To request higher ratelimits, please contact founders@zeroentropy.dev or message us on Discord or Slack!
The model ID to use for embedding. Options are: ["zembed-1"]
The input type. For retrieval tasks, either query or document.
query, document The string, or list of strings, to embed.
The output dimensionality of the embedding model. For zembed-1, the available options are: [2560, 1280, 640, 320, 160, 80, 40].
The output format of the embedding. If float, an array of floats will be returned for each embeddings. If base64, a f32 little endian byte array will be returned, encoded as a base64 string. base64 is significantly more efficient than float. The default is float.
float, base64 Whether the call will be inferenced "fast" or "slow". RateLimits for slow API calls are orders of magnitude higher, but you can expect 2-20 second latency. Fast inferences are guaranteed subsecond, but rate limits are lower. If not specified, first a "fast" call will be attempted, but if you have exceeded your fast rate limit, then a slow call will be executed. If explicitly set to "fast", then 429 will be returned if it cannot be executed fast.
fast, slow