Skip to content

Add experimental AI functions llmClassify/Extract/GenerateSQL/Translate/GenerateContent and generateEmbedding[OrNull]#99579

Closed
melvynator wants to merge 27 commits intoClickHouse:masterfrom
melvynator:feature/llm-functions
Closed

Add experimental AI functions llmClassify/Extract/GenerateSQL/Translate/GenerateContent and generateEmbedding[OrNull]#99579
melvynator wants to merge 27 commits intoClickHouse:masterfrom
melvynator:feature/llm-functions

Conversation

@melvynator
Copy link
Copy Markdown
Member

@melvynator melvynator commented Mar 16, 2026

Introduces a bunch oif experimental AI functions that call external LLM/embedding providers (OpenAI, Anthropic) directly from SQL queries without leaving ClickHouse.

Uses Named Collections for storing provider credentials and query-level settings for operational parameters (like temperature and error handling).

All functions are gated behind the allow_experimental_ai_functions setting.

Changelog category (leave one):

  • Experimental Feature

Changelog entry (a user-readable short description):

Introduces experimental AI functions llmClassify, llmExtract, llmGenerateSQL, llmTranslate, llmGenerateContent, generateEmbedding, generateEmbeddingOrNull that call an external LLM and embedding providers (OpenAI, Anthropic) directly from SQL queries without leaving ClickHouse.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Documentation is in progress.

The functions works in the following ways:

LLMClassify

SELECT
    LLMClassify(message, ['Fatal', 'Critical', 'Error', 'Warning', 'Notice', 'Information', 'Debug', 'Trace', 'Test']),
    level,
    left(message, 30)
FROM text_log_no_index
WHERE level IN ('Fatal', 'Critical', 'Error', 'Warning')
LIMIT 240, 10

LLMTranslate

SELECT LLMTranslate('Here are some new ClickHouse functions to ease the use of LLM directly within the database!', 'ru', 'Do not translate technical content')

LLMExtract

SELECT LLMExtract(text, '{"company": "company name", "location": "city"}') AS info
FROM default.hackernews
LIMIT 10

LLMGenerateSQL

SELECT LLMGenerateSQL('Show me the top 5 categories')

LLMGenerateContent

SELECT LLMGenerateContent('Summarize the following log entry')

generateEmbedding / generateEmbeddingOrNull

SELECT generateEmbedding('ClickHouse is fast', 256) AS emb
SELECT generateEmbeddingOrNull('test_embedding', text, 4) AS emb FROM test

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 16, 2026

Workflow [PR], commit [e35f8d2]

Summary:

job_name test_name status info comment
Style check failure
various failure cidb
aspell failure cidb
Fast test failure
Build ClickHouse failure
Docs check dropped
Build (amd_debug) dropped
Build (amd_asan_ubsan) dropped
Build (amd_tsan) dropped
Build (amd_msan) dropped
Build (amd_binary) dropped
Build (arm_asan_ubsan) dropped
Build (arm_binary) dropped

AI Review

Summary

This PR introduces experimental LLM/embedding SQL functions, provider integrations, quota/caching logic, and docs. The feature scope is valuable, but there are multiple correctness/safety issues (including out-of-bounds argument access, quota/accounting inconsistencies, potential metadata leakage, and one setting wiring bug) that make current behavior unreliable in production-like environments. I cannot approve until the blockers/majors below are addressed.

Findings

❌ Blockers

  • [src/Functions/FunctionBaseLLM.cpp:70] llm_max_rows_per_query is initialized from llm_cache_ttl_sec, so row quota ignores the intended setting and follows cache TTL instead.
  • [src/Functions/LLM/LLMGenerateSQL.cpp] Named-collection arity ambiguity can make arguments[1] out-of-bounds for single-argument calls.
  • [src/Functions/llmGenerateContent.cpp:34] Same named-collection arity out-of-bounds risk.
  • [src/Functions/llmClassify.cpp:42] Named-collection path can make arguments[idx + 1] out-of-bounds.
  • [src/Functions/llmExtract.cpp:36] Named-collection path can make arguments[idx + 1] out-of-bounds.
  • [src/Functions/llmTranslate.cpp:33] Named-collection path can make arguments[idx + 1] out-of-bounds.
  • [src/Functions/generateEmbedding.cpp:355] Empty provider response can dereference resp.embeddings.back() (undefined behavior).
  • [src/Functions/LLM/OpenAIProvider.cpp:26] extractProviderError duplicate non-inline definition across providers risks ODR/link issues.

⚠️ Majors

  • [src/Functions/llmGenerateSQL.cpp:95] Schema introspection enumerates databases/tables without explicit access filtering, risking metadata disclosure.
  • [src/Functions/generateEmbedding.cpp:162] Allowing arbitrary URL-like first argument enables user-controlled outbound endpoints (SSRF risk) unless additionally gated.
  • [src/Functions/LLM/OpenAIProvider.cpp, src/Functions/LLM/AnthropicProvider.cpp] Raw provider response bodies are surfaced in exceptions; can leak sensitive prompt/context data.
  • [src/Functions/FunctionBaseLLM.cpp:207, src/Functions/generateEmbedding.cpp:250] Cache key omits resource identity; cross-collection/provider cache collisions are possible.
  • [src/Functions/FunctionBaseLLM.cpp:274] API-call quota is reserved once per dedup key, but retries issue extra calls; effective API-call cap can be bypassed.
  • [src/Functions/FunctionBaseLLM.cpp:248, src/Functions/FunctionBaseLLM.cpp:276] Input-token cap is not effectively enforced pre-dispatch (always passed as 0), so configured limit can be exceeded before control.
  • [src/Functions/generateEmbedding.cpp:332] Embedding flow does not call recordResponse, so token quota accounting is inconsistent vs text LLM functions.
  • [src/Functions/LLM/LLMQuotaTracker.cpp:77] Output-token checks are post-fetch_add; concurrent requests can significantly overshoot quota before first rejection.
  • [src/Functions/FunctionBaseLLM.cpp:305] Exponential backoff expression may overflow/shift unsafely for large retry counts.
  • [src/Functions/FunctionBaseLLM.cpp:349] Nulling decision uses global isQuotaExceeded and can misclassify non-quota failures as quota-null rows.
  • [src/Core/Settings.cpp:7809 vs src/Functions/LLM/LLMQuotaTracker.cpp] llm_on_quota_exceeded docs/runtime accepted values are inconsistent.
  • [src/Core/SettingsChangesHistory.cpp] Same new settings appear in two version blocks, making compatibility history inconsistent.
  • [src/Functions/llmGenerateSQL.cpp] Signature/docs mention optional tables/database controls, but implementation ignores them (contract mismatch).
  • [src/Functions/generateEmbedding.cpp:354] When fewer embeddings are returned than inputs, reusing back() silently assigns incorrect vectors to unrelated rows.
  • [src/Functions/generateEmbedding.cpp:156] Non-constant collection argument silently falls back to default resource instead of deterministic rejection.
  • [src/Functions/llmTranslate.cpp:46, src/Functions/llmExtract.cpp:54, src/Functions/llmGenerateContent.cpp:55] Non-constant prompt-control args are read from row 0 and reused for all rows (silent wrong results).

Tests

⚠️ Add argument-safety tests for each function with named-collection-first forms to ensure deterministic NUMBER_OF_ARGUMENTS_DOESNT_MATCH instead of out-of-bounds access.

⚠️ Add quota-behavior tests covering retries, embedding token accounting, and concurrent output-token limits to verify settings are actually enforced.

⚠️ Add access-control test for llmGenerateSQL schema prompt construction to confirm unauthorized databases/tables/columns are excluded.

ClickHouse Rules

Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny Multiple correctness/safety issues in new core execution path (FunctionBaseLLM, quotas, cache).
No test removal
Experimental gate
No magic constants
Backward compatibility ⚠️ SettingsChangesHistory has duplicate introduction points for new settings.
SettingsChangesHistory.cpp Same settings added in more than one compatibility block.
PR metadata quality
Safe rollout Out-of-bounds paths and quota/accounting mismatches require fixes before rollout.
Compilation time

Performance & Safety

Quota/call accounting under concurrency and retry paths is currently inconsistent, allowing quota overshoot and API-call cap bypass in realistic workloads. Caching currently lacks resource identity in keys, which can return cross-resource results and violate isolation assumptions.

User-Lens

Several argument-handling paths can silently ignore user inputs (row-0 reuse, default-resource fallback), producing plausible but incorrect results without explicit exception, which is particularly risky for AI-assisted SQL/content generation workflows.

Final Verdict

  • Status: ❌ Block
  • Minimum required actions:
    • Fix all named-collection arity out-of-bounds paths and add regression tests.
    • Fix quota/accounting consistency (llm_max_rows_per_query initializer, retries/API-call accounting, embedding token accounting).
    • Eliminate sensitive raw provider body leakage and address cache-key resource isolation.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 16, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ rschu1ze
✅ melvynator
❌ Ubuntu


Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@alexey-milovidov

This comment was marked as resolved.

@melvynator

This comment was marked as resolved.

@alexey-milovidov

This comment was marked as resolved.

@melvynator

This comment was marked as resolved.

@melvynator

This comment was marked as resolved.

@melvynator melvynator added the pr-experimental Experimental Feature label Mar 21, 2026
@melvynator

This comment was marked as resolved.

@melvynator melvynator marked this pull request as ready for review March 21, 2026 17:51
@melvynator melvynator force-pushed the feature/llm-functions branch 2 times, most recently from 1bd3b70 to 2fa75e6 Compare March 21, 2026 23:00
@melvynator melvynator changed the title Add LLM functions: LLMClassify, LLMExtract, LLMGenerateSQL, LLMTranslate Add experimental AI functions: LLMClassify, LLMExtract, LLMGenerateSQL, LLMTranslate, LLMGenerateContent, generateEmbedding, generateEmbeddingOrNull Mar 21, 2026

DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
if (arguments.empty() || arguments.size() > 5)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getReturnTypeImpl accepts one argument, but when that argument matches an existing named collection, hasNamedCollectionArg makes getFirstDataArgIndex return 1. Later executeImpl reads arguments[1] (for the prompt), which is out-of-bounds for a single-argument call.

That means LLMGenerateSQL('existing_collection') can hit undefined behavior instead of returning a clean arity exception. Please add an explicit guard for this ambiguous case (e.g. if first arg resolves to a named collection, require at least 2 arguments).


DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
if (arguments.empty() || arguments.size() > 5)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getReturnTypeImpl accepts one argument, but when that argument matches an existing named collection, hasNamedCollectionArg makes getFirstDataArgIndex return 1. Later executeImpl reads arguments[1] (for the prompt), which is out-of-bounds for a single-argument call.

That means LLMGenerateSQL('existing_collection') can hit undefined behavior instead of returning a clean arity exception. Please add an explicit guard for this ambiguous case (e.g. if first arg resolves to a named collection, require at least 2 arguments).

{
factory.registerFunction<FunctionLLMGenerateSQL>(FunctionDocumentation{
.description = "Generates a ClickHouse SQL query from a natural language description using an LLM.",
.syntax = "LLMGenerateSQL([collection,] query[, tables][, database][, temperature])",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature/docs expose optional tables and database arguments, but this implementation never reads them: buildSystemPrompt always introspects the catalog and buildUserMessage only uses the first data argument.

This creates a user-visible contract mismatch (queries pass extra arguments that are silently ignored). Please either wire these arguments into prompt construction or remove them from the signature and documentation.


DataTypePtr getReturnTypeImpl(const ColumnsWithTypeAndName & arguments) const override
{
if (arguments.empty() || arguments.size() > 4)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same arity bug as in LLMGenerateSQL: a single argument is accepted here, but if that argument matches an existing named collection, getFirstDataArgIndex becomes 1 and executeImpl reads arguments[1] for the prompt.

LLMGenerateContent('existing_collection') should fail with a deterministic arity error, not access out-of-bounds memory. Please add an explicit check that when first arg is a named collection, at least one prompt argument is present.

for (size_t i = 0; i < batch_items.size(); ++i)
{
auto & embedding = (i < resp.embeddings.size()) ? resp.embeddings[i] : resp.embeddings.back();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When resp.embeddings is empty, this fallback dereferences resp.embeddings.back(), which is undefined behavior.

If the provider returns HTTP 200 with malformed payload (or no data entries), resp.embeddings.size() is 0, so the first iteration goes to the fallback and can throw an exception or corrupt memory.

Please guard the empty-response case before calling .back() (for example, fail the batch with a clear exception or map all rows in the batch to empty embeddings in or_null mode).

@melvynator melvynator force-pushed the feature/llm-functions branch from 64aa872 to 27b8bec Compare March 21, 2026 23:58
{
throw Exception(
ErrorCodes::RECEIVED_ERROR_FROM_REMOTE_IO_SERVER,
"LLM provider returned HTTP {}: {}", static_cast<int>(status), response_body);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception includes the full remote response_body in the query error text. Provider error payloads often echo request details (prompt content and sometimes credential/context details), so this can leak sensitive data into user-visible exceptions and logs.

Please avoid returning the raw body verbatim. A safer pattern is to include only HTTP status + provider error code/message extracted from JSON, or a truncated/sanitized snippet.

text_arg_idx = 1;
dim_arg_idx = 2;

const auto * first_const = checkAndGetColumn<ColumnConst>(arguments[0].column.get());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the first argument is not constant, checkAndGetColumn<ColumnConst> returns null and first_arg becomes empty, so the function silently falls back to default_llm_resource instead of using (or rejecting) the provided argument.

That makes calls like generateEmbedding(concat('my_', 'nc'), text, 256) behave incorrectly and unexpectedly. Since endpoint/model selection is query-level, this argument should be required to be a constant string.

Please throw BAD_ARGUMENTS when arguments[0] is not constant in the 3-argument form.

Maximum number of texts to include in a single HTTP request for embedding functions (generateEmbedding, generateEmbeddingOrNull). Texts are grouped into batches of this size to reduce API call overhead. For example, 500 unique texts with batch size 100 results in 5 HTTP requests.
)", EXPERIMENTAL) \
DECLARE(String, llm_on_quota_exceeded, "throw", R"(
Behavior when an AI function quota limit (llm_max_rows_per_query, llm_max_input_tokens_per_query, llm_max_output_tokens_per_query, or llm_max_api_calls_per_query) is exceeded. Possible values: 'throw' (default) aborts the query with an exception; 'null' stops making API calls and returns NULL (for LLM functions) or an empty array (for embedding functions) for remaining rows. Ignored by generateEmbeddingOrNull which always uses 'null'.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented allowed values for llm_on_quota_exceeded here are 'throw' and 'null', but runtime behavior in LLMQuotaTracker::isGracefulQuotaMode also accepts 'break' and 'partial', and quota exceptions explicitly instruct users to set 'break'.

This mismatch makes troubleshooting hard because users get guidance that contradicts the setting docs.

Please align these surfaces (setting docs, validation/accepted values, and exception text) to one canonical set of values.

for (auto & [dispatch_key, representative_row] : to_dispatch)
{
UInt64 rows_for_key = dedup_map[dispatch_key].size();
if (!quota->checkBeforeDispatch(0, rows_for_key))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm_max_input_tokens_per_query is currently not enforced in practice. checkBeforeDispatch only checks input-token quota when estimated_input_tokens > 0, but all call sites pass 0 (here and in LLMGenerateEmbedding). recordResponse updates input_tokens after the response but never checks the max, so queries can exceed the configured input-token budget without exception.

Please either estimate and pass non-zero input tokens before dispatch, or add an input-token limit check inside recordResponse (with consistent throw/null behavior).


for (size_t i = 0; i < batch_items.size(); ++i)
{
const auto & embedding = (i < resp.embeddings.size()) ? resp.embeddings[i] : resp.embeddings.back();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the provider returns fewer embeddings than requested, this fallback silently reuses the last embedding for all missing rows. That can produce incorrect results by assigning unrelated vectors to inputs without any error.

It is safer to treat size mismatch as an exception (or, for generateEmbeddingOrNull, return empty arrays for affected rows) instead of duplicating resp.embeddings.back().


String user_message = buildUserMessage(arguments, i);
std::vector<String> cache_args = {user_message, system_prompt};
UInt128 key = LLMResultCache::buildKey(functionName(), config.model, temperature, cache_args);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMResultCache::buildKey currently excludes the selected provider/resource identity (e.g. collection name, provider, endpoint).

That means two queries using different named collections can collide in cache when functionName, model, temperature, and prompt text are equal, and then one query can reuse another resource's response. This is incorrect and can leak cross-resource results.

Please include resource identity in the cache key (for example provider + normalized endpoint, or collection identifier) for all LLM text functions.

}

std::vector<String> cache_args = {text, std::to_string(dimensions)};
UInt128 key = LLMResultCache::buildKey(name, model, 0, cache_args);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same cache-key issue in embeddings: LLMResultCache::buildKey is computed from function name, model, dimensions, and text, but not from selected resource identity (provider/endpoint/collection).

If two named collections use the same model and input text but different endpoints/accounts, cached vectors can be reused across resources. For embedding workloads this is a correctness and isolation issue.

Please include resource identity in the embedding cache key as well.

{
if (attempt < max_retries && e.code() == ErrorCodes::RECEIVED_ERROR_FROM_REMOTE_IO_SERVER)
{
std::this_thread::sleep_for(std::chrono::milliseconds(retry_delay_ms * (1ULL << attempt)));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retry_delay_ms * (1ULL << attempt) can trigger undefined behavior when attempt >= 64 (left-shift past bit width), and the multiplication can overflow before conversion to std::chrono::milliseconds.

Because llm_max_retries is user-configurable and currently unbounded, this path is reachable and can lead to incorrect sleep durations or undefined behavior.

Please clamp the exponent / delay before shifting (or use saturating arithmetic), e.g. cap attempt to a safe max and bound the computed delay to a reasonable upper limit.

LLM functions require external HTTP access and named collection
configuration that is not available in the unit test environment.
@melvynator melvynator force-pushed the feature/llm-functions branch from 3e7acf5 to 82cecf1 Compare March 22, 2026 15:44
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Mar 22, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.90% 83.80% -0.10%
Functions 24.50% 24.80% +0.30%
Branches 76.50% 76.40% -0.10%

PR changed lines: PR changed-lines coverage: 37.35% (583/1561, 0 noise lines excluded)
Diff coverage report
Uncovered code

extern const int RECEIVED_ERROR_FROM_REMOTE_IO_SERVER;
}

String extractProviderError(const std::string & response_body, int status_code)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractProviderError is defined with external linkage in both OpenAIProvider.cpp and AnthropicProvider.cpp in namespace DB. This creates duplicate non-inline definitions of the same symbol and can fail at link time (or violate ODR depending on toolchain behavior).

Please move this helper into an anonymous namespace / mark it static, or share a single implementation in one .cpp file and reuse it.

for (const auto * item : batch_items)
req.inputs.push_back(item->text);

auto resp = provider->embed(req, timeouts);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateEmbedding does not call quota->recordResponse after provider->embed, so embedding requests never contribute to llm_max_input_tokens_per_query / llm_max_output_tokens_per_query enforcement.

As a result, token quotas can be bypassed via embedding functions even when limits are configured. Please record response token usage here (similar to LLMFunctionBase::executeImpl) so quota behavior is consistent across all AI functions.

WHERE database = currentDatabase() AND table = '_03300_ret_embedding';
DROP TABLE IF EXISTS _03300_ret_embedding;

-- generateEmbeddingOrNull returns Nullable(Array(Float32))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in this comment: generateEmbeddingOrNull returns Array(Float32), not Nullable(Array(Float32)).

generateEmbedding and generateEmbeddingOrNull share GenerateEmbeddingFunction::getReturnTypeImpl, which returns Array(Float32), and the .reference file also expects Array(Float32).

Please update this comment to avoid confusion.

req.temperature = temperature;
req.max_tokens = config.max_tokens;

auto resp = provider->call(req, timeouts);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm_max_api_calls_per_query is effectively bypassed by retries here.

checkBeforeDispatch increments api_calls once before entering the retry loop, but each retry iteration still sends a new HTTP request via provider->call without any additional quota check. With llm_max_api_calls_per_query = 1 and llm_max_retries = 3, a single row can still emit up to 4 remote calls.

Please account API-call quota per actual HTTP attempt (including retries), e.g. by moving/adding the API-call reservation check inside the retry loop before provider->call.

@rschu1ze rschu1ze force-pushed the feature/llm-functions branch from a17b70d to e35f8d2 Compare March 23, 2026 12:50
@rschu1ze rschu1ze changed the title Add experimental AI functions: LLMClassify, LLMExtract, LLMGenerateSQL, LLMTranslate, LLMGenerateContent, generateEmbedding, generateEmbeddingOrNull Add experimental AI functions llmClassify/Extract/GenerateSQL/Translate/GenerateContent and generateEmbedding[OrNull] Mar 23, 2026
@rschu1ze rschu1ze self-assigned this Mar 23, 2026
Copy link
Copy Markdown
Member

@pjhampton pjhampton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of initial comments on this, skimming through the PR, though it looks okay directionally.

  1. llm<Func> is probably erroneous naming. ai<Func> probably makes more sense, as most/all of these functions shouldn't actually need an LLM. We will be using SLMs, Encoders, and possibly even simpler classifiers/transformers in the cloud to support these. If you are creating these functions with the sole purpose of calling OpenAI / Anthropic, then that is fine - but we will likely need a separate set of functions. I'm not a big fan of the generateEmbedding naming either - seems inconsistent.
  2. The functions need to be able to take into consideration the underlying model, which should be able to change from the functions arity - ie. embedFunc(model, text) - we could be providing 3+ models in some cases for users to use.
  3. Not all of these functions we will want to support I don't think - Looking at GenerateSQL - but maybe I'm missing context
  4. I'm confused how the caching works here, and how it benefits the query performance - especially around embeddings. It would be good if you could talk me through the code at some point.
  5. The system prompts you hardcoded would be good to override from a users PoV and could be developed further. In the case of using this in our ai cloud we will likely own the prompts on our side.
  6. These calls can get very expensive when hooked up to an external provider - we noticed recently that snowflake provided a way for users to be granted to use similar functions - it is probably a good time to implement this here. We might be able to do something here in the control plane but ideally in the core.

@melvynator
Copy link
Copy Markdown
Member Author

I'm confused how the caching works here, and how it benefits the query performance - especially around embeddings. It would be good if you could talk me through the code at some point.

It works in a similar fashion a the query cache.

The system prompts you hardcoded would be good to override from a users PoV and could be developed further. In the case of using this in our ai cloud we will likely own the prompts on our side.

Agree that's a good idea.

These calls can get very expensive when hooked up to an external provider - we noticed recently that snowflake provided a way for users to be granted to use similar functions - it is probably a good time to implement this here. We might be able to do something here in the control plane but ideally in the core.

Yeah I've added a lot of settings to be able to limit how much call a single query can do to a provider but we should have a longer term plan for it. I have a few ideas.

Let's discuss all of that on Wednesday and define what goes in a V1 and V2.

@rschu1ze
Copy link
Copy Markdown
Member

Somehow, I can't merge master again to this branch.

Let's continue in #100735

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-experimental Experimental Feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants