Skip to content

Supports AI-powered functions using LLM.#85442

Closed
fastio wants to merge 6 commits intoClickHouse:masterfrom
fastio:llm
Closed

Supports AI-powered functions using LLM.#85442
fastio wants to merge 6 commits intoClickHouse:masterfrom
fastio:llm

Conversation

@fastio
Copy link
Copy Markdown
Contributor

@fastio fastio commented Aug 12, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

In ClickHouse scenarios, invoking LLM capabilities through SQL enables the analysis of unstructured and text, improving both efficiency and cost-effectiveness. For example, we can use LLMs to perform tasks such as text classification, semantic recognition, text generation, and summarization.

llmComplete: Generates a completion for given text using LLMs.
llmEmbed: Generates an embedding vector for text or other type data. (still working on)

Run the llmComplete example:

SELECT
    product_name,
    price,
    llmComplete('{"model_name":"llama3", "batch_size": 10, "parameters":{"temperature":0.2}}', '{"prompt":"Analyze the potential use cases for each product."}', product_name) AS use_cases
FROM products
WHERE product_name != 'Laptop'

Query id: c32bc9e8-80e9-4ff8-af32-3eacb4d38114

   ┌─product_name─┬─price─┬─use_cases─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Camera       │  4599 │ The potential use case for Camera is capturing life's moments, taking photos and videos.                                                                              │
2. │ Headphones   │   699 │ The potential use case for Headphones is listening to music or podcasts, watching videos, gaming, or making hands-free phone calls.                                   │
3. │ Keyboard     │   299 │ The potential use case for Keyboard is typing documents, emails, and messages; data entry, coding, or writing reports.                                                │
4. │ Monitor      │  1499 │ The potential use case for Monitor is displaying visual information such as images, videos, web pages, and apps.                                                      │
5. │ Mouse        │   199 │ The potential use case for Mouse is navigating through menus, clicking buttons, selecting options, or manipulating digital content.                                   │
6. │ Printer      │   899 │ The potential use case for Printer is printing documents, photos, or physical copies of digital files.                                                                │
7. │ Smartphone   │  4999 │ The potential use case for Smartphone is making phone calls, sending texts, taking photos, accessing apps, playing games, or browsing the internet.                   │
8. │ Smartwatch   │  1999 │ The potential use case for Smartwatch is tracking fitness and health metrics, receiving notifications, controlling music playback, or viewing smart home information. │
9. │ Tablet       │  3299 │ The potential use case for Tablet is reading digital books, watching movies, playing games, browsing the internet, or creating art and designs.                       │
   └──────────────┴───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Run the llmEmbedding example:

SELECT
    product_name,
    llmEmbedding('{"model_name":"embed", "batch_size": 10, "parameters":{"temperature":0.2}, "dimensions":16}', product_description) AS use_cases
FROM products
WHERE product_name != 'Laptop'

Query id: 1d8d3c1a-8fe3-4f45-9516-dcf2c3fae8e8

   ┌─product_name─┬─use_cases────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Camera       │ [0.0067377584,0.06458207,-0.4162463,-0.2510257,-0.19899493,-0.03644437,-0.13226773,0.3917827,-0.10269234,-0.018336289,0.3815591,-0.06453643,-0.4743017,-0.04429463,0.37425652,0.14130466]    │
2. │ Headphones   │ [0.031977385,0.3291715,-0.11023181,-0.44702303,-0.21985404,-0.18023156,0.11073979,0.25886697,-0.020878008,-0.107183926,-0.26516593,0.087321885,-0.46409118,0.31941828,0.30072457,0.14436811] │
3. │ Keyboard     │ [0.35101733,0.18461668,-0.1438864,-0.16619596,0.37373623,-0.08396784,-0.19761354,0.54034156,-0.16507024,-0.25911832,-0.06871957,0.2194114,-0.14480744,0.22063945,0.309673,-0.061862964]      │
4. │ Monitor      │ [-0.28191116,-0.25697553,-0.19983137,-0.03887534,-0.14459202,0.04775,-0.052728467,0.0135068,-0.100694925,0.0036662086,0.19861922,0.20190933,-0.29212782,0.2999202,0.52434087,-0.4962883]     │
5. │ Mouse        │ [0.38233718,0.016063813,-0.36136925,-0.11933704,-0.105740644,-0.363335,-0.33368504,0.4232902,0.09804148,-0.0056412756,-0.04291874,0.3397461,-0.19378959,0.22753486,0.19100478,0.14931463]    │
6. │ Printer      │ [0.055755656,0.26455268,0.18583152,-0.33397606,-0.06675802,-0.07946498,-0.24620508,0.44381377,-0.17517008,-0.14231794,-0.03492864,0.01529484,-0.48050898,0.24260993,-0.0630699,-0.4078624]   │
7. │ Smartphone   │ [0.2393374,-0.1567317,-0.22009131,-0.02128264,0.2993623,-0.11223704,0.30336395,0.16168614,0.08489236,-0.2848801,-0.2989812,-0.41960266,-0.37329775,-0.38396883,-0.06974322,-0.058357544]     │
8. │ Smartwatch   │ [0.42518342,-0.059206404,0.22182368,0.21400215,0.025355857,0.30952737,0.18887067,0.23272252,0.4044115,-0.23490229,-0.08090794,0.14873727,-0.4533922,0.017774744,0.19105044,0.2113095]        │
9. │ Tablet       │ [0.013582323,-0.1017702,-0.3507887,-0.1564328,0.23094577,-0.4174504,-0.067217216,0.49951836,-0.07321677,-0.19731864,-0.206355,0.11228794,-0.3496036,0.25568467,0.2490185,-0.105843976]       │
   └──────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

9 rows in set. Elapsed: 0.657 sec. 

llmEmbeding, llmFilter are on the way.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Aug 12, 2025

CLA assistant check
All committers have signed the CLA.

@fastio fastio changed the title Impl semantic analysis using SQL Supports semantic analysis using LLMs. Aug 12, 2025
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Aug 12, 2025
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Aug 12, 2025

Workflow [PR], commit [13b66b9]

Summary:

job_name test_name status info comment
Style check failure
cpp failure cidb
PR formatter dropped
Docs check dropped
Fast test failure
Build ClickHouse failure
Build (arm_tidy) failure
Build ClickHouse failure cidb
Build (amd_debug) dropped
Build (amd_asan) dropped
Build (amd_tsan) dropped
Build (amd_msan) dropped
Build (amd_ubsan) dropped

@clickhouse-gh clickhouse-gh bot added pr-feature Pull request with new product feature submodule changed At least one submodule changed in this PR. labels Aug 12, 2025
@iskakaushik
Copy link
Copy Markdown
Contributor

I like this feature overall, would you consider moving some of the remote LLM and prompt models to https://github.com/ClickHouse/ai-sdk-cpp, this is already pulled in as a dependency and we use it.

@iskakaushik iskakaushik self-requested a review August 14, 2025 22:16
@fastio
Copy link
Copy Markdown
Contributor Author

fastio commented Aug 14, 2025

I like this feature overall, would you consider moving some of the remote LLM and prompt models to https://github.com/ClickHouse/ai-sdk-cpp, this is already pulled in as a dependency and we use it.

Thanks for the suggestion — I agree it makes sense to reimplement this based on ai-sdk-cpp and migrate the remote LLM and prompt-related code there as well, especially since it’s already a dependency. I’ll look into making this change.

@fastio fastio changed the title Supports semantic analysis using LLMs. Supports AI-powered functions using LLM. Aug 15, 2025
@fastio
Copy link
Copy Markdown
Contributor Author

fastio commented Aug 18, 2025

Hi @iskakaushik ,As mentioned earlier, I have submitted two PRs to the ai-sdk-cpp project for supporting AI Functions.. I would really appreciate your review.
ClickHouse/ai-sdk-cpp#24
ClickHouse/ai-sdk-cpp#23

@fastio fastio force-pushed the llm branch 2 times, most recently from da79ef6 to 7c8972c Compare August 19, 2025 02:48
@iskakaushik
Copy link
Copy Markdown
Contributor

@fastio could you please take a look at ClickHouse/ai-sdk-cpp#28, it fixes up some issues with ClickHouse/ai-sdk-cpp#24 and also adds comprehensive integration tests.

@iskakaushik
Copy link
Copy Markdown
Contributor

@fastio could you please rebase on to latest main and use the new emberddings API support in ai-sdk-cpp please? :)

@fastio
Copy link
Copy Markdown
Contributor Author

fastio commented Nov 27, 2025

@iskakaushik Hi, from what I’ve seen, AI functions can offer substantial value in various business scenarios.
Could you review the implementation in this PR and share whether you think any refinements or adjustments are needed?

@rschu1ze
Copy link
Copy Markdown
Member

We'll go forward with #99579 (which provides a superset of the functionality in this PR). Thanks anyways @fastio

@rschu1ze rschu1ze closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-feature Pull request with new product feature submodule changed At least one submodule changed in this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants