Skip to content

Nodes: Machine Learning#257

Merged
felix-schultz merged 29 commits intodevfrom
node/machine-learning
Sep 2, 2025
Merged

Nodes: Machine Learning#257
felix-schultz merged 29 commits intodevfrom
node/machine-learning

Conversation

@simonjanssen
Copy link
Copy Markdown
Collaborator

@simonjanssen simonjanssen commented Aug 31, 2025

Introducing Machine Learning Nodes

This PR implements #213 adding a core set of advanced machine learning nodes and related utilities:

  • Train Clustering Node fitting a KMeans model
  • Train Classification Node fitting Support Vector Machines (one for every class)
  • Train Regression Node fitting a Linear Regression
  • A generic Predict Node to predict with any of the above models
  • Generic Save and Load Nodes to serialize / deserialize machine learning models as JSON checkpoints for persistence across runs & ship with apps.
  • Train nodes expect an existing database previously populated with a vector column, supervised algorithms for classification and regression a additionally require a target column for fitting
  • Prediction can be done on database columns (expanding the table in-place with a prediction column) or directly on vectors (e.g. for online-prediction in chat).

Use Cases

Apart from typical machine learning cases on tabular data we can use the multi-modal Embedding Models available in the Model Catalog as powerful feature extractors for our images, documents, emails, etc. Fused with conventional machine learning algorithms we can build highly reliable predictors for our input data - natively in flow like.

Training Workflow Example

A typical training workflow would look like this:
image

Predict Workflow Example (Offline-Predictions on Database Records)

image

Prediction Workflow Example (Online-Prediction In-Chat)

Here we are using a previously trained classifier to predict the class for all images attached by the user:
image


Prompts to the Linfa Crate providing unified apis for machine learning models and dataset transforms! I'd like also stress this PR to upgrade to latest ndarray version: rust-ml/linfa#371

@simonjanssen simonjanssen linked an issue Aug 31, 2025 that may be closed by this pull request
@simonjanssen simonjanssen changed the base branch from main to dev August 31, 2025 12:12
@simonjanssen simonjanssen requested a review from Copilot September 1, 2025 17:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds comprehensive machine learning capabilities to the node-based flow system by implementing various ML algorithms and data processing utilities based on the Linfa crate. The changes restructure the AI module organization and add support for clustering, classification, regression, and model persistence operations.

Key changes include:

  • Reorganization of ONNX-related modules from ai::ml::onnx to ai::onnx
  • Addition of traditional ML algorithms including KMeans clustering, SVM classification, and linear regression
  • Implementation of model save/load functionality for ML model persistence
  • Centralized utility functions for pin management and data transformation

Reviewed Changes

Copilot reviewed 24 out of 31 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/core/src/flow/node.rs Adds utility functions for pin removal operations
packages/catalog/src/ai/onnx/*.rs Updates import paths following module reorganization
packages/catalog/src/ai/ml.rs Complete rewrite adding ML model types, data utilities, and node registration
packages/catalog/src/ai/ml/*.rs New ML algorithm implementations (clustering, classification, regression)
packages/catalog/src/ai.rs Updates module organization and registration logic
packages/catalog/Cargo.toml Adds dependencies for Linfa ML algorithms

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@simonjanssen simonjanssen marked this pull request as ready for review September 2, 2025 05:05
@simonjanssen simonjanssen mentioned this pull request Sep 2, 2025
8 tasks
Copy link
Copy Markdown
Member

@felix-schultz felix-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@felix-schultz felix-schultz merged commit 7cf8d72 into dev Sep 2, 2025
1 of 4 checks passed
@simonjanssen simonjanssen deleted the node/machine-learning branch September 4, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Machine Learning Nodes

3 participants