Conversation
Datafusion Nodes WASM implementation
- Removed unused route types from settings routes. - Enhanced event form to support UI-capable event types with unique route paths. - Added validation for route paths in event forms, ensuring uniqueness and proper formatting. - Updated mobile header to improve context management. - Introduced new sink state management for active event sinks. - Simplified route state management by consolidating route-related interfaces and methods. - Updated IndexedDB route state implementation to reflect new route mapping structure. - Added optional metadata for notifications in the backend state. - Cleaned up unused code in Rust templates for better readability.
- Introduced `execution_mode` to the `IBoard` interface to support Hybrid, Remote, and Local execution modes. - Added `only_offline` property to `INode` to indicate nodes that can only run locally. - Enhanced `IRunPayload` and `IRunContext` to include `runtime_variables` for better runtime management. - Updated `IVariable` to include `runtime_configured` to indicate per-user configuration at runtime. - Implemented `prerunBoard` and `prerunEvent` methods in `IBoardState` and `IEventState` for pre-execution analysis. - Created `ExecutionServiceContext` to manage execution logic and runtime variable prompts. - Enhanced run execution state management to track execution metrics and node updates. - Added runtime variables context for managing variable values and configurations.
…ents - Reorganized import statements in various files for consistency and clarity. - Simplified function parameters and destructuring for better readability. - Enhanced formatting of JSX elements for improved visual structure. - Removed unused imports and variables to streamline code. - Adjusted function definitions and callbacks for better maintainability. - Ensured consistent use of optional chaining and nullish coalescing where applicable.
- Removed unused timing variables in `markitdown.rs` to clean up the code. - Updated `next` dependency version in `package.json` from 16.0.10 to 16.1.3 for improved features and fixes. - Simplified error handling in `engine.rs` by using a more concise error mapping for directory reading.
- Moved OAuthTokenInput struct definition from executor types to flow_like_types for better modularity. - Updated imports in executor's lib.rs to reflect the new location of OAuthTokenInput. - Cleaned up unnecessary code in executor's lib.rs by removing redundant struct definitions. - Added Docker Compose configuration for integration tests with PostgreSQL, MySQL, ClickHouse, Oracle, and MinIO. - Created initialization scripts for each database to set up test data and schemas. - Implemented health checks for database services in Docker Compose. - Added scoped user policies and bucket setup for MinIO to facilitate credential testing.
…ality - Updated `dot_product.rs` and `normalize.rs` to conditionally compile execution logic based on the "execute" feature. - Modified `Cargo.toml` in the web package to include optional dependencies for web scraping and email functionalities under the "execute" feature. - Enhanced IMAP and SMTP mail handling in the web package to require the "execute" feature for running mail-related nodes. - Introduced error handling for cases where the "execute" feature is not enabled. - Updated dependencies in various Cargo.toml files to ensure compatibility with the new feature structure. - Improved vector store optimization in `lancedb.rs` to use a new optimization method. - Added iceberg support and federation features in the storage package.
…e with specific client types; update client builder methods to use new API key pattern; adjust provider method to return boxed clients; enhance VectorStore trait with select and fields parameters for search queries; improve LanceDBVectorStore implementation to support new search features; update UI components for better responsiveness.
…port - Added Swedish and Chinese translations for the Business Intelligence preview section. - Integrated the Business Intelligence preview component into multiple language pages (DE, ES, FR, IT, JA, KO, NL, PT, SV, ZH). - Implemented the Business Intelligence preview section in the Astro framework, including dynamic text rendering based on localization. - Enhanced the `AddModelHeadersNode` to support custom HTTP headers for model API calls. - Refactored various model provider implementations (Anthropic, Deepseek, Gemini, Groq, Ollama, OpenAI) to utilize custom headers. - Improved code readability and structure in several Rust test files and modules.
| src={sanitizeImageUrl( | ||
| localMetadata?.thumbnail ?? undefined, | ||
| "/placeholder-thumbnail.webp", | ||
| )} |
Check failure
Code scanning / CodeQL
DOM text reinterpreted as HTML High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, the safest way to handle this pattern is:
- Treat image URLs as data, not HTML, and strictly validate which schemes and origins are allowed before assigning them to
src. - For user uploads, prefer browser-generated
blob:URLs or server-generated HTTPS URLs on trusted origins. - Avoid using
new URL(url, window.location.origin)on arbitrary strings, because it can normalize/interpret unexpected inputs; instead, parse and validate the URL more defensively.
For this specific case, we can harden sanitizeImageUrl in packages/ui/lib/utils.ts without changing how Id uses it:
- Extend the sanitizer to:
- Explicitly allow
blob:URLs only if they look like real blob URLs (blob:followed by an origin + an opaque reference). - Treat bare paths and relative URLs as safe only if they resolve to the current origin and (optionally) look like image paths.
- Continue to validate
data:URLs using the existing regex.
- Explicitly allow
- Replace the
new URL(url, window.location.origin)usage with logic that:- Checks protocol for absolute URLs containing
://or that start withblob:ordata:. - For relative URLs (no scheme), resolves against
window.location.originand confirms the result uses an allowed protocol and origin.
- Checks protocol for absolute URLs containing
- Keep the current API (
sanitizeImageUrl(url, fallback): string) so callers such as<img src={sanitizeImageUrl(localMetadata?.thumbnail, "/placeholder-thumbnail.webp")}>keep working unchanged.
All changes are confined to packages/ui/lib/utils.ts; the apps/desktop/app/library/config/page.tsx file does not need modification, because it already calls the sanitizer.
| @@ -39,21 +39,49 @@ | ||
| fallback: string, | ||
| ): string { | ||
| if (!url) return fallback; | ||
|
|
||
| // Basic trimming to avoid control characters or whitespace tricks | ||
| const trimmed = url.trim(); | ||
| if (!trimmed) return fallback; | ||
|
|
||
| // Allow browser-generated blob URLs directly (e.g. from URL.createObjectURL) | ||
| if (trimmed.startsWith("blob:")) { | ||
| // Very simple shape check: "blob:" followed by something non-empty | ||
| return trimmed.length > "blob:".length ? trimmed : fallback; | ||
| } | ||
|
|
||
| // Validate data URLs for images using a strict allowlist | ||
| if (trimmed.startsWith("data:")) { | ||
| const safeImageDataUrlPattern = | ||
| /^data:image\/(png|jpeg|jpg|webp);base64,[a-zA-Z0-9+/=]+$/; | ||
| return safeImageDataUrlPattern.test(trimmed) ? trimmed : fallback; | ||
| } | ||
|
|
||
| try { | ||
| const parsed = new URL(url, window.location.origin); | ||
| let parsed: URL; | ||
|
|
||
| // If the string already looks like an absolute URL with a scheme, | ||
| // parse it as-is. Otherwise, treat it as a path relative to the current origin. | ||
| const hasScheme = /^[a-zA-Z][a-zA-Z0-9+.-]*:/.test(trimmed); | ||
| if (hasScheme) { | ||
| parsed = new URL(trimmed); | ||
| } else { | ||
| parsed = new URL(trimmed, window.location.origin); | ||
| } | ||
|
|
||
| if (!SAFE_URL_PROTOCOLS.includes(parsed.protocol)) { | ||
| return fallback; | ||
| } | ||
|
|
||
| if (parsed.protocol === "data:") { | ||
| const safeImageDataUrlPattern = | ||
| /^data:image\/(png|jpeg|jpg|webp);base64,[a-zA-Z0-9+/=]+$/; | ||
| if (!safeImageDataUrlPattern.test(url)) { | ||
| return fallback; | ||
| } | ||
| // For http/https URLs, optionally ensure they stay on the same origin. | ||
| if ( | ||
| (parsed.protocol === "http:" || parsed.protocol === "https:") && | ||
| parsed.origin !== window.location.origin | ||
| ) { | ||
| return fallback; | ||
| } | ||
|
|
||
| return url; | ||
| return parsed.toString(); | ||
| } catch { | ||
| return fallback; | ||
| } |
| src={sanitizeImageUrl( | ||
| localMetadata?.icon ?? undefined, | ||
| "/app-logo.webp", | ||
| )} |
Check failure
Code scanning / CodeQL
DOM text reinterpreted as HTML High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, to fix this type of issue you ensure that any URL derived from untrusted or semi-trusted sources is strictly validated and normalized before being bound into attributes like src, especially when the same sanitizer might be reused in other contexts. The sanitizer should either return a clearly safe, normalized URL string or fall back to a trusted default, and it should avoid passing through arbitrary text that the browser could reinterpret in surprising ways.
For this codebase, the best fix is to tighten sanitizeImageUrl in packages/ui/lib/utils.ts so that:
- All returned URLs are fully normalized via the
URLAPI. - It never returns the raw, unparsed
urlstring. data:URLs are validated against a strict image-only pattern and, if valid, are returned as-is (sincenew URLcannot parse them easily withwindow.location.origin).- Other protocols (http, https, blob) are returned via
parsed.href, ensuring consistent encoding and removing any chance of stray whitespace or characters being interpreted oddly. - Any parsing error or protocol mismatch causes the sanitizer to return the provided
fallback.
No changes are required in apps/desktop/app/library/config/page.tsx, because that file already uses sanitizeImageUrl correctly; our fix will be entirely within packages/ui/lib/utils.ts.
Concretely:
- Update
sanitizeImageUrlto:- Keep the strict
data:URL regex check. - For
data:URLs, return the original URL only if it matches the regex; otherwise use the fallback. - For non-
data:URLs, parse usingnew URL(url, window.location.origin), verify the protocol againstSAFE_URL_PROTOCOLS, and returnparsed.hrefinstead ofurl. - Fall back on any error or protocol mismatch.
- Keep the strict
This keeps existing behavior for valid URLs but removes the direct passthrough of tainted text that CodeQL is warning about.
| @@ -39,21 +39,26 @@ | ||
| fallback: string, | ||
| ): string { | ||
| if (!url) return fallback; | ||
|
|
||
| // Handle data: URLs separately, as `new URL` with a base origin does not | ||
| // parse them reliably. Only allow well-formed image data URLs. | ||
| if (url.startsWith("data:")) { | ||
| const safeImageDataUrlPattern = | ||
| /^data:image\/(png|jpeg|jpg|webp);base64,[a-zA-Z0-9+/=]+$/; | ||
| return safeImageDataUrlPattern.test(url) ? url : fallback; | ||
| } | ||
|
|
||
| try { | ||
| const parsed = new URL(url, window.location.origin); | ||
|
|
||
| // Only allow explicitly safe protocols. | ||
| if (!SAFE_URL_PROTOCOLS.includes(parsed.protocol)) { | ||
| return fallback; | ||
| } | ||
|
|
||
| if (parsed.protocol === "data:") { | ||
| const safeImageDataUrlPattern = | ||
| /^data:image\/(png|jpeg|jpg|webp);base64,[a-zA-Z0-9+/=]+$/; | ||
| if (!safeImageDataUrlPattern.test(url)) { | ||
| return fallback; | ||
| } | ||
| } | ||
|
|
||
| return url; | ||
| // Return the normalized href rather than the original potentially-tainted | ||
| // string, to avoid reinterpreting raw DOM text as a URL/HTML fragment. | ||
| return parsed.href; | ||
| } catch { | ||
| return fallback; | ||
| } |
| src={sanitizeImageUrl( | ||
| URL.createObjectURL(file), | ||
| "", | ||
| )} |
Check failure
Code scanning / CodeQL
DOM text reinterpreted as HTML High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 months ago
In general, the problem is that sanitizeImageUrl currently considers any http:, https:, blob:, or data: URL as safe and returns it unchanged. This is too permissive: data: URLs can encode HTML or script in some browsers, and the function does not enforce that “data” URLs are image-only nor does it ensure that blob URLs originated from trusted sources. To harden this, we should strictly constrain which URL patterns are allowed and return a safe fallback in all other cases, without relying on “DOM text” being treated safely by consumers.
The best targeted fix without changing existing functionality more than necessary is to:
- Restrict
SAFE_URL_PROTOCOLSto["http:", "https:", "blob:"]. - Keep the special case for
data:URLs but only allow data URLs that match a strict, image-only pattern (data:image/...;base64,...). - Avoid returning the original
urlstring directly; instead, return a normalized, trusted string derived from the parsed URL, such asparsed.href. This ensures we do not propagate unexpected control characters or malformed URLs returned by tainted input.
Concretely, in packages/ui/lib/utils.ts, we will:
- Update
SAFE_URL_PROTOCOLSto exclude"data:". - Update the logic in
sanitizeImageUrlso that:- If the protocol is not in
SAFE_URL_PROTOCOLS, but the original string is a valid, safe image data URL according to the strict regex, we allow it as-is. - Otherwise, return
fallback. - For non-
data:URLs, returnparsed.hrefinstead ofurl.
- If the protocol is not in
The call site in chatbox.tsx does not need changes; it will now receive a hardened URL from sanitizeImageUrl. This keeps the component’s behavior the same (valid blob: URLs still work; safe image data URLs are still allowed) while tightening the sanitizer to satisfy CodeQL and improve security.
| @@ -31,7 +31,7 @@ | ||
| return `${value.toFixed(dp)} ${units[u]}`; | ||
| } | ||
|
|
||
| const SAFE_URL_PROTOCOLS = ["http:", "https:", "blob:", "data:"]; | ||
| const SAFE_URL_PROTOCOLS = ["http:", "https:", "blob:"]; | ||
|
|
||
| /** Sanitize image URL to prevent XSS - only allow safe protocols */ | ||
| export function sanitizeImageUrl( | ||
| @@ -39,21 +39,24 @@ | ||
| fallback: string, | ||
| ): string { | ||
| if (!url) return fallback; | ||
|
|
||
| // Allow only strict, image-only data URLs when explicitly provided. | ||
| const safeImageDataUrlPattern = | ||
| /^data:image\/(png|jpeg|jpg|webp);base64,[a-zA-Z0-9+/=]+$/; | ||
| if (safeImageDataUrlPattern.test(url)) { | ||
| return url; | ||
| } | ||
|
|
||
| try { | ||
| const parsed = new URL(url, window.location.origin); | ||
|
|
||
| // Only allow explicitly whitelisted protocols. | ||
| if (!SAFE_URL_PROTOCOLS.includes(parsed.protocol)) { | ||
| return fallback; | ||
| } | ||
|
|
||
| if (parsed.protocol === "data:") { | ||
| const safeImageDataUrlPattern = | ||
| /^data:image\/(png|jpeg|jpg|webp);base64,[a-zA-Z0-9+/=]+$/; | ||
| if (!safeImageDataUrlPattern.test(url)) { | ||
| return fallback; | ||
| } | ||
| } | ||
|
|
||
| return url; | ||
| // Return a normalized URL string to avoid propagating unexpected data. | ||
| return parsed.href; | ||
| } catch { | ||
| return fallback; | ||
| } |
| dataModel: unknown[]; | ||
| } | null { | ||
| try { | ||
| const jsonMatch = response.match(/```json\s*([\s\S]*?)```/); |
Check failure
Code scanning / CodeQL
Polynomial regular expression used on uncontrolled data High
Summary of ChangesHello @felix-schultz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's capabilities by introducing robust UI development tools, a comprehensive WASM package management system, and critical backend improvements for stability and extensibility. The changes aim to streamline the development workflow, improve operational safety, and provide a richer user experience through new features like runtime variable configuration and system tray integration. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
…event unsafe key access
…or safer style injection
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This is a massive and impressive pull request that introduces several major features, including a comprehensive package management system for WASM nodes, a visual page and widget builder (A2UI), app routing, runtime variable management, and system tray integration. The backend changes, particularly the refactoring of the AWS file tracker to prevent Lambda recursion, are critical improvements for stability and scalability. The frontend has been significantly refactored to use a centralized API state and shared UI components, which greatly improves maintainability. The documentation has also been thoroughly updated to reflect these new features, which is excellent for contributors and users alike. My review focuses on a few minor points regarding concurrency handling and configuration, but overall, this is a fantastic set of additions to the project.
| let db = match manager.try_lock() { | ||
| Ok(manager_guard) => manager_guard.db(), | ||
| Err(_) => { | ||
| tracing::warn!( | ||
| "EventSinkManager busy while resolving deeplink route for app '{}'", | ||
| app_id | ||
| ); | ||
| return Ok(false); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Using try_lock here is a good way to avoid blocking, but silently returning Ok(false) if the lock is busy could lead to a confusing user experience where a deeplink does nothing. It might be better to return an error that can be surfaced to the user, or perhaps implement a short retry with a timeout to handle transient contention.
| @@ -1,3 +1,4 @@ | |||
| todo | |||
| // NOTE: We intentionally do NOT delete the S3 object here. | ||
| // Doing so would trigger another S3 event notification, causing Lambda recursion. | ||
| // Instead, we only revert the DynamoDB state and let the SQS retry mechanism | ||
| // handle the failure. The S3 object remains, and will be reconciled on retry. |
There was a problem hiding this comment.
This is an excellent change. Removing the S3 object deletion on failure is a crucial fix to prevent recursive Lambda invocations. The detailed comment explaining the reasoning is also very clear and will be helpful for future maintenance. Great work on identifying and resolving this potential infinite loop.
| Err(_) => { | ||
| let manager = manager_state.0.blocking_lock(); | ||
| manager.fire_event(app_handle, event_id, None, None) | ||
| tracing::warn!("EventSinkManager busy while firing cron event {}", event_id); | ||
| Ok(false) | ||
| } |
There was a problem hiding this comment.
This change from blocking_lock to try_lock is a good improvement for responsiveness. However, the fallback is to silently skip the cron job execution if the lock is contended. While this prevents blocking, it might lead to missed executions. Is this the desired behavior, or should there be a retry mechanism or a more prominent error log for these cases?
| _ = cancellation_token.cancelled() => { | ||
| println!("Board execution cancelled for run: {}", run_id); | ||
| match tokio::time::timeout(Duration::from_secs(30), internal_run.flush_logs_cancelled()).await { | ||
| Ok(Ok(Some(meta))) => { | ||
| Some(meta) | ||
| }, | ||
| Ok(Ok(None)) => { | ||
| println!("No meta flushing early"); | ||
| None | ||
| }, | ||
| Ok(Err(e)) => { | ||
| println!("Error flushing logs early for run: {}, {:?}", run_id, e); | ||
| abort_handle.abort(); | ||
| None | ||
| } |
There was a problem hiding this comment.
The new cancellation logic using abort_handle is much cleaner and more robust. I noticed the previous implementation had a flush_logs_cancelled call to ensure logs were saved even on cancellation. Is that behavior still preserved with this new approach, perhaps handled within the internal_run.execute task when it's aborted? If not, we might lose valuable debugging information for cancelled runs.
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive feature-gating mechanism for heavy dependencies in nodes, along with new documentation detailing its implementation in Cargo.toml and Rust source files. It also includes significant refactoring across the desktop application's frontend, migrating numerous API calls from direct fetcher and put utilities to a new backend.apiState service, thereby centralizing API interaction logic and removing useAuth dependencies in several admin and library pages. New administrative pages for WASM packages and a dedicated page for managing runtime variables have been added, alongside new UI components for pages and widgets, including a visual page builder and widget editor with versioning capabilities. The core application layout has been enhanced with new providers for execution services, notifications, runtime variables, and system tray integration. Backend changes support these new features, including updates to the event bus, event sink handling, and new Tauri commands for A2UI (pages, widgets, routes) and WASM package registry management. Dependency updates include rig-core, rmcp, markitdown, and next.js, and the .gitignore file now excludes todo and docling_cache. A review comment highlights the importance of including the user's ID in react-query keys to prevent stale data issues, while another emphasizes the need for a Dead-Letter Queue for SQS to ensure robustness against persistent failures in the file-tracker service.
| ); | ||
| const query = useQuery<T, Error>({ | ||
| queryKey: [method, path, data, auth?.user?.profile?.sub ?? "anon"], | ||
| queryKey: [method, path, data], |
There was a problem hiding this comment.
The queryKey for useQuery should uniquely identify the data being fetched. By removing the user's ID from the key, there's a risk of showing stale, cached data from a previous user if one user logs out and another logs in. Although the enabled flag will trigger a refetch when the profile changes, react-query might still serve the stale data from the cache for a moment before the new data arrives. To prevent this, the user's ID should be part of the query key.
| queryKey: [method, path, data], | |
| queryKey: [method, path, data, profile.data?.hub_profile.id], |
| // NOTE: We intentionally do NOT delete the S3 object here. | ||
| // Doing so would trigger another S3 event notification, causing Lambda recursion. | ||
| // Instead, we only revert the DynamoDB state and let the SQS retry mechanism | ||
| // handle the failure. The S3 object remains, and will be reconciled on retry. |
There was a problem hiding this comment.
The logic to avoid deleting the S3 object and rely on SQS retries is a great way to prevent Lambda recursion. However, for this to be fully robust against persistent failures (e.g., a database issue that outlasts the retry attempts), it's crucial that the SQS queue is configured with a Dead-Letter Queue (DLQ). This will ensure that messages that repeatedly fail processing are captured for manual inspection instead of being lost after the retention period expires. It might be worth adding a note about this requirement in the infrastructure-as-code or deployment documentation.
…n run cancellation
This pull request introduces several dependency updates, new workspace packages, and important changes to backend components, particularly around AWS Lambda execution and file tracking. The most significant updates include adding new packages to the workspace, updating core dependencies, refactoring the file tracker to remove direct S3 deletions (preventing Lambda recursion), and extending execution payloads to support runtime variables. There are also improvements to documentation and licensing clarity in the
README.md.Workspace and Dependency Updates:
packages/catalog/geoandpackages/wasmto the workspace and as dependencies, expanding catalog and WASM support. [1] [2]rig-coreto0.29.0,rmcpto0.13.0, and themarkitdowngit revision, ensuring the project uses the latest features and fixes.Backend AWS Executor and Runtime Enhancements:
flow-like-executordependency to include theall-executefeature in all AWS executor/runtime and Docker Compose runtime crates, enabling broader execution capabilities. [1] [2] [3] [4]DispatchPayloadstruct in bothexecutor-asyncandruntime-asyncto include an optionalruntime_variablesfield, and updated the execution logic to pass this field, allowing more flexible runtime configuration. [1] [2] [3] [4]AWS File Tracker Refactor:
aws-sdk-s3and related S3 client code from the file tracker Lambda, including S3 object deletion logic. Now, on failure, only DynamoDB state is reverted, preventing recursive Lambda invocations caused by S3 event triggers. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]Documentation and License Improvements:
README.mdto add a "Buy Me a Coffee" badge, a new section detailing the Business Source License (BSL 1.1) and its rationale, and clarified the use of AI-assisted coding and its review requirements. [1] [2] [3] [4]Minor Codebase Cleanups:
#[allow(dead_code)]attributes and unused imports in several backend files, and made minor enum attribute cleanups. [1] [2]These changes collectively improve the maintainability, extensibility, and operational safety of the project, while also clarifying licensing and contribution guidelines.