Skip to content

feat: Add Python bindings for ArcadeDB embedded database#2737

Merged
robfrank merged 4 commits intomainfrom
python-embedded
Nov 5, 2025
Merged

feat: Add Python bindings for ArcadeDB embedded database#2737
robfrank merged 4 commits intomainfrom
python-embedded

Conversation

@robfrank
Copy link
Collaborator

@robfrank robfrank commented Nov 5, 2025

Introduce comprehensive Python bindings that enable embedded ArcadeDB usage directly from Python applications, leveraging JPype for seamless JVM integration.

Core Features:

  • Embedded database operations with full CRUD support
  • Document, vertex, and edge models for graph databases
  • Transaction management (read, write, batch operations)
  • Server mode with HTTP API support
  • Vector search capabilities for AI/ML applications
  • Data import from CSV/JSONL with automatic type inference
  • Export to GraphML, GraphSON, JSONL, and CSV formats
  • Gremlin query language support
  • Async execution and batch processing utilities

Development Infrastructure:

  • Multi-platform build system (Linux, macOS, Windows on x64/ARM64)
  • Native build scripts with JRE bundling
  • Docker-based build environment
  • Comprehensive test suite with 100+ tests covering:
    • Core database operations
    • Concurrency and transactions
    • Import/export functionality
    • Server patterns and API
    • Type conversions and result handling
  • CI/CD workflows for automated testing across all platforms
  • Testing for examples 01-03 (verified working)

Examples:

  • Simple document store with CRUD operations
  • Social network graph modeling and traversal
  • Vector similarity search
  • CSV import with MovieLens dataset (examples 04-05 included but not CI-tested yet)

Build System:

  • Platform-specific wheel generation
  • JAR exclusion filtering for minimal distributions
  • Version extraction from parent pom.xml
  • Setup utilities for streamlined installation

This implementation provides a Pythonic interface to ArcadeDB while maintaining compatibility with the Java API and supporting all major platforms.## What does this PR do?

A brief description of the change being made with this pull request.

Motivation

What inspired you to submit this pull request?

Related issues

A list of issues either fixed, containing architectural discussions, otherwise relevant
for this Pull Request.

Additional Notes

Anything else we should know when reviewing?

Checklist

  • I have run the build using mvn clean package command
  • My unit tests cover both failure and success scenarios

tae898 and others added 2 commits November 3, 2025 11:44
Introduce comprehensive Python bindings that enable embedded ArcadeDB usage directly from Python applications, leveraging JPype for seamless JVM integration.

Core Features:
- Embedded database operations with full CRUD support
- Document, vertex, and edge models for graph databases
- Transaction management (read, write, batch operations)
- Server mode with HTTP API support
- Vector search capabilities for AI/ML applications
- Data import from CSV/JSONL with automatic type inference
- Export to GraphML, GraphSON, JSONL, and CSV formats
- Gremlin query language support
- Async execution and batch processing utilities

Development Infrastructure:
- Multi-platform build system (Linux, macOS, Windows on x64/ARM64)
- Native build scripts with JRE bundling
- Docker-based build environment
- Comprehensive test suite with 100+ tests covering:
  * Core database operations
  * Concurrency and transactions
  * Import/export functionality
  * Server patterns and API
  * Type conversions and result handling
- CI/CD workflows for automated testing across all platforms
- Testing for examples 01-03 (verified working)

Examples:
- Simple document store with CRUD operations
- Social network graph modeling and traversal
- Vector similarity search
- CSV import with MovieLens dataset (examples 04-05 included but not CI-tested yet)

Build System:
- Platform-specific wheel generation
- JAR exclusion filtering for minimal distributions
- Version extraction from parent pom.xml
- Setup utilities for streamlined installation

This implementation provides a Pythonic interface to ArcadeDB while maintaining compatibility with the Java API and supporting all major platforms.
@mergify
Copy link
Contributor

mergify bot commented Nov 5, 2025

🧪 CI Insights

Here's what we observed from your CI run for 20ca31d.

🟢 All jobs passed!

But CI Insights is watching 👀

@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

@robfrank robfrank linked an issue Nov 5, 2025 that may be closed by this pull request
@robfrank robfrank added this to the 25.10.1 milestone Nov 5, 2025
@robfrank robfrank self-assigned this Nov 5, 2025
robfrank and others added 2 commits November 5, 2025 02:17
…ain permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
…ain permissions

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@robfrank
Copy link
Collaborator Author

robfrank commented Nov 5, 2025

/gemini summary

@gemini-code-assist
Copy link
Contributor

Summary of Changes

This pull request significantly expands ArcadeDB's ecosystem by introducing a robust set of Python bindings. These bindings allow Python developers to leverage ArcadeDB's multi-model capabilities, including document, graph, and vector database features, directly within their applications without requiring a separate Java installation. The changes encompass not only the core API but also a comprehensive build system, extensive testing, and practical examples to facilitate adoption and development across various platforms.

Highlights

  • Comprehensive Python Bindings: Introduced full Python bindings for ArcadeDB, enabling embedded database usage directly from Python applications through JPype for seamless JVM integration.
  • Multi-Model Database Features: The bindings support core ArcadeDB functionalities including CRUD operations, document, vertex, and edge models for graph databases, transaction management, server mode with HTTP API, and vector search capabilities for AI/ML applications.
  • Robust Development Infrastructure: Implemented a multi-platform build system (Linux, macOS, Windows on x64/ARM64), native build scripts with JRE bundling, a Docker-based build environment, and a comprehensive test suite with over 100 tests covering core operations, concurrency, import/export, server patterns, and type conversions.
  • Enhanced Data Handling: Added support for data import from CSV/JSONL with automatic type inference, export to various formats (GraphML, GraphSON, JSONL, CSV), and Gremlin query language support, alongside async execution and batch processing utilities.
  • New Examples and Documentation: Provided several new examples including a simple document store, social network graph modeling, vector similarity search, and CSV import with the MovieLens dataset. Extensive documentation for the Python bindings has also been added and updated.
Changelog
  • .github/dependabot.yml
    • Updated dependabot schedule for '/studio' from Monday to Sunday.
    • Added new dependabot configuration for 'bindings/python' using the 'uv' package ecosystem.
  • .pre-commit-config.yaml
    • Modified 'check-yaml' exclusion to include 'bindings/python/mkdocs.yml'.
    • Added 'black', 'isort', 'pretty-format-yaml', and 'shfmt' pre-commit hooks specifically for Python binding files.
  • bindings/python/.gitignore
    • Added a new .gitignore file for the Python bindings, covering build artifacts, virtual environments, testing files, IDE configurations, logs, MkDocs output, and Jupyter notebooks.
  • bindings/python/Dockerfile.build
    • Added a multi-stage Dockerfile for building the ArcadeDB Python package, including stages for obtaining JARs, building a minimal JRE with jlink, building the Python wheel, and testing the built wheel. This Dockerfile bundles a JRE and excludes unnecessary JARs to optimize size.
  • bindings/python/README.md
    • Added a comprehensive README for the Python bindings, detailing installation, quick start, features, package contents, testing, building from source, development notes, package structure, contributing guidelines, license, and links.
  • bindings/python/build-native.sh
    • Added a shell script for native building of the ArcadeDB Python package on macOS and Windows, handling JAR downloading (via Docker if needed), minimal JRE creation with jlink, versioning, and wheel building with platform-specific tags.
  • bindings/python/build.sh
    • Added the main build script for the ArcadeDB Python package, which auto-detects the platform and uses either Docker for Linux or native build scripts for macOS/Windows. It handles version detection, package configuration, and wheel building.
  • bindings/python/ensure-build-tools.sh
    • Added a shell script to ensure Python build tools are installed across various Python versions.
  • bindings/python/examples/.gitignore
    • Added a .gitignore file for the examples directory, ignoring example databases, logs, downloaded datasets, and benchmark outputs.
  • bindings/python/examples/01_simple_document_store.py
    • Added an example demonstrating basic CRUD operations with document types, rich data types, transactions, and ArcadeDB SQL.
  • bindings/python/examples/02_social_network_graph.py
    • Added an example demonstrating graph database features with vertex/edge types, graph traversal using SQL MATCH and Cypher, and handling NULL values.
  • bindings/python/examples/03_vector_search.py
    • Added an experimental example for vector search using HNSW index, mock embeddings, and semantic similarity search.
  • bindings/python/examples/04_csv_import_documents.py
    • Added an example for importing CSV data into documents with automatic type inference, index creation, full-text search, and performance comparison.
  • bindings/python/examples/EXAMPLES_PLAN.md
    • Added a planning document outlining the strategy for developing ArcadeDB Python examples, including target personas, planned examples, implementation phases, and documentation structure.
  • bindings/python/examples/README.md
    • Added a README for the examples directory, providing a quick start guide, descriptions of available examples, and tips.
  • bindings/python/examples/download_sample_data.py
    • Added a script to download MovieLens datasets (small/large) and introduce NULL values for testing import scenarios.
  • bindings/python/examples/run_benchmark_04_csv_import_documents.sh
    • Added a benchmark script for CSV import of documents, allowing configuration of dataset size, parallel threads, and batch size, and saving logs.
  • bindings/python/examples/run_benchmark_05_csv_import_graph.sh
    • Added a benchmark script for graph creation from CSV, supporting various methods (Java API, SQL) and configurations, with logging and export options.
  • bindings/python/extract_version.py
    • Added a Python script to extract the ArcadeDB version from pom.xml and convert it to PEP 440 compliant Python versions, supporting development and release modes, and Python-specific patches.
  • bindings/python/jar_exclusions.txt
    • Added a file listing JAR patterns to exclude from the Python package, specifically arcadedb-grpcw-*.jar to optimize size.
  • bindings/python/mkdocs.yml
    • Added the MkDocs configuration file for the Python bindings documentation, defining site structure, theme, features, plugins, and navigation.
  • bindings/python/pyproject.toml
    • Added the pyproject.toml file for package metadata, build system configuration, dependencies, and pytest options.
  • bindings/python/setup.py
    • Added a setup.py file to force platform-specific wheel generation for the Python package, as it bundles platform-specific JRE binaries.
  • bindings/python/setup_jars.py
    • Added a script to copy ArcadeDB JAR files and the bundled JRE into the Python package during the build process.
  • bindings/python/src/arcadedb_embedded/init.py
    • Added the init.py file for the arcadedb_embedded package, defining its public API and importing core modules.
  • bindings/python/src/arcadedb_embedded/async_executor.py
    • Added the AsyncExecutor class, a Pythonic wrapper for ArcadeDB's Java DatabaseAsyncExecutor, enabling parallel processing, batching, and WAL optimization for bulk operations.
  • bindings/python/src/arcadedb_embedded/batch.py
    • Added the BatchContext class, a high-level context manager for batch processing, simplifying bulk operations with async execution, progress tracking, and error handling.
  • bindings/python/src/arcadedb_embedded/core.py
    • Added core database classes (Database, DatabaseFactory) and convenience functions (create_database, open_database, database_exists) for embedded database access, including create_vector_index and count_type.
  • bindings/python/src/arcadedb_embedded/exceptions.py
    • Added custom exception classes for ArcadeDB Python bindings, with ArcadeDBError as the base exception.
  • bindings/python/src/arcadedb_embedded/exporter.py
    • Added functions for exporting database data to various formats (JSONL, GraphML, GraphSON) and query results to CSV.
  • bindings/python/src/arcadedb_embedded/importer.py
    • Added the Importer class and convenience functions (import_json, import_csv, import_neo4j) for importing data from JSON, CSV, and Neo4j formats, leveraging Java's importer with automatic type inference and batching.
  • bindings/python/src/arcadedb_embedded/jvm.py
    • Added utilities for managing the Java Virtual Machine (JVM), including starting the JVM with ArcadeDB JARs and bundled JRE, and handling JVM arguments.
  • bindings/python/src/arcadedb_embedded/results.py
    • Added ResultSet and Result classes for wrapping query results, providing methods like to_list, to_dataframe, iter_chunks, count, first, one, and automatic type conversion.
  • bindings/python/src/arcadedb_embedded/server.py
    • Added the ArcadeDBServer class and create_server function for managing the embedded ArcadeDB server, enabling HTTP API and Studio access.
  • bindings/python/src/arcadedb_embedded/transactions.py
    • Added the TransactionContext class, a context manager for ArcadeDB transactions.
  • bindings/python/src/arcadedb_embedded/type_conversion.py
    • Added utilities for converting Java objects to native Python types and vice-versa, handling primitives, numerics, temporals, and collections.
  • bindings/python/src/arcadedb_embedded/vector.py
    • Added the VectorIndex class for HNSW vector search and helper functions (to_java_float_array, to_python_array) for converting between Python and Java array types.
  • bindings/python/tests/README.md
    • Added a README for the tests directory, providing quick stats, instructions for running tests, and links to detailed documentation.
  • bindings/python/tests/init.py
    • Added an empty init.py for the tests package.
  • bindings/python/tests/conftest.py
    • Added pytest fixtures for temporary database paths and server roots, and helper functions to check for server/Gremlin support. Includes a pytest_unconfigure hook to force exit due to JPype JVM thread hang.
  • bindings/python/tests/test_async_executor.py
    • Added tests for the AsyncExecutor class, covering basic creation, auto-commits, parallel execution, method chaining, pending status, callbacks, and performance comparison with synchronous operations.
  • bindings/python/tests/test_batch_context.py
    • Added tests for the BatchContext class, covering basic usage with vertices/documents/edges, callbacks, success counting, is_pending, wait_completion, performance comparison, different batch sizes, and mixed operations.
  • bindings/python/tests/test_concurrency.py
    • Added tests for ArcadeDB's concurrency behavior, including file locking, thread safety, sequential access, and multi-process limitations.
  • bindings/python/tests/test_core.py
    • Added core functionality tests for ArcadeDB Python bindings, covering database creation, basic CRUD, rich data types, SQL features, transactions, graph operations, error handling, Result methods, Cypher queries, vector search, Unicode support, schema queries, large result set handling, and property type conversions.
  • bindings/python/tests/test_database_utils.py
    • Added tests for new Database utility methods like count_type, is_transaction_active, and drop_database, and their integration.
  • bindings/python/tests/test_exporter.py
    • Added tests for database export functionality, covering JSONL, GraphML, GraphSON, and CSV exports, including filters, overwrite protection, invalid formats, empty databases, and round-trip verification.
  • bindings/python/tests/test_gremlin.py
    • Added tests for Gremlin query language support, including basic queries and graph traversal.
  • bindings/python/tests/test_resultset.py
    • Added tests for enhanced ResultSet and Result functionality, including to_list, to_dataframe, iter_chunks, count, first, one, iteration patterns, representation, complex queries, empty result handling, and reusability.
  • bindings/python/tests/test_server.py
    • Added tests for server and Studio management via the ArcadeDBServer class, covering server creation, database operations through the server, custom configuration, and context manager usage.
  • bindings/python/tests/test_server_patterns.py
    • Added tests for ArcadeDB server access patterns, comparing Java API standalone, Java API server-managed, and HTTP API access, including performance comparisons and thread safety.
  • bindings/python/tests/test_type_conversion.py
    • Added tests for type conversion between Java and Python, covering basic types, Decimal, Date/DateTime, collections (List, Map), nested collections, property names, to_dict, to_json, Python to Java conversion, and array conversion.
  • bindings/python/write_version.py
    • Added a script to write the extracted version to _version.py during the build process.
Ignored Files
  • Ignored by pattern: .github/workflows/** (5)
    • .github/workflows/mvn-deploy.yml
    • .github/workflows/mvn-test.yml
    • .github/workflows/studio-security-audit.yml
    • .github/workflows/test-python-bindings.yml
    • .github/workflows/test-python-examples.yml
Activity
  • mergify[bot] reported that all CI jobs passed for commit c6fa80b.
  • gemini-code-assist[bot] encountered an error while trying to create a summary.
  • robfrank requested a new summary from gemini-code-assist[bot].
  • github-advanced-security[bot] posted multiple comments regarding missing permissions in various workflows, suggesting explicit permissions blocks.

@robfrank robfrank merged commit 30017d1 into main Nov 5, 2025
32 of 36 checks passed
@robfrank robfrank deleted the python-embedded branch November 5, 2025 10:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for python embedded package

2 participants