Skip to content

feat: Setup Testcontainers infrastructure for vector search tests (#528)#533

Closed
robfrank wants to merge 2 commits intomainfrom
feature/528-testcontainers-infrastructure
Closed

feat: Setup Testcontainers infrastructure for vector search tests (#528)#533
robfrank wants to merge 2 commits intomainfrom
feature/528-testcontainers-infrastructure

Conversation

@robfrank
Copy link
Copy Markdown
Owner

Implemented Phase 1 of test refactoring plan to replace Mockito mocks with real production code using Testcontainers + ArcadeDB.

Changes:

  • Added Awaitility 4.2.0 dependency for async test verification
  • Created ArcadeDbContainer: Custom Testcontainer for ArcadeDB 25.11.1
    • Configures database with vector index (LSM_VECTOR, COSINE, 384 dimensions)
    • Provides clean database creation and schema initialization
  • Created ArcadeDbTestBase: Abstract base class for integration tests
    • Uses shared container for performance
    • Provides real ContentPersistenceAdapter and ArcadeContentRepository
  • Created FakeEmbeddingGenerator: Deterministic fake for EmbeddingGenerator
    • Generates consistent 384-dimensional embeddings
    • Provides test helpers for caching and error injection
  • Added verification tests to ensure infrastructure works correctly

Benefits achieved:

  • Reuses production code (ContentPersistenceAdapter, ArcadeContentRepository)
  • Catches real SQL errors, schema issues, and vector index configuration problems
  • Lower maintenance (~50 lines container code vs ~200 lines stub code)
  • All 424 existing tests pass (0 failures, 0 errors)

Discovered issues for Phase 2:

  • ContentMapper null field handling needs improvement
  • Embedding type conversion (List vs float[])

🤖 Generated with Claude Code

Implemented Phase 1 of test refactoring plan to replace Mockito mocks with real production code using Testcontainers + ArcadeDB.

Changes:
- Added Awaitility 4.2.0 dependency for async test verification
- Created ArcadeDbContainer: Custom Testcontainer for ArcadeDB 25.11.1
  - Configures database with vector index (LSM_VECTOR, COSINE, 384 dimensions)
  - Provides clean database creation and schema initialization
- Created ArcadeDbTestBase: Abstract base class for integration tests
  - Uses shared container for performance
  - Provides real ContentPersistenceAdapter and ArcadeContentRepository
- Created FakeEmbeddingGenerator: Deterministic fake for EmbeddingGenerator
  - Generates consistent 384-dimensional embeddings
  - Provides test helpers for caching and error injection
- Added verification tests to ensure infrastructure works correctly

Benefits achieved:
- Reuses production code (ContentPersistenceAdapter, ArcadeContentRepository)
- Catches real SQL errors, schema issues, and vector index configuration problems
- Lower maintenance (~50 lines container code vs ~200 lines stub code)
- All 424 existing tests pass (0 failures, 0 errors)

Discovered issues for Phase 2:
- ContentMapper null field handling needs improvement
- Embedding type conversion (List<Float> vs float[])

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request marks the completion of Phase 1 of a test refactoring plan, focusing on enhancing the reliability and realism of vector search tests. By integrating Testcontainers with ArcadeDB, the project moves from mock-based testing to using actual production code and a real database environment. This change not only provides a more robust testing foundation but has also proactively identified critical mapping issues that will be addressed in subsequent phases.

Highlights

  • Test Infrastructure Setup: Established Testcontainers infrastructure for vector search tests, replacing Mockito mocks with real production code and an ArcadeDB instance.
  • New Components: Introduced ArcadeDbContainer for managing ArcadeDB instances with vector index configuration, ArcadeDbTestBase as an abstract base for integration tests, and FakeEmbeddingGenerator for deterministic embedding generation.
  • Dependency Update: Added Awaitility 4.2.0 to pom.xml for asynchronous test verification.
  • Early Bug Detection: The new infrastructure successfully identified potential ContentMapper issues related to null field handling and embedding type conversion, validating the approach.
  • Test Stability: All 424 existing tests continue to pass without regression, indicating a stable transition to the new testing paradigm.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Dec 23, 2025

🧪 CI Insights

Here's what we observed from your CI run for 084a67f.

🟢 All jobs passed!

But CI Insights is watching 👀

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of setting up the Testcontainers infrastructure for integration tests, which is a solid step towards more reliable testing. The new components like ArcadeDbContainer and ArcadeDbTestBase are well-structured. I've left a few comments with suggestions for improvement, mainly focusing on thread-safety in the FakeEmbeddingGenerator and ensuring proper resource management in the ArcadeDbContainer. There's also a minor point about a redundant cleanup step in the test base class. Overall, this is a strong contribution that will improve the project's test quality.

Comment on lines +19 to +21
private final Map<String, List<Float>> embeddingCache = new HashMap<>();
private final Function<String, List<Float>> embeddingFunction;
private @Nullable RuntimeException exceptionToThrow = null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This class has a couple of thread-safety issues that could lead to flaky tests in concurrent scenarios:

  1. embeddingCache uses HashMap, which is not thread-safe. Operations like computeIfAbsent can behave unexpectedly under concurrent load. It's better to use java.util.concurrent.ConcurrentHashMap.
  2. The exceptionToThrow field is accessed from multiple threads without any memory synchronization (volatile or synchronized). This can lead to visibility problems where one thread doesn't see the update from another.

I'd recommend making the field volatile to ensure visibility.

Suggested change
private final Map<String, List<Float>> embeddingCache = new HashMap<>();
private final Function<String, List<Float>> embeddingFunction;
private @Nullable RuntimeException exceptionToThrow = null;
private final Map<String, List<Float>> embeddingCache = new java.util.concurrent.ConcurrentHashMap<>();
private final Function<String, List<Float>> embeddingFunction;
private volatile @Nullable RuntimeException exceptionToThrow = null;

Comment on lines +61 to +78
public @NonNull RemoteDatabase createDatabase() {
// Create database if it doesn't exist
RemoteServer server = new RemoteServer(getHost(), getMappedPort(ARCADE_PORT), "root", getRootPassword());

// Always drop and recreate database for clean state
if (server.exists(getDatabaseName())) {
server.drop(getDatabaseName());
}
server.create(getDatabaseName());

// Connect to database
RemoteDatabase db = new RemoteDatabase(getHost(), getMappedPort(ARCADE_PORT), getDatabaseName(), "root", getRootPassword());

// Initialize schema with vector index
initializeSchema(db);

return db;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The RemoteServer instance created here is not being closed after use. RemoteServer manages network connections and should be closed to release resources and prevent potential leaks, which can be problematic in CI environments. A try-finally block would ensure it's always closed.

  public @NonNull RemoteDatabase createDatabase() {
    // Create database if it doesn't exist
    RemoteServer server = new RemoteServer(getHost(), getMappedPort(ARCADE_PORT), "root", getRootPassword());
    try {
      // Always drop and recreate database for clean state
      if (server.exists(getDatabaseName())) {
        server.drop(getDatabaseName());
      }
      server.create(getDatabaseName());
    } finally {
      server.close();
    }

    // Connect to database
    RemoteDatabase db = new RemoteDatabase(getHost(), getMappedPort(ARCADE_PORT), getDatabaseName(), "root", getRootPassword());

    // Initialize schema with vector index
    initializeSchema(db);

    return db;
  }

@AfterEach
void tearDownDatabase() {
// Clean up test data
arcadeDb.cleanDatabase(database);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The call to arcadeDb.cleanDatabase(database) is redundant. The @BeforeEach method setUpDatabase already ensures a clean state by dropping and recreating the database via arcadeDb.createDatabase(). You can safely remove this line to avoid unnecessary work after each test.

@claude
Copy link
Copy Markdown

claude bot commented Dec 23, 2025

Pull Request Review: Testcontainers Infrastructure (#533)

Great work on implementing Phase 1 of the test refactoring plan! This PR introduces solid infrastructure for integration testing with real ArcadeDB instances. Here's my detailed review:


Strengths

  1. Well-Aligned with Architecture

    • Follows the Hexagonal Architecture principles documented in .agent/rules/code-style.md
    • Reuses production code (ContentPersistenceAdapter, ArcadeContentRepository) which is excellent for catching real integration issues
    • Clear separation between test infrastructure and test cases
  2. Good Test Infrastructure Design

    • ArcadeDbContainer properly encapsulates container lifecycle and schema initialization
    • ArcadeDbTestBase provides clean abstractions for test classes
    • Static container approach (@Container static) is smart for performance optimization
    • The vector index configuration (LSM_VECTOR, COSINE, 384 dimensions) matches production requirements
  3. Deterministic Test Fake

    • FakeEmbeddingGenerator is well-designed with deterministic behavior
    • Good test helpers (clearCache(), getCacheSize(), throwOnNextCall())
    • Uses Math.sin for consistent embedding generation - clever approach
  4. Excellent Documentation

    • Clear PR description explaining goals and benefits
    • Good tracking document (528-testcontainers-infrastructure.md)
    • Javadoc comments are comprehensive and helpful

🔍 Code Quality Issues

1. Null Handling in ContentMapper (Critical)

The PR description correctly identifies this issue. In ContentMapper.java:60-77, the mapToVertex method doesn't check for null on required fields before calling .set():

vertex.set("htmlContent", content.htmlContent());  // Could be null
vertex.set("textContent", content.textContent());  // Could be null
vertex.set("contentLength", content.contentLength());  // Could be null
vertex.set("mimeType", content.mimeType());  // Could be null

Recommendation: Either:

  • Follow the pattern used for optional fields (lines 71-77) and add null checks for these nullable fields
  • Or ensure ArcadeDB handles null values correctly and document this assumption

2. Thread.sleep() Anti-pattern in Tests

While BackfillEmbeddingsServiceTest (not in this PR but referenced) uses Thread.sleep() for async verification, I see you've added Awaitility 4.2.0 as a dependency. However, the verification test in ArcadeDbContainerTest doesn't need async handling since it's synchronous.

Good news: The Awaitility dependency is correctly added for future Phase 2 work.

3. Inconsistent Exception Handling

In ArcadeDbContainer.java:135-139, the cleanDatabase() method silently swallows all exceptions:

} catch (Exception e) {
  // Ignore errors during cleanup (e.g., if Content type doesn't exist yet)
}

Recommendation:

  • Consider logging these exceptions or being more specific about which exceptions to ignore
  • At minimum, document why swallowing exceptions is safe here

🐛 Potential Bugs

1. Database Recreation on Every Test Method

In ArcadeDbTestBase.java:44-50, @BeforeEach calls arcadeDb.createDatabase() which drops and recreates the entire database for each test method:

if (server.exists(getDatabaseName())) {
  server.drop(getDatabaseName());
}
server.create(getDatabaseName());

Issues:

  • This is expensive and may slow down test suites significantly
  • The cleanDatabase() method in @AfterEach becomes redundant since @BeforeEach drops everything anyway
  • Multiple tests in the same class will repeatedly drop/create the same database

Recommendation:

  • Use cleanDatabase() in @BeforeEach instead of recreating
  • Only create database once (perhaps in @BeforeAll or when container starts)
  • This aligns better with the shared container pattern you're using

2. Resource Leak Risk

In ArcadeDbContainer.java:61, createDatabase() creates a RemoteServer but never closes it:

RemoteServer server = new RemoteServer(...);
// ... uses server ...
// No server.close() call

Recommendation: Use try-with-resources or ensure the server connection is properly closed.


Performance Considerations

  1. Shared Container Strategy

  2. Vector Index Initialization

    • Creating vector index on every database recreation could be slow
    • Once you fix the database recreation issue, this becomes a non-issue

🔒 Security Concerns

  1. Hardcoded Credentials ⚠️

    In ArcadeDbContainer.java:17:

    private static final String ROOT_PASSWORD = "playwithdata";

    While this is acceptable for test containers, consider:

    • Adding a comment explaining this is test-only
    • The password is already documented in ArcadeDB examples, so it's not a secret
    • No action required but good to be aware
  2. No SQL Injection Risks

    • All queries properly use parameterized statements (e.g., db.command("sql", "...", linkId))

🧪 Test Coverage

Current Coverage: Minimal but appropriate for infrastructure verification

  • ArcadeDbContainerTest has 3 basic tests verifying container starts and components initialize
  • This is reasonable for Phase 1

Recommendations for Phase 2:

  • Add tests for actual Content CRUD operations
  • Test vector search functionality once ContentMapper issues are resolved
  • Test edge cases: null embeddings, empty text, large embeddings
  • Consider testing the container cleanup behavior

📋 Best Practices

Excellent Practices:

  1. ✅ Using real production code instead of mocks
  2. ✅ Clear javadoc on all public methods
  3. ✅ Proper use of @NonNull and @Nullable annotations
  4. ✅ Deterministic test fakes
  5. ✅ Comprehensive PR description and tracking document

Could Improve:

  1. ⚠️ Add more inline comments explaining why certain decisions were made (e.g., why drop database on each test)
  2. ⚠️ Consider extracting magic numbers to constants (e.g., vector dimensions: 384, beamWidth: 100, maxConnections: 16)

📝 Minor Issues

  1. Redundant Comment in ArcadeDbContainerTest.java:10-12:

    // Note: Full integration tests with Content persistence will be added in Phase 2
    // after ensuring the ContentMapper correctly handles all field types.

    This is good context, but could be in a class-level Javadoc instead of a comment.

  2. Unused Import Risk: Verify no unused imports exist (didn't spot any, but worth checking)

  3. Consistency: FakeEmbeddingGenerator uses 384 dimensions while BackfillEmbeddingsServiceTest uses arbitrary sizes. Consider documenting the expected dimension as a constant.


🎯 Recommendations Summary

Must Fix Before Merge:

  1. ❗ Fix database recreation issue in @BeforeEach - use cleanDatabase() instead
  2. ❗ Close RemoteServer properly in createDatabase() method

Should Fix Before Merge:

  1. ⚠️ Add null checks in ContentMapper.mapToVertex() for nullable fields
  2. ⚠️ Improve exception handling in cleanDatabase() (at least log)

Consider for Phase 2:

  1. 💡 Extract vector dimensions and index params to constants
  2. 💡 Add comprehensive integration tests for Content CRUD and vector search
  3. 💡 Consider using Awaitility for async test verification in BackfillEmbeddingsServiceTest

Overall Assessment

This is a high-quality PR that successfully achieves Phase 1 goals. The infrastructure is well-designed and follows project conventions. The main concerns are around database lifecycle management and null handling, which should be addressed before merging.

Recommendation: Request Changes - Fix the critical database recreation issue and resource leak, then this will be ready to merge.

Great work on identifying the ContentMapper issues through this infrastructure - that's exactly the kind of benefit integration tests provide! 🎉


🤖 Generated with Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Dec 23, 2025

PR Review: Testcontainers Infrastructure for Vector Search Tests

Summary

This PR implements Phase 1 of a test refactoring plan, introducing Testcontainers infrastructure to replace Mockito mocks with real production code. The changes are well-implemented and demonstrate a solid testing architecture improvement.

✅ Strengths

Architecture & Design

  1. Excellent Test Infrastructure Design: The three-layer approach (ArcadeDbContainerArcadeDbTestBase → test classes) provides good separation of concerns and reusability.
  2. Production Code Reuse: Using real ContentPersistenceAdapter and ArcadeContentRepository instead of mocks will catch real integration issues.
  3. Deterministic Fakes: FakeEmbeddingGenerator is well-designed with deterministic behavior, proper caching, and error injection capabilities.

Code Quality

  1. Clear Documentation: Comprehensive Javadoc comments explain the purpose and usage of each component.
  2. Proper Annotations: Consistent use of @NonNull and @Nullable from jspecify for null safety.
  3. Good Error Handling: Exception handling with meaningful error messages in cleanup code.

🔍 Issues & Recommendations

🔴 Critical Issues

1. Resource Leak in ArcadeDbContainer.createDatabase() (ArcadeDbContainer.java:63)

The RemoteServer instance is never closed, potentially causing resource leaks:

RemoteServer server = new RemoteServer(getHost(), getMappedPort(ARCADE_PORT), "root", getRootPassword());

Recommendation: Use try-with-resources:

try (RemoteServer server = new RemoteServer(getHost(), getMappedPort(ARCADE_PORT), "root", getRootPassword())) {
    if (server.exists(getDatabaseName())) {
        server.drop(getDatabaseName());
    }
    server.create(getDatabaseName());
}

2. Hardcoded Password in Version Control (ArcadeDbContainer.java:18)

private static final String ROOT_PASSWORD = "playwithdata";

While this is test code, hardcoding passwords sets a bad precedent.
Recommendation: Use environment variables or make it configurable, even for tests.

⚠️ Potential Issues

3. Database Recreation Performance Impact (ArcadeDbContainer.java:65-68)

The createDatabase() method always drops and recreates the database for every test. While this ensures isolation, it may impact test performance significantly.

if (server.exists(getDatabaseName())) {
    server.drop(getDatabaseName());
}
server.create(getDatabaseName());

Current approach: Drop/recreate database in @BeforeEach
Alternative: Use cleanDatabase() in @AfterEach (currently implemented but not used)

Recommendation: Monitor test suite performance. If it becomes an issue, consider switching to the cleanup-based approach.

4. Silent Exception Swallowing (ArcadeDbContainer.java:138-140)

catch (Exception e) {
    // Ignore errors during cleanup (e.g., if Content type doesn't exist yet)
}

While the comment explains the intent, silently swallowing exceptions can hide real issues during cleanup.
Recommendation: At minimum, log the exception:

catch (Exception e) {
    // Ignore errors during cleanup (e.g., if Content type doesn't exist yet)
    // But log for debugging if needed
    System.err.println("Cleanup warning: " + e.getMessage());
}

5. Embedding Normalization Issue (FakeEmbeddingGenerator.java:35)

float value = (float) Math.sin(hash + i) * 0.5f;

The comment says "values between -1 and 1" but the code produces values between -0.5 and 0.5. While this might be intentional, the comment is misleading.
Recommendation: Update comment to match reality or adjust the formula if the full range is needed.

💡 Improvements

6. Test Coverage Gap (ArcadeDbContainerTest.java)

The verification test only checks basic infrastructure:

@Test
void containerStarts_shouldInitializeSchema() {
    assertThat(arcadeDb.isRunning()).isTrue();
}

Recommendation: Add at least one integration test that performs actual CRUD operations to validate the full stack works before Phase 2.

7. Magic Numbers (Multiple locations)

  • 384 dimensions (FakeEmbeddingGenerator.java:32, ArcadeDbContainer.java:117)
  • 16 maxConnections (ArcadeDbContainer.java:118)
  • 100 beamWidth (ArcadeDbContainer.java:119)

Recommendation: Extract to named constants:

private static final int EMBEDDING_DIMENSIONS = 384;
private static final int VECTOR_INDEX_MAX_CONNECTIONS = 16;
private static final int VECTOR_INDEX_BEAM_WIDTH = 100;

8. Inconsistent Formatting (LinkControllerTest.java:40, NewLinkServiceTest.java:55)

The automated formatting removed line breaks from Link constructor calls, making them harder to read. While this follows the formatter rules, consider if the original multi-line format was more maintainable.

🔒 Security Considerations

⚡ Performance Considerations

📊 Test Coverage

  • ✅ All 424 existing tests pass
  • ⚠️ New infrastructure has minimal tests (only basic smoke tests)
  • 📝 Phase 2 will add comprehensive tests using this infrastructure

🎯 Overall Assessment

Verdict: ✅ Approve with Minor Revisions

This is a solid foundation for the test refactoring effort. The architecture is sound and the code quality is high. The critical resource leak issue should be fixed before merge, but the other issues are minor and could be addressed in follow-up work.

Priority Fixes Before Merge:

  1. Fix RemoteServer resource leak (Critical)
  2. Fix embedding normalization comment (Minor)

Consider for Phase 2:
3. Add comprehensive CRUD integration test
4. Extract magic numbers to constants
5. Evaluate database recreation performance impact

Great work on this refactoring! The approach of using real production code with Testcontainers is much better than maintaining mock stubs.

@codacy-production
Copy link
Copy Markdown

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
Report missing for 0848b1a1
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (0848b1a) Report Missing Report Missing Report Missing
Head commit (084a67f) 2799 1633 58.34%

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#533) 0 0 ∅ (not applicable)

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Footnotes

  1. Codacy didn't receive coverage data for the commit, or there was an error processing the received data. Check your integration for errors and validate that your coverage setup is correct.

@robfrank robfrank closed this Jan 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant