Skip to content

#3971 fix(postgres): return RowDescription with schema columns for empty SELECT results#4118

Merged
robfrank merged 2 commits intomainfrom
fix/3971-postgres-empty-result-schema
May 6, 2026
Merged

#3971 fix(postgres): return RowDescription with schema columns for empty SELECT results#4118
robfrank merged 2 commits intomainfrom
fix/3971-postgres-empty-result-schema

Conversation

@robfrank
Copy link
Copy Markdown
Collaborator

@robfrank robfrank commented May 6, 2026

Summary

  • Fix [BUG] Postgres Protocol Plugin does not return schema for empty query results #3971: PostgreSQL wire protocol returned RowDescription with 0 columns when a SELECT query returned 0 rows (e.g. Spark/PySpark's WHERE 1=0 schema probe)
  • Adds schema-discovery fallback via getColumnsFromQuerySchema() in three protocol paths: describeCommand (type='P'), executeCommand (when RowDescription not yet sent), and queryCommand (simple query)
  • Restores compatibility with Apache Spark / PySpark JDBC schema probing

Root cause

When getColumns() is called on an empty result set it correctly returns an empty map (no rows = no property names to inspect), but no fallback to type schema was attempted. PySpark caches the 0-column schema and later renders N empty rows even though data exists.

The primary path triggered by PgJDBC's default extended protocol is describeCommand type='P' (Describe Portal): it executes the query, gets 0 rows, then getColumns(emptyList) returns empty and a 0-column RowDescription is sent.

Test plan

  • New tests in PostgresProtocolIT:
    • selectWhere1Eq0ReturnsSchemaColumns (simple Statement.executeQuery)
    • selectStarWhere1Eq0ReturnsSchemaColumns (Spark's exact SELECT * probe)
    • preparedSelectWhere1Eq0ReturnsSchemaColumns (PreparedStatement path)
  • Each test verifies ResultSetMetaData.getColumnCount() > 0 and that projected columns appear in metadata even with 0 rows returned
  • Full PostgresProtocolIT suite (69 tests) passes with no regressions

🤖 Generated with Claude Code

…pty SELECT results

Spark/PySpark probes schema by sending SELECT ... WHERE 1=0 and expects
RowDescription with column metadata. ArcadeDB's Postgres wire protocol
returned 0 columns when results were empty, causing clients to create
schemas with no columns and display empty rows.

Fix adds a schema-discovery fallback via getColumnsFromQuerySchema() in
all three protocol paths that build column metadata from result sets:
Describe Portal, Execute when RowDescription not yet sent, and the
simple query path.
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 6, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 17 complexity

Metric Results
Complexity 17

View in Codacy

🟢 Coverage 88.24% diff coverage · -7.81% coverage variation

Metric Results
Coverage variation -7.81% coverage variation
Diff coverage 88.24% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (c485040) 123784 90948 73.47%
Head commit (1fd3a8a) 155200 (+31416) 101916 (+10968) 65.67% (-7.81%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4118) 17 15 88.24%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a fallback mechanism to provide column metadata for empty result sets, which is necessary for schema discovery in tools like Apache Spark. The changes affect the PostgreSQL network executor and include comprehensive integration tests. Review feedback recommends restricting the fallback logic to SELECT queries to ensure protocol compliance, eliminating redundant conditional checks, and refactoring the repeated logic into a shared helper method.

Comment on lines +294 to +298
if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty()) {
final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query);
if (schemaColumns != null && !schemaColumns.isEmpty())
portal.columns = schemaColumns;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The schema discovery fallback should be restricted to SELECT queries. Other commands like DELETE FROM or UPDATE might contain the " FROM " substring used by getColumnsFromQuerySchema for parsing, but they should not return a RowDescription message in the PostgreSQL protocol unless they have a RETURNING clause. Adding a check for the query type ensures protocol compliance.

Suggested change
if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty()) {
final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query);
if (schemaColumns != null && !schemaColumns.isEmpty())
portal.columns = schemaColumns;
}
if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty() && portal.query.toUpperCase(Locale.ENGLISH).startsWith("SELECT")) {
final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query);
if (schemaColumns != null && !schemaColumns.isEmpty())
portal.columns = schemaColumns;
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1fd3a8a. The SELECT guard now lives inside the new resolveEmptyResultSchemaColumns helper so all three call sites share it.

Comment on lines +378 to +382
if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty() && portal.isExpectingResult) {
final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query);
if (schemaColumns != null && !schemaColumns.isEmpty())
portal.columns = schemaColumns;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the describeCommand path, the fallback to schema columns should be restricted to SELECT queries. Additionally, the check for portal.isExpectingResult is redundant here as the entire block is already nested within an if (portal.isExpectingResult) check at line 370.

Suggested change
if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty() && portal.isExpectingResult) {
final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query);
if (schemaColumns != null && !schemaColumns.isEmpty())
portal.columns = schemaColumns;
}
if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty() && portal.query.toUpperCase(Locale.ENGLISH).startsWith("SELECT")) {
final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query);
if (schemaColumns != null && !schemaColumns.isEmpty())
portal.columns = schemaColumns;
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1fd3a8a. Redundant portal.isExpectingResult removed; SELECT guard centralized in the helper.

Comment on lines +525 to +529
if (columns.isEmpty() && cachedResultSet.isEmpty() && upperCaseText.startsWith("SELECT")) {
final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(query.query);
if (schemaColumns != null && !schemaColumns.isEmpty())
columns = schemaColumns;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for schema column fallback is duplicated in three different methods (describeCommand, executeCommand, and queryCommand). Consider refactoring this into a single helper method to improve maintainability and ensure consistent behavior across all protocol paths.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored in 1fd3a8a - extracted resolveEmptyResultSchemaColumns(query) and replaced the duplicated 4-line block at all three sites.

@claude
Copy link
Copy Markdown

claude Bot commented May 6, 2026

Code Review: Fix PostgreSQL Wire Protocol Empty Result Schema (#3971)

Overview

This PR fixes a real compatibility issue: when WHERE 1=0 returns 0 rows, the PostgreSQL wire protocol was sending a 0-column RowDescription, breaking Spark/PySpark JDBC schema probing. The fix adds a getColumnsFromQuerySchema() fallback in three code paths. The root cause analysis is correct and the approach is reasonable for a schema-less database like ArcadeDB.


Correctness Issues

1. Column projection mismatch (behavioral bug)

getColumnsFromQuerySchema() returns ALL columns of the type - either from a sample row or schema-defined properties. But for a projected query like SELECT name FROM SparkSchemaTest WHERE 1=0, the JDBC spec requires ResultSetMetaData.getColumnCount() to return 1 (just name), not all columns in the type plus system columns.

The tests pass because they only assert assertThat(meta.getColumnCount()).isGreaterThan(0) and check that named columns are present - they do not verify that extra columns are absent. Spark may see unexpected extra columns in the schema.

2. Fragile FROM-clause parsing

The typeName extraction uses simple string indexing on " FROM ". This silently fails or extracts the wrong name for:

  • JOINs: SELECT a.x, b.y FROM TypeA a JOIN TypeB b ON a.id = b.id WHERE 1=0
  • Subqueries: SELECT * FROM (SELECT FROM TypeA) sub WHERE 1=0
  • Quoted identifiers: SELECT * FROM "My Type" WHERE 1=0
  • Aliases: SELECT t.x FROM TypeA t WHERE 1=0

These cases silently return null, which is safe (no crash), but inconsistent - the Spark compatibility story only works for the simplest SELECT * FROM Type WHERE 1=0 form.

3. Condition inconsistency in describeCommand

At line 294, the fallback condition is:

if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty()) {

The executeCommand path at line 378 additionally checks && portal.isExpectingResult. The describeCommand check is already inside an if (portal.isExpectingResult) block so it is effectively equivalent, but the asymmetry is worth documenting or unifying to reduce future confusion.

4. Canonical type name not used in sample query

After database.getSchema().getType(typeName) validates the type exists, the original user-supplied typeName string is still used in the sample SQL rather than the canonical name from the schema object. This is low-risk (because the schema lookup acts as a whitelist), but it is better practice to use the validated canonical form:

final DocumentType docType = database.getSchema().getType(typeName);
if (docType == null) return null;
// Use docType.getName() instead of typeName below
final String sampleQuery = "SELECT FROM " + docType.getName() + " LIMIT 1";

Performance Concern

For every SELECT that returns 0 rows, the fallback now executes an additional SELECT FROM TypeName LIMIT 1. In Spark's schema-probe scenario this doubles the query load on table open. For large tables, getting 1 row is cheap with an index but adds latency on each probe. For an empty table, it falls through to the schema-property walk instead, which is fine.

This is an acceptable tradeoff for a schema probe, but a comment on the performance trade-off would help future readers.


Test Coverage Gaps

5. Missing test for truly empty table (main edge case)

All three new tests insert at least one row before querying with WHERE 1=0. This means the code path at lines 836-854 (the schema-property fallback when the type has no rows at all) is never exercised. The fix's primary audience is Spark connecting to an empty table - that path must be tested:

@Test
void selectWhere1Eq0OnEmptyTableReturnsSchemaColumns() throws Exception {
    try (var conn = getConnection(); var st = conn.createStatement()) {
        // Create type with defined properties but NO rows
        st.execute("CREATE DOCUMENT TYPE EmptySparkTest IF NOT EXISTS");
        st.execute("ALTER TYPE EmptySparkTest ADD PROPERTY name STRING");
        st.execute("ALTER TYPE EmptySparkTest ADD PROPERTY age INTEGER");

        ResultSet rs = st.executeQuery("SELECT name, age FROM EmptySparkTest WHERE 1=0");
        assertThat(rs.next()).isFalse();
        ResultSetMetaData meta = rs.getMetaData();
        assertThat(meta.getColumnCount()).isGreaterThan(0);
    }
}

6. Assertions are too weak

assertThat(meta.getColumnCount()).isGreaterThan(0) will pass even if only @rid is returned. For SELECT name, age FROM Type WHERE 1=0 the count should be exactly 2 (or 5 counting system columns if they are always included). The assertions should verify the exact column list, not just that something is non-zero.

7. Javadoc-style comments on test methods

The three new tests use /** ... */ doc-comments. The existing test at line 684 (preparedStatementDescribe) does not. Minor style inconsistency - prefer single-line or block comments on tests.


Minor / Suggestions

  • The null vs empty-map return from getColumnsFromQuerySchema() forces callers to do if (schemaColumns != null && !schemaColumns.isEmpty()) in three identical places. Returning an empty map instead of null would simplify callers and make the three call sites identical enough to extract a helper.
  • The queryCommand path uses upperCaseText.startsWith("SELECT") to gate the fallback, while the extended-protocol paths use portal.isExpectingResult. Both are reasonable proxies for the same intent; adding a comment explaining why startsWith("SELECT") is used there (since there is no isExpectingResult flag in the simple query path) would help future readers.
  • The schema: prefix check at line 807 lowercases only the prefix to do the comparison, which is correct, but a brief comment explaining what schema: queries are (internal system catalog queries) would help.

Summary

Area Status
Root cause analysis Correct
Three-path coverage (describe/execute/simple) Good
Column projection accuracy Needs fix - returns too many columns
FROM-clause parsing robustness Limited - fails for JOINs/subqueries
Empty table test coverage Missing
Assertion strength Weak
Performance Acceptable with caveat
Security Safe (schema lookup acts as whitelist)

The fix addresses the reported issue, but the column-projection mismatch and missing empty-table test are worth addressing before merge. The FROM-clause parsing limitation should at least be documented.

🤖 Generated with Claude Code

…dd empty-type test

- Extract resolveEmptyResultSchemaColumns helper to dedupe the three call sites
- Centralize the SELECT guard inside the helper so DELETE FROM / UPDATE
  cannot accidentally drive RowDescription off a parsed type name
- Drop the redundant portal.isExpectingResult check in executeCommand
  (already inside that conditional)
- Add selectWhere1Eq0OnEmptyTypeReturnsSchemaColumns to cover the
  schema-property fallback path (truly empty type with defined properties)
- Tighten test assertions to verify exact projected column names
@robfrank
Copy link
Copy Markdown
Collaborator Author

robfrank commented May 6, 2026

Round-2 changes in 1fd3a8a.

Addressed

  • Extracted resolveEmptyResultSchemaColumns(query) helper; the three call sites now share the SELECT guard, so DELETE FROM / UPDATE cannot trigger schema lookup off a parsed type name (Gemini build support #1, Bump gremlin.version from 3.4.10 to 3.5.1 #2, Bump slf4j-simple.version from 1.7.31 to 1.7.32 #3).
  • Removed the redundant portal.isExpectingResult check in executeCommand (already inside that conditional).
  • Added selectWhere1Eq0OnEmptyTypeReturnsSchemaColumns to cover the schema-property fallback path - a type with declared properties but zero rows. This is the primary Spark-on-empty-table case and was previously uncovered.
  • Tightened the existing tests: assertions now verify exact projected column names via a columnNames(meta) helper instead of the weaker getColumnCount() > 0.

Deferred (not introduced by this PR)

  • Column-projection mismatch: getColumnsFromQuerySchema returns all type columns rather than only the projected ones. This is the existing behavior of the DESCRIBE 'S' path (line 311-312), present before this PR. Properly projecting requires SQL-grammar work that's out of scope for the empty-result fix. Spark's actual probe form is SELECT * FROM type WHERE 1=0, which this fix handles correctly.
  • Fragile FROM-clause parsing for JOINs/subqueries/quoted identifiers: same source - inherited from getColumnsFromQuerySchema. The parser returns null for those cases, which is safe (no crash, no schema fallback).

70/70 PostgresProtocolIT pass.

@claude
Copy link
Copy Markdown

claude Bot commented May 6, 2026

Code Review: Fix PostgreSQL Wire Protocol Empty Result Schema (#3971)

Overview

This PR fixes a genuine compatibility bug: when WHERE 1=0 returns 0 rows, the PostgreSQL wire protocol was sending a 0-column RowDescription, breaking Spark/PySpark JDBC schema probing. The fix adds a resolveEmptyResultSchemaColumns() fallback applied in all three protocol paths (describeCommand, executeCommand, queryCommand). The root cause analysis is correct and the approach is sound.


Issues

1. Column projection mismatch (behavioral bug)

getColumnsFromQuerySchema() returns ALL columns of the type - from a sample row or schema properties. For SELECT name, age FROM SparkSchemaTest WHERE 1=0, the JDBC spec requires ResultSetMetaData.getColumnCount() == 2, not all columns of the type plus system columns (@rid, @type, @cat).

The tests only assert contains("name", "age") - they do not verify the column count matches what was projected. Spark may see unexpected extra columns in the schema and behave incorrectly downstream. A test like this would expose it:

assertThat(rs.getMetaData().getColumnCount()).isEqualTo(2); // currently fails

2. System columns always injected when type has no rows

In the empty-type fallback path (lines 840-842), @rid, @type, and @cat are always prepended, even when the query explicitly selects specific columns. For SELECT title, score FROM EmptySparkTest WHERE 1=0, Spark would receive [@rid, @type, @cat, title, score] - that is not what was requested.

3. Fragile FROM-clause parsing

The typeName extraction uses indexOf(" FROM "). This silently fails or extracts the wrong name for JOINs, subqueries, quoted identifiers, and aliases. All silently return null, which avoids a crash but means the Spark compatibility story only holds for the simplest single-table SELECT * FROM Type WHERE 1=0 form. That is probably fine for the documented use case, but should be noted in a comment so future maintainers understand the scope.

4. Missing Locale in getColumnsFromQuerySchema

Line 788 uses query.toUpperCase() without a Locale, while resolveEmptyResultSchemaColumns (line 871) correctly uses toUpperCase(Locale.ENGLISH). The inner method should be consistent:

final String upperQuery = query.toUpperCase(Locale.ENGLISH);

5. Constant array allocated on every call

Line 798 allocates a new String[] on every invocation:

for (String terminator : new String[]{" WHERE ", " LIMIT ", " ORDER ", " GROUP ", ";"}) {

This should be a private static final constant to reduce GC pressure (per ArcadeDB's performance guidelines).

6. Broad exception catch in getColumnsFromQuerySchema

The catch (Exception e) at line 856 silently swallows all errors except when DEBUG is on. Schema-not-found is expected, but other exceptions (OOM, assertion errors, runtime bugs) should not be silently ignored. Consider catching only SchemaException or similar checked exceptions and letting unexpected ones propagate.

7. Test helper uses fully qualified class names instead of imports

The columnNames() helper (line 995) uses java.util.Set, java.util.HashSet, and java.util.Locale fully qualified. Per existing code style in this file, these should be imported at the top.

8. Missing column count assertions in tests

As noted in issue #1, the tests do not assert that no unexpected extra columns are present. Each test should verify both that expected columns are present AND that no unexpected columns appear (or at minimum that the count equals the number of projected columns).


Minor Points

  • typeName.toUpperCase() is called on every loop iteration inside the terminator loop (line 799). Computing it once before the loop avoids repeated allocations.
  • The duplicate if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty()) block in executeCommand is inside if (!portal.rowDescriptionSent), which is already inside if (portal.isExpectingResult). The describeCommand block is also inside if (portal.isExpectingResult). The logic is equivalent but asymmetric - a brief comment noting "isExpectingResult already guaranteed by enclosing if" would help future readers.
  • The Javadoc comment on resolveEmptyResultSchemaColumns is helpful and well-written.

Summary

The fix is directionally correct and solves a real Spark/PySpark compatibility problem. The main concern is the column projection mismatch: clients requesting SELECT a, b receive metadata for all columns in the type. This may work acceptably for Spark's schema probe (it only checks the type exists, not the exact column set), but it is a protocol deviation. The tests should be tightened to catch this, or the behavior documented explicitly as a known limitation of schema-less schema discovery.

@robfrank robfrank added this to the 26.5.1 milestone May 6, 2026
@robfrank robfrank merged commit 39bf4db into main May 6, 2026
20 of 21 checks passed
@robfrank robfrank deleted the fix/3971-postgres-empty-result-schema branch May 6, 2026 20:39
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 41.17647% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.36%. Comparing base (c485040) to head (1fd3a8a).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...com/arcadedb/postgres/PostgresNetworkExecutor.java 41.17% 2 Missing and 8 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4118      +/-   ##
==========================================
- Coverage   64.48%   64.36%   -0.12%     
==========================================
  Files        1627     1627              
  Lines      123784   123851      +67     
  Branches    26373    26391      +18     
==========================================
- Hits        79822    79719     -103     
- Misses      32866    32935      +69     
- Partials    11096    11197     +101     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Postgres Protocol Plugin does not return schema for empty query results

1 participant