Conversation
…pty SELECT results Spark/PySpark probes schema by sending SELECT ... WHERE 1=0 and expects RowDescription with column metadata. ArcadeDB's Postgres wire protocol returned 0 columns when results were empty, causing clients to create schemas with no columns and display empty rows. Fix adds a schema-discovery fallback via getColumnsFromQuerySchema() in all three protocol paths that build column metadata from result sets: Describe Portal, Execute when RowDescription not yet sent, and the simple query path.
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 17 |
🟢 Coverage 88.24% diff coverage · -7.81% coverage variation
Metric Results Coverage variation ✅ -7.81% coverage variation Diff coverage ✅ 88.24% diff coverage Coverage variation details
Coverable lines Covered lines Coverage Common ancestor commit (c485040) 123784 90948 73.47% Head commit (1fd3a8a) 155200 (+31416) 101916 (+10968) 65.67% (-7.81%) Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch:
<coverage of head commit> - <coverage of common ancestor commit>Diff coverage details
Coverable lines Covered lines Diff coverage Pull request (#4118) 17 15 88.24% Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified:
<covered lines added or modified>/<coverable lines added or modified> * 100%
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
This pull request implements a fallback mechanism to provide column metadata for empty result sets, which is necessary for schema discovery in tools like Apache Spark. The changes affect the PostgreSQL network executor and include comprehensive integration tests. Review feedback recommends restricting the fallback logic to SELECT queries to ensure protocol compliance, eliminating redundant conditional checks, and refactoring the repeated logic into a shared helper method.
| if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty()) { | ||
| final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query); | ||
| if (schemaColumns != null && !schemaColumns.isEmpty()) | ||
| portal.columns = schemaColumns; | ||
| } |
There was a problem hiding this comment.
The schema discovery fallback should be restricted to SELECT queries. Other commands like DELETE FROM or UPDATE might contain the " FROM " substring used by getColumnsFromQuerySchema for parsing, but they should not return a RowDescription message in the PostgreSQL protocol unless they have a RETURNING clause. Adding a check for the query type ensures protocol compliance.
| if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty()) { | |
| final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query); | |
| if (schemaColumns != null && !schemaColumns.isEmpty()) | |
| portal.columns = schemaColumns; | |
| } | |
| if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty() && portal.query.toUpperCase(Locale.ENGLISH).startsWith("SELECT")) { | |
| final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query); | |
| if (schemaColumns != null && !schemaColumns.isEmpty()) | |
| portal.columns = schemaColumns; | |
| } |
There was a problem hiding this comment.
Fixed in 1fd3a8a. The SELECT guard now lives inside the new resolveEmptyResultSchemaColumns helper so all three call sites share it.
| if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty() && portal.isExpectingResult) { | ||
| final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query); | ||
| if (schemaColumns != null && !schemaColumns.isEmpty()) | ||
| portal.columns = schemaColumns; | ||
| } |
There was a problem hiding this comment.
Similar to the describeCommand path, the fallback to schema columns should be restricted to SELECT queries. Additionally, the check for portal.isExpectingResult is redundant here as the entire block is already nested within an if (portal.isExpectingResult) check at line 370.
| if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty() && portal.isExpectingResult) { | |
| final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query); | |
| if (schemaColumns != null && !schemaColumns.isEmpty()) | |
| portal.columns = schemaColumns; | |
| } | |
| if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty() && portal.query.toUpperCase(Locale.ENGLISH).startsWith("SELECT")) { | |
| final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(portal.query); | |
| if (schemaColumns != null && !schemaColumns.isEmpty()) | |
| portal.columns = schemaColumns; | |
| } |
There was a problem hiding this comment.
Fixed in 1fd3a8a. Redundant portal.isExpectingResult removed; SELECT guard centralized in the helper.
| if (columns.isEmpty() && cachedResultSet.isEmpty() && upperCaseText.startsWith("SELECT")) { | ||
| final Map<String, PostgresType> schemaColumns = getColumnsFromQuerySchema(query.query); | ||
| if (schemaColumns != null && !schemaColumns.isEmpty()) | ||
| columns = schemaColumns; | ||
| } |
There was a problem hiding this comment.
There was a problem hiding this comment.
Refactored in 1fd3a8a - extracted resolveEmptyResultSchemaColumns(query) and replaced the duplicated 4-line block at all three sites.
Code Review: Fix PostgreSQL Wire Protocol Empty Result Schema (#3971)OverviewThis PR fixes a real compatibility issue: when Correctness Issues1. Column projection mismatch (behavioral bug)
The tests pass because they only assert 2. Fragile FROM-clause parsing The
These cases silently return 3. Condition inconsistency in At line 294, the fallback condition is: if (portal.columns.isEmpty() && portal.cachedResultSet.isEmpty()) {The 4. Canonical type name not used in sample query After final DocumentType docType = database.getSchema().getType(typeName);
if (docType == null) return null;
// Use docType.getName() instead of typeName below
final String sampleQuery = "SELECT FROM " + docType.getName() + " LIMIT 1";Performance ConcernFor every SELECT that returns 0 rows, the fallback now executes an additional This is an acceptable tradeoff for a schema probe, but a comment on the performance trade-off would help future readers. Test Coverage Gaps5. Missing test for truly empty table (main edge case) All three new tests insert at least one row before querying with @Test
void selectWhere1Eq0OnEmptyTableReturnsSchemaColumns() throws Exception {
try (var conn = getConnection(); var st = conn.createStatement()) {
// Create type with defined properties but NO rows
st.execute("CREATE DOCUMENT TYPE EmptySparkTest IF NOT EXISTS");
st.execute("ALTER TYPE EmptySparkTest ADD PROPERTY name STRING");
st.execute("ALTER TYPE EmptySparkTest ADD PROPERTY age INTEGER");
ResultSet rs = st.executeQuery("SELECT name, age FROM EmptySparkTest WHERE 1=0");
assertThat(rs.next()).isFalse();
ResultSetMetaData meta = rs.getMetaData();
assertThat(meta.getColumnCount()).isGreaterThan(0);
}
}6. Assertions are too weak
7. Javadoc-style comments on test methods The three new tests use Minor / Suggestions
Summary
The fix addresses the reported issue, but the column-projection mismatch and missing empty-table test are worth addressing before merge. The FROM-clause parsing limitation should at least be documented. 🤖 Generated with Claude Code |
…dd empty-type test - Extract resolveEmptyResultSchemaColumns helper to dedupe the three call sites - Centralize the SELECT guard inside the helper so DELETE FROM / UPDATE cannot accidentally drive RowDescription off a parsed type name - Drop the redundant portal.isExpectingResult check in executeCommand (already inside that conditional) - Add selectWhere1Eq0OnEmptyTypeReturnsSchemaColumns to cover the schema-property fallback path (truly empty type with defined properties) - Tighten test assertions to verify exact projected column names
|
Round-2 changes in 1fd3a8a. Addressed
Deferred (not introduced by this PR)
70/70 |
Code Review: Fix PostgreSQL Wire Protocol Empty Result Schema (#3971)OverviewThis PR fixes a genuine compatibility bug: when Issues1. Column projection mismatch (behavioral bug)
The tests only assert assertThat(rs.getMetaData().getColumnCount()).isEqualTo(2); // currently fails2. System columns always injected when type has no rows In the empty-type fallback path (lines 840-842), 3. Fragile FROM-clause parsing The 4. Missing Locale in Line 788 uses final String upperQuery = query.toUpperCase(Locale.ENGLISH);5. Constant array allocated on every call Line 798 allocates a new for (String terminator : new String[]{" WHERE ", " LIMIT ", " ORDER ", " GROUP ", ";"}) {This should be a private static final constant to reduce GC pressure (per ArcadeDB's performance guidelines). 6. Broad exception catch in The 7. Test helper uses fully qualified class names instead of imports The 8. Missing column count assertions in tests As noted in issue #1, the tests do not assert that no unexpected extra columns are present. Each test should verify both that expected columns are present AND that no unexpected columns appear (or at minimum that the count equals the number of projected columns). Minor Points
SummaryThe fix is directionally correct and solves a real Spark/PySpark compatibility problem. The main concern is the column projection mismatch: clients requesting |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4118 +/- ##
==========================================
- Coverage 64.48% 64.36% -0.12%
==========================================
Files 1627 1627
Lines 123784 123851 +67
Branches 26373 26391 +18
==========================================
- Hits 79822 79719 -103
- Misses 32866 32935 +69
- Partials 11096 11197 +101 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
WHERE 1=0schema probe)getColumnsFromQuerySchema()in three protocol paths:describeCommand(type='P'),executeCommand(when RowDescription not yet sent), andqueryCommand(simple query)Root cause
When
getColumns()is called on an empty result set it correctly returns an empty map (no rows = no property names to inspect), but no fallback to type schema was attempted. PySpark caches the 0-column schema and later renders N empty rows even though data exists.The primary path triggered by PgJDBC's default extended protocol is
describeCommandtype='P' (Describe Portal): it executes the query, gets 0 rows, thengetColumns(emptyList)returns empty and a 0-column RowDescription is sent.Test plan
PostgresProtocolIT:selectWhere1Eq0ReturnsSchemaColumns(simpleStatement.executeQuery)selectStarWhere1Eq0ReturnsSchemaColumns(Spark's exactSELECT *probe)preparedSelectWhere1Eq0ReturnsSchemaColumns(PreparedStatementpath)ResultSetMetaData.getColumnCount() > 0and that projected columns appear in metadata even with 0 rows returnedPostgresProtocolITsuite (69 tests) passes with no regressions🤖 Generated with Claude Code