Fix bins on time-related fields#4612
Conversation
| RexNode maxValue = context.relBuilder.max(fieldExpr).over().toRex(); | ||
| RexNode dataRange = context.relBuilder.call(SqlStdOperatorTable.MINUS, maxValue, minValue); | ||
| RexNode minValue = context.relBuilder.min(fieldExpr).over().toRex(); |
There was a problem hiding this comment.
nitnit: more intuitive to start with minValue
| } | ||
|
|
||
| /** Width bucket calculation using nice number algorithm. */ | ||
| public static String calculateWidthBucket( |
There was a problem hiding this comment.
I think it is cleaner to have separate method for timestamp.
| @@ -0,0 +1 @@ | |||
| {"calcite":{"logical":"LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], email=[$8], lastname=[$9], age=[$16])\n LogicalSort(fetch=[5])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], email=[$9], lastname=[$10], _id=[$11], _index=[$12], _score=[$13], _maxscore=[$14], _sort=[$15], _routing=[$16], age=[WIDTH_BUCKET($8, 3, MIN($8) OVER (), MAX($8) OVER ())])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n","physical":"EnumerableLimit(fetch=[10000])\n EnumerableCalc(expr#0..12=[{inputs}], expr#13=[3], expr#14=[WIDTH_BUCKET($t8, $t13, $t11, $t12)], proj#0..7=[{exprs}], email=[$t9], lastname=[$t10], age=[$t14])\n EnumerableLimit(fetch=[5])\n EnumerableWindow(window#0=[window(aggs [MIN($8), MAX($8)])])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[account_number, firstname, address, balance, gender, city, employer, state, age, email, lastname]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"balance\",\"gender\",\"city\",\"employer\",\"state\",\"age\",\"email\",\"lastname\"],\"excludes\":[]}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n"}} No newline at end of file | |||
There was a problem hiding this comment.
It looks like JSON file instead of YAML.
| String timestampStr = (String) timestamp; | ||
| java.time.LocalDateTime localDateTime = | ||
| java.time.LocalDateTime.parse( | ||
| timestampStr, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")); |
There was a problem hiding this comment.
Why this pattern is different from something used in time_bins_test_data.json file?
And I was not quite sure if we can always assume this format.
There was a problem hiding this comment.
The test data uses ISO 8601 format (2025-07-28T10:00:00 with 'T') for ingestion into OpenSearch, but when Calcite retrieves timestamp values and converts them to strings, it uses the SQL literal format (yyyy-MM-dd HH:mm:ss with space).
| private static String formatTimestamp(long timestampMillis) { | ||
| Instant instant = Instant.ofEpochMilli(timestampMillis); | ||
| ZonedDateTime zdt = instant.atZone(ZoneOffset.UTC); | ||
| return zdt.format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")); |
There was a problem hiding this comment.
Should we make DateTimeFormatter instance a constant? (it is used above as well)
There was a problem hiding this comment.
Great suggestion!
I updated the following:
For parsing (input) - uses DATE_TIMESTAMP_FORMATTER
For formatting (output) - uses SQL_LITERAL_DATE_TIME_FORMAT
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
…g, AVG rounding, timestamp support
…g, AVG rounding, timestamp support Signed-off-by: Kai Huang <ahkcs@amazon.com>
…g, AVG rounding, timestamp support Signed-off-by: Kai Huang <ahkcs@amazon.com>
Description
Summary
This PR extends the
bincommand to support time-related fields (timestamp,date,time) in non-aggregation scenarios using theWIDTH_BUCKETUDF.Previously,
binson time fields only worked with aggregations viaauto_date_histogrampushdown.Now, users can apply binning on timestamp fields in projection queries without aggregation.
Related Issues
Resolves ##4578