-
Notifications
You must be signed in to change notification settings - Fork 190
[BUG] Streamstats command incorrectly treats null as a valid group #4751
Copy link
Copy link
Closed
Labels
PPLPiped processing languagePiped processing languagebugSomething isn't workingSomething isn't working
Description
What is the bug?
When using the streamstats by <field> clause, the current implementation treats null values as a valid group key and computes separate statistics for them. Which is not consistent with the expected behavior.
How can one reproduce the bug?
Steps to reproduce the behavior:
- Set data:
PUT stream_test_null
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"age": {
"type": "integer"
},
"state": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"country": {
"type": "text"
},
"year": {
"type": "integer"
},
"month": {
"type": "integer"
}
}
}
}
POST stream_test_null/_bulk?refresh=true
{"index":{"_id":"1"}}
{"name":"Jake","age":70,"state":"California","country":"USA","year":2023,"month":4}
{"index":{"_id":"2"}}
{"name":"Hello","age":30,"state":"New York","country":"USA","year":2023,"month":4}
{"index":{"_id":"3"}}
{"name":"John","age":25,"state":"Ontario","country":"Canada","year":2023,"month":4}
{"index":{"_id":"4"}}
{"name":"Jane","age":20,"state":"Quebec","country":"Canada","year":2023,"month":4}
{"index":{"_id":"5"}}
{"name":null,"age":10,"state":null,"country":"Canada","year":2023,"month":4}
{"index":{"_id":"6"}}
{"name":"Kevin","year":2023,"month":4}
- Run this query
POST _plugins/_ppl
{
"query": """
source=stream_test_null | streamstats avg(age) as avg_age by state
"""
}
- See error
{
"schema": [
{
"name": "name",
"type": "string"
},
{
"name": "country",
"type": "string"
},
{
"name": "state",
"type": "string"
},
{
"name": "month",
"type": "int"
},
{
"name": "year",
"type": "int"
},
{
"name": "age",
"type": "int"
},
{
"name": "avg_age",
"type": "double"
}
],
"datarows": [
[
"Jake",
"USA",
"California",
4,
2023,
70,
70
],
[
"Hello",
"USA",
"New York",
4,
2023,
30,
30
],
[
"John",
"Canada",
"Ontario",
4,
2023,
25,
25
],
[
"Jane",
"Canada",
"Quebec",
4,
2023,
20,
20
],
[
null,
"Canada",
null,
4,
2023,
10,
10
],
[
"Kevin",
null,
null,
4,
2023,
null,
10
]
],
"total": 6,
"size": 6
}
The two null value for state are incorrectly considered in one valid group.
What is the expected behavior?
"datarows": [
[
"Jake",
"USA",
"California",
4,
2023,
70,
70
],
[
"Hello",
"USA",
"New York",
4,
2023,
30,
30
],
[
"John",
"Canada",
"Ontario",
4,
2023,
25,
25
],
[
"Jane",
"Canada",
"Quebec",
4,
2023,
20,
20
],
[
null,
"Canada",
null,
4,
2023,
10,
null
],
[
"Kevin",
null,
null,
4,
2023,
null,
null
]
]
What is your host/environment?
- OS: [e.g. iOS]
- Version [e.g. 22]
- Plugins
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languagebugSomething isn't workingSomething isn't working
Type
Projects
Status
Done