-
Notifications
You must be signed in to change notification settings - Fork 190
[BUG] Regression (3.1 -> 3.3): PPL join with aggregates on IPs is now returning empty results. #4726
Copy link
Copy link
Closed
Labels
PPLPiped processing languagePiped processing languagebugSomething isn't workingSomething isn't working
Description
What is the bug?
When trying to do an aggregate-join-aggregate on an IP field, it returns no data. Example (mappings & data in repro):
source=web_logs
| stats count() as request_count by client_ip
| join type=inner client_ip [source=ip_geodata]
| stats sum(request_count) as total_requests by country
How can one reproduce the bug?
$ echo '{
"mappings": {
"properties": {
"client_ip": {
"type": "ip"
},
"status": {
"type": "long"
},
"timestamp": {
"type": "date"
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}' | xh put localhost:9200/web_logs
$ echo '{
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"client_ip": {
"type": "ip"
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}' | xh put localhost:9200/ip_geodata
$ echo '{"index":{"_id":"1"}}
{"client_ip":"203.0.113.1","timestamp":"2024-04-03T10:01:00Z","url":"/products","status":200}
{"index":{"_id":"2"}}
{"client_ip":"198.51.100.1","timestamp":"2024-04-03T10:07:00Z","url":"/cart","status":200}
{"index":{"_id":"3"}}
{"client_ip":"203.0.113.3","timestamp":"2024-04-03T10:08:00Z","url":"/home","status":200}
{"index":{"_id":"4"}}
{"client_ip":"203.0.113.1","timestamp":"2024-04-03T10:00:00Z","url":"/home","status":200}
{"index":{"_id":"5"}}
{"client_ip":"203.0.113.2","timestamp":"2024-04-03T10:02:00Z","url":"/cart","status":200}
{"index":{"_id":"6"}}
{"client_ip":"198.51.100.1","timestamp":"2024-04-03T10:04:00Z","url":"/about","status":200}
{"index":{"_id":"7"}}
{"client_ip":"203.0.113.1","timestamp":"2024-04-03T10:05:00Z","url":"/checkout","status":200}
{"index":{"_id":"8"}}
{"client_ip":"198.51.100.1","timestamp":"2024-04-03T10:03:00Z","url":"/home","status":200}
{"index":{"_id":"9"}}
{"client_ip":"203.0.113.2","timestamp":"2024-04-03T10:06:00Z","url":"/products","status":200}' | xh post localhost:9200/web_logs/_bulk
$ echo '{"index":{"_id":"1"}}
{"client_ip": "198.51.100.1","country": "Canada","city": "Toronto"}
{"index":{"_id":"2"}}
{"client_ip": "203.0.113.3","country": "UK","city": "London"}
{"index":{"_id":"3"}}
{"client_ip": "203.0.113.1","country": "USA","city": "New York"}
{"index":{"_id":"4"}}
{"client_ip": "203.0.113.2","country": "USA","city": "Seattle"}' | xh post localhost:9200/ip_geodata/_bulk
$ echo '{"query":"source=web_logs | stats count() as request_count by client_ip | join type=inner client_ip [source=ip_geodata] | stats sum(request_count) as total_requests by country"}' | xh post localhost:9200/_plugins/_ppl
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 141
Content-Type: application/json; charset=UTF-8
X-Opensearch-Version: OpenSearch/3.4.0-SNAPSHOT (opensearch)
{
"schema": [
{
"name": "total_requests",
"type": "bigint"
},
{
"name": "country",
"type": "string"
}
],
"datarows": [],
"total": 0,
"size": 0
}What is the expected behavior?
This should return 3 results:
┏━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ country ┃total_requests┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ USA │ 5 │
├──────────┼──────────────┤
│ Canada │ 3 │
├──────────┼──────────────┤
│ UK │ 1 │
└──────────┴──────────────┘
This is correctly returned if the respective ip types are changed to keywords. It also works correctly if you simplify by removing the initial count() stage before the join, and just do source | join | stats count() as total_count by country
What is your host/environment?
- Mainline, JDK 21
Do you have any screenshots?
N/A
Do you have any additional context?
Works on 3.1, stops working on 3.3, haven't yet bissected to when exactly it broke.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PPLPiped processing languagePiped processing languagebugSomething isn't workingSomething isn't working
Type
Projects
Status
Done