Elasticsearch Version
8.12.1
Installed Plugins
No response
Java Version
bundled
OS Version
docker
Problem Description
uri_parts ingest pipeline processor output wrong extension when there is none and URL path contains dot character(s)
URLs https://www.example.com/path.withdot/filenamewithoutextension computes extension as "extension": "withdot/filenamewithoutextension"
Steps to Reproduce
- Create pipeline
PUT /_ingest/pipeline/test-uri-parts
{
"processors": [
{
"uri_parts": {
"field": "url.original",
"target_field": "url.parsed"
}
}
]
}
- Simulate pipeline :
POST _ingest/pipeline/test-uri-parts/_simulate
{
"docs" :
[
{
"_index": "index",
"_id": "id",
"_source": {
"url": {
"original": "https://www.example.com/path.withdot/filenamewithoutextension"
}
}
}
]
}
Output contains wrong data for extension
{
"docs": [
{
"doc": {
"_index": "index",
"_version": "-3",
"_id": "id",
"_source": {
"url": {
"parsed": {
"path": "/path.withdot/folder/filenamewithoutextension",
"extension": "withdot/folder/filenamewithoutextension",
"original": "https://www.example.com/path.withdot/folder/filenamewithoutextension",
"scheme": "https",
"domain": "www.example.com"
},
"original": "https://www.example.com/path.withdot/folder/filenamewithoutextension"
}
},
"_ingest": {
"timestamp": "2024-02-19T09:47:21.38168605Z"
}
}
}
]
}
Workaround
- The issue won't appear on
https://www.example.com/path.withdot/filenamewithextension.zip so the following workaround is available to remove the unwanted extension field
PUT /_ingest/pipeline/test-uri-parts
{
"processors": [
{
"uri_parts": {
"field": "url.original",
"target_field": "url.parsed"
}
},
{
"remove": {
"field": "url.parsed.extension",
"if": "ctx?.url?.parsed?.extension != null && ctx?.url?.parsed?.extension.indexOf('/') != -1"
}
}
]
}
Elasticsearch Version
8.12.1
Installed Plugins
No response
Java Version
bundled
OS Version
docker
Problem Description
uri_partsingest pipeline processor output wrong extension when there is none and URL path contains dot character(s)URLs
https://www.example.com/path.withdot/filenamewithoutextensioncomputes extension as"extension": "withdot/filenamewithoutextension"Steps to Reproduce
Output contains wrong data for extension
Workaround
https://www.example.com/path.withdot/filenamewithextension.zipso the following workaround is available to remove the unwantedextensionfield