Skip to content

Fix uri_parts processor behaviour for missing extensions#105689

Merged
nielsbauman merged 4 commits intoelastic:mainfrom
nielsbauman:uri-parts-extension
Feb 22, 2024
Merged

Fix uri_parts processor behaviour for missing extensions#105689
nielsbauman merged 4 commits intoelastic:mainfrom
nielsbauman:uri-parts-extension

Conversation

@nielsbauman
Copy link
Copy Markdown
Contributor

The uri_parts processor was behaving incorrectly for URI's that included a dot in the path but did not have an extension. For example: https://www.example.com/path.withdot/filenamewithoutextension would get processed as "extension": "withdot/filenamewithoutextension".
Instead of determining the extension based on the full path, we should use the last segment of the path (i.e. the part after the last /).
This PR also includes (missing) YAML REST tests for the uri_parts processor.

Fixes #105612

@nielsbauman nielsbauman added >bug :Distributed/Ingest Node Execution or management of Ingest Pipelines Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.14.0 labels Feb 21, 2024
@nielsbauman nielsbauman requested a review from joegallo February 21, 2024 11:10
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @nielsbauman, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo
Copy link
Copy Markdown
Contributor

joegallo commented Feb 21, 2024

PUT _ingest/pipeline/uri-processor
{
  "processors": [
    {
      "uri_parts": {
        "field" : "my_uri"
      }
    }
  ]
}

POST /_ingest/pipeline/uri-processor/_simulate
{
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "my_uri": "http://example.com/foo/bar/baz."
      }
    }
  ]
}

On main this yields "extension" : "", but I think on this PR it would no longer result in there being an "extension" at all.

I'm not actually sure I care about the old behavior, but I don't think we should change it unintentionally. Can you add a test for the old behavior and switch it back? I'd include a comment in the test about how it just happens to be the case that we do this, and it's not necessarily something we intend to keep forever but that if we change it we want to be conscious of it being a change in the observable behavior of the system.

@nielsbauman nielsbauman requested a review from joegallo February 21, 2024 13:30
@nielsbauman nielsbauman merged commit b1fcedd into elastic:main Feb 22, 2024
@nielsbauman nielsbauman deleted the uri-parts-extension branch February 22, 2024 08:41
andrewkroh added a commit to elastic/integrations that referenced this pull request Apr 17, 2024
Ignore the existence of an invalid  `url.extension` field. Stack versions < 8.14 had a
bug that populated the field with bad data. After a package uses a minimum stack version
of 8.14.0 then this addition to `dynamic_fields` can be removed.

This fixes errors like this which occur under v8.14.0

    test case failed: Expected results are different from actual ones: --- want
    +++ got
    @@ -1797,7 +1797,6 @@
                     "preserve_duplicate_custom_fields"
                 ],
                 "url": {
    -                "extension": "com/page",
                     "original": "www.example.com/page",
                     "path": "www.example.com/page"
                 }

Relates: elastic/elasticsearch#105689
kcreddy added a commit to elastic/integrations that referenced this pull request Aug 6, 2024
Extension to #9623

Ignore the existence of an invalid url.extension field. Stack versions < 8.14
had a bug that populated the field with bad data. After a package uses a minimum
stack version of 8.14.0 then this addition to dynamic_fields can be removed.

This fixes errors like this which occur under v8.14.0+

+++ got
@@ -1797,7 +1797,6 @@
                 "preserve_duplicate_custom_fields"
             ],
             "url": {
-                "extension": "com/page",
                 "original": "www.example.com/page",
                 "path": "www.example.com/page"
             }

Relates: elastic/elasticsearch#105689
harnish-crest-data pushed a commit to chavdaharnish/integrations that referenced this pull request Feb 4, 2025
Extension to elastic#9623

Ignore the existence of an invalid url.extension field. Stack versions < 8.14
had a bug that populated the field with bad data. After a package uses a minimum
stack version of 8.14.0 then this addition to dynamic_fields can be removed.

This fixes errors like this which occur under v8.14.0+

+++ got
@@ -1797,7 +1797,6 @@
                 "preserve_duplicate_custom_fields"
             ],
             "url": {
-                "extension": "com/page",
                 "original": "www.example.com/page",
                 "path": "www.example.com/page"
             }

Relates: elastic/elasticsearch#105689
harnish-crest-data pushed a commit to chavdaharnish/integrations that referenced this pull request Feb 5, 2025
Extension to elastic#9623

Ignore the existence of an invalid url.extension field. Stack versions < 8.14
had a bug that populated the field with bad data. After a package uses a minimum
stack version of 8.14.0 then this addition to dynamic_fields can be removed.

This fixes errors like this which occur under v8.14.0+

+++ got
@@ -1797,7 +1797,6 @@
                 "preserve_duplicate_custom_fields"
             ],
             "url": {
-                "extension": "com/page",
                 "original": "www.example.com/page",
                 "path": "www.example.com/page"
             }

Relates: elastic/elasticsearch#105689
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed/Ingest Node Execution or management of Ingest Pipelines Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.14.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uri_parts parse directory structure for extension

3 participants