Skip to content

Introduce instant enrich policies.#73407

Closed
martijnvg wants to merge 6 commits intoelastic:mainfrom
martijnvg:instant_enrich_policy
Closed

Introduce instant enrich policies.#73407
martijnvg wants to merge 6 commits intoelastic:mainfrom
martijnvg:instant_enrich_policy

Conversation

@martijnvg
Copy link
Copy Markdown
Member

@martijnvg martijnvg commented May 26, 2021

Instant enrich policies don't need to be executed and
the enrich processor performs the lookups directly against
the enrich policy's source index.

Instant enrich policies are a trade off to allow more flexibility.
Changes made to the source index are almost immediately taken into
account by the enrich processor using an instant enrich policy.
There is no latency between changes made to the source index and
when the enrich policy is executed to create the new enrich index.

An enrich policy with the instant property set to false is
more efficient when it comes to performing the lookup for the enrichment,
because the lookup is performed against an optimized enrich index.

Relates to #48988

Example usage:

PUT /_enrich/policy/my-policy
{
    "match": {
        "instant": true,
        "indices": "users",
        "match_field": "email",
        "enrich_fields": [
            "first_name",
            "last_name",
            "address",
            "city",
            "zip",
            "state"
        ]
    }
}

// No need to execute the policy

PUT /_ingest/pipeline/my-pipeline
{
    "processors": [
        {
            "enrich" : {
                "policy_name": "my-policy",
                "field": "user.email",
                "target_field": "user"
            }
        }
    ]
}

POST /messages-201905/_doc?pipeline=my-pipeline
{
    "user": {
        "email": "zita.strunk@gmail.com"
    },
    "timestamp": "2019-05-01T13:07:45",
    "message": "..." 
}

Instant enrich policies don't need to be executed and
the enrich processor performs the lookups directly against
the enrich policy's source index.

Instant enrich policies are a trade off to allow more flexibility.
Changes made to the source index are almost immediately taken into
account by the enrich processor using an instant enrich policy.
There is no latency between changes made to the source index and
when the enrich policy is executed to create the new enrich index.

An enrich policy with the `instant` property set to `false` is
more efficient when it comes to performing the lookup for the enrichment,
because the lookup is performed against an optimized enrich index.

Relates to elastic#48988
@martijnvg martijnvg added the :Distributed/Ingest Node Execution or management of Ingest Pipelines label May 26, 2021
martijnvg added 2 commits May 26, 2021 09:56
No need to provide Supplier<EnrichPolicy>,
because an enrich policy can't be updated after it has been created.
@consulthys
Copy link
Copy Markdown
Contributor

@martijnvg has there been any progress on this? or maybe you can point me to another ongoing effort to make the enrich policy more dynamic, i.e. an enrich policy sourcing its data directly from the source index. Thanks in advance for sheding some light on this

@martijnvg
Copy link
Copy Markdown
Member Author

@consulthys As far as I know, this is the only effort of making enrich policies more dynamic.
I will update this PR soon and raise awareness.

@martijnvg martijnvg mentioned this pull request Sep 2, 2021
10 tasks
@dakrone
Copy link
Copy Markdown
Member

dakrone commented Nov 1, 2022

Closing for now until we can get back to this, we will probably end up re-using this code in the future though.

@dakrone dakrone closed this Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Ingest Node Execution or management of Ingest Pipelines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants