Skip to content

Modify foreach processor to accept a single processor instead of collection #19345

@BigFunger

Description

@BigFunger

@BigFunger, @Bargs, and @talevy had a discussion on Zoom earlier today, and this is one of the issues that we discussed.

Summary

I would like to see the foreach processor reworked so that it only accepts a single processor instead of an array of processors as it does now.

Background

I have been working on the UI for ingest pipelines, and specifically I have been trying to implement the foreach processor. The UI uses the verbose setting on the simulate API to report back to the user.

This way, when the user add or edits a processor, I can use the output from the parent processor as the input of the next and provide them with the data necessary to build out their processors. This is a problem in the context of the foreach processor because I can't provide the user with the input and output of each of the processors defined within the foreach processor.

Per our discussion, it also make sense to structure the foreach processor in this way because it follows the patterns that have been established in the other processors. For example, if you want to apply an uppercase processor to more than one field in the document, you need to create one uppercase processor for each field you want to act on.

Example

With a pipeline definition of the following:

{
  "pipeline": {
    "description": "",
    "processors": [
      {
        "split": {
          "tag": "processor_1",
          "field": "message",
          "separator": " "
        }
      },
      {
        "foreach": {
          "tag": "processor_2",
          "field": "message",
          "processors": [
            {
              "uppercase": {
                "tag": "processor_3",
                "field": "_value"
              },
              "lowercase": {
                "tag": "processor_4",
                "field": "_value"
              },
              "uppercase": {
                "tag": "processor_5",
                "field": "_value"
              }
            }
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "these are the words of a sentence"
      }
    }
  ]
}

I would expect the following output:

{
  "docs": [
    {
      "processor_results": [
        {
          "tag": "processor_1",
          "doc": {
            "_type": "_type",
            "_id": "_id",
            "_index": "_index",
            "_source": {
              "message": [
                "these",
                "are",
                "the",
                "words",
                "of",
                "a",
                "sentence"
              ]
            },
            "_ingest": {
              "timestamp": "2016-07-06T21:27:14.585+0000"
            }
          }
        },
        {
          "tag": "processor_3",
          "doc": {
            "_type": "_type",
            "_id": "_id",
            "_index": "_index",
            "_source": {
              "message": [
                "THESE",
                "ARE",
                "THE",
                "WORDS",
                "OF",
                "A",
                "SENTENCE"
              ]
            },
            "_ingest": {
              "timestamp": "2016-07-06T21:27:14.585+0000"
            }
          }
        },
        {
          "tag": "processor_4",
          "doc": {
            "_type": "_type",
            "_id": "_id",
            "_index": "_index",
            "_source": {
              "message": [
                "these",
                "are",
                "the",
                "words",
                "of",
                "a",
                "sentence"
              ]
            },
            "_ingest": {
              "timestamp": "2016-07-06T21:27:14.585+0000"
            }
          }
        },
        {
          "tag": "processor_5",
          "doc": {
            "_type": "_type",
            "_id": "_id",
            "_index": "_index",
            "_source": {
              "message": [
                "THESE",
                "ARE",
                "THE",
                "WORDS",
                "OF",
                "A",
                "SENTENCE"
              ]
            },
            "_ingest": {
              "timestamp": "2016-07-06T21:27:14.585+0000"
            }
          }
        },
        {
          "tag": "processor_2",
          "doc": {
            "_type": "_type",
            "_id": "_id",
            "_index": "_index",
            "_source": {
              "message": [
                "THESE",
                "ARE",
                "THE",
                "WORDS",
                "OF",
                "A",
                "SENTENCE"
              ]
            },
            "_ingest": {
              "timestamp": "2016-07-06T21:27:14.585+0000"
            }
          }
        }
      ]
    }
  ]
}

Instead, I get this back:

{
  "docs": [
    {
      "processor_results": [
        {
          "tag": "processor_1",
          "doc": {
            "_id": "_id",
            "_type": "_type",
            "_index": "_index",
            "_source": {
              "message": [
                "these",
                "are",
                "the",
                "words",
                "of",
                "a",
                "sentence"
              ]
            },
            "_ingest": {
              "timestamp": "2016-07-08T21:36:14.400+0000"
            }
          }
        },
        {
          "tag": "processor_2",
          "doc": {
            "_id": "_id",
            "_type": "_type",
            "_index": "_index",
            "_source": {
              "message": [
                "these",
                "are",
                "the",
                "words",
                "of",
                "a",
                "sentence"
              ]
            },
            "_ingest": {
              "timestamp": "2016-07-08T21:36:14.400+0000"
            }
          }
        }
      ]
    }
  ]
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions