Add ingest-attachment support for per document indexed_chars limit#28977
Add ingest-attachment support for per document indexed_chars limit#28977dadoonet merged 4 commits intoelastic:masterfrom
indexed_chars limit#28977Conversation
We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.
Which allows running:
```
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information. Used to parse pdf and office files",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars_field" : "size"
}
}
]
}
```
Then index either:
```
PUT index/doc/1?pipeline=attachment
{
"data": "BASE64"
}
```
Which will use the default value (or the one defined by `indexed_chars`)
Or
```
PUT index/doc/2?pipeline=attachment
{
"data": "BASE64",
"size": 1000
}
```
Closes elastic#28942
|
Pinging @elastic/es-core-infra |
| "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=", | ||
| "attachment": { | ||
| "content_type": "application/rtf", | ||
| "language": "ro", |
There was a problem hiding this comment.
The doc build fails, because sl is returned instead.
There was a problem hiding this comment.
True. TBH I was expecting that as I did not run the doc check myself but let the CI do it (I was in a taxi when I pushed my PR ;) )...
I'll fix it, run a check locally and let the CI say ok.
Thanks!
|
@elasticmachine test this please |
|
@martijnvg Apparently CI is still unhappy but I'm unsure if it's my fault. Could you check it please? |
|
@dadoonet The test failure looks unrelated to this change. Maybe rebase master and re-run the pr build? |
|
Hmmm. Sounds like I forgot to backport this PR in 6.3 branch. |
Yes, that makes sense to me. |
We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.
Which allows running:
```
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information. Used to parse pdf and office files",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars_field" : "size"
}
}
]
}
```
Then index either:
```
PUT index/doc/1?pipeline=attachment
{
"data": "BASE64"
}
```
Which will use the default value (or the one defined by `indexed_chars`)
Or
```
PUT index/doc/2?pipeline=attachment
{
"data": "BASE64",
"size": 1000
}
```
Backport of #28977 in 6.x branch (6.4.0)
…31352) We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself. It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process. We add an option which reads this limit value from the document itself by adding a setting named `indexed_chars_field`. Which allows running: ``` PUT _ingest/pipeline/attachment { "description" : "Extract attachment information. Used to parse pdf and office files", "processors" : [ { "attachment" : { "field" : "data", "indexed_chars_field" : "size" } } ] } ``` Then index either: ``` PUT index/doc/1?pipeline=attachment { "data": "BASE64" } ``` Which will use the default value (or the one defined by `indexed_chars`) Or ``` PUT index/doc/2?pipeline=attachment { "data": "BASE64", "size": 1000 } ``` Backport of #28977 in 6.x branch (6.4.0)
We today support a global
indexed_charsprocessor parameter. But in some cases, users would like to set this limit depending on the document itself.It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
We add an option which reads this limit value from the document itself
by adding a setting named
indexed_chars_field.Which allows running:
Then index either:
Which will use the default value (or the one defined by
indexed_chars)Or
Closes #28942.