-
Notifications
You must be signed in to change notification settings - Fork 29
[Bug] zentity fails to obtain attribute values from object arrays during resolution #85
Description
Environment
- zentity version: 1.8.0
- Elasticsearch version: 7.11.1
Describe the bug
During a resolution job, zentity fails to access attributes whose values appear in an array of objects in the "_source" field of the matching documents. This is likely due to the use of JsonPointer to access attributes from documents (see also here), because the JSON Pointer syntax requires the index value for array elements. A potential solution is to replace the use of JsonPointer with JsonPath, which supports a syntax that can return all values within an array.
Expected behavior
zentity should assume (like Elasticsearch) that each object in an array of objects has the same schema, and then during a resolution job, zentity should obtain attribute values from arrays of objects just like it obtains attribute values from object values or arrays of values.
Steps to reproduce
Step 1. Create an index with a nested object.
PUT my_index
{
"mappings": {
"properties": {
"first_name": {
"type": "text"
},
"last_name": {
"type": "text"
},
"phone": {
"type": "nested",
"properties": {
"number": {
"type": "keyword"
},
"type": {
"type": "keyword"
}
}
}
}
}
}Step 2. Index two documents.
POST my_index/_bulk?refresh
{"index":{"_id":1}}
{"first_name":"alice","last_name":"jones","phone":[{"number":"555-123-4567","type":"home"},{"number":"555-987-6543","type":"mobile"}]}
{"index":{"_id":2}}
{"first_name":"allison","last_name":"jones","phone":[{"number":"555-987-6543","type":"mobile"}]}Step 3. Create an entity model.
PUT _zentity/models/my_entity_model
{
"attributes": {
"first_name": {},
"last_name": {},
"phone": {}
},
"resolvers": {
"name_phone": {
"attributes": [
"last_name",
"phone"
]
}
},
"matchers": {
"exact": {
"clause": {
"term": {
"{{ field }}": "{{ value }}"
}
}
},
"exact_phone": {
"clause": {
"nested": {
"path": "phone",
"query": {
"term": {
"{{ field }}": "{{ value }}"
}
}
}
}
}
},
"indices": {
"my_index": {
"fields": {
"first_name": {
"attribute": "first_name",
"matcher": "exact"
},
"last_name": {
"attribute": "last_name",
"matcher": "exact"
},
"phone.number": {
"attribute": "phone",
"matcher": "exact_phone"
}
}
}
}
}Step 4. Run a resolution job. Expect the first hop to match the given name and phone number (555-123-4567), and expect the second hop to match the new phone number (555-987-6543) from the document in the first hop.
POST _zentity/resolution/my_entity_model?queries
{
"attributes": {
"first_name": [ "alice" ],
"last_name": [ "jones" ],
"phone": [ "555-123-4567" ]
}
}Step 5. The resolution job fails with the following error message:
io.zentity.model.ValidationException: Expected 'string' attribute data type.
at io.zentity.resolution.input.value.StringValue.validate(StringValue.java:52)
at io.zentity.resolution.input.value.Value.<init>(Value.java:35)
at io.zentity.resolution.input.value.StringValue.<init>(StringValue.java:28)
at io.zentity.resolution.input.value.Value.create(Value.java:57)
at io.zentity.resolution.Job.onSearchComplete(Job.java:755)
at io.zentity.resolution.Job.access$000(Job.java:50)
at io.zentity.resolution.Job$1.onResponse(Job.java:1052)
at io.zentity.resolution.Job$1.onResponse(Job.java:1045)
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83)
at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:77)
at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:253)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:595)
at org.elasticsearch.action.search.ExpandSearchPhase.run(ExpandSearchPhase.java:109)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:372)
at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:366)
at org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:219)
at org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$1(FetchSearchPhase.java:101)
at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:107)
at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:36)
at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:84)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)
Additional context
The following request shows the query that zentity submits to Elasticsearch in the first hop, and the response that zentity receives from Elasticsearch to process. The error occurs when zentity tries to parse the values of the phone numbers, which are inside of an object array.
Request:
GET my_index/_search
{
"_source": true,
"query": {
"bool": {
"filter": [
{
"term": {
"last_name": "jones"
}
},
{
"nested": {
"path": "phone",
"query": {
"term": {
"phone.number": "555-123-4567"
}
}
}
}
]
}
},
"size": 1000
}Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"first_name" : "alice",
"last_name" : "jones",
"phone" : [
{
"number" : "555-123-4567",
"type" : "home"
},
{
"number" : "555-987-6543",
"type" : "mobile"
}
]
}
}
]
}
}