Add ManticoresearchAdapter by alexander-schranz · Pull Request #103 · PHP-CMSIG/search

alexander-schranz · 2023-02-13T22:17:13Z

Manticoresearch is a Sphinx Fork providing PHP implementation over https://github.com/manticoresoftware/manticoresearch-php. As requested by some on reddit we are trying to support also this.

TODO

External

How can I index multi string, date, ... in manticoresearch manticoresoftware/manticoresearch#1047

sanikolaev · 2023-05-16T03:34:53Z

currently stuck on create valid schema for complex document

Hi. I'm a member of Manticore team. Please let me know if we can help with this.

alexander-schranz · 2023-05-16T19:43:50Z

@sanikolaev your help is really welcome here. Maybe we can do this step by step, first would be nice if you could help how to map a SEAL Schema to Manticore Search schema.

The abstraction is supporting the following kind of fields for single representation I currently did use the following mapping, I think that should be correct. The datetime / timestamps seems to be representated in Manticoresearch as Number so I did use in our converter to convert "2023-..." to a timestamp number presentation like we already using for Apache Solr. So a very basic mapping should look like this hope atleast that is correct:

SEAL Field Type	Manticore Field Type	Example in JSON
BooleanField	`bool`	{ "field": false }
IntegerField	`int`	{ "field": 1 }
FloatField	`float`	{ "field": 1.5 }
DateTimeField	`timestamp`	{ "field": 1684265085 }
TextField	`text`	{ "field": "Text" }

But now the more difficult part, every Field can be multiple, I'm not yet sure how I can map something else then a Integer field to an array as based on https://manual.manticoresearch.com/Creating_a_table/Data_types#Multi-value-integer-(MVA) the multi currently only works for integers not other types of values. How to store other type of data then which represented in an array?

SEAL Schema Type	Manticore Field Type	Example in JSON
multi BooleanField	???	{ "field": [false, true] }
multi IntegerField	`multi`	{ "field": [1, 3] }
multi FloatField	???	{ "field": [1.5, 2.5] }
multi DateTimeField	???	{ "field": [1684265085, 1684265025] }
multi TextField	???	{ "field": ["Text 1", "Text"] }

The TextField has special flag called searchable which can be (default) true or false, I currently did based on that add ['indexed'] or not think that should work but as the above is currently blocking I was not able yet to test it: https://manual.manticoresearch.com/Creating_a_table/Data_types#Character-data-types:

SEAL Schema Type	Manticore Field Type	Example in JSON
TextField searchable	text ['indexed']	{ "field": "Text 1" }
multi TextField searchable	??? ['indexed']	{ "field": ["Text 1", "Text"] }

While reading the documentation about text / string I'm not sure if a field which contains text would maybe be better to be string instead of a text field.

SEAL Schema Type	Manticore Field Type	Example in JSON
TextField not searchable	string	{ "field": "Text 1" }
multi TextField not searchable	???	{ "field": ["Text 1", "Text"] }

All kind of fields can be filtearable and sortable based on the documentation such fields required to be marked as attribute: https://manual.manticoresearch.com/Creating_a_table/Data_types#Character-data-types:

SEAL Schema Type	Manticore Field Type	Example in JSON
BooleanField	`bool` ['attribute']	{ "field": false }
IntegerField	`int` ['attribute']	{ "field": 1 }
FloatField	`float` ['attribute']	{ "field": 1.5 }
DateTimeField	`timestamp` ['attribute']	{ "field": 1684265085 }
TextField	`text` ['attribute']	{ "field": "Text" }
multi BooleanField	??? ['attribute']	{ "field": [false, true] }
multi IntegerField	`multi` ['attribute']	{ "field": [1, 3] }
multi FloatField	??? ['attribute']	{ "field": [1.5, 2.5] }
multi DateTimeField	??? ['attribute']	{ "field": [1684265085, 1684265025] }
multi TextField	??? ['attribute']	{ "field": ["Text 1", "Text"] }

The problem with the multiple fields is what currently make the Implementation crashing as I'm not sure how this can be handle with manticore search engine or sphinx:

Warning
Manticoresearch\Exceptions\ResponseException: "MVA elements should be integers"

Form the previous discussion some JSON field maybe would support this, but I'm not sure about correclty defining that types. as it fails there in case of combination with indexed:

Warning
Manticoresearch\Exceptions\ResponseException: "sphinxql: syntax error, unexpected INDEXED, expecting ')' or ',' near 'indexed,blocks_text_description json indexed,blocks_text_media multi,blocks_embed_title json indexed,blocks_embed_media json,footer_title text indexed,created timestamp,commentsCount integer,rating float,comments_email json,comments_text json indexed,tags json indexed attribute,categoryIds multi,_source text)'"

As example our test has a tags fields here and the tags are filterable, searchable, multi TextField in the definitions see here.

sanikolaev · 2023-05-18T04:36:19Z

How to store other type of data then which represented in an array?

This is only possible using the json type, e.g.:

mysql> drop table if exists t; create table t(string_array json, float_array json, bool_array json); insert into t values(0, '["abc", "def"]', '[1.23, 2.34]', '[true, false]'),(0, '["ghi", "jkl"]', '[3.45, 4.56]', '[true, true]'); select *, any(x = 'abc' for x in string_array), any(x > 3.0 and x < 4.0 for x in float_array), all(x = 1 for x in bool_array) from t;
--------------
drop table if exists t
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
create table t(string_array json, float_array json, bool_array json)
--------------

Query OK, 0 rows affected (0.01 sec)

--------------
insert into t values(0, '["abc", "def"]', '[1.23, 2.34]', '[true, false]'),(0, '["ghi", "jkl"]', '[3.45, 4.56]', '[true, true]')
--------------

Query OK, 2 rows affected (0.00 sec)

--------------
select *, any(x = 'abc' for x in string_array), any(x > 3.0 and x < 4.0 for x in float_array), all(x = 1 for x in bool_array) from t
--------------

+---------------------+---------------+---------------------+--------------+--------------------------------------+-----------------------------------------------+--------------------------------+
| id                  | string_array  | float_array         | bool_array   | any(x = 'abc' for x in string_array) | any(x > 3.0 and x < 4.0 for x in float_array) | all(x = 1 for x in bool_array) |
+---------------------+---------------+---------------------+--------------+--------------------------------------+-----------------------------------------------+--------------------------------+
| 1515343812221005444 | ["abc","def"] | [1.230000,2.340000] | [true,false] |                                    1 |                                             0 |                              0 |
| 1515343812221005445 | ["ghi","jkl"] | [3.450000,4.560000] | [true,true]  |                                    0 |                                             1 |                              1 |
+---------------------+---------------+---------------------+--------------+--------------------------------------+-----------------------------------------------+--------------------------------+
2 rows in set (0.00 sec)

BTW timestamp internally is just int, so an array of timestamps would be multi.

alexander-schranz · 2023-05-18T14:16:22Z

@sanikolaev thx for the response, what about indexed and attribute on this fields. It currently ends in the following error:

Warning
Manticoresearch\Exceptions\ResponseException: "sphinxql: syntax error, unexpected INDEXED, expecting ')' or ',' near 'indexed,blocks_text_description json indexed,blocks_text_media multi,blocks_embed_title json indexed,blocks_embed_media json,footer_title text indexed,created timestamp,commentsCount integer,rating float,comments_email json,comments_text json indexed,tags json indexed attribute,categoryIds multi,_source text)'"

alexander-schranz · 2023-05-18T15:03:31Z

I tried to skip the attribute and indexed part for the json fields still run into another error this is the manticore field defintions had to use _ instead of . for field seperator for nested objects:

{
    "title": {
        "type": "text",
        "options": [
            "indexed"
        ]
    },
    "header_image_media": {
        "type": "integer",
        "options": []
    },
    "header_video_media": {
        "type": "string",
        "options": []
    },
    "article": {
        "type": "text",
        "options": [
            "indexed"
        ]
    },
    "blocks_text_title": {
        "type": "json",
        "options": []
    },
    "blocks_text_description": {
        "type": "json",
        "options": []
    },
    "blocks_text_media": {
        "type": "multi",
        "options": []
    },
    "blocks_embed_title": {
        "type": "json",
        "options": []
    },
    "blocks_embed_media": {
        "type": "json",
        "options": []
    },
    "footer_title": {
        "type": "text",
        "options": [
            "indexed"
        ]
    },
    "created": {
        "type": "timestamp",
        "options": []
    },
    "commentsCount": {
        "type": "integer",
        "options": []
    },
    "rating": {
        "type": "float",
        "options": []
    },
    "comments_email": {
        "type": "json",
        "options": []
    },
    "comments_text": {
        "type": "json",
        "options": []
    },
    "tags": {
        "type": "json",
        "options": []
    },
    "categoryIds": {
        "type": "multi",
        "options": []
    },
    "_source": {
        "type": "string",
        "options": []
    }
}

This is the document:

{
    "title": "New Blog",
    "header_image_media": 1,
    "article": "<article><h2>New Subtitle<\/h2><p>A html field with some content<\/p><\/article>",
    "blocks_text_title": "[\"Titel\",\"Titel 2\",\"Titel 4\"]",
    "blocks_text_description": "[\"<p>Description<\\\/p>\",\"<p>Description 4<\\\/p>\"]",
    "blocks_text_media": [
        3,
        4,
        3,
        4
    ],
    "blocks_embed_title": "[\"Video\"]",
    "blocks_embed_media": "[\"https:\\\/\\\/www.youtube.com\\\/watch?v=iYM2zFP3Zn0\"]",
    "footer_title": "New Footer",
    "created": "2022-01-24T12:00:00+01:00",
    "commentsCount": 2,
    "rating": 3.5,
    "comments_email": "[\"admin.nonesearchablefield@localhost\",\"example.nonesearchablefield@localhost\"]",
    "comments_text": "[\"Awesome blog!\",\"Like this blog!\"]",
    "tags": "[\"Tech\",\"UI\"]",
    "categoryIds": [
        1,
        2
    ],
    "_source": "{\"unrelated\":\"Unrelated\"}"
}

it is indixed via the PHP client this way:

$searchIndex = $this->client->index('test_complex');
$searchIndex->addDocument($aboveDocument, '23b30f01-d8fd-4dca-b36a-4710e360a965');

But when try to load that document via:

$searchIndex = $this->client->index('test_complex');
$searchIndex->getDocumentById('23b30f01-d8fd-4dca-b36a-4710e360a965');

It errors with:

Manticoresearch\Exceptions\ResponseException: "index test_complex: unsupported filter type 'string' on attribute 'id'"

Not sure why this is happening.

sanikolaev · 2023-05-18T15:45:08Z

what about indexed and attribute on this fields

indexed makes sense only for textual fields. It makes the field full-text indexed. It may be a little bit confusing since "string" and "text" without additional properties mean different things, but when you add one of the properties ("indexed", "stored", "attribute") they become aliases. We tried to describe that all in the docs. Let me know if smth is not clear, I'll be glad to help and update the docs afterwards.

had to use _ instead of . for field seperator for nested objects

I see. This is right. Manticore doesn't natively support nested objects and the period sign is used for json, e.g.: where json_attr.a.b = 123, that's why it's not allowed in column names.

unsupported filter type 'string' on attribute 'id'"

Manticore doesn't support string IDs. The ID requirements can be found here https://manual.manticoresearch.com/Creating_a_table/Data_types#Document-ID.

alexander-schranz · 2023-05-18T15:57:43Z

indexed makes sense only for textual fields. It makes the field full-text indexed. It may be a little bit confusing since "string" and "text" without additional properties mean different things

From the document above we have text which is searchable but are represented by an array of texts, as we did flatten the whole blocks objects. As suggested by you I did now use for this array text fields (tags, blocks_text_description, blocks_text_title, ..) the type json. Is that maybe not the correct way here for searchable text if that can not be indexed?

The tags field or sometimes called search keyword I think is a very common example which is multi field which is searchable and filterable. In elasticsearch this is achieved this way:

[
    'type' => 'text',
    'index' => true,
    'fields' => [
        'raw' => ['type' => 'keyword'],
    ],
]

So a field tags for searchability is created and a field tags.raw for filterability. How is this done in manticoresearch?

sanikolaev · 2023-05-18T16:01:59Z

The equivalent of Elasticsearch's

The tags field I think is a very common example which is multi field which is searchable and filterable. In elasticsearch this is achieved this way:

[
    'type' => 'text',
    'index' => true,
    'fields' => [
        'raw' => ['type' => 'keyword'],
    ],
]

in Manticore is type text indexed attribute or type string indexed attribute, e.g. :

create table t(type text indexed attribute, type2 string indexed attribute)
--------------

Query OK, 0 rows affected (0.00 sec)

--------------
desc t
--------------

+-------+--------+-------------------+
| Field | Type   | Properties        |
+-------+--------+-------------------+
| id    | bigint |                   |
| type  | string | indexed attribute |
| type2 | string | indexed attribute |
+-------+--------+-------------------+
3 rows in set (0.00 sec)

alexander-schranz · 2023-05-18T16:28:24Z

I'm not sure if I did understand you correctly tags or keywords are a list of data which can contain zero or many keywords e.g.:

{
   "uuid": "23b30f01-d8fd-4dca-b36a-4710e360a965",
   "tags": ["UI", "UX"]
}

For searchable I think we could use type string indexed by converting the document multi text fields to something like this:

{
   "uuid": "23b30f01-d8fd-4dca-b36a-4710e360a965",
   "tags": "UI UX"
}

But that how we could still get then filterability to work to get document tagged with that tags. That would still I think require a json field, but how can we make that json filter attribute to receive only documents having a specific tag?:

{
   "uuid": "23b30f01-d8fd-4dca-b36a-4710e360a965",
   "tags": "UI UX",
   "tags_raw": ["UI", "UX"]
}

PS: we are using the https://github.com/manticoresoftware/manticoresearch-php here so we are not actually do any create table ... statement ourselfs.

sanikolaev · 2023-05-19T06:25:05Z

how can we make that json filter attribute to receive only documents having a specific tag?

Filtering in an array of strings separated with a space there's a special mechanism in Manticore which you can use to avoid using JSON. Here's what it looks like in PHP:

<?php
require_once __DIR__ . '/vendor/autoload.php';
use Manticoresearch\Search;

$config = ['host'=>'127.0.0.1','port'=>9308];
$client = new \Manticoresearch\Client($config);
$index = $client->index('test');

$index->drop();

$index->create([
  'uuid'=>['type'=>'string'],
  'tags'=>['type'=>'text indexed attribute']
]);

$index->addDocument([
  'uuid' => '23b30f01-d8fd-4dca-b36a-4710e360a965',
  'tags' => 'UI UX'
]);

echo "--------------- \$index->search('UI')->get(): -------------------\n";
$docs = $index->search('UI')->get();
foreach($docs as $doc) print_r($doc);

echo "--------------- \$index->search('UI')->get(): -------------------\n";
$docs = $index->search('UI')->get();
foreach($docs as $doc) print_r($doc);

echo "--------------- \$index->search('')->filter('any(tags)', 'in', ['UI'])->get(): -------------------\n";
$docs = $index->search('')->filter('any(tags)', 'in', ['UI'])->get();
foreach($docs as $doc) print_r($doc);

which will give you:

➜  ~ php schranz.php
--------------- $index->search('UI')->get(): -------------------
Manticoresearch\ResultHit Object
(
    [data:protected] => Array
        (
            [_id] => 1515343812221005526
            [_score] => 1500
            [_source] => Array
                (
                    [tags] => UI UX
                    [uuid] => 23b30f01-d8fd-4dca-b36a-4710e360a965
                )

        )

)
--------------- $index->search('UI')->get(): -------------------
Manticoresearch\ResultHit Object
(
    [data:protected] => Array
        (
            [_id] => 1515343812221005526
            [_score] => 1500
            [_source] => Array
                (
                    [tags] => UI UX
                    [uuid] => 23b30f01-d8fd-4dca-b36a-4710e360a965
                )

        )

)
--------------- $index->search('')->filter('any(tags)', 'in', ['UI'])->get(): -------------------
Manticoresearch\ResultHit Object
(
    [data:protected] => Array
        (
            [_id] => 1515343812221005526
            [_score] => 1
            [_source] => Array
                (
                    [tags] => UI UX
                    [uuid] => 23b30f01-d8fd-4dca-b36a-4710e360a965
                )

        )

)

So in this specific case 'tags'=>['type'=>'text indexed attribute'] should suffice.

From the docs:

alexander-schranz · 2024-09-13T12:17:19Z

Closing this as I'm to unexperiecned with manticore to finish this one. If somebody want to give it try feel free to reopen a merge request. Every adapter should fullfill the tests run against them only thing which not all adapters currently support is the multi index search so that would be fine to skip also for manticore.

The main focus should be on the different SearchTests which I could not be able to work correctly with the different expected cases.

alexander-schranz added the features New feature or request label Feb 13, 2023

alexander-schranz force-pushed the feature/add-manticoresearch-adapter branch 2 times, most recently from ab0ccb5 to b74f9aa Compare February 13, 2023 22:21

alexander-schranz added the Adapter: Manticoresearch label Feb 13, 2023

alexander-schranz force-pushed the feature/add-manticoresearch-adapter branch from e85cd0b to 7a97e4c Compare February 15, 2023 22:08

alexander-schranz added the help wanted Extra attention is needed label Feb 15, 2023

alexander-schranz force-pushed the feature/add-manticoresearch-adapter branch 4 times, most recently from 8cff708 to 2d70635 Compare February 22, 2023 21:43

alexander-schranz marked this pull request as draft April 1, 2023 17:41

sanikolaev mentioned this pull request May 16, 2023

Feature request: add Manticore Search #188

Open

alexander-schranz force-pushed the feature/add-manticoresearch-adapter branch 5 times, most recently from c95518c to c508f6c Compare May 16, 2023 19:10

alexander-schranz force-pushed the feature/add-manticoresearch-adapter branch from c508f6c to 38db6ff Compare May 16, 2023 19:49

Add ManticoresearchAdapter

9087029

alexander-schranz force-pushed the feature/add-manticoresearch-adapter branch from 38db6ff to 9087029 Compare May 16, 2023 20:02

Add JSON fields to FlattenMarshaller

20e2fa4

alexander-schranz closed this Sep 13, 2024

alexander-schranz deleted the feature/add-manticoresearch-adapter branch September 13, 2024 12:17

alexander-schranz mentioned this pull request Feb 2, 2026

Sphinx / Manticore #646

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ManticoresearchAdapter#103

Add ManticoresearchAdapter#103
alexander-schranz wants to merge 2 commits into0.5from
feature/add-manticoresearch-adapter

alexander-schranz commented Feb 13, 2023 •

edited

Loading

Uh oh!

sanikolaev commented May 16, 2023

Uh oh!

alexander-schranz commented May 16, 2023 •

edited

Loading

Uh oh!

sanikolaev commented May 18, 2023

Uh oh!

alexander-schranz commented May 18, 2023

Uh oh!

alexander-schranz commented May 18, 2023

Uh oh!

sanikolaev commented May 18, 2023

Uh oh!

alexander-schranz commented May 18, 2023 •

edited

Loading

Uh oh!

sanikolaev commented May 18, 2023

Uh oh!

alexander-schranz commented May 18, 2023 •

edited

Loading

Uh oh!

sanikolaev commented May 19, 2023

Uh oh!

alexander-schranz commented Sep 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alexander-schranz commented Feb 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

External

Uh oh!

sanikolaev commented May 16, 2023

Uh oh!

alexander-schranz commented May 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanikolaev commented May 18, 2023

Uh oh!

alexander-schranz commented May 18, 2023

Uh oh!

alexander-schranz commented May 18, 2023

Uh oh!

sanikolaev commented May 18, 2023

Uh oh!

alexander-schranz commented May 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanikolaev commented May 18, 2023

Uh oh!

alexander-schranz commented May 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanikolaev commented May 19, 2023

Uh oh!

alexander-schranz commented Sep 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexander-schranz commented Feb 13, 2023 •

edited

Loading

alexander-schranz commented May 16, 2023 •

edited

Loading

alexander-schranz commented May 18, 2023 •

edited

Loading

alexander-schranz commented May 18, 2023 •

edited

Loading