{"id":3518,"date":"2018-05-06T13:55:44","date_gmt":"2018-05-06T11:55:44","guid":{"rendered":"https:\/\/codingexplained.com\/?p=3518"},"modified":"2020-05-08T10:39:36","modified_gmt":"2020-05-08T08:39:36","slug":"creating-custom-elasticsearch-analyzers","status":"publish","type":"post","link":"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers","title":{"rendered":"Creating Custom Elasticsearch Analyzers"},"content":{"rendered":"<p>In a previous post, you saw <a title=\"Configuring Analyzers and Token Filters\" href=\"\/coding\/elasticsearch\/configuring-elasticsearch-analyzers-token-filters\" target=\"_blank\" rel=\"noopener noreferrer\">how to configure one of the built-in analyzers as well as a token filter<\/a>. Now it&#8217;s time to see how we can build our own custom analyzer. We do that by defining which character filters, tokenizer, and token filters the analyzer should consist of, and potentially configuring them.<\/p>\n<pre><code class=\"json\">PUT \/analyzers_test\n{\n  \"settings\": {\n    \"analysis\": {\n      \"analyzer\": {\n        \"english_stop\": {\n          \"type\": \"standard\",\n          \"stopwords\": \"_english_\"\n        },\n        \"my_analyzer\": {\n          \"type\": \"custom\",\n          \"tokenizer\": \"standard\",\n          \"char_filter\": [\n            \"html_strip\"\n          ],\n          \"filter\": [\n            \"lowercase\",\n            \"trim\",\n            \"my_stemmer\"\n          ]\n        }\n      },\n      \"filter\": {\n        \"my_stemmer\": {\n          \"type\": \"stemmer\",\n          \"name\": \"english\"\n        }\n      }\n    }\n  }\n}<\/code><\/pre>\n<p>The above query adds two analyzers and one token filter, which is used within the custom analyzer. Apart from the custom token filter, built-in character filters and token filters are used. And of course the <span class=\"code\">standard<\/span> tokenizer. You can find a list of the available ones <a title=\"Built-in analyzers, tokenizers, token filters, and character filters\" href=\"https:\/\/www.elastic.co\/guide\/en\/elasticsearch\/reference\/current\/analysis-analyzers.html#_custom_analyzers\" target=\"_blank\" rel=\"external nofollow noopener noreferrer\">in the documentation right here<\/a>.<\/p>\n<p>Great, so the index has been created with our settings. Let&#8217;s use the Analyze API to verify that the analyzer works as we expect.<\/p>\n<pre><code class=\"json\">POST \/analyzers_test\/_analyze\n{\n  \"analyzer\": \"my_analyzer\",\n  \"text\": \"I'm in the mood for drinking <strong>semi-dry<\/strong> red wine!\"\n}<\/code><\/pre>\n<pre><code class=\"json\">{\n  \"tokens\": [\n    {\n      \"token\": \"i'm\",\n      \"start_offset\": 0,\n      \"end_offset\": 3,\n      \"type\": \"\",\n      \"position\": 0\n    },\n    {\n      \"token\": \"in\",\n      \"start_offset\": 4,\n      \"end_offset\": 6,\n      \"type\": \"\",\n      \"position\": 1\n    },\n    {\n      \"token\": \"the\",\n      \"start_offset\": 7,\n      \"end_offset\": 10,\n      \"type\": \"\",\n      \"position\": 2\n    },\n    {\n      \"token\": \"mood\",\n      \"start_offset\": 11,\n      \"end_offset\": 15,\n      \"type\": \"\",\n      \"position\": 3\n    },\n    {\n      \"token\": \"for\",\n      \"start_offset\": 16,\n      \"end_offset\": 19,\n      \"type\": \"\",\n      \"position\": 4\n    },\n    {\n      \"token\": \"drink\",\n      \"start_offset\": 20,\n      \"end_offset\": 28,\n      \"type\": \"\",\n      \"position\": 5\n    },\n    {\n      \"token\": \"semi\",\n      \"start_offset\": 37,\n      \"end_offset\": 41,\n      \"type\": \"\",\n      \"position\": 6\n    },\n    {\n      \"token\": \"dry\",\n      \"start_offset\": 42,\n      \"end_offset\": 54,\n      \"type\": \"\",\n      \"position\": 7\n    },\n    {\n      \"token\": \"red\",\n      \"start_offset\": 55,\n      \"end_offset\": 58,\n      \"type\": \"\",\n      \"position\": 8\n    },\n    {\n      \"token\": \"wine\",\n      \"start_offset\": 59,\n      \"end_offset\": 63,\n      \"type\": \"\",\n      \"position\": 9\n    }\n  ]\n}<\/code><\/pre>\n<p>Taking a glance at the results, we can see that the letter &#8220;i&#8221; has been lowercased within the first token. We can also see that the HTML tags have been stripped out, and that the word &#8220;drinking&#8221; has been stemmed to &#8220;drink.&#8221; Great, so the analyzer is good to go and we can now use it within field mappings.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In a previous post, you saw how to configure one of the built-in analyzers as well as a token filter. Now it&#8217;s time to see how we can build our own custom analyzer. We do that by defining which character filters, tokenizer, and token filters the analyzer should consist of, and potentially configuring them. PUT&hellip; <a href=\"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers\" class=\"more-link\">read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"jetpack_post_was_ever_published":false,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[154],"tags":[155],"series":[],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Creating Custom Elasticsearch Analyzers<\/title>\n<meta name=\"description\" content=\"Learn how to create custom analyzers in Elasticsearch, using both built-in and custom tokenizers, character filters, token filters, etc.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Creating Custom Elasticsearch Analyzers\" \/>\n<meta property=\"og:description\" content=\"Learn how to create custom analyzers in Elasticsearch, using both built-in and custom tokenizers, character filters, token filters, etc.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers\" \/>\n<meta property=\"og:site_name\" content=\"Coding Explained\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/codingexplained\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/codingexplained\" \/>\n<meta property=\"article:published_time\" content=\"2018-05-06T11:55:44+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-05-08T08:39:36+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/codingexplained.com\/wp-content\/uploads\/2015\/11\/codingexplained-fb-promote.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"444\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Bo Andersen\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@codingexplained\" \/>\n<meta name=\"twitter:site\" content=\"@codingexplained\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Bo Andersen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers\",\"url\":\"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers\",\"name\":\"Creating Custom Elasticsearch Analyzers\",\"isPartOf\":{\"@id\":\"https:\/\/codingexplained.com\/#website\"},\"datePublished\":\"2018-05-06T11:55:44+00:00\",\"dateModified\":\"2020-05-08T08:39:36+00:00\",\"author\":{\"@id\":\"https:\/\/codingexplained.com\/#\/schema\/person\/e19c92ec991f571605f047cefeaa950d\"},\"description\":\"Learn how to create custom analyzers in Elasticsearch, using both built-in and custom tokenizers, character filters, token filters, etc.\",\"breadcrumb\":{\"@id\":\"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/codingexplained.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Creating Custom Elasticsearch Analyzers\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/codingexplained.com\/#website\",\"url\":\"https:\/\/codingexplained.com\/\",\"name\":\"Coding Explained\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/codingexplained.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/codingexplained.com\/#\/schema\/person\/e19c92ec991f571605f047cefeaa950d\",\"name\":\"Bo Andersen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/codingexplained.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/28f5826f9d5d544b0c5e1ec321dfdfb8?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/28f5826f9d5d544b0c5e1ec321dfdfb8?s=96&d=mm&r=g\",\"caption\":\"Bo Andersen\"},\"description\":\"I am a back-end web developer with a passion for open source technologies. I have been a PHP developer for many years, and also have experience with Java and Spring Framework. I currently work full time as a lead developer. Apart from that, I also spend time on making online courses, so be sure to check those out!\",\"sameAs\":[\"https:\/\/codingexplained.com\",\"https:\/\/www.facebook.com\/codingexplained\",\"https:\/\/www.linkedin.com\/in\/ba0708\",\"https:\/\/twitter.com\/codingexplained\",\"https:\/\/www.youtube.com\/c\/codingexplained\"],\"url\":\"https:\/\/codingexplained.com\/author\/andy\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Creating Custom Elasticsearch Analyzers","description":"Learn how to create custom analyzers in Elasticsearch, using both built-in and custom tokenizers, character filters, token filters, etc.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers","og_locale":"en_US","og_type":"article","og_title":"Creating Custom Elasticsearch Analyzers","og_description":"Learn how to create custom analyzers in Elasticsearch, using both built-in and custom tokenizers, character filters, token filters, etc.","og_url":"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers","og_site_name":"Coding Explained","article_publisher":"https:\/\/www.facebook.com\/codingexplained","article_author":"https:\/\/www.facebook.com\/codingexplained","article_published_time":"2018-05-06T11:55:44+00:00","article_modified_time":"2020-05-08T08:39:36+00:00","og_image":[{"width":1200,"height":444,"url":"https:\/\/codingexplained.com\/wp-content\/uploads\/2015\/11\/codingexplained-fb-promote.png","type":"image\/png"}],"author":"Bo Andersen","twitter_card":"summary_large_image","twitter_creator":"@codingexplained","twitter_site":"@codingexplained","twitter_misc":{"Written by":"Bo Andersen","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers","url":"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers","name":"Creating Custom Elasticsearch Analyzers","isPartOf":{"@id":"https:\/\/codingexplained.com\/#website"},"datePublished":"2018-05-06T11:55:44+00:00","dateModified":"2020-05-08T08:39:36+00:00","author":{"@id":"https:\/\/codingexplained.com\/#\/schema\/person\/e19c92ec991f571605f047cefeaa950d"},"description":"Learn how to create custom analyzers in Elasticsearch, using both built-in and custom tokenizers, character filters, token filters, etc.","breadcrumb":{"@id":"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/codingexplained.com\/coding\/elasticsearch\/creating-custom-elasticsearch-analyzers#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/codingexplained.com\/"},{"@type":"ListItem","position":2,"name":"Creating Custom Elasticsearch Analyzers"}]},{"@type":"WebSite","@id":"https:\/\/codingexplained.com\/#website","url":"https:\/\/codingexplained.com\/","name":"Coding Explained","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/codingexplained.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/codingexplained.com\/#\/schema\/person\/e19c92ec991f571605f047cefeaa950d","name":"Bo Andersen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/codingexplained.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/28f5826f9d5d544b0c5e1ec321dfdfb8?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28f5826f9d5d544b0c5e1ec321dfdfb8?s=96&d=mm&r=g","caption":"Bo Andersen"},"description":"I am a back-end web developer with a passion for open source technologies. I have been a PHP developer for many years, and also have experience with Java and Spring Framework. I currently work full time as a lead developer. Apart from that, I also spend time on making online courses, so be sure to check those out!","sameAs":["https:\/\/codingexplained.com","https:\/\/www.facebook.com\/codingexplained","https:\/\/www.linkedin.com\/in\/ba0708","https:\/\/twitter.com\/codingexplained","https:\/\/www.youtube.com\/c\/codingexplained"],"url":"https:\/\/codingexplained.com\/author\/andy"}]}},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3mJkW-UK","_links":{"self":[{"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/posts\/3518"}],"collection":[{"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/comments?post=3518"}],"version-history":[{"count":5,"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/posts\/3518\/revisions"}],"predecessor-version":[{"id":3882,"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/posts\/3518\/revisions\/3882"}],"wp:attachment":[{"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/media?parent=3518"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/categories?post=3518"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/tags?post=3518"},{"taxonomy":"series","embeddable":true,"href":"https:\/\/codingexplained.com\/wp-json\/wp\/v2\/series?post=3518"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}