Most teams working with Elasticsearch, OpenSearch or RAG pipelines focus on ranking, embeddings or model quality when trying to improve relevance. But in many cases, the issue starts much earlier: in how text is normalized before indexing. In a previous post, we...
Some RAG issues have a simpler fix than people think: better text normalization. One common culprit is stemming. Stemming is a blunt, error-prone approach: it strips word endings mechanically, without properly accounting for morphology, part of speech, or context....
The case for evaluation of NLU platforms Synthetic image and video have proven to be a big success for cost-cutting. Synthetic text is following suit: tabular data (that is the data organized in a table with rows and columns) is becoming mainstream already, and the...
What Is Synthetic training data? Synthetic Training data is the data that is used to train an NLU engine. An NLU engine allows chatbots to understand the intent of user queries. The training data is enriched by data labeling or data annotation, with information about...
It is always important to evaluate the quality of your chatbots and conversational agents in order to know the its real health, accuracy and efficiency. Chatbot accuracy can only be increased by constantly evaluating and retraining it with new data that answers your...
Talking, expressing ourselves through words, using speech to exchange information is something that comes natural to humans. Then why don’t we just talk with goods and service providers on the internet, instead of using all kinds of user interfaces, buttons and...
Recent Comments