Skip to content

Feature/Search: Reorganize section, and add content about hybrid search#106

Merged
amotl merged 6 commits intomainfrom
collab/better-search
Jul 31, 2024
Merged

Feature/Search: Reorganize section, and add content about hybrid search#106
amotl merged 6 commits intomainfrom
collab/better-search

Conversation

@amotl
Copy link
Copy Markdown
Member

@amotl amotl commented Jul 30, 2024

About

Inspired by Doing Hybrid Search in CrateDB (thanks @surister!), this patch intends to reorganize the "Search" section, in order to make its structure strong enough to cover the discrimination between Full-Text Search, Vector Search, and Hybrid Search well.

Preview

https://cratedb-guide--106.org.readthedocs.build/feature/search/

Details

  • The "Search" section now doesn't educate about any details any longer. Instead, it has been repurposed into an "index" page, guiding to the topics of Full-Text Search vs. Vector Search vs. Hybrid Search in a balanced way.

Thoughts

  • This enhancement aims to set the stage for further improvements and contributions in this area. Please support us shaping this important section of the documentation.
  • Please also share your opinions what can be improved, and/or submit patches. Thanks!

Inspirations

Only just a bit. Relevant sections need to be improved further.

/cc @matriv, @mkleen, @seut, @BaurzhanSakhariev, @karynzv, @hlcianfagna, @hammerhead, @proddata, @WalBeh, @selina-meyer, @donmadeus, @widmogrod, @kneth

@amotl amotl marked this pull request as ready for review July 30, 2024 13:17
Comment thread docs/feature/search/vector.md Outdated
Comment on lines +28 to +37
## Learn

To learn more about vector search, please visit the corresponding page about
[](#vector-store).


:::{todo}
Bring page into the same shape like the others in this section.
Maybe just move the [](#vector-store) page here without further ado.
:::
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better repurpose and dissolve the main-level Vector Store page completely, and just move it here, in order to provide a better narrative and enhanced guidance. wdyt?

Copy link
Copy Markdown
Member Author

@amotl amotl Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about this, @surister? In my impression, looking at it from a fresh perspective, listing "Vector Store" as a main feature does not make too much sense, because you will mostly use it for searching anyway.

In this spirit, I am planning to move that page to the revamped /search/ section, in order to make the trio of (fts, vector, hybrid) complete. Do you agree?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. Thinking about it once more, the Vector Store page might still make sense to have "Vector" enumerated on the top menu level in that section. Otherwise, it might make people feel it is missing here.

Maybe still keep that page, but switch the content around?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again. I decided to refactor mercilessly, and dissolved both the "Vector Store" and the "Geospatial Data" page, refactoring them into "Vector Search" and "Geospatial Search", now making up a quartett together with the newly added "Hybrid Search" page.

image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That other patch adds relevant redirects to compensate for refactored pages now available on a different location.

@amotl amotl mentioned this pull request Jul 30, 2024
6 tasks
Copy link
Copy Markdown
Contributor

@surister surister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant, I love the idea and look forward putting in more content, great stuff

@surister
Copy link
Copy Markdown
Contributor

At the beginning you talk about how vectors is not enough hence we need to mix with bm25, this is very well written in the description of https://haystackconf.com/us2023/talk-16/, maybe it can serve as an inspiration?

@amotl
Copy link
Copy Markdown
Member Author

amotl commented Jul 30, 2024

Brilliant, I love the idea and look forward putting in more content, great stuff.

Thanks a stack, appreciate it.

See [...] for other sources of inspiration.

I've adjusted my OP by adding a few inspirations I had slightly used for conceiving the first sketch of this patch. I was sure you would provide much better resources than coming from my quick research. Thanks!

I will be so happy to accept well rewritten paragraphs on this PR, based on the material you are suggesting, if you can afford a few cycles. Every chunk counts!

Otherwise, we will probably need to add it to the backlog summary at GH-101, because I need to take care about other obligations.

@amotl
Copy link
Copy Markdown
Member Author

amotl commented Jul 30, 2024

[improve] about how vectors is not enough hence we need to mix with bm25

I've diverted this to the backlog at #101 (comment). Thanks!

amotl added 5 commits July 31, 2024 00:22
- Spend a dedicated slot for each of FTS vs. Vector vs. Hybrid, in order
  to give the structure more strength about adding more relevant content
  next to the introductory tidbits.

- Refactor feature/vector page into feature/search/vector, renaming it
  from "Vector Store" to "Vector Search". It has already been curated
  well, and provides the same valuable shape to this documentation slot
  like all the other "feature card" pages in the whole "All Features"
  subsection.
- FTS: Re-shuffle content cards in "Learn" subsection. Add blog articles
  about "Indexing and Storage in CrateDB" and "Indexing Text for Both
  Effective Search and Accurate Analysis".

- Vector+Hybrid: This and that.
- Hybrid: Bring page into the same shape like the others. Add a little
  "Usage" section, that can be improved later.

- Hybrid: Add two output examples from blog article.

- Hybrid: Cross-linking.

- FTS: Add another SQL examples.

- Vector: Minor improvements.

- Advanced Querying: Add references to FTS-, Vector-, and Hybrid-Search
  pages.
@amotl amotl merged commit fe8ea98 into main Jul 31, 2024
@amotl amotl deleted the collab/better-search branch July 31, 2024 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants