Skip to content

Expose FTS5Pattern initializer for raw pattern.#1050

Closed
GetToSet wants to merge 1 commit intogroue:developmentfrom
GetToSet:development
Closed

Expose FTS5Pattern initializer for raw pattern.#1050
GetToSet wants to merge 1 commit intogroue:developmentfrom
GetToSet:development

Conversation

@GetToSet
Copy link
Copy Markdown
Contributor

@GetToSet GetToSet commented Sep 4, 2021

Pull Request Checklist

  • This pull request is submitted against the development branch.
  • Inline documentation has been updated.
  • README.md or another dedicated guide has been updated.
  • Changes are tested.

FTS3Pattern as an initializer public init(rawPattern: String) throws that constructs it directly from raw pattern, while FTS5Pattern can be constructed using Database extensions, making public init(rawPattern: String, allowedColumns: [String] = []) throws publicly available helps make it constructed easier outside a database-agnostic context.

This PR also compared FTS3Pattern.swift with FTS5Pattern.swift, and made some minor adjustments to make it easier to those APIs aligned up.

@GetToSet GetToSet force-pushed the development branch 2 times, most recently from aa8efd3 to 7ce0c5c Compare September 4, 2021 11:53
@groue
Copy link
Copy Markdown
Owner

groue commented Sep 4, 2021

Hello @GetToSet,

Thank you for your contribution. I won't be able to review your pull request until mid September. Please be patient.

@groue
Copy link
Copy Markdown
Owner

groue commented Sep 20, 2021

Hello @GetToSet, I do not quite understand what feature is made possible with this pull request. Would you please explain what you need?

@GetToSet
Copy link
Copy Markdown
Contributor Author

The main purpose of this PR is to make init(rawPattern: String, ...) of FTS5Pattern public rather than internal so that patterns can be constructed directly from raw pattern directly while FTS3Pattern supports this kind of initialization, but FTS5Patterns don't.

@groue
Copy link
Copy Markdown
Owner

groue commented Sep 20, 2021

Thanks @GetToSet, but "because FTS3 can" is not a reason. FTS5 is not FTS3, and FTS5 pattern validation is not the same as FTS3 pattern validation (FTS5 cares about tables and columns when FTS3 does not). You have to find a better argument. For example, tell about what you can't possibly do, without any known workaround?

@GetToSet
Copy link
Copy Markdown
Contributor Author

GetToSet commented Sep 21, 2021

My use cases contain prefix queries like one + two + three*, these kind of queries may not be easily constructed by current initializers, since they transform queries into tokens and then reconstruct them into patterns to suit specific needs.

While using FTS3, I use the init(rawPattern: String, ...) to construct queries with prefix tokens, but after migrating to FTS5, I found I have no way to construct FTS5Patterns that contains prefix tokens.

@GetToSet
Copy link
Copy Markdown
Contributor Author

My use cases contain prefix queries like one + two + three*, these kind of queries may not be easily constructed by current initializers, since they transform queries into tokens and then reconstruct them into patterns to suit specific needs.

While using FTS3, I use the init(rawPattern: String, ...) to construct queries with prefix tokens, but after migrating to FTS5, I found I have no way to construct FTS5Patterns that contains prefix tokens.

Or maybe I can enhance current tokenization process to correctly support prefix tokens rather than exposing rawPattern initializer?

@groue
Copy link
Copy Markdown
Owner

groue commented Sep 21, 2021

Now we're talking :-)

It happens I had a similar conversation with one of my coworker yesterday, and I pushed on the development branch four new APIs:

  • FTS3Pattern(matchingAllPrefixesIn:)
  • FTS5Pattern(matchingAllPrefixesIn:)
  • FTS3.tokenize(_:)
  • FTS5.tokenize(_:)

The two new initializers accept user input and turn "foo bar" into the foo* bar* pattern. They are suitable for live updates of a list of results as a user is typing a query in a search bar. https://github.com/groue/GRDB.swift/blob/development/Documentation/FullTextSearch.md#fts5pattern

The two tokenize methods accept user input and turn "foo bar" into the ["foo", "bar"] array of tokens, and make it easier for applications to build raw tokens that are not built by ready-made initializers. https://github.com/groue/GRDB.swift/blob/development/Documentation/FullTextSearch.md#fts5-tokenizers

You still need a database connection in order to call Database.makeFTS5Pattern(rawPattern:forTable:), though. If this is too much an inconvenience, please explain why.

@groue
Copy link
Copy Markdown
Owner

groue commented Sep 21, 2021

  • FTS3.tokenize(_:)
  • FTS5.tokenize(_:)

Those methods accept a tokenizer:

// Default tokenization using the `ascii` tokenizer:
try FTS5.tokenize("SQLite database")  // ["sqlite", "database"]
try FTS5.tokenize("Gustave Doré")     // ["gustave", "doré"])

// Tokenization with an explicit tokenizer:
try FTS5.tokenize("SQLite database", withTokenizer: .porter()) // ["sqlite", "databas"]
try FTS5.tokenize("Gustave Doré", withTokenizer: .unicode61()) // ["gustave", "dore"])

They currently do not accept a custom FTS5 tokenizer - I'll have to make it possible.

@GetToSet
Copy link
Copy Markdown
Contributor Author

Thanks I'll checkout the latest development branch to have a try.

@groue
Copy link
Copy Markdown
Owner

groue commented Sep 21, 2021

They currently do not accept a custom FTS5 tokenizer - I'll have to make it possible.

Done. See https://github.com/groue/GRDB.swift/blob/development/Documentation/FullTextSearch.md#fts5-tokenization

Note that a database connection is needed for FTS5 tokenization.

@groue
Copy link
Copy Markdown
Owner

groue commented Sep 22, 2021

@GetToSet I'm closing this PR because it will not be merged. But you can provide feedback about the new APIs, of course. They are not merged into master yet, so we can discuss them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants