Skip to content

Move SQLite pragma configuration to the database configuration file#49691

Closed
suwyn wants to merge 1 commit intorails:mainfrom
AccountAim:ar-sqlite-configurable-pragma
Closed

Move SQLite pragma configuration to the database configuration file#49691
suwyn wants to merge 1 commit intorails:mainfrom
AccountAim:ar-sqlite-configurable-pragma

Conversation

@suwyn
Copy link

@suwyn suwyn commented Oct 18, 2023

Motivation / Background

A recent PR #49349 introduced a number of SQLite optimizations to improve database performance.

Those optimizations were hard coded in the SQLite adapter which left developers in the same position they were before - that if they wanted to configure SQLite beyond that default configuration, they still needed to rely on monkey patching the adapter.

Hard coding those parameters also introduce the potential for breakage. For example, the write ahead log is now the default journal mode. The side effect of this is that SQLite will use three files instead of one for the database which may break any operational scripts (e.g. the backup strategy must change).

Another issue is that WAL mode is an optimization for writing to databases, and now with Rails 7.1 it makes a false assumption that you'll always want to write to any SQLite database.

Rails 7.1 now throws an error connecting to read-only sqlite database that isn't already in WAL mode.

A use case where that is problematic. Let's say on each build of my application I embed the latest version of a Zipcode database that we receive from a 3rd party. The database, since it is readonly, has the correct file permissions (readonly) and is embedded with journal mode turned off (no -shm or -wal files). 7.1 will throw an error now since it can't change the database to use WAL.

Detail

This Pull Request moves those performance optimizations into the database configuration file. Developers will now have the option to tune their SQLite configurations by specifying PRAGMA values along with the rest of their database configuration, in config/database.yml.

The defaults that were introduced in 7.1 have been moved to the database.yml generator so new applications can benefit from them while existing applications upgrading 7.1+ won't be impacted side effects.

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

@rafaelfranca
Copy link
Member

while existing applications upgrading 7.1+ won't be impacted side effects.

How they will not be affected? The defaults that were applied will just not be applied anymore.

@rafaelfranca
Copy link
Member

rafaelfranca commented Oct 18, 2023

Ah, I get what you mean now, this code can't be released in 7.1 anymore. It will only be released in 7.2, so not applying defaults automatically with affect any app that is upgrading from 7.1.

We need a way to apply it by default if it isn't configured explicitly in the database.yml. Or at least deprecate it unless it is explicitly configured.

@eileencodes
Copy link
Member

We need a way to apply it by default if it isn't configured explicitly in the database.yml. Or at least deprecate it unless it is explicitly configured.

For other client requirements we do defaults like this: https://github.com/rails/rails/blob/main/activerecord/lib/active_record/database_configurations/hash_config.rb#L71-L73

But I feel a bit weird having sqlite3 specific defaults set in the hash_config.rb. None of the other options specified are db specific. 🤔

@suwyn
Copy link
Author

suwyn commented Oct 18, 2023

Ah, I get what you mean now, this code can't be released in 7.1 anymore. It will only be released in 7.2, so not applying defaults automatically with affect any app that is upgrading from 7.1.

Yes, sorry poorly worded. The WAL journal mode is persistent, so the database will remain in WAL even if the PRAGMA isn't executed when the connection is established. So I believe upgrading from 7.1->7.2 will be less impactful than upgrading to 7.1

We need a way to apply it by default if it isn't configured explicitly in the database.yml. Or at least deprecate it unless it is explicitly configured.

I could see the case for applying the defaults on a new database, but not sure about applying persistent changes implicitly to an existing database...

@rafaelfranca
Copy link
Member

Got it. I think this patch is ok then.

@suwyn
Copy link
Author

suwyn commented Oct 18, 2023

But I feel a bit weird having sqlite3 specific defaults set in the hash_config.rb. None of the other options specified are db specific. 🤔

That's interesting @eileencodes I hadn't thought about that. There is evidence of other configuration options that are not represented in the HashConfig. My preference would be to keep as generalized and not introduce anything db specific.

This allows developers to fine tune their SQLite pragma through
additional configuration options available in the database configuration
file.
@suwyn suwyn force-pushed the ar-sqlite-configurable-pragma branch from 76293cd to 1fd33bf Compare October 18, 2023 20:19
@suwyn
Copy link
Author

suwyn commented Oct 19, 2023

Updated the PR description with another use case where the changes in 7.1 are problematic. tl;dr; Rails 7.1 is no longer compatible with readonly sqlite databases.

Also - although I preserved the performance optimizations of #49349 by putting them as defaults in the database configuration file generator, my preference would be to remove them from there as well and require any pragma beyond the FK statement (which existed before 7.1) be explicitly set by the developer. Looking forward to others thoughts and opinions on this.

VERY excited on the momentum growing behind SQLite and Rails! :itshappening:

raw_execute("PRAGMA mmap_size = #{128.megabytes}", "SCHEMA")

(@config.keys & PRAGMA_STATEMENTS).each do |key|
@raw_connection.send "#{key}=", @config[key]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preferred using the setters offered by the sqlite gem rather than executing the PRAGMA statements directly because the gem provides normalization of the values before setting them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this code correctly, it will also accept something like journal_mode=(:wal). Which I think is nicer than journal_mode: WAL in the config.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, same could be done for for synchronous

@matthewd
Copy link
Member

Yeah, we don't want to unwind the new defaults or force them into the user's configuration. We should set defaults (which might change over time) that we think are the most appropriate in general, and then allow the user to override them via their config where circumstance dictates. If we want to get fancy, I'd be open to considering a basic check for a filesystem-readonly DB file when choosing them. (Or in extreme, even rescue an attempt to journalize a currently-not-journalled DB and just keep going? 🤔)

That is (broadly) what we do in the other adapters too -- though more rarely, because most "how your database operates" config lives in the server in those cases. Following that precedent, I suspect SQLite pragmas are probably most equivalent to session variables, which would mean these values should live under a variables: section in the config:

variables = @config.fetch(:variables, {}).stringify_keys

variables = @config.fetch(:variables, {}).stringify_keys


I'd rather we not list out the possible pragma names... @flavorjones @tenderlove can we rely on SQLite3::Pragmas.method_defined?("#{key}=") sticking around? I see the module is documented, but also "intended .. solely". (But also maybe WTB set_pragma that figures out the appropriate casting from the pragma name, so this use case could avoid relying on send? 🙏🏻)

@suwyn
Copy link
Author

suwyn commented Oct 19, 2023

Yeah, we don't want to unwind the new defaults or force them into the user's configuration.

I don't have a strong preference either way, but I believe we should be able to unset them.

The goal of this PR is to allow me to connect to ActiveRecord to SQLite databases without forcing it to use WAL, and to allow activerecord to connect to databases that are readonly. Both were possible before 7.1.

I suspect SQLite pragmas are probably most equivalent to session variables, which would mean these values should live under a variables: section in the config

I'd be happy to move them to the variables section if that's desired. I would like to point out that there are two types of pragmas. Ones that persist beyond the connection (e.g. journal_mode) and others that are connection specific (e.g. cache_size). The former write to the database and change it for all connections (so not like session variables).

I'd rather we not list out the possible pragma names...

Perhaps that constant is badly named. It's a list of pragmas that are available to be configured via the database config, not meant to be all the pragmas that exist. I've found it difficult to discern which configuration options were available with each adapter so this approach was meant to make that more discoverable.

I see the module is documented, but also "intended .. solely"

I read that as the module should not be included elsewhere, not that the methods shouldn't be used.

@flavorjones
Copy link
Member

flavorjones commented Oct 19, 2023

can we rely on SQLite3::Pragmas.method_defined?("#{key}=") sticking around?

@matthewd Yep, that module is included in the Database class and so it's part of the public API.

I will note, though, that not all pragmas supported by SQLite3 have an accessor, and as new pragmas are added upstream the gem will likely always lag unless and until someone needs to use the pragma and takes the time to write and test the accessor with the correct type coercion.

So: @suwyn maybe it's a good idea to use the accessor if one exists, but fall back to using raw_execute("PRAGMA ...") if not? This would allow users to set the latest and greatest pragmas regardless of whether the gem knows about it.

@yahonda yahonda mentioned this pull request Oct 31, 2023
4 tasks
@suwyn
Copy link
Author

suwyn commented Nov 7, 2023

@flavorjones The challenge i have with the fallback is that we'd still need a way to be explicit on which configuration options are intended to be set as PRAGMAs, otherwise we'd potentially be attempting to set invalid pragma options.

Some options that come to mind -

  1. Specify pragma statements as a subkey in the database configuration. e.g.
default: &default
  adapter: sqlite3
  encoding: unicode
  pool: <%%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000
  database: storage/development.sqlite3
  pragma:
    journal_mode: wal
    synchronous: normal
    mmap_size: 134217728
    journal_size_limit: 67108864
    cache_size: 2000    

or (within a variables subkey)

default: &default
  adapter: sqlite3
  encoding: unicode
  pool: <%%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000
  database: storage/development.sqlite3
  variables:
    pragma:
      journal_mode: wal
      synchronous: normal
      mmap_size: 134217728
      journal_size_limit: 67108864
      cache_size: 2000    
  1. Keep the PRAGMA_STATEMENTS constant (I'd rename it to PRAGMA_OPTIONS) to include those supported by the gem. Anyone wanting to use an unsupported pragma would have to configure the connection manually (or monkey patch the adapter).

I think I prefer the first option, but indifferent on whether to use variables or not. I don't fully understand why the base configuration block can't be database specific.

The bigger concern for me still is that with the changes in 7.1 (forcing WAL & the setting the other pragma defaults), there is no way to unset them i.e. how do we use defaults that are compiled with SQLite? The only way would be to explicitly copy them into the configuration. That's another reason why I think all pragma's should be specified in the database configuration rather than hard coding them into the adapter and forcing developers to overwrite them.

So to push this PR forward, I need direction on:

  1. Should the journal mode (WAL) (and the other pragma) be forced by Rails? Again, this makes the adapter incompatible with read-only SQLite databases and the only way to opt out would be to reconfigure the options back to SQLite defaults? This PR retains compatibility with rails<7.1 but allows developers to configure sqlite databases to use WAL if that is their intention.
  2. Should SQLite specific configuration live in the variables subkey or is it fine at the top level?
  3. Should we allow any pragma to be configured, or should we be strict and only allow to those supported by the setters in the gem?

@flavorjones
Copy link
Member

Should the journal mode (WAL) (and the other pragma) be forced by Rails? ... this makes the adapter incompatible with read-only SQLite databases

I think the read-only incompatibility is a good reason to prefer this PR over #49349 and to revert that PR if/when this is merged (especially because of the good defaults being declared in sqlite3.yml.tt). But we need a core team member to weigh in.

Should SQLite specific configuration live in the variables subkey or is it fine at the top level

I know remarkably little about the database adapters and the design principles they follow, but maybe @eileencodes can either give an opinion or tag someone else close to those classes.

Should we allow any pragma to be configured, or should we be strict and only allow to those supported by the setters in the gem?

My preference is to only support the pragmas supported by the setters in the gem; and then we can push users to request (or write) new feature code and tests for any pragmas they want to use.

Especially since any pragma that users want to experiment with can be set via raw_execute (and I can take the responsibility for documenting that clearly in the gem README), it feels like delegating this API to the gem is the prudent decision. (I realize this is a flip-flop from my earlier comment and apologize for the noise.)

Copy link
Contributor

@fractaledmind fractaledmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suwyn I like the direction of this PR. I confess that when I opened the original PR with these configuration changes, I overlooked the possibility that a Rails app might want to connect to a read-only database. Thanks for pointing that oversight out and pushing for a solution that still provides healthy defaults for apps, but allows devs the sharp knives to make changes as they need.

I made a few comments and suggestions for how I think this PR could be improved at the margins, but in general, I'm a fan.

Comment on lines +11 to +15
journal_mode: WAL
synchronous: NORMAL
mmap_size: 134217728
journal_size_limit: 67108864
cache_size: 2000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put back the comments that were originally in the configure_connection code to explain what these pragmas are doing and like to the SQLite docs.

Copy link
Author

@suwyn suwyn Dec 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the comments don't add value. Comments are code, in that they need to be maintained. One seeking to understand what the pragma's do should reference the official documentation as the behavior may change.

timeout: 5000
journal_mode: WAL
synchronous: NORMAL
mmap_size: 134217728
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make these numbers more human-meaningful

Suggested change
mmap_size: 134217728
mmap_size: <%%= 128.megabytes %>

journal_mode: WAL
synchronous: NORMAL
mmap_size: 134217728
journal_size_limit: 67108864
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here

Suggested change
journal_size_limit: 67108864
journal_size_limit: <%%= 64.megabytes %>

# https://www.sqlite.org/mmap.html
raw_execute("PRAGMA mmap_size = #{128.megabytes}", "SCHEMA")

(@config.keys & PRAGMA_STATEMENTS).each do |key|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why limit the PRAGMAs that can be passed from the database.yml file? If we are already moving the SQLite configuration into the file, we should grant developers the power to configure other SQLite PRAGMA statements as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't limit them, but the thought behind this approach is to only allow the pragma's supported by the adapter and avoid arbitrarily executing database statements.

@fractaledmind
Copy link
Contributor

I have also now read @matthewd's comment (should have started there). I personally prefer this approach as well (defaults are invisible inside of Rails, but can be overridden explicitly via the database.yml file). I have opened a PR with my take on this suggestion: #50460

@suwyn
Copy link
Author

suwyn commented Dec 27, 2023

Closing this in favor of #50460 authored by the contributor who introduced the change.

@suwyn suwyn closed this Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants