WIP: Add the ability to swap connections in Active Record#33877
WIP: Add the ability to swap connections in Active Record#33877eileencodes wants to merge 1 commit intorails:masterfrom
Conversation
This is the first iteration of adding support for multiple databases in
Rails. Please note that if this version is accepted the next step will
be adding support for QueryCache on multiple handlers and allowing the
application to register multiple handlers.
This PR adds the ability for an application to have multiple handlers
and switch between those handlers in order to change connections.
By default Rails will register a `default` handler (for write
connections), and a `readonly` handler (for readonly connections). On
load these handlers are registered.
In an application model we can connect to the database using
`establish_connection` which will determine which handler to connect to
based on options passed in. For example given a configuration like this:
```
development:
primary:
database: my_primary_db
user: root
primary_replica:
database: my_primary_db
user: ro_user
replica: true
animals:
database: my_animals_db
user: root
animals_replica
database: my_animals_db
user: ro_user
replica: true
```
A model would connect like this:
```
class AnimalsBase < ApplicationRecord
establish_connection :animals, :default
establish_connection :animals, :readonly
end
```
This will ensure when a model under `AnimalsBase` is loaded, for example
`Dog`, `Dog` will have a connection to the write db and the readonly db
through the connection handlers.
To swap between databases applications can call `use_write_connection`
or `use_readonly_connection`. If no handler or connection type is
provided the application will default to the write db (later I'd like to
make this configurable, for example at GitHub we default to the readonly
connections).
Examples:
```
Animals.use_write_connection do
# dog will be created
Dog.create!
end
Animals.use_readonly_db do
# exception will be thrown due to using a readonly user
Dog.create!
end
```
|
The other way of doing this is to use one connection handler and swap connections on https://github.com/rails/rails/compare/master...eileencodes:connection-switching-matthewd?expand=1 I think the pros of having multiple connection handlers (my original implementation are):
While I personally prefer the multiple connection handler method I'd love to hear your thoughts on what makes sense going forward. Also note in both implementations suggested here, the public API remains the same. |
|
I think there are a mix of terms here that makes this confusing. In the database configuration we call the connection as About the approach, I prefer your approach for the reasons you outlined in the comment. As future work, while I think we will need methods to explicitly switch connection handlers I'd love if we could do that automatically depending on the need. All reads by default goes to the readonly handler and as soon we have a write we change all the connections on that session to go to the writer. He having being using this strategy in Shopify for a while. |
Ah yes I should have mentioned that re the terms approach. My thinking is that a database is a replica and a connection is readonly. You connect to a replica database with a readonly user. If we feel it's too confusing I can switch everything to readonly.
Yes my plan is to add that in the next PR or so. I wanted to get the basic API ironed out first before doing too much work in the wrong direction. 😄 |
|
Ok, now the names make more sense. But how would work in the case of |
|
I’m working on an API proposal for this I’ll publish shortly.
… On Sep 13, 2018, at 16:34, Rafael França ***@***.***> wrote:
Ok, now the names make more sense. But how would work in the case of replica_slow? Just wondering the case were I want to add a third custom handler.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
My concerns about this version are:
|
|
Just to provide some context here of what we do at Discourse, we use: https://github.com/discourse/rails_multisite This was born out of the need to support "multisite", we have 100s of customers as shared tenants on a single application each with dedicated databases. The features we use are:
Overall we would be happy to move to "the official way" (tm) provided we don't lose out on any of our features. Note where our usage is extremely different to the GitHub usage is that we are segmented based on databases with identical structure and you only ever talk to one database to populate a chain of models and so on. All DBs share the exact same schema. The use case of "send some slow reports to the replica to alleviate load on the master" is possibly there for us, but to date we have not hit a real strong need for it. |
|
Here's the API design I've been playing with: # Configuration...
# 1st level: Environment, 2nd level: Name, 3rd level: Role
development:
base:
primary:
database: base_writing_db
user: root
replica:
database: base_reading_db
user: ro_user
animals:
primary:
database: animals_writing_db
user: root
replica_one:
database: animals_reading_db
user: ro_user
replica_two:
database: animals_reading_db
user: ro_user
machines:
database: machines_db
user: root
class ApplicationRecord < ActiveRecord::Base
connects_to database: :base, writing: :primary, reading: :replica
end
class Animal < ApplicationRecord
# Built-in round robin for reading
connects_to database: :animals, writing: :primary, reading: [ :replica_one, :replica_two ]
end
class Dog < Animal
end
Dog.reading_from(:primary) do
Dog.first # Works just fine, since primary database can be read from as well
end
Dog.writing_to(:replica) do
Dog.create! # Exception bubbled up from the db given that ro_user doesn't have write access
end
class Machine < ApplicationRecord
connects_to database: :machines
end
class Saw < Machine
end
Saw.reading_from(database: :base, role: :replica) do
Saw.first # Tries to read the Saw table from the base database off the replica role
end |
|
That proposal would clarify a few things. First, it's the model DSL that designates whether a connection is used for reading or writing, not the database.yml configuration. Second, we explicitly use the env/name/role hierarchy to establish a pattern of how this is supposed to be used. Third, we reserve the primary/replica terms for roles, not names of databases. |
|
@dhh thanks for sharing that API! I think we can address your concerns and accomplish this without moving away from the three-tier config. A couple clarifying questions first:
The interesting thing about multi-db applications is you never start out that way. As applications evolve and grow you may add one replica, then a second primary and then more replicas. There's definitely value in the 4-tier proposal, but I'm wondering if it's too much setup for a rare scenario? The three-tier config gives just enough structure and just enough flexibility to allow applications to define what their database infrastructure looks like. Instead of a 4-tier config, we can keep the 3-tier and address your concerns around the config defining the type of the connection. development:
primary:
database: base_db
user: root
replica:
database: base_db
user: ro_user
aniamls_primary:
database: animals_db
user: root
animals_replica:
database: animals_db
user: ro_user
some_other_db:
database: some_other_db
user: ro_user
machines:
database: machines_db
user: rootWe can then use the class AnimalsModel < ApplicationRecord
connects_to write: :animals_primary, read: :animals_replica, read_slow: :some_other_db
end
class MachineModel < ApplicationRecord
# for the scenario where we don't have multiple connections
connects_to :machines
endSince we defined the roles of the databases in Dog.using_write do
Dog.create!
end
Dog.using_write do
Dog.first # read from write db cause we can
endDog.using_read do
Dog.create! #exception
end
Dog.using_read_slow do
Dog.create! #exception
endThe benefit to the 3-tier is that it's flexible enough to not force you to define all of your databases under a name, while having enough structure that you know what databases belong to what environment. It's entirely possible that an application |
|
@eileencodes I'd like for the new Yes, I like the idea of development:
animals:
primary:
database: animals_writing_db
user: root
replica_one:
database: animals_reading_db
user: ro_user
replica_two:
database: animals_reading_db
user: ro_user
class Animal < ApplicationRecord
connects_to database: :animals, writing: :primary, reading: :replica_one
end
Animal.reading_from(:replica_two) do
Animal.first!
endWe only need to declare the standard read/write roles in the DSL declaration, but we'll allow any additional roles from the nested hierarchy to be used when explicitly setting a read or write scope. That's another reason to go with the nested setup. But even if you didn't, or if you were accessing outside of your nest, you could do something like this: Animal.reading_from(base: :replica_two) do
Animal.first!
endNo, nesting isn't strictly necessary, but I think the API is a big improvement when we do it. What's the perceived cost to allowing nesting for roles? |
IMO the 4-tier config feels over-specialized and heavy. It's a pretty rare setup and I don't see us (github) renaming all our db keys That said, how we do the config nesting isn't a blocker for adding the |
|
I don’t follow the overspecialized notion. We have two separate use cases that we are trying to address: multi-db with the same scheme (that’s primary/replica) and multi-db with different schemes (base vs animals vs machines). Differentiating between the two approaches feels like a substantial step up to me.
What’s the cost for renaming database keys in the GitHub config? The actual databases, their names, and their configuration would stay the same.
But agree that we can proceed even in absence of nailing the split. I just don’t want us to get to a place where the primary/replica scenarios aren’t directly represented with their own concept for whatever makes it into the final version of Rails 6. Whatever road we take to get there is fine though. We can get some of the API improvements done before that 👍
… On Sep 14, 2018, at 12:31, Eileen M. Uchitelle ***@***.***> wrote:
No, nesting isn't strictly necessary, but I think the API is a big improvement when we do it. What's the perceived cost to allowing nesting for roles?
IMO the 4-tier config feels over-specialized and heavy. It's a pretty rare setup and I don't see us (github) renaming all our db keys primary and replica under a namespace. I've done a lot of work to get the 3-tier properly working and I feel that adding an extra nesting doesn't improve the experience.
That said, how we do the config nesting isn't a blocker for adding the connects_to API. I will start working on that. I put a lot of time into reconfiguring the configuration and want to work on other parts of the multi-db setup so we can successfully get multi-db support in for Rails 6.0.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
|
Another piece to consider here is https://github.com/discourse/discourse/blob/master/lib/active_record/connection_adapters/postgresql_fallback_adapter.rb In particular we always configure Discourse in our production environments to have a master database and "backup" readonly replica database. If the master dies we automatically switch to the backup database and log warnings. Which means the apps keep on running in a readonly mode. I wonder if this is in the scope of this work, cause its a very strong use case. Setting up replicas is pretty easy in PG. |
|
@eileencodes Here's an intermediary API setup that we can take that doesn't require formalizing the # Intermediary step..
development:
base_primary:
database: base_writing_db
user: root
base_replica:
database: base_reading_db
user: ro_user
animals_primary:
database: animals_writing_db
user: root
animals_replica_one:
database: animals_reading_db
user: ro_user
animals_replica_two:
database: animals_reading_db
user: ro_user
machines:
database: machines_db
user: root
class ApplicationRecord < ActiveRecord::Base
connects_to database: { writing: :base_primary, reading: :base_replica }
end
class Animal < ApplicationRecord
# Built-in round robin for reading
connects_to database: { writing: :animals_primary, reading: :animals_replica_one, reading_slow: :animals_replica_two }
end
class Dog < Animal
end
class Machine < ApplicationRecord
connects_to database: :machines
end
Dog.connected_to(:base_primary) do
# Everything happens off base rather than animals
end
Dog.reading_from(:animals_primary) do
Dog.first # Works just fine, since primary database can be read from as well
end
Dog.writing_to(:animals_replica_one) do
Dog.create! # Exception bubbled up from the db given that ro_user doesn't have write access
endI still want to end up with a formalized role API, but we don't have to go there in one leap 👍 |
|
I agree that that intermediary API is a better short-term target... while a worthy goal, I don't think we're close enough to providing the entirely turn-key solution that defining the DB-interrelationships inside the config implies. My judgement of where we're likely to get to (for this release) looks more like "we have strong underpinning/support at the low levels, and a comfortable mid-level API, but not yet any of the highest-level 'just turn it on and it works' interface". We're going from hack-the-internals-to-make-it-work, so getting to a point where those people can transition to the same mid-API that we'll ultimately use to build the super-friendly stuff, is going to be a huge win... and IMO plenty to chew for now. There are also just practical advantages to getting those users who've previously had to do their own thing, and therefore do understand the details and technologies, moved onto our new foundation for a release -- they'll properly prove it works and supports their use cases before we commit on UX for, and encourage adoption by, people who don't need it strongly enough to have already hacked it somehow.
That sounds like the model would internally be tracking separate read & write connections and using them, respectively, at the same time. We certainly don't have anything like that right now, and I'm not sure it's a thing we even want: normally when you start writing to the primary, you also switch reads to it, to avoid replication delays. (Which does require knowledge of the primary and replica, but they're never both in use at the same time / for alternating queries within a request.) i.e., |
|
It was my understanding that @eileencodes primary motivation for pursuing this was indeed the read/write splitting. The #reading_from API is particularly important when used in conjunction with purpose-specific replicas, like the reading_slow example. If you have a replica dedicated to slow-running queries, you're going to need a way to designate when that replica is to be used. Either way, these are all building blocks. Just getting the Did you have another idea for an API that designates the read/write splitting and specific selection when doing slow reads etc? |
|
Sorry, I said The immediate target here (as I understand it) would be to do both # (config and model as per your comment)
# I'd use the :reading key from connects_to, so the model manages the
# role:dbname mapping, keeping the latter out of actual app code
Dog.connected_to(:reading) do # or would connected_for(:reading) make more sense here?
Dog.first # works
Dog.create! # fails
end
Dog.connected_to(:reading_slow) do
Dog.all.each { |a| Dog.all.each { |b| a.sniff(b) } }
endThus we have API to dynamically switch between the read-only and writable database connections -- but no internal concept of when/why one might automatically switch (at this building-block stage). |
|
If we're going to offer model-level read/write splitting, I'd like to actually expose that. Let's say you want to read from the slow replica, but then you need to update some data after that's done. This could happen like so: class Animal < ApplicationRecord
connects_to database: { writing: :animals_primary, reading: :animals_replica_one }
end
Dog.reading_from(:animals_replica_two) do
list_of_dog_ids = Dog.some_long_query # reads from animals_replica_two
Dog.where(id: list_of_dog_ids).do_something_with_the_dogs # writes to animals_primary
list_of_other_dog_ids = Dog.some_other_long_query # STILL reads from animals_replica_two
endThis would allow us to be explicit about what's happening within these designated scopes. The default behavior would be like so: list_of_dog_ids = Dog.some_long_query # reads from animals_replica_one
Dog.where(id: list_of_dog_ids).do_something_with_the_dogs # writes to animals_primary
list_of_other_dog_ids = Dog.some_other_long_query # NOW reads from animals_primary to prevent lagSo we'd have the following methods: Also, I made a mistake in this example: class Animal < ApplicationRecord
connects_to database: { writing: :animals_primary, reading: :animals_replica_one, reading_slow: :animals_replica_two }
endI don't think we should ever declare the |
|
We could also consider a smaller API surface with something more verbose, like: Dog.connected_to(database: { reading: :animals_replica_two }) do
Dog.create! # writes to whatever was set at the model level
Dog.first # reads from animals_replica_two
end
Dog.connected_to(database: :animals_replica_two) do
# R + W happens against :animals_replica_two
endActually, looking at that now, I prefer it. It means complete parity between the model-level definitions and the scopes we provide. It's more verbose, but these aren't methods you should be slinging about constantly anyway. So rather go with the parity and clarity 👍 |
That's the thing.. I don't think we are. Models have one connection active at a time, and I don't see that changing: the closest I see us getting is "use the read connection until we need to write", but that's still one-at-a-time behaviour, just with a trigger to automatically change which one that is. |
|
I don’t follow. Why are we going through the exercise of declaring reading and writing roles if that’s not how it’s going to work? @eileencodes, did I misunderstand your intentions with this patch? I thought the purpose was explicitly to provide read/write splitting? If that’s not what we’re doing, then none of these API conversations make sense. I’d be happy just to get a good API for switching between databases for the purpose of splintering (base vs animals vs machines). So if that’s the level of ambition we are starting with, let’s just nail that first. But then as soon as that’s sorted, I’d like to see us get to the read/write splitting. It’s the basic scaling approach, and it should be built into Rails. |
|
I think we're talking past each other. The goal is to provide the dynamic connection switching needed to talk to a readonly database, and then later switch to the writable one. I do not believe we have any intention for a model to simultaneously be writing to database one but reading from database two, which is what the later parts of your API seem to describe. Read/write switching yes, read/write splitting no. Thus my belief that we need a concise and friendly API to say "switch to database X now please"... but there's nothing special about X being "read" or "write" beyond the fact that's a common switch-triggering pattern. (Note that what I'm describing is what Basecamp, GitHub, and Shopify all do, to the best of my knowledge.) |
|
I don't understand the difference. We need a way to instruct a model to be able to read from one database and write to another, as well as once a write has occurred, switch to reading from the one we were writing to. That API should be available at the model level, and it should be available at the scope level. If you're connecting to the replica you have designated for slow reading, then any writes that happen within that scope should continue to go to the designated database for writing. That designation can either have happened at the model level or the scope level. The API will be the same. |
|
FWIW, I don't think that using a YAML configuration for a multi-gateway configuration is the best solution. It's fairly limiting and gets more awkward as the complexity of your DB setup increases. For example, say you wanted to compute the size of the connection pool dynamically based on you number of processors, size of concurrent-ruby threadpools, etc; you can do this in ERB but it's starts getting ugly quick in my experience. A nice middle ground would be a configuration DSL, which builds a registry of different connections that can be used. That way user can write Real Ruby™️ to configure their DB connections. |
|
Closing in favor of #34052. We can hash out configurations at a later date, for now I'm focusing on building out the API for connection switching. |
|
@eileencodes I hope this isn't totally left-field, I just saw your keynote on this stuff from rails conf. Is there a place you guys are talking about handling tenanting/sharding strategy vs the 2 mentioned above? What I mean is you have:
We started down this route for our SaaS platform (3) but I never considered 2 as a potential solution. So tldr, is there a conversation happening somewhere about this? Are we just doing it wrong and we should pick one of the rails ways? Are we doing it right and we should just use the apartment gem and not worry about core support? Happy to move this elsewhere if there is a better home! |
This is a work in progress but I want to get feedback from @tenderlove @rafaelfranca @matthewd and @dhh before continuing. In the comments I'll include a second way of doing connection switching that Matthew suggested.
Background & Inspiration:
In github we have 10+ primary configurations, 10+ replica configurations, and a few other types of configurations. To organize these configurations into connections we create a connection handlers hash that creates a mapping from connection type to a handler;
:default,:readonly, and:readonly_slow. Note this PR only sets up 2 handlers as I'm trying to keep this PR simple. Later I'd like to add the ability for an application to register custom handlers that would generateuse_x_connectionmethods from those handlers.This PR takes that concept and refactors it a bit. It allows for a simple API for establishing connections and swapping connections for rw/ro splitting.
This is the first iteration of adding support for multiple databases in
Rails. Please note that if this version is accepted the next step will
be adding support for QueryCache on multiple handlers and allowing the
application to register multiple handlers.
This PR adds the ability for an application to have multiple handlers
and switch between those handlers in order to change connections.
By default Rails will register a
defaulthandler (for writeconnections), and a
readonlyhandler (for readonly connections). Onload these handlers are registered.
In an application model we can connect to the database using
establish_connectionwhich will determine which handler to connect tobased on options passed in. For example given a configuration like this:
A model would connect like this:
This will ensure when a model under
AnimalsBaseis loaded, for exampleDog,Dogwill have a connection to the write db and the readonly dbthrough the connection handlers.
To swap between databases applications can call
use_default_connectionor
use_readonly_connection. If no handler or connection type isprovided the application will default to the write db (later I'd like to
make this configurable, for example at GitHub we default to the readonly
connections).
Examples:
To do: