Skip to content

Refactor connection handler#24844

Merged
arthurnn merged 13 commits intorails:masterfrom
arthurnn:arthurnn/conn
May 6, 2016
Merged

Refactor connection handler#24844
arthurnn merged 13 commits intorails:masterfrom
arthurnn:arthurnn/conn

Conversation

@arthurnn
Copy link
Member

@arthurnn arthurnn commented May 4, 2016

### Summary

ConnectionHandler will not have any knowledge of AR models now, it will
only know about the specs.
Like that we can decouple the two, and allow the same model to use more
than one connection.
### Context

Historically, folks used to create abstract AR classes on the fly in
order to have multiple connections for the same model, and override the
connection methods.

With this, now we can override the `specificiation_id` method in the
model, to return a key, that will be used to find the connection_pool
from the handler.

@jeremy thoughts?
### How can users extend this?

database.yml

```
production:
  database: ...
readonly:
  database: ...
```

```
class User < ApplicationRecord
  self.connection_specification_name = "readonly"
end
```

or

```
class User < ApplicationRecord
  def self.connection_specification_name 
     @in_readonly_mode ? "readonly" : "primary"
  end
end
```

In this example `in_readonly_mode` could be an ivar that is set at runtime to change the connection at runtime.

Pretty much the gist of this is, models can now extend that method, and change the connection it will be used. This is possible because now the connection handler doesnt hash the connections by the class name anymore.
### Is this backwards compatible?

Yes, 
- The default `connection_specification_name` behaviour is to fallback to the parent method, if `connection_specification_name` not defined.
- When doing an establish_connection in a AR model: ie: `User.establish_connection()`, that will set the `connection_specification_name` to the model name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do some gymnastics here to be backwards compatible.
However if we deprecate the establish_connection from the model, eventually we would be able to remove this fallback into the Class.name method.

The way I envision this, is we would establish the connection straight to the handler, as this connection is not attached to the model anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could spare an array allocation by calling each_value if Concurrent::Map supports it 😁

@arthurnn
Copy link
Member Author

arthurnn commented May 4, 2016

@tenderlove @matthewd @rafaelfranca would ❤️ your review here too.

@matthewd
Copy link
Member

matthewd commented May 5, 2016

Doesn't this break the current inheritance behaviour?

@arthurnn
Copy link
Member Author

arthurnn commented May 5, 2016

Doesn't this break the current inheritance behaviour?

It should not. the specification_id method fallbacks to the parent if not set. Which brings the same inheritance behaviour as before. Also, we had tests for that, and they still work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put id ahead of the pack here as well?

arthurnn added 10 commits May 5, 2016 15:29
ConnectionHandler will not have any knowlodge of AR models now, it will
only know about the specs.
Like that we can decouple the two, and allow the same model to use more
than one connection.

Historically, folks used to create abstract AR classes on the fly in
order to have multiple connections for the same model, and override the
connection methods.

With this, now we can override the `specificiation_id` method in the
model, to return a key, that will be used to find the connection_pool
from the handler.
@arthurnn
Copy link
Member Author

arthurnn commented May 5, 2016

@matthewd @kaspth thanks for bearing with me on this. I renamed the key to specification_name.

def retrieve_connection_pool(spec_name)
owner_to_pool.fetch(spec_name) do
if ancestor_pool = pool_from_any_process_for(spec_name)
# A connection was established in an ancestor process that must have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may have

bquorning added a commit to bquorning/rails that referenced this pull request May 20, 2016
After PR rails#24844 the documentation for
`#retrieve_connection_pool` was out of date. This commit changes:

- the reference from `@class_to_pool` to `@owner_to_pool`.
- with newer Rubies, `#fetch` isn't significantly slower than `#[]`. Since Rails 5
requires Ruby >= 2.2.2, we can just use `#fetch` here.
bquorning added a commit to zendesk/active_record_host_pool that referenced this pull request Jun 28, 2016
kamipo added a commit to kamipo/rails that referenced this pull request Jul 17, 2016
Follow up to rails#20818.

`retrieve_connection` is passed `spec_name` instead of `klass` since rails#24844.
@GreenFomo
Copy link

@arthurnn sorry for commenting on a closed issue, but looking through this thread + the changes in the code, I'm not sure I understand how I can make use of these changes.

Basically, my scenario is that I would like to be able to have multiple databases configured at runtime (e.g. new user signs up -> make the Rails app "aware" of a new database -> make a new Pool for that database), and I can switch on each request which database to use.

To my understanding, this PR would make the above use-case possible, including using a multi-threaded server like Puma - it seems though that it eludes me how I can actually define the "specifications" for each DB. Could you please give me some pointers in the right direction?

To summarize:

  1. Having connection pools to multiple DBs - is it thread safe? Can I safely switch between DB specs using connection_specification_name per request in a multi-threaded server?
  2. How do I specify the .. connection specifications? Is it possible to do so while the Rails app is running or does it need to restart? :)

Thanks a lot!

@arthurnn
Copy link
Member Author

Hey @rzv01 I am glad you asked. I am planning to write a few blog posts on how people can use this, but I didn't have time yet. I did a talk that touch into this changes too, here is the link: https://youtu.be/-41V0qETxmE?list=PLo3sQWbYZbskTlzH7Ert-FucR7QmIuwE5

Having connection pools to multiple DBs - is it thread safe? Can I safely switch between DB specs using connection_specification_name per request in a multi-threaded server?

The idea is to not use the same connection pool for different database. Each database should have its own connection pool.

How do I specify the .. connection specifications? Is it possible to do so while the Rails app is running or does it need to restart? :)

You can create the connection pools passing a specification name, and in your model you would extend the method connection_specification_name and return the right name.

@GreenFomo
Copy link

Hey @arthurnn - thanks a lot for your response. I've watched the video you linked - very informative, thanks! and it got me closer to understanding how multiple DBs should work.

Now, in your talk you mentioned that the future direction for Rails 5.1 and onward would be to have all the DB configs in the database.yml file and to initialize all connections at app startup.

However, this means the config is now static. Would it be possible to make the app "aware" of a new DB config during runtime?

Not sure how this would work to be honest, maybe write to the database.yml file (say, an external process which knows when a new DB is needed) and send a signal to the Rails server process? which would in turn reload the database.yml, which contains the new DB config as well.

The use case I'm thinking of is:

  1. one DB per customer
  2. one/few Rails app instances (each one serving all/multiple clients)
  3. Rails app connects to all/multiple DBs that is assigned to it

here comes the problem:
4. customer signs up, a DB is created for him/her.

How would the Rails app know about the new DB? :)

Ideally, without having to restart the app - I guess it wouldn't be the end of the world, especially with a phased restart of a multi-threaded server, but still, not that elegant, hence me asking if a new DB config can be "injected" in the app dynamically.

Btw, sorry if I missed something and the answer to my question above is obvious!

Thanks a lot, once again! 🍻

@milep
Copy link

milep commented Oct 27, 2016

I'm also interested in use case where there is one DB per user/customer.

@mikecmpbll
Copy link

mikecmpbll commented Oct 27, 2016

the database connection details don't technically need to be in database.yml, you should be able to create connection pools yourself; the code could be something like this (untested):

def self.connection_specification_name
  if !connection_handler.retrieve_connection_pool(Thread.current['customer_db_name'])
    spec = database_spec.merge(name: Thread.current['customer_db_name'])
    connection_handler.establish_connection(spec)
  end

  Thread.current['customer_db_name']
end

only looked briefly though, and i might be missing something basic there :). i assume, as we're creating a pool per database we'll end up with pool_size * customers database connections. (traditional pg schema based switching or mysql use statement switching uses one pool)

edit: would probably be better do do the connection pool initialisation out of band though (per request?) as this method will be called a ton 😅

@arthurnn
Copy link
Member Author

@mikecmpbll is right.
You can call connection_handler.establish_connection yourself and pass a spec. That spec needs to be a hash, which can come from anywhere, a file, db, http, etc.
So in your case, you would establish the connection, and indeed, probably better to do that during initialization, or out-of-band somehow.

@GreenFomo
Copy link

@mikecmpbll @arthurnn thanks a ton for your help, you guys are great :)

I think right now i got all the info i need. I'll give it a try sometime in the next few days and see how it goes.

@milep
Copy link

milep commented Dec 27, 2016

Is there a thread safe way to remove idle connection pools? With some testing it looks like I just can't use ConnectionHandler's remove_connection, because it removes pool from all threads?

@njakobsen
Copy link
Contributor

@arthurnn Is there a way to avoid using an instance variable to store @connection_specification_name in remove_connection (see 537a342)? I need to dynamically swap the connection per-model, in a multi-threaded environment. Sometimes this means the same model, usually ActiveRecord::Base, will be connected to separate databases in separate threads. I achieve this by overriding the connection_specification_name method suggested in #24844 (comment) (though with a different, threadsafe implementation). However, the remove_connection method accesses the instance variable directly, bypassing the logic in the overridden method, and becoming un-threadsafe since the instance variable is shared across threads.

@jeremy
Copy link
Member

jeremy commented Feb 5, 2018

Cool @njakobsen. Please do contribute a PR to no longer refer to the ivar directly. And a PR with your dynamic connection swapping would be wonderful as well! 😍

@njakobsen
Copy link
Contributor

njakobsen commented Feb 5, 2018

I'm definitely just crashing around, but I'll show you my monkeypatch and maybe you can let me know if there's a better way to store information about which connection the model should use on a per-thread basis. See https://github.com/culturecode/stagehand/blob/5c2d1815738e602584633ef48c7aa2b62ba79ce6/lib/stagehand/active_record_extensions.rb#L28-L53. With some feedback, I'll see what sort of PR I can make from all of this.

@matthewd
Copy link
Member

matthewd commented Feb 5, 2018

Do you have a use case that involves calling remove_connection, or are you just trying to cover the full API?

@njakobsen
Copy link
Contributor

njakobsen commented Feb 5, 2018

I don't have a use case for removing the connection, but if someone using my patch were to call establish_connection on the model, it would call remove_connection, so I want to make sure that it uses the overridden method to know which connection to remove based on the current thread. Otherwise I think it'll just default to AR:B's "primary" connection since no other classes are directly setting the instance variable in my monkeypatch.

Basically @arthurnn's suggestion of how his PR can be used opened up a nice avenue for my Gem to use a lighter touch to swap connections dynamically by changing the connection spec name instead of calling establish_connection each time, but it's being hamstrung by the ivar in the remove_connection method.

@njakobsen
Copy link
Contributor

njakobsen commented Feb 5, 2018

I am also not happy with my solution of storing the connection spec name for each model on Thread.current. So if there are any suggestions of a better place for it, I'd happily incorporate that into a potential PR.

@mikecmpbll
Copy link

i think establish_connection calls this remove_connection

@matthewd
Copy link
Member

matthewd commented Feb 6, 2018

If you've overwritten the getter/setter, then @connection_specification_name won't be defined anyway, will it?

I think it's pretty common to need to thread-scope the behaviour.. the only quick observation I'd make is that you don't necessarily need to use the setter: you could define connection_specification_name to directly read the top of the ConnectionStack, for example.

@njakobsen
Copy link
Contributor

njakobsen commented Feb 6, 2018

@matthewd that's an interesting idea, but I was trying to keep the changes to AR:B isolated from the inner machinations of the rest of the gem. If I could simply make it threadsafe, then any other code (including subclasses) calling establish_connection or remove_connection would continue to function as expected.

My gem seems to work with my monkeypatches, but at @jeremy's urging, my questions are now more about how to apply this threadsafety to a PR for Rails itself.

  • Does Rails have any mechanism other than Thread.current for storing information per-thread (I see some places using Concurrent::Map, and Process.pid, so maybe something similar would be useful here)
  • Do the threads ever get reaped? Or is that dependent on which webserver we're running?
  • Have I missed something important?

@njakobsen
Copy link
Contributor

@jeremy @matthewd I think I'm going to tackle this with a very light touch. Instead of making a bunch of decisions about how the connection should be tracked independently per-model, per-thread (which could cause issues in the general case where we don't want independent tracking per thread). I'll make a PR which simply uses the connection_specification_name method when removing a connection, instead of the ivar, while at the same time keeping in mind the change from this commit 537a342. This way, an app that needs independent tracking per thread can implement it by overriding the connection_specification_name with whatever logic is required by their specific app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.