Skip to content

new database.yml format#27611

Closed
arthurnn wants to merge 25 commits intorails:masterfrom
arthurnn:arthurnn/new_db_yaml2
Closed

new database.yml format#27611
arthurnn wants to merge 25 commits intorails:masterfrom
arthurnn:arthurnn/new_db_yaml2

Conversation

@arthurnn
Copy link
Member

@arthurnn arthurnn commented Jan 9, 2017

Summary

New database.yml format. This is how it looks:

default: &default
  adapter: sqlite3
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  primary:
    <<: *default
    database: db/development.sqlite3
  readonly:
    <<: *default
    database: db/readonly.sqlite3

We are moving from a two levels config to a three levels.

This is backward compatible, apps using the default, old, 2 levels database.yml will still work.

Advantages of this change

  • Better structure on the database.yml file when dealing with multiples databases.
  • One environment can have multiple configs.
  • Having all the configs under an environment key, enables us to create/drop multiple databases on the create/drop tasks.
  • Removes the necessity of adding the Rails.env when calling establish_connection for a non standard connection: i.e: establish_connection "#{Rails.env}_readonly

Implementation details

The config file might have 3 levels, however, ActiveRecord now has a local_configurations option, which is scoped by the running Rails.env. That's why the code on ConnectionSpecification::Resolver didn't change much. Internally we still handle a 2 levels config Hash.
However, we are keeping the configurations getter/setter to be backward compatible and to be able to access it at the tasks level. (We might be able to deprecate this moving forward, that's another discussion tho)
The good thing about this, is that I was able to remove the Rails.env call from inside AR connection code. Now AR won't depend on the RAILS_ENV anymore.

Caveats

If you were using a non-default database.yml, for instance:

development:
  adapter: sqlite3
development_readonly:
  adapter: sqlite3

There will be some necessary changes in the app, as AR will scope the config by the environment, so AR would only see:

development:
  adapter: sqlite3

The upgrade path would be to move the development_readonly config to inside development and reference it when calling establish_connection. (*see the third task on future work)

Future work

  • Change the guides.
  • Change the default database.yml template
  • Connect to all databases under the env during Rails initializer, so apps won't need to call establish_connection(:readonly) for instance.
  • Deprecate create:all/drop:all and make create/drop create all the dbs under the development/test envs?
  • Deprecate the old template/config with a warning message?

All the future work are not in this PR, because I would like to discuss one by one before implementing them, to make sure they make sense.

review @kaspth @dhh @rafaelfranca @tenderlove @jeremy
@schneems (specially on commit e51edd4, because it changes a lot from 6cc0367, However, I made sure the core functionality still works, which is make DATABASE_URL work without Rails.)

@arthurnn arthurnn added this to the 5.1.0 milestone Jan 9, 2017
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should call it a legacy config. It's still a valid config that most of Rails apps are going to use, as long as we don't deprecate one level config.

@matzke
Copy link

matzke commented Jan 9, 2017

+1 for decoupling Rails.env call from inside AR connection code

-1 for 3 level configuration as a "default", because for simple projects setup is imho way too complex. if 3 levels work that's fine, but "default" should be the simple version

@siegy22
Copy link
Contributor

siegy22 commented Jan 9, 2017

Just a side note:

Why do you use ENV.fetch("RAILS_MAX_THREADS") { 5 } instead of ENV.fetch("RAILS_MAX_THREADS", 5)? 😁

@dhh
Copy link
Member

dhh commented Jan 9, 2017

Like the new format, but concur with @matzke that most apps won't need multiple DBs starting out, so the default database.yml should still be a standard 2-level setup. Then we can explain the 3-level setup in the guide. So that informs that whatever we call the code isn't about "legacy", it's about 2-level depth or 3-level depth.

@arthurnn
Copy link
Member Author

arthurnn commented Jan 9, 2017

👍 thanks @kirs @matzke and @dhh for the comment on the legacy naming.
I, indeed, thought about it multiple times. I agree that 98% of the rails apps won't probably need the 3 level connections, so let's not call it legacy, and make it live in our code base as a proper 'transformer' stage.

Copy link
Contributor

@kaspth kaspth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial rough thoughts and suggestions 😊

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find local_configurations confusing: every config is defined locally in database.yml.

Personally, I'd keep this all in configurations and turn it into an object that handles mapping between 2 or 3 level hashes:

class Configurations < Hash
  def initialize(hash)
    @configs = normalize_nesting(hash)
  end

  def [](env_or_spec_name)
    # If we couldn't normalize it properly we could handle both env and spec names here.
  end

  private
    def normalize_nesting(hash)
      # Insert Arthur Magic…
    end
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 agree with Kasper, the naming here is difficult to understand; it took me several read throughs to understand the difference between local_configurations and configurations 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduces yet another name for a configuration: main. Think we need to prune the terminology :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Active Record

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it knows lodge like no other 🤓

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need these two new lines :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I finally get why "connection specification name" has bothered me so. Partly because it's so abstract, but mainly it's because Active Record just calls these configurations and that's simpler to grasp.

So while I think spec or connection specification is fine internally, I think we should try using configuration as the public terminology.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Agree @kaspth ,

So the model using a readonly config could look like this:

class User < ApplicationRecord
  self.connection_configuration = :readonly
end

or:

class User < ApplicationRecord
  def self.connection_configuration
    Application.readonly_mode? ? :readonly : :primary
  end
end

Pretty much we could replace the connection_specification_name to connection_configuration in the model.

I don't want to change those in this PR tho, we can try in a next one. Also there will be some deprecation cycle for the name change, as 5.0 uses connection_specification_name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to know the primary wording internally? Shouldn't establish connection just handle a config without a name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New lines around here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it's a smell that the database tasks have to reach for the ConnectionAdapters::ConnectionSpecification::LegacyConfigTransformer.

@schneems
Copy link
Member

schneems commented Jan 13, 2017

I like the idea of being able to specify a readonly shard and have AR aware of it out of the box. I'm guessing this is what a recommended Heroku config would look like:

production:
  primary:
    url: <%= ENV['DATABASE_URL'] %>
  readonly:
    url: <%= ENV['HEROKU_POSTGRESQL_BLACK_URL'] %>

It looks like some tests were moved but no behavior changes were made.

Questions

Are we ever interested in connecting to multiple "readonly" databases? For example if we wanted to round-robin reads on say 2 or 10 "readonly" shards? I'm not sure if such behavior would be in scope.

Thinking about it now, i guess you could hack it in on a per webserver basis where each connects to a random readonly server

production:
  primary:
    url: <%= ENV['DATABASE_URL'] %>
  readonly:
    url: <%= [ ENV['HEROKU_POSTGRESQL_ROSE_URL'],
               ENV['HEROKU_POSTGRESQL_BLACK_URL'],
               ENV['HEROKU_POSTGRESQL_BLUE_URL']
             ].sample %>

Copy link
Member Author

@arthurnn arthurnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clean-up this a bit. There is no legacy reference anymore, as we will support 2 level configs moving forward.
@kaspth I liked your suggestion on having a Configuration class, that's what I did.
However, I didn't want to extend Hash, and instead, I did a Proxy. (http://words.steveklabnik.com/beware-subclassing-ruby-core-classes)
Also, I cannot normalize the config in the initializer, as AR::Base.configurations is a public API, and people could be using that configuration for something. so instead I just created a normalize method, that will return the config the way the resolver expects.

Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Agree @kaspth ,

So the model using a readonly config could look like this:

class User < ApplicationRecord
  self.connection_configuration = :readonly
end

or:

class User < ApplicationRecord
  def self.connection_configuration
    Application.readonly_mode? ? :readonly : :primary
  end
end

Pretty much we could replace the connection_specification_name to connection_configuration in the model.

I don't want to change those in this PR tho, we can try in a next one. Also there will be some deprecation cycle for the name change, as 5.0 uses connection_specification_name

With .local_configurations we don't need to lookup the connection config
using an enviroment. This opens the door for us to have a 2 levels
config under an enviroment, to be able to make ActiveRecord work with
multiple connections.
With the `DATABASE_URL` logic living in the connection config resolver, we don't
need to mess around with enviroment(such as RAILS_ENV, RACK_ENV) in it.
Because Active Record will now use the `local_configurations`, which is a
configuration hash for the running environment, the `DATABASE_URL` logic
gets easier to handle as it only needs to set the `url` config for the
`primary` database config, which is the default one.
As we pass the local configuration to ActiveRecord, we don't need the
`Rails.env` anymore to find the right configuration.
also make the default establish_connection call use :primary
Replace the LegacyTransformer, with a better configuration class. We
will still support 2 level configurations for apps that won't connect to
multiple databases, so we should not call legacy. Instead, we can
normalize the configuration before passing to the resolvers.
@arthurnn arthurnn force-pushed the arthurnn/new_db_yaml2 branch from c6e1735 to 63f084c Compare January 15, 2017 04:31
@arthurnn
Copy link
Member Author

arthurnn commented Jan 15, 2017

@matthewd: As we discussed. I changed the PR to make this possible:
database.yml

production:
  database: ...
production_readonly:
  database: ...

So, old setups, with a 2 levels config, will still work, and will still be able to do a establish_connection(:production_readonly)

@kirs
Copy link
Member

kirs commented Jan 15, 2017

Do we have any plan about how this works with migrations?

I see at least two scenarios:

  1. You have primary and readonly connections and you want to use the first one to run the migrations.

  2. You have different connections with different databases. In our case, master and shard_X databases have completely different tables. For every migration, we have a method that describes which database the migration is going to alter.

@kaspth
Copy link
Contributor

kaspth commented Jan 15, 2017

@kirs I think bin/rails g migration some_name --db readonly with self.connection = :readonly in the migration would suffice, but maybe we should shelve that discussion for later. Think this PR has big implications as is 😊

Copy link
Contributor

@kaspth kaspth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback ❤️

config = config ? config.dup : {}

# if the configuration is a one level config, pushes that to be a two level config, with primary as key
if !config.key?("primary") && config.key?("adapter")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of checking for an adapter key we should check config.values.any? { |v| !v.is_a?(Hash) }?

resolve_connection(config).merge("name" => spec.to_s)
else
raise(AdapterNotSpecified, "'#{spec}' database is not configured. Available: #{configurations.keys.inspect}")
config = configurations.respond_to?(:normalized) && configurations.normalized[spec.to_s]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When won't configurations respond to normalized? Think this might be too protective if it guards against people overriding ActiveRecord::Base.configurations to return whatever.

Getting rid of that check also means we can slim the diff and keep the previous easier-to-read structure of the method as a bonus 😁

# Accepts:
# - Hash: one layer deep Hash that contains all connection information
# - Symbol: a configuration name that will be looked-up
# from the configurations Hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should leave out that configurations is a Hash 😉

self.configurations = {}

# Returns fully resolved configurations hash
self.configurations = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the previous newline below and remove the one above :)

should_reconnect = ActiveRecord::Base.connection_pool.active_connection?
ActiveRecord::Schema.verbose = false
ActiveRecord::Tasks::DatabaseTasks.load_schema ActiveRecord::Base.configurations["test"], :ruby, ENV["SCHEMA"]
ActiveRecord::Tasks::DatabaseTasks.load_schema ActiveRecord::Tasks::DatabaseTasks.config_at("test"), :ruby, ENV["SCHEMA"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these changes mean we've broken backwardscompatibility?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it indeed broke.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding it back


def normalized(key = @root_level)
config = key ? self[key] : self
config = config ? config.dup : {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find these two lines hard to parse because of the two conditionals. The empty hash fallback seems too defensive at first sight. What case is it trying to cover? That there's a key in self[key] but with no config at all?

Could we simplify to this?

config = fetch(key, self).dup

end

if url = ENV["DATABASE_URL"]
config["primary"] ||= {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've just spent lines establishing "primary", surely it should have been made an empty hash by now?

if config
resolve_connection config
elsif env = ActiveRecord::ConnectionHandling::RAILS_ENV.call
resolve_symbol_connection env.to_sym
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we removing this because resolve_connection calls it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added back

config[key] = resolve(value) if value
end
config
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder how much we need to worry about people calling all the available Hash methods on ActiveRecord::Base.configurations[some_env] and if we can just normalize the configurations up front.

end

# Takes the environment such as +:production+ or +:development+.
# Takes the specification name such as +:primary+.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

configuration name 😉

# config = { "production" => { "host" => "localhost", "database" => "foo", "adapter" => "sqlite3" } }
# spec = Resolver.new(config).spec(:production)
# config = { "host" => "localhost", "database" => "foo", "adapter" => "sqlite3" }
# spec = Resolver.new({}).spec(config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API looks weird to me. I'd expect the resolver to always be initialized with the config and resolve a spec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

should_reconnect = ActiveRecord::Base.connection_pool.active_connection?
ActiveRecord::Schema.verbose = false
ActiveRecord::Tasks::DatabaseTasks.load_schema ActiveRecord::Base.configurations["test"], :ruby, ENV["SCHEMA"]
ActiveRecord::Tasks::DatabaseTasks.load_schema ActiveRecord::Tasks::DatabaseTasks.config_at("test"), :ruby, ENV["SCHEMA"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it indeed broke.

"development" => { "database" => "dev-db" },
"test" => { "database" => "test-db" },
"production" => { "database" => "prod-db" }
"development" => { "database" => "dev-db", "adapter" => "sqlite3" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the adapter is now required?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's because we only check for the adapter key at the moment. Finding a more accurate way to find an actual config could fix the need to change this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already required in practice for an actual config you intend to use, and while this test was accidentally assuming it, we've never claimed to be an arbitrary hash storage bucket.

@kaspth
Copy link
Contributor

kaspth commented Feb 18, 2017

Looks like the backwardscompatibility issues have been solved, so I'd say rebase, fix tests, run rubocop -a, squash and then you have my ok to merge 👍

This was a hard and long haul so I definitely appreciate your willingness to run my review gauntlet so many times 😄

ret
end

def keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this called? My cmd-E on keys can't find any instances?

def to_hash
@config
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still found a lot of this logic hard to read, so I took the liberty of trying to make it more concise:

class ConnectionConfigurations #:nodoc:
  def initialize(config)
    @root_config = @config = config
    @database_url_config = resolve_database_url_config || {}
  end
 
  def scope=(scope)
    @root_config, @config = @config, (@config[scope] || {})
  end
 
  def [](key)
    @database_url_config.reverse_merge(@config[key] || @root_config[key] || {})
  end

  def keys
    @config.keys
  end

  def to_h
    @config
  end

  private
    def resolve_database_url_config
      if ENV["DATABASE_URL"]
        ConnectionUrlResolver.new(ENV["DATABASE_URL"]).to_hash
      end
    end
end

I'm curious why we need the @root_config[key] fallback in []?

NOTE: scope= basically assumes that it's never assigned more than once, which it doesn't seem to be in library code, but it is in test code. Would it make more sense to pass it in at the constructor? Or have a scope method that returns a new instance of ConnectionConfigurations?

end
ret ||= @config[key]
if @database_url_config
ret = (ret || {}).merge(@database_url_config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this applying the URL-supplied database to every configuration we return?

That's wrong, and I'm alarmed the tests aren't point that out. It's hard to see what's going on with them, because they're all renamed at the same time, though. 😟

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.