Skip to content

sql: disallow the use of pg_catalog tables with no cur db defined#23045

Merged
knz merged 1 commit intocockroachdb:masterfrom
knz:20180224-avoid-nonsensical-pg-catalog
Feb 27, 2018
Merged

sql: disallow the use of pg_catalog tables with no cur db defined#23045
knz merged 1 commit intocockroachdb:masterfrom
knz:20180224-avoid-nonsensical-pg-catalog

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Feb 24, 2018

Fixes #23028.

The contents of the various pg_catalog tables are very ill-defined
when there is no current database set.

(For context, in PostgreSQL it is not possible for a client to have no
"current database" set. There is always a current database in a pg
session.)

This ill definition surfaces in the shape of mildly confusing
situations, like duplicate schema/table names in pg_tables when
there there are two tables with the same name in different databases
(because pg_catalog tables do not contain a catalog column, unlike
information_schema tables), to blatantly problematic output
(pg_proc wants a pronamespace OID column, which suggests separate
entries for each database/schema, but all the built-in functions
should only be listed once) and outright bugs (::regproc and
::regclass conversions don't resolve properly with no current
database set).

In order to restore sanity, this patch outright disallows using any of
the pg_catalog vtables with no current databaset set. This should
not create incompatibility with pg-specific tools anyways, since these
should have been designed to always specify a current db in the
connection string, and would have been bound to not-work with no
current database set anyway.

The other vtables (crdb_internal and information_schema) do not
have these problems are are thus still allowed when there is no
current database set.

Release note (sql change): the pg_catalog virtual tables, as well as
the special casts ::regproc and ::regclass, can only be queried by
clients that properly set a current database (SET database = ... or
USE).

cc @rytaft

@knz knz requested review from a team, jordanlewis and nvb February 24, 2018 17:10
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@knz knz force-pushed the 20180224-avoid-nonsensical-pg-catalog branch from 782510f to b80a8cb Compare February 24, 2018 17:11
@knz knz force-pushed the 20180224-avoid-nonsensical-pg-catalog branch from b80a8cb to 35e8c5e Compare February 24, 2018 18:21
@knz knz requested a review from a team as a code owner February 24, 2018 18:21
@knz knz requested a review from a team February 24, 2018 18:21
@knz knz force-pushed the 20180224-avoid-nonsensical-pg-catalog branch from 35e8c5e to f826503 Compare February 25, 2018 00:15
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 25, 2018

Rebased on top of #23051 to ensure the logic underlying the test that this PR touches is correct.

PTAL

The contents of the various `pg_catalog` tables are very ill-defined
when there is no current database set.

(For context, in PostgreSQL it is not possible for a client to have no
"current database" set. There is always a current database in a pg
session.)

This ill definition surfaces in the shape of mildly confusing
situations, like duplicate schema/table names in `pg_tables` when
there there are two tables with the same name in different databases
(because `pg_catalog` tables do not contain a catalog column, unlike
`information_schema` tables), to blatantly problematic output
(`pg_proc` wants a `pronamespace` OID column, which suggests separate
entries for each database/schema, but all the built-in functions
should only be listed once) and outright bugs (`::regproc` and
`::regclass` conversions don't resolve properly with no current
database set).

In order to restore sanity, this patch outright disallows using any of
the `pg_catalog` vtables with no current databaset set. This should
not create incompatibility with pg-specific tools anyways, since these
should have been designed to always specify a current db in the
connection string, and would have been bound to not-work with no
current database set anyway.

The other vtables (`crdb_internal` and `information_schema`) do not
have these problems are are thus still allowed when there is no
current database set.

Release note (sql change): the `pg_catalog` virtual tables, as well as
the special casts `::regproc` and `::regclass`, can only be queried by
clients that properly set a current database (`SET database = ...` or
`USE`).
@knz knz force-pushed the 20180224-avoid-nonsensical-pg-catalog branch from f826503 to 5ca5c5a Compare February 25, 2018 10:00
@nvb
Copy link
Copy Markdown
Contributor

nvb commented Feb 26, 2018

Instead of this approach, did you give any more thought to removing the "no cur database" state entirely? I'm worried that this change is going to make our new database/schema distinction even more unclear.

If we do decide to keep this behavior, should we do the same thing for information_schema? In fact, do any of the schemas make sense to expose when no current database is set?

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 26, 2018

Instead of this approach, did you give any more thought to removing the "no cur database" state entirely?

Yes. This would be a feature change and not suitable for the 2.0 release, yet there is a clear problem in 2.0 to be solved.

I'm worried that this change is going to make our new database/schema distinction even more unclear.

How so? Please be concrete here. I see only upsides and no downsides myself.

If we do decide to keep this behavior, should we do the same thing for information_schema? In fact, do any of the schemas make sense to expose when no current database is set?

Yes, both crdb_internal and information_schema are fully sane/safe when there are multiple catalogs visible.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see only upsides and no downsides myself.

The major downside I see is that there will now be a discrepancy between the output of SHOW SCHEMAS with and without a database set. This may be unclear to anyone exploring the object hierarchy of Cockroach and I suspect that we'll end up with issues like "why is pg_catalog missing!!".

EDIT: I now see that SHOW SCHEMAS does not work when a database is not specified. This helps relieve my previous concern. However, it does introduce a new issue where some schemas can be queried outside of the context of a database but it's not clear which ones they are. These schemas are also not properly reflected in information_schema.schemata. This is a problem with the "no current db" state itself, so I won't push it here. I also agree that removing this state should not be in 2.0, but do we at least have an issue to remove it in 2.1?

}
} else {
if !e.validWithNoDatabaseContext {
return nil, errNoDatabase
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add PG codes to all four errors defined at the top of descriptor.go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 26, 2018

I now see that SHOW SCHEMAS does not work when a database is not specified.

Sounds like a bug to me. Will have a look :)

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 27, 2018

Ok indeed SHOW SCHEMAS is nonsensical if there is no current database set.

The fact that querying from "".information_schema.xxx and "".crdb_internal.xxx is supported is an "internal feature" (it's used by SHOW but I don't see any reason to explain it to users) and so I am comfortable not listing them in information_schema.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 27, 2018

Filed the discussion about removing the "no current db" situation as #23145.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 27, 2018

Ok merging this now -- I'm willing to revisit this area as we try more pg tools and discover there's more to be done.

@knz knz merged commit f3d80dd into cockroachdb:master Feb 27, 2018
@knz knz deleted the 20180224-avoid-nonsensical-pg-catalog branch April 27, 2018 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: CRDB improperly evaluates certain requests when no database is set

3 participants