Skip to content

sql: Introduce VirtualSchema infrastructure#8119

Merged
nvb merged 5 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/infoSchem
Aug 2, 2016
Merged

sql: Introduce VirtualSchema infrastructure#8119
nvb merged 5 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/infoSchem

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Jul 29, 2016

Related to #7965.

This PR introduces the VirtualSchema infrastructure into the SQL layer. This new structure allows static virtual descriptors to be used for non-distributed read-only SQL relations, and provides a means to mock out the results of selecting from tables within these schemas.

This is the initial step towards supporting information_schema and pg_catalog.


This change is Reviewable

@petermattis
Copy link
Copy Markdown
Collaborator

:lgtm: though someone else should give it a more thorough review. The approach matches what I was expecting.


Review status: 0 of 18 files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


sql/data_source.go, line 205 [r4] (raw file):

      // Check for a virtual table.
      virtual, err := getVirtualTable(t)

If someone has created a database named information_schema prior to this change, they will no longer be able to access it. Probably acceptable, but we'll likely want to call this out in the release notes. Be sure to highlight it in the commit message if it isn't already.


Comments from Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor

LGTM


Review status: 0 of 18 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


sql/information_schema.go, line 54 [r4] (raw file):

);`,
  populate: func(p *planner, addRow func(...parser.Datum)) error {
      // TODO(nvanbenschoten) This isn't actually the correct implementation

instead, say that not all fields all filled in.
Also nit: ':' after TODO(nvanbenschoten)


sql/virtual_schema.go, line 98 [r1] (raw file):

// checkVirtualDatabaseDesc checks if the provided name matches a virtual database,
// and if so, returns that database's descriptor.
func checkVirtualDatabaseDesc(name string) *sqlbase.DatabaseDescriptor {

s/checkVirtualDBDesc/getVirtualDBDesc ?


sql/virtual_schema.go, line 103 [r1] (raw file):

// isVirtualDatabase checks if the provided name corresponds to a virtual database.
func isVirtualDatabase(name string) bool {

really necessary?


sql/virtual_schema.go, line 42 [r4] (raw file):

type virtualSchemaTable struct {
  schema   string
  populate func(p *planner, addRow func(...parser.Datum)) error

wasn't the previous definition nicer? And in fact can't we make it more specific - can't it return a... ValuesNode?


sql/virtual_schema.go, line 74 [r4] (raw file):

}

type virtualTableEntry struct {

do we really need struct separately from virtualSchemaTable?


sql/virtual_schema.go, line 79 [r4] (raw file):

}

func (e virtualTableEntry) getPlanNode(p *planner) (planNode, error) {

this method deserves a comment


Comments from Reviewable

@nvb nvb force-pushed the nvanbenschoten/infoSchem branch from 7694193 to 82ba1b1 Compare August 1, 2016 20:28
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 1, 2016

Thanks for the reviews. I added in a final commit which has the real implementation for information_schema.tables. I figured this wouldn't be suitable for merging until that was added in.


Review status: 0 of 19 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


sql/data_source.go, line 205 [r4] (raw file):

Previously, petermattis (Peter Mattis) wrote…

If someone has created a database named information_schema prior to this change, they will no longer be able to access it. Probably acceptable, but we'll likely want to call this out in the release notes. Be sure to highlight it in the commit message if it isn't already.

Yes, that's a good point that I meant to highlight. My feeling was that anyone who previously had an `information_schema` database had created it to hack around our lack of support, and therefore it would be fine to "shadow" their previous database now that we are providing real support. Still, we should call this out in the release notes and I noted it in the first commit message.

sql/information_schema.go, line 54 [r4] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

instead, say that not all fields all filled in.
Also nit: ':' after TODO(nvanbenschoten)

See final commit.

sql/virtual_schema.go, line 98 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/checkVirtualDBDesc/getVirtualDBDesc ?

This was addressed in the next commit.

sql/virtual_schema.go, line 103 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

really necessary?

No, but it adds a layer of abstraction so that other parts of `sql` dont need to know about the details of `virtualSchemaMap`.

sql/virtual_schema.go, line 42 [r4] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

wasn't the previous definition nicer? And in fact can't we make it more specific - can't it return a... ValuesNode?

I made the change because all of these tables are going to be doing the same thing: - Creating a `ValuesNode` - Populating the `ValuesNode.Column` from the table descriptor - Populating rows

This new signature allows step 1 and 2 above to be collapsed into getPlanNode, and also allows common assertions like len(columns per row) == len(valuesNode.columns) to be shared.


sql/virtual_schema.go, line 74 [r4] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

do we really need struct separately from virtualSchemaTable?

I'm going for a clear distinction between the programmer interface of virtualSchemas (see `virtualSchema`, `virtualSchemaTable`, and `virtualSchemas` and how they are used in `information_schema.go`) from the statically generated data structures. Both of the descriptors are statically generated during `init`, so it makes things much clearer if they are stored in the generated entry objects.

sql/virtual_schema.go, line 79 [r4] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

this method deserves a comment

Done.

Comments from Reviewable

@nvb nvb changed the title [WIP] sql: Introduce VirtualSchema infrastructure sql: Introduce VirtualSchema infrastructure Aug 1, 2016
@nvb nvb force-pushed the nvanbenschoten/infoSchem branch from 82ba1b1 to ca6544c Compare August 1, 2016 21:37
@andreimatei
Copy link
Copy Markdown
Contributor

:lgtm:


Review status: 0 of 19 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.


sql/virtual_schema.go, line 42 [r4] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I made the change because all of these tables are going to be doing the same thing:

  • Creating a ValuesNode
  • Populating the ValuesNode.Column from the table descriptor
  • Populating rows

This new signature allows step 1 and 2 above to be collapsed into getPlanNode, and also allows common assertions like len(columns per row) == len(valuesNode.columns) to be shared.

OK. But how about extracting steps 1 and 2 into a helper, and leaving a single method on the struct. If not, a more minor suggestion - can we make populate take a `ValuesNode`, instead of the callback (and make the assertion into a loop over the results)?

sql/virtual_schema.go, line 74 [r4] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm going for a clear distinction between the programmer interface of virtualSchemas (see virtualSchema, virtualSchemaTable, and virtualSchemas and how they are used in information_schema.go) from the statically generated data structures. Both of the descriptors are statically generated during init, so it makes things much clearer if they are stored in the generated entry objects.

to be honest, I already got confused between the two types. I think the random future reader will not make much of your distinction. Particularly with `getPlanNode()` being split from `populate()` in different structs. No strong opinions.

sql/virtual_schema.go, line 79 [r4] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Done.

can you change the return type to `ValuesNode` ?

Comments from Reviewable

@nvb nvb force-pushed the nvanbenschoten/infoSchem branch from ca6544c to f33fc0c Compare August 1, 2016 22:53
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 1, 2016

Review status: 0 of 19 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


sql/virtual_schema.go, line 42 [r4] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

OK. But how about extracting steps 1 and 2 into a helper, and leaving a single method on the struct.
If not, a more minor suggestion - can we make populate take a ValuesNode, instead of the callback (and make the assertion into a loop over the results)?

I really don't see the benefit in that. The current approach makes the purpose of the `populate` function more directed, and it means that the instances of `populate` can ignore the specifics of `ValuesNode` completely. It also means that the instances are not allowed to manipulate any fields in `ValuesNode`, which is desirable.

All future efforts towards supporting information_schema will be creating virtualSchemaTable objects, like in the last commit in this PR. There will be about 30 of these, which is why minimizing the scope of this interface is so important. With this structure, the only thing these objects will need to worry about is mapping a planner to a set of rows. Importantly, they wont need to worry about their own TableDescriptors or setting up valuesNode, because this approach hides that all within virtualSchema.

Are there any disadvantages to this approach that we should try to address?


sql/virtual_schema.go, line 74 [r4] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

to be honest, I already got confused between the two types. I think the random future reader will not make much of your distinction. Particularly with getPlanNode() being split from populate() in different structs.
No strong opinions.

Yeah I see where you're coming from. I added a few comments to try to clear up the confusion.

I also un-embedded virtualSchemaTable from virtualTableEntry to make their respective purposes a bit more concrete. This also had the nice effect that virtualTableEntry no longer has a populate field directly on it.


sql/virtual_schema.go, line 79 [r4] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you change the return type to ValuesNode ?

Done.

Comments from Reviewable

nvb added 5 commits August 1, 2016 22:55
This first commit creates the skeleton of the `virtualSchema`
infrastructure. To test this out, it adds the initial
"information_schema" database, with no tables.

Note that this change will result in previous databases called
`information_schema` being inaccessible. This is because the new
virtual database will shadow it.
This commit makes sure no mutations on virtual tables are possible, and
that the tables are indistinguishable from real tables with read-only
privileges.
This commit replaces the old mock population function for
`information_schema.tables` with the real one. One thing to note is that
the commit removed a number of columns from the table which were all
MySQL extensions.
@nvb nvb force-pushed the nvanbenschoten/infoSchem branch from f33fc0c to 4581375 Compare August 2, 2016 03:11
@andreimatei
Copy link
Copy Markdown
Contributor

Review status: 0 of 19 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


sql/virtual_schema.go, line 42 [r4] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I really don't see the benefit in that. The current approach makes the purpose of the populate function more directed, and it means that the instances of populate can ignore the specifics of ValuesNode completely. It also means that the instances are not allowed to manipulate any fields in ValuesNode, which is desirable.

All future efforts towards supporting information_schema will be creating virtualSchemaTable objects, like in the last commit in this PR. There will be about 30 of these, which is why minimizing the scope of this interface is so important. With this structure, the only thing these objects will need to worry about is mapping a planner to a set of rows. Importantly, they wont need to worry about their own TableDescriptors or setting up valuesNode, because this approach hides that all within virtualSchema.

Are there any disadvantages to this approach that we should try to address?

OK, if there's so many of these virtual tables than I'll shut up.

sql/virtual_schema.go, line 74 [r4] (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Yeah I see where you're coming from. I added a few comments to try to clear up the confusion.

I also un-embedded virtualSchemaTable from virtualTableEntry to make their respective purposes a bit more concrete. This also had the nice effect that virtualTableEntry no longer has a populate field directly on it.

thanks

Comments from Reviewable

@nvb nvb merged commit e534ddf into cockroachdb:master Aug 2, 2016
@nvb nvb deleted the nvanbenschoten/infoSchem branch August 2, 2016 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants