Search code, repositories, users, issues, pull requests...

type virtualSchemaTable struct {
  schema   string
  populate func(p *planner, addRow func(...parser.Datum)) error

wasn't the previous definition nicer? And in fact can't we make it more specific - can't it return a... ValuesNode?

}

type virtualTableEntry struct {

do we really need struct separately from virtualSchemaTable?

}

func (e virtualTableEntry) getPlanNode(p *planner) (planNode, error) {

this method deserves a comment

Comments from Reviewable

nvb · 2016-08-01T20:31:00Z

Thanks for the reviews. I added in a final commit which has the real implementation for information_schema.tables. I figured this wouldn't be suitable for merging until that was added in.

Review status: 0 of 19 files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.

sql/data_source.go, line 205 [r4] (raw file):

Previously, petermattis (Peter Mattis) wrote…

If someone has created a database named information_schema prior to this change, they will no longer be able to access it. Probably acceptable, but we'll likely want to call this out in the release notes. Be sure to highlight it in the commit message if it isn't already.

Yes, that's a good point that I meant to highlight. My feeling was that anyone who previously had an `information_schema` database had created it to hack around our lack of support, and therefore it would be fine to "shadow" their previous database now that we are providing real support. Still, we should call this out in the release notes and I noted it in the first commit message.

sql/information_schema.go, line 54 [r4] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

instead, say that not all fields all filled in.
Also nit: ':' after TODO(nvanbenschoten)

See final commit.

sql/virtual_schema.go, line 98 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/checkVirtualDBDesc/getVirtualDBDesc ?

This was addressed in the next commit.

sql/virtual_schema.go, line 103 [r1] (raw file):

Previously, andreimatei (Andrei Matei) wrote…

really necessary?

No, but it adds a layer of abstraction so that other parts of `sql` dont need to know about the details of `virtualSchemaMap`.

Previously, andreimatei (Andrei Matei) wrote…

wasn't the previous definition nicer? And in fact can't we make it more specific - can't it return a... ValuesNode?

I made the change because all of these tables are going to be doing the same thing: - Creating a `ValuesNode` - Populating the `ValuesNode.Column` from the table descriptor - Populating rows

This new signature allows step 1 and 2 above to be collapsed into getPlanNode, and also allows common assertions like len(columns per row) == len(valuesNode.columns) to be shared.

Previously, andreimatei (Andrei Matei) wrote…

do we really need struct separately from virtualSchemaTable?

I'm going for a clear distinction between the programmer interface of virtualSchemas (see `virtualSchema`, `virtualSchemaTable`, and `virtualSchemas` and how they are used in `information_schema.go`) from the statically generated data structures. Both of the descriptors are statically generated during `init`, so it makes things much clearer if they are stored in the generated entry objects.

Previously, andreimatei (Andrei Matei) wrote…

this method deserves a comment

Done.

Comments from Reviewable

andreimatei · 2016-08-01T22:10:59Z

Review status: 0 of 19 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending.

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I made the change because all of these tables are going to be doing the same thing:

Creating a ValuesNode

Populating the ValuesNode.Column from the table descriptor

Populating rows

This new signature allows step 1 and 2 above to be collapsed into getPlanNode, and also allows common assertions like len(columns per row) == len(valuesNode.columns) to be shared.

OK. But how about extracting steps 1 and 2 into a helper, and leaving a single method on the struct. If not, a more minor suggestion - can we make populate take a `ValuesNode`, instead of the callback (and make the assertion into a loop over the results)?

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm going for a clear distinction between the programmer interface of virtualSchemas (see virtualSchema, virtualSchemaTable, and virtualSchemas and how they are used in information_schema.go) from the statically generated data structures. Both of the descriptors are statically generated during init, so it makes things much clearer if they are stored in the generated entry objects.

to be honest, I already got confused between the two types. I think the random future reader will not make much of your distinction. Particularly with `getPlanNode()` being split from `populate()` in different structs. No strong opinions.

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Done.

can you change the return type to `ValuesNode` ?

Comments from Reviewable

nvb · 2016-08-01T22:53:54Z

Review status: 0 of 19 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

Previously, andreimatei (Andrei Matei) wrote…

OK. But how about extracting steps 1 and 2 into a helper, and leaving a single method on the struct.
If not, a more minor suggestion - can we make populate take a ValuesNode, instead of the callback (and make the assertion into a loop over the results)?

I really don't see the benefit in that. The current approach makes the purpose of the `populate` function more directed, and it means that the instances of `populate` can ignore the specifics of `ValuesNode` completely. It also means that the instances are not allowed to manipulate any fields in `ValuesNode`, which is desirable.

All future efforts towards supporting information_schema will be creating virtualSchemaTable objects, like in the last commit in this PR. There will be about 30 of these, which is why minimizing the scope of this interface is so important. With this structure, the only thing these objects will need to worry about is mapping a planner to a set of rows. Importantly, they wont need to worry about their own TableDescriptors or setting up valuesNode, because this approach hides that all within virtualSchema.

Are there any disadvantages to this approach that we should try to address?

Previously, andreimatei (Andrei Matei) wrote…

to be honest, I already got confused between the two types. I think the random future reader will not make much of your distinction. Particularly with getPlanNode() being split from populate() in different structs.
No strong opinions.

Yeah I see where you're coming from. I added a few comments to try to clear up the confusion.

I also un-embedded virtualSchemaTable from virtualTableEntry to make their respective purposes a bit more concrete. This also had the nice effect that virtualTableEntry no longer has a populate field directly on it.

Previously, andreimatei (Andrei Matei) wrote…

can you change the return type to ValuesNode ?

Done.

Comments from Reviewable

This first commit creates the skeleton of the `virtualSchema` infrastructure. To test this out, it adds the initial "information_schema" database, with no tables. Note that this change will result in previous databases called `information_schema` being inaccessible. This is because the new virtual database will shadow it.

This commit makes sure no mutations on virtual tables are possible, and that the tables are indistinguishable from real tables with read-only privileges.

This commit replaces the old mock population function for `information_schema.tables` with the real one. One thing to note is that the commit removed a number of columns from the table which were all MySQL extensions.

andreimatei · 2016-08-02T15:46:59Z

Review status: 0 of 19 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I really don't see the benefit in that. The current approach makes the purpose of the populate function more directed, and it means that the instances of populate can ignore the specifics of ValuesNode completely. It also means that the instances are not allowed to manipulate any fields in ValuesNode, which is desirable.

All future efforts towards supporting information_schema will be creating virtualSchemaTable objects, like in the last commit in this PR. There will be about 30 of these, which is why minimizing the scope of this interface is so important. With this structure, the only thing these objects will need to worry about is mapping a planner to a set of rows. Importantly, they wont need to worry about their own TableDescriptors or setting up valuesNode, because this approach hides that all within virtualSchema.

Are there any disadvantages to this approach that we should try to address?

OK, if there's so many of these virtual tables than I'll shut up.