workload/schemachange support user defined schemas in workload by jayshrivastava · Pull Request #54889 · cockroachdb/cockroach

jayshrivastava · 2020-09-28T21:36:43Z

Adds support for user-defined schemas

create schema op
drop schema op
updates tables, enums, and views to be qualified with schema name prefixes

Resolves: #54408
Release note: None

cockroach-teamcity · 2020-09-28T21:36:51Z

This change is

jayshrivastava · 2020-09-30T13:42:52Z

pkg/workload/schemachange/schemachange.go


 	stmt := rowenc.RandCreateTable(w.rng, "table", int(atomic.AddInt64(w.seqNum, 1)))
-	stmt.Table = tree.MakeUnqualifiedTableName(tree.Name(tableName))
+	stmt.Table = tree.MakeTableNameWithSchema((tree.Name)(w.database), (tree.Name)(schemaName), (tree.Name)(tableName))


I'm not sure if specifying the best db is the best thing to do here. It might be better to leave it blank until the workload supports multiple dbs.

jayshrivastava · 2020-09-30T13:46:21Z

pkg/workload/schemachange/schemachange.go

+		return "", err
+	}
+
+	stmt := rowenc.MakeSchemaName(w.rng.Intn(2) == 0, schemaName, "root")


It may be better to leave authorization plank until the workload supports multiple roles.

ajwerner

Reviewed 2 of 3 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava)

pkg/workload/schemachange/schemachange.go, line 1100 at r1 (raw file):

Previously, jayshrivastava (Jayant Shrivastava) wrote…

It may be better to leave authorization plank until the workload supports multiple roles.

I agree completely. We should just leave a TODO to support authorization. Also worth noting that it is valid to not provide a name when you provide an authorization as it implies the schema of the same name.

pkg/workload/schemachange/schemachange.go, line 1110 at r1 (raw file):

	const q = `
  SELECT *
    FROM [information_schema.schemata]

This query is suspect to me. Firstly, I don't think those where brackets make sense. Secondly, if you do SELECT * I think it's going to fail when you only try to scan a single column.

I think you want:

  SELECT schema_name
    FROM information_schema.schemata
   WHERE schema_name LIKE 'schema%' OR schema_name = 'public'
ORDER BY random()
   LIMIT 1;

I'm starting to think that we need to make it so that any errors coming from these schema introspection functions fail the workload completely. My read is that they currently do not.

pkg/workload/schemachange/schemachange.go, line 565 at r3 (raw file):

Previously, jayshrivastava (Jayant Shrivastava) wrote…

I'm not sure if specifying the best db is the best thing to do here. It might be better to leave it blank until the workload supports multiple dbs.

This seems fine. If you'd prefer to not use the database name you could do:

tree.MakeTableNameFromPrefix(tree.ObjectPrefix{
    SchemaName:     tree.Name(schemaName),
    ExplicitSchema: true,
}, tree.Name(tableName))

Also nit: you don't need parents around the type name for the cast.

pkg/workload/schemachange/schemachange.go, line 584 at r3 (raw file):

		return "", err
	}
	qualifiedTableName := fmt.Sprintf("%s.%s", schemaName, tableName)

On this whole commit I think it'd be better to have a helper here that can choose to omit public or something like that. I'd take it one step further and have randTable just return a string or a tree.TableName (which has a String() method)

The repetition in this commit isn't very valuable.

pkg/workload/schemachange/schemachange.go, line 1077 at r5 (raw file):

}

func (w *schemaChangeWorker) randView(tx *pgx.Tx, pctExisting int) (string, string, error) {

same comment on the return types here, returning two strings makes the caller do a lot of work and makes the code harder to read as far as I can tell. what do you think?

jayshrivastava · 2020-10-07T13:41:08Z

pkg/workload/schemachange/schemachange.go

+func (w *schemaChangeWorker) randTable(tx *pgx.Tx, pctExisting int) (tree.TableName, error) {
 	if w.rng.Intn(100) >= pctExisting {
-		randSchema, err := w.randSchema(tx, pctExisting)
+		randSchema, err := w.randSchema(tx, 90)


pctExisting will be a low value when randTable is called with the intention of creating a table, so a non existing table is returned most of the time. Before I added schemas, this means that a create table operation would mostly succeed because most of the time a non existing table name is returned. However, when you pass pctExisting to randSchema, the schema will most likely not exist. This would cause the create table operation to fail most of the time. To restore the original behaviour of normally succeeding, I thought 90 would be a good value here.

The same thing applies to createEnum and createView.

jayshrivastava

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/workload/schemachange/schemachange.go, line 584 at r3 (raw file):

Previously, ajwerner wrote…

On this whole commit I think it'd be better to have a helper here that can choose to omit public or something like that. I'd take it one step further and have randTable just return a string or a tree.TableName (which has a String() method)

The repetition in this commit isn't very valuable.

Ok. I Changed the return type.

pkg/workload/schemachange/schemachange.go, line 1077 at r5 (raw file):

Previously, ajwerner wrote…

same comment on the return types here, returning two strings makes the caller do a lot of work and makes the code harder to read as far as I can tell. what do you think?

Ok. It makes sense to me. I just put out these changes.

ajwerner

Seems like you need a rebase. This is getting close.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)

pkg/workload/schemachange/schemachange.go, line 1035 at r12 (raw file):

Previously, jayshrivastava (Jayant Shrivastava) wrote…

pctExisting will be a low value when randTable is called with the intention of creating a table, so a non existing table is returned most of the time. Before I added schemas, this means that a create table operation would mostly succeed because most of the time a non existing table name is returned. However, when you pass pctExisting to randSchema, the schema will most likely not exist. This would cause the create table operation to fail most of the time. To restore the original behaviour of normally succeeding, I thought 90 would be a good value here.

The same thing applies to createEnum and createView.

interesting, I feel like if we're choosing to not return an existing name we know that no matter what, what we return here will not exist. The question at hand is whether the schema also does not exist. I don't know how to be principled about what to do here. Picking a random constant feels worse than just propagating pctExisting, no?

pkg/workload/schemachange/schemachange.go, line 502 at r16 (raw file):

	}
	def.Nullable.Nullability = tree.Nullability(rand.Intn(1 + int(tree.SilentNull)))
	return fmt.Sprintf(`ALTER TABLE "%s" ADD COLUMN %s`, tableName.String(), tree.Serialize(def)), nil