Skip to content

workload/schemachange support user defined schemas in workload#54889

Merged
craig[bot] merged 7 commits intocockroachdb:masterfrom
jayshrivastava:support-user-defined-schemas-in-workload
Oct 13, 2020
Merged

workload/schemachange support user defined schemas in workload#54889
craig[bot] merged 7 commits intocockroachdb:masterfrom
jayshrivastava:support-user-defined-schemas-in-workload

Conversation

@jayshrivastava
Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava commented Sep 28, 2020

Adds support for user-defined schemas

  • create schema op
  • drop schema op
  • updates tables, enums, and views to be qualified with schema name prefixes

Resolves: #54408
Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@jayshrivastava jayshrivastava force-pushed the support-user-defined-schemas-in-workload branch 2 times, most recently from 93ed9b5 to 2ca209d Compare September 29, 2020 20:53

stmt := rowenc.RandCreateTable(w.rng, "table", int(atomic.AddInt64(w.seqNum, 1)))
stmt.Table = tree.MakeUnqualifiedTableName(tree.Name(tableName))
stmt.Table = tree.MakeTableNameWithSchema((tree.Name)(w.database), (tree.Name)(schemaName), (tree.Name)(tableName))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if specifying the best db is the best thing to do here. It might be better to leave it blank until the workload supports multiple dbs.

return "", err
}

stmt := rowenc.MakeSchemaName(w.rng.Intn(2) == 0, schemaName, "root")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to leave authorization plank until the workload supports multiple roles.

@jayshrivastava jayshrivastava changed the title Support user defined schemas in workload workload/schemachange support user defined schemas in workload Sep 30, 2020
@jayshrivastava jayshrivastava force-pushed the support-user-defined-schemas-in-workload branch from 7edd1eb to ff3a5dc Compare September 30, 2020 14:11
@jayshrivastava jayshrivastava marked this pull request as ready for review September 30, 2020 14:11
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r1, 2 of 2 files at r2, 1 of 1 files at r3, 1 of 1 files at r4, 1 of 1 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jayshrivastava)


pkg/workload/schemachange/schemachange.go, line 1100 at r1 (raw file):

Previously, jayshrivastava (Jayant Shrivastava) wrote…

It may be better to leave authorization plank until the workload supports multiple roles.

I agree completely. We should just leave a TODO to support authorization. Also worth noting that it is valid to not provide a name when you provide an authorization as it implies the schema of the same name.


pkg/workload/schemachange/schemachange.go, line 1110 at r1 (raw file):

	const q = `
  SELECT *
    FROM [information_schema.schemata]

This query is suspect to me. Firstly, I don't think those where brackets make sense. Secondly, if you do SELECT * I think it's going to fail when you only try to scan a single column.

I think you want:

  SELECT schema_name
    FROM information_schema.schemata
   WHERE schema_name LIKE 'schema%' OR schema_name = 'public'
ORDER BY random()
   LIMIT 1;

I'm starting to think that we need to make it so that any errors coming from these schema introspection functions fail the workload completely. My read is that they currently do not.


pkg/workload/schemachange/schemachange.go, line 565 at r3 (raw file):

Previously, jayshrivastava (Jayant Shrivastava) wrote…

I'm not sure if specifying the best db is the best thing to do here. It might be better to leave it blank until the workload supports multiple dbs.

This seems fine. If you'd prefer to not use the database name you could do:

tree.MakeTableNameFromPrefix(tree.ObjectPrefix{
    SchemaName:     tree.Name(schemaName),
    ExplicitSchema: true,
}, tree.Name(tableName))

Also nit: you don't need parents around the type name for the cast.


pkg/workload/schemachange/schemachange.go, line 584 at r3 (raw file):

		return "", err
	}
	qualifiedTableName := fmt.Sprintf("%s.%s", schemaName, tableName)

On this whole commit I think it'd be better to have a helper here that can choose to omit public or something like that. I'd take it one step further and have randTable just return a string or a tree.TableName (which has a String() method)

The repetition in this commit isn't very valuable.


pkg/workload/schemachange/schemachange.go, line 1077 at r5 (raw file):

}

func (w *schemaChangeWorker) randView(tx *pgx.Tx, pctExisting int) (string, string, error) {

same comment on the return types here, returning two strings makes the caller do a lot of work and makes the code harder to read as far as I can tell. what do you think?

@jayshrivastava jayshrivastava force-pushed the support-user-defined-schemas-in-workload branch from ff3a5dc to 46f3e5f Compare October 6, 2020 21:42
@jayshrivastava jayshrivastava requested a review from a team October 6, 2020 21:42
@jayshrivastava jayshrivastava force-pushed the support-user-defined-schemas-in-workload branch from 46f3e5f to d46a9dd Compare October 6, 2020 21:43
@jayshrivastava jayshrivastava removed the request for review from a team October 6, 2020 21:44
@jayshrivastava jayshrivastava force-pushed the support-user-defined-schemas-in-workload branch from d46a9dd to 3bfe843 Compare October 6, 2020 23:13
func (w *schemaChangeWorker) randTable(tx *pgx.Tx, pctExisting int) (tree.TableName, error) {
if w.rng.Intn(100) >= pctExisting {
randSchema, err := w.randSchema(tx, pctExisting)
randSchema, err := w.randSchema(tx, 90)
Copy link
Copy Markdown
Contributor Author

@jayshrivastava jayshrivastava Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pctExisting will be a low value when randTable is called with the intention of creating a table, so a non existing table is returned most of the time. Before I added schemas, this means that a create table operation would mostly succeed because most of the time a non existing table name is returned. However, when you pass pctExisting to randSchema, the schema will most likely not exist. This would cause the create table operation to fail most of the time. To restore the original behaviour of normally succeeding, I thought 90 would be a good value here.

The same thing applies to createEnum and createView.

Copy link
Copy Markdown
Contributor Author

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/workload/schemachange/schemachange.go, line 584 at r3 (raw file):

Previously, ajwerner wrote…

On this whole commit I think it'd be better to have a helper here that can choose to omit public or something like that. I'd take it one step further and have randTable just return a string or a tree.TableName (which has a String() method)

The repetition in this commit isn't very valuable.

Ok. I Changed the return type.


pkg/workload/schemachange/schemachange.go, line 1077 at r5 (raw file):

Previously, ajwerner wrote…

same comment on the return types here, returning two strings makes the caller do a lot of work and makes the code harder to read as far as I can tell. what do you think?

Ok. It makes sense to me. I just put out these changes.

@jayshrivastava jayshrivastava force-pushed the support-user-defined-schemas-in-workload branch from 3bfe843 to 017c5a8 Compare October 8, 2020 21:13
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you need a rebase. This is getting close.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)


pkg/workload/schemachange/schemachange.go, line 1035 at r12 (raw file):

Previously, jayshrivastava (Jayant Shrivastava) wrote…

pctExisting will be a low value when randTable is called with the intention of creating a table, so a non existing table is returned most of the time. Before I added schemas, this means that a create table operation would mostly succeed because most of the time a non existing table name is returned. However, when you pass pctExisting to randSchema, the schema will most likely not exist. This would cause the create table operation to fail most of the time. To restore the original behaviour of normally succeeding, I thought 90 would be a good value here.

The same thing applies to createEnum and createView.

interesting, I feel like if we're choosing to not return an existing name we know that no matter what, what we return here will not exist. The question at hand is whether the schema also does not exist. I don't know how to be principled about what to do here. Picking a random constant feels worse than just propagating pctExisting, no?


pkg/workload/schemachange/schemachange.go, line 502 at r16 (raw file):

	}
	def.Nullable.Nullability = tree.Nullability(rand.Intn(1 + int(tree.SilentNull)))
	return fmt.Sprintf(`ALTER TABLE "%s" ADD COLUMN %s`, tableName.String(), tree.Serialize(def)), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 600 at r16 (raw file):

	return fmt.Sprintf(`CREATE TABLE "%s" AS SELECT %s FROM "%s"`,
		tableName.String(), tree.Serialize(&names), destTableName.String()), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 640 at r16 (raw file):

		return "", err
	}
	return fmt.Sprintf(`ALTER TABLE "%s" DROP COLUMN "%s"`, tableName.String(), columnName), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 652 at r16 (raw file):

		return "", err
	}
	return fmt.Sprintf(`ALTER TABLE "%s" ALTER COLUMN "%s" DROP DEFAULT`, tableName.String(), columnName), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 664 at r16 (raw file):

		return "", err
	}
	return fmt.Sprintf(`ALTER TABLE "%s" ALTER COLUMN "%s" DROP NOT NULL`, tableName.String(), columnName), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 747 at r16 (raw file):

	return fmt.Sprintf(`ALTER TABLE "%s" RENAME COLUMN "%s" TO "%s"`,
		tableName.String(), srcColumnName, destColumnName), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 767 at r16 (raw file):

	return fmt.Sprintf(`ALTER TABLE "%s" RENAME CONSTRAINT "%s" TO "%s"`,
		tableName.String(), srcIndexName, destIndexName), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 795 at r16 (raw file):

	}

	return fmt.Sprintf(`ALTER TABLE "%s" RENAME TO "%s"`, srcTableName.String(), destTableName.String()), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 809 at r16 (raw file):

	}

	return fmt.Sprintf(`ALTER VIEW "%s" RENAME TO "%s"`, srcViewName.String(), destViewName.String()), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 827 at r16 (raw file):

		return "", err
	}
	return fmt.Sprintf(`ALTER TABLE "%s" ALTER COLUMN "%s" SET NOT NULL`, tableName.String(), columnName), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 844 at r16 (raw file):

	}
	return fmt.Sprintf(`ALTER TABLE "%s" ALTER COLUMN "%s" SET DATA TYPE %s`,
		tableName.String(), columnName, typ), nil

nit: the %s verb calls String()


pkg/workload/schemachange/schemachange.go, line 872 at r16 (raw file):

	return fmt.Sprintf(
		`INSERT INTO "%s" (%s) VALUES %s`,
		tableName.String(),

nit: the %s verb calls String()

@jayshrivastava jayshrivastava force-pushed the support-user-defined-schemas-in-workload branch 4 times, most recently from 4b25259 to 7924846 Compare October 9, 2020 16:22
Copy link
Copy Markdown
Contributor Author

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)


pkg/workload/schemachange/schemachange.go, line 1035 at r12 (raw file):

Previously, ajwerner wrote…

interesting, I feel like if we're choosing to not return an existing name we know that no matter what, what we return here will not exist. The question at hand is whether the schema also does not exist. I don't know how to be principled about what to do here. Picking a random constant feels worse than just propagating pctExisting, no?

It is worth noting that we do use int literals in a few places, but that I plan to clean up all of them with my work related to error classification and control.

I think 100 - pctExisting might be the best. We mainly use pctExisting as a way to control errors. For example, we call w.randTable(tx, 10) in createTable. I feel that the intention is to have it fail roughly 10% of the time due to an existing table being given, otherwise succeed 90% of the time. My problem with propagatingpctExisting is that we will actually fail most of the time in that 90% case by calling w.randSchema(tx, 10)). The problem is we do not distinguish between if the error case is when a name does exist (ie. the table name) and when it doesn't (the schema name). There are a lot of examples of this in the workload (ex. create a column on a table - in the non error case, the column name does not exist, but the table name does). It makes sense to use two different pctExistings here to achieve some overall error rate.

Overall, I think the idea of pctExisting needs to be refactored into something called an errorRate. We should also configure it as a property in schemaChangeWorker rather than int literals. Then, we need to provide an abstraction that generates the pctExisting to use based on if the error case is having a name exist or not. I plan to accomplish this during my next project.


pkg/workload/schemachange/schemachange.go, line 600 at r16 (raw file):

Previously, ajwerner wrote…

nit: the %s verb calls String()

It seems like this happens: CREATE TABLE {{table1647 { public %!s(bool=false) %!s(bool=true)}}} AS SELECT col676_6, col676_3, col676_1 FROM {{table675 { public %!s(bool=false) %!s(bool=true)}}} it also causes CI to fail due to linting.

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still needs a rebase against master, no?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)


pkg/workload/schemachange/schemachange.go, line 1035 at r12 (raw file):

Previously, jayshrivastava (Jayant Shrivastava) wrote…

It is worth noting that we do use int literals in a few places, but that I plan to clean up all of them with my work related to error classification and control.

I think 100 - pctExisting might be the best. We mainly use pctExisting as a way to control errors. For example, we call w.randTable(tx, 10) in createTable. I feel that the intention is to have it fail roughly 10% of the time due to an existing table being given, otherwise succeed 90% of the time. My problem with propagatingpctExisting is that we will actually fail most of the time in that 90% case by calling w.randSchema(tx, 10)). The problem is we do not distinguish between if the error case is when a name does exist (ie. the table name) and when it doesn't (the schema name). There are a lot of examples of this in the workload (ex. create a column on a table - in the non error case, the column name does not exist, but the table name does). It makes sense to use two different pctExistings here to achieve some overall error rate.

Overall, I think the idea of pctExisting needs to be refactored into something called an errorRate. We should also configure it as a property in schemaChangeWorker rather than int literals. Then, we need to provide an abstraction that generates the pctExisting to use based on if the error case is having a name exist or not. I plan to accomplish this during my next project.

sgtm.


pkg/workload/schemachange/schemachange.go, line 600 at r16 (raw file):

Previously, jayshrivastava (Jayant Shrivastava) wrote…

It seems like this happens: CREATE TABLE {{table1647 { public %!s(bool=false) %!s(bool=true)}}} AS SELECT col676_6, col676_3, col676_1 FROM {{table675 { public %!s(bool=false) %!s(bool=true)}}} it also causes CI to fail due to linting.

okay, what's going on here is that the string method is on the pointer. What if we changed the return type to be the pointer rather than the struct?

Copy link
Copy Markdown
Contributor Author

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I'll rebase momentarily.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)


pkg/workload/schemachange/schemachange.go, line 600 at r16 (raw file):

Previously, ajwerner wrote…

okay, what's going on here is that the string method is on the pointer. What if we changed the return type to be the pointer rather than the struct?

Because you cannot take the address of a fcn in Go, we would have to change all the return statements in randTable and randView to look like this.

tableName := tree.MakeTableNameFromPrefix(tree.ObjectNamePrefix{
	SchemaName:     tree.Name(randSchema),
	ExplicitSchema: true,
}, tree.Name(fmt.Sprintf("table%d", atomic.AddInt64(w.seqNum, 1)))), nil
return tableName

of alternatively, we can update the Sprintfs to take the address.
fmt.Sprintf(CREATE TABLE "%s" AS SELECT %s FROM %s, &destTableName, tree.Serialize(&names), &tableName)

Copy link
Copy Markdown
Contributor Author

@jayshrivastava jayshrivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)


pkg/workload/schemachange/schemachange.go, line 600 at r16 (raw file):

Previously, jayshrivastava (Jayant Shrivastava) wrote…

Because you cannot take the address of a fcn in Go, we would have to change all the return statements in randTable and randView to look like this.

tableName := tree.MakeTableNameFromPrefix(tree.ObjectNamePrefix{
	SchemaName:     tree.Name(randSchema),
	ExplicitSchema: true,
}, tree.Name(fmt.Sprintf("table%d", atomic.AddInt64(w.seqNum, 1)))), nil
return tableName

of alternatively, we can update the Sprintfs to take the address.
fmt.Sprintf(CREATE TABLE "%s" AS SELECT %s FROM %s, &destTableName, tree.Serialize(&names), &tableName)

I'll go with the first option. It'll keep the changes in 1 spot for each rand fcn.

Updated the 'Example usage' comment at the top of the workload
to use the correct flag names and types

Release note: None
…ements

This fixes a syntax error. Specifying `"public.table_name"` instead of
`public.table_name` will not recognize `public` as the schema name.

Release note: None
@jayshrivastava jayshrivastava force-pushed the support-user-defined-schemas-in-workload branch from 7924846 to a8bcdd6 Compare October 9, 2020 20:27
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @jayshrivastava)

@jayshrivastava
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 13, 2020

Build succeeded:

@craig craig bot merged commit d66b7de into cockroachdb:master Oct 13, 2020
@jayshrivastava jayshrivastava deleted the support-user-defined-schemas-in-workload branch October 31, 2020 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

workload/schemachange: support user-defined schemas

3 participants