Skip to content

feat: Expression builder (WIP)#7282

Closed
nebkat wants to merge 48 commits intotypeorm:masterfrom
nebkat:expression-builder
Closed

feat: Expression builder (WIP)#7282
nebkat wants to merge 48 commits intotypeorm:masterfrom
nebkat:expression-builder

Conversation

@nebkat
Copy link
Copy Markdown
Contributor

@nebkat nebkat commented Jan 16, 2021

Depends on #7058 (refactor: Query builders and entity/column/relation metadata)

Necessary due to large amount of duplicate code before refactor.

Only last commit is relevant to this PR! See #7058 for rest.

Description of change

This introduces the concept of "expressions" in TypeORM that represent SQL expressions, as well as an accompanying "expression builder" system. SQL expressions are found in a variety of places, but most importantly in the WHERE clause, INSERT VALUES and UPDATE SET. Currently only QueryBuilder.where() has basic support for constructing expressions using FindOperators.

Whenever "raw" expressions are used a number of major problems arise, all of which can be solved by the expression builder:

Implementation

Slightly simplified from actual implementation for better readability

Base Types

// Raw values, such as 1, "Paris", TRUE or NULL, which will always get converted to query parameters(!)
type ExpressionLiteral = boolean | string | number | null;

// Class which produces an expression string when its build() method is called
abstract class ExpressionBuilder {
    abstract build(ctx: ExpressionBuildContext): string;
}

type Expression = ExpressionLiteral | ExpressionBuilder;

Build Context

Certain tasks of the expression builder are left to be implemented by the query builder, via an abstract ExpressionBuildContext.

export abstract class ExpressionBuildContext {
    // Query builder will replace property name or path and escape
    abstract buildColumn(column: string, alias?: string): string;
    // Query builder will add values to the native parameters list and replace with `?` / `$1`
    abstract buildLiteral(column: string): string;
    // Query builder will replace all property names and parameters
    abstract buildRaw(expression: string): string;
}

Builders

Every part of an expression that isn't a literal will be constructed using builders. These are functional building blocks that combine other expressions to form the final expression in raw string form.

Builders extend the ExpressionBuilder class and have a function to initialize them (i.e. XYZ() instead of new XYZBuilder()). Some builder functions may even have tagged template literal equivalents for conciseness.

Expressions will be built as the final step in constructing queries in the query builder, in the order that they appear in the query. Builders themselves will also build their sub-expressions in the order that they appear. This way, the order of native parameters will always be preserved. (#7256, #7104, #6607).

Raw

Raw unedited value that is passed as provided.

function Raw(raw: string): RawBuilder;
Raw("1 = 1 AND ?'!@@").build(ctx); /// Becomes: 
r`1 = 1 AND ?'!@@` // Tagged template literal variant
1 = 1 AND ?'!@@ # Note everything remains unescaped

This is precisely what the expression builder is trying to avoid but it remains for users that would like to still have raw SQL access. It will also replace the () => "RAW" syntax, which should become deprecated.

Column

Explicit reference to a column name or property name/path.

function Col(): ColumnBuilder;
function Col(column: string): ColumnBuilder;
function Col(alias: string, column: string): ColumnBuilder;
Col("userId").build(ctx);
c`userId` // Tagged template literal variant
`userId` # Note the column name is escaped

If Col() is used, it becomes a reference to the current column within a where conditions object or inset/update values object.

replacePropertyNames() can be used accurately and efficiently because we know the value we are dealing with is some kind of column, as opposed to having to search the entire WHERE clause for matching property names using regular expressions.

Function

Function within a query such as IF, LENGTH, or CURRENT_TIMESTAMP. Known functions have initializers with fixed parameters, but user can also manually call an arbitrary function with arbitrary parameters.

function Fn(name: string, parameters: Expression[]): UserFunctionBuilder;
Fn("CONCAT", [c`firstName`, " ", c`lastName`]).build(ctx);
CONCAT(`firstName`, " ", `lastName`)

Operator

All operators in the form of operand *operator* operand.

function *Operator*(a: Expression, b: Expression): *Operator*Builder;
Equal(1, 2).build(ctx);
LessThan(1, 2).build(ctx);
Like(c`name`), "John").build(ctx);
// MoreThan, LessThanOrEqual, Is, etc are all the same
WHERE 1 = 2
WHERE 1 < 2
WHERE `name` LIKE "John"

And, Or, Xor

function And(...expressions: []): AndBuilder;
function Or(...expressions: []): OrBuilder;
function Xor(...expressions: []): XorBuilder;
And(c`isUnread`), LessThan(c`date`, "2020-10-01"), Or(false, 1)).build(ctx);
WHERE `isUnread` AND `date` < "2020-10-01" AND (FALSE OR 1)

Find Conditions

Normally when we are dealing with entities and the WHERE clause, we define the find conditions using an object.

When we write:

cities.find({
    city: "Paris",
    country: "France"
})

It is the equivalent of ANDing every key = value of the object i.e.:

And(...Object.entries(conditions).map(([key, value]) => Equal(Col(key), value)));
And(Equal(c`city`, "Paris"), Equal(c`country`, "France"));
WHERE `city` = "Paris" AND `country` = "France"

Comparators

However there is an exception to this rule if we are using comparison builders:

cities.find({
    population: MoreThan(10000),
    country: Equal("France")
})

We don't want to check if population = > 10000 AND country = = "France", but population > 10000 AND country = "France". Note also that where previously there were two arguments to MoreThan and Equal there is now only one.

This is a special variant of the comparator where only one argument is given instead of two. In this variant, the first argument becomes Col(), so that it references the current column based on the key in the object, and a flag is set to achieve the desired behavior of not checking for equality. We update the object mapping code:

And(...Object.entries(conditions).map(([key, value]) => {
        // If the find condition value was already comparing to the current column, use it directly
        if (value instanceof ExpressionBuiler && value.columnComparator) return value;
        // Otherwise check for equality
        return Equal(value);
}));

Which gives us the distinction between:

cities.find({ population: MoreThan(10000); });
cities.find({ population: MoreThan(c`population`, 10000); });
WHERE `population` > 10000
WHERE `population` = (`population` > 10000) # Broken query here, but could be useful elsewhere

And, Or, Xor

One remaining special case is the ability to use And, Or, and Xor in combination with a find object (long been requested #2100).

cities.find({
    name: Or("Paris", "London", Like("A%"), And(Like("B%"), Like("%B")),
    country: In(["England", "France"])
}); 
WHERE (`name` = "Paris" OR `name` = "London" OR `name` LIKE "A%" OR (`name` LIKE "B%" AND `name` LIKE "%B")
AND `country` IN ("England", "France")

When used as part of a find conditions object, And, Or, and Xor are always distributed over their column.

To avoid this behavior, there is a wrapper function:

// Does nothing to the expression being wrapped, but prevents the distributive property from applying
function Ex(expression: Expression): WrappedExpressionBuilder;

cities.find({
    name: Ex(Or("Paris", "London")),
})
WHERE `name` = ("Paris" OR "London") # Broken query, but might be intended in other queries

Examples

When used with query builder all values get replaced with parameters (?, $1)

Find All

cities.find(true);
WHERE 1

Basic Conditions

cities.find(Equal(c`name`, "London"));
cities.find({name: "London"});
WHERE `city` = "London"

OR Conditions

cities.find(Or(Equal(c`name`, "London"), Equal(c`name`, "Paris"));
cities.find(Or({name: "London"}, {name: "Paris"}));
cities.find(Or({name: "London"}, Equal(c`name`, "Paris"));
cities.find({name: Or("London", "Paris")});
WHERE `city` = "London" OR `city` = "Paris"

Not

cities.find({
    name: Not("Paris"),
    population: Not(MoreThan(10000),
    country: Not(In(["Germany", "Spain", "Italy"]),
    capital: Not(Not(true))
    rural: Not(Coalesce(null, null, true))
})
WHERE
`name` != "Paris AND
`population` <= 10000 AND 
`country` NOT IN ("Germany", "Spain", "Italy") AND
capital = 1 AND
rural = Not(Coalesce(NULL, NULL, 1))

Nested With Column

cities.find({
    name: MoreThan(Length(), 10)
})
WHERE LENGTH(`name`) > 10

Case Expression

cities.find({
    population: Case(c`country`,
            When("France", 10000),
            When("England", 50000),
            Else(100000)
    )
})
WHERE population = (CASE `country` WHEN "France" THEN 10000 WHEN "England" THEN 50000 ELSE 100000 END)

Join Conditions

qb.leftJoin(AddressEntity, "address", Equal(c`address.id`, c`contact.addressId`));
qb.leftJoin(AddressEntity, "address", { id: c`contact.addressId` });
LEFT JOIN `addresses` ON `address`.`id` = `contact`.`addressId`;

Update Increment

users.update({
    id: 1,
}, {
    score: Plus(Col(), 1)
});
SET `score` = `score` + 1 WHERE `id` = 1

Update Reference

cities.update({
    country: "France"
}, {
    large: MoreThan(c`population`, 10000
});
SET `large` = `population` > 10000 WHERE `country` = "France

Column Default

@Column({default: CurrentTimestamp() }) createDate: Date;
@Column({default: r`CURRENT_TIMESTAMP`}) createDate: Date;
createDate DATETIME DEFAULT CURRENT_TIMESTAMP

Pull-Request Checklist

  • Code is up-to-date with the master branch
  • npm run lint passes with this change
  • npm run test passes with this change
  • This pull request links relevant issues as Fixes #0000
  • There are new or updated unit tests validating the change
  • Documentation has been updated to reflect this change
  • The new commits follow conventions explained in CONTRIBUTING.md

@nebkat
Copy link
Copy Markdown
Contributor Author

nebkat commented Jan 16, 2021

@pleerock

This is about 90% complete now and I have identified some very very significant benefits from introducing this system and its related changes:


  • Adds And()/Or() builders to find()/where()
  • Adds ability to filter by relation and join columns
  • Adds ability to create very complex expressions previously only possible with raw expressions
  • Adds ability to cleanly use expressions in select(), orderBy(), groupBy(), where() column defaults, insert/update values
  • Adds ability to generate subqueries without breaking parameter ordering

  • Improves performance by using targeted property name replacement only on column expressions and raw expressions
  • Improves performance by using targeted parameter replacement only on parameter expressions and raw expressions
  • Improves safety from SQL injections by parameterizing all values except those used in raw expressions
  • Improves user experience by moving query construction to code, where missing brackets, typos, etc are noticed

  • Eliminates all usage of TypeORM parameters in built in functions and replaces with native parameters
  • Eliminates all manual query construction in relations/joins
  • Eliminates all problems with native parameter ordering
  • Eliminates a large amount of duplicate code

I have also identified countless issues that can directly be solved with this: #7280, #7256, #7248, #7247, #7226, #7210, from the first 2 pages of issues alone 🎉

Amazingly I believe it may even be possible to achieve all of this without introducing any breaking changes relative to the current version!

That said, a bunch of features remain that would effectively become obsolete, and would only make it more difficult to implement:

  • SelectQueryBuilder.select(select: string)
    • Previously does not differentiate between column/property name/path and a raw expression
    • Now it should assume that a column is being selected, and if user wants expressions they can pass qb.select(Raw("..."))
    • Better yet, qb.select(SomeExpressionBuild())
    • If no breaking changes allowed, it must assume that a string could be either a column or an expression
  • SelectQueryBuilder.where(where: Brackets)
    • Previously used for qb.andWhere(new Brackets(qb => qb.where("1 = 2").orWhere("2 < 3")))
    • Replaced by expressions qb.andWhere(Or(Equal(1, 2), LessThan(2, 3))
    • If no breaking changes allowed, all Brackets inputs have their wheres converted to expressions as above
  • SelectQueryBuilder.where(where: (qb: QueryBuilder) => string)
    • Previously used for qb.andWhere(qb => { const sq = qb.subQuery() ... return "post.title IN " + sq.getQuery() })
    • Replaced by expressions qb.andWhere(In(Col("post", "title"), SubQuery(qb => ...)))
    • Better yet, qb.andWhere({title: In(SubQuery(qb => ...))})
    • If no breaking changes allowed, mapped to Raw(where(this))
  • SelectQueryBuilder.where(where: string)
    • Previously used for passing string expression qb.andWhere("post.title = :title", { title: "Title" })
    • Replaced by expression qb.andWhere(Equal(Col("post", "title", "Title"), or qb.andWhere(Raw("post.title = :title"), { title: "Title" })
    • Better yet, qb.andWhere({title: "Title"})
    • If no breaking changes allowed, all string inputs are mapped to Raw(where)
  • SelectQueryBuilder.join(..., on: string)
    • Previously used for qb.leftJoin(..., "user", "user.id = post.userId")
    • Replaced by expressions qb.leftJoin(..., "user", Equal(Col("user", "id"), Col("post", "userId")))
    • If no breaking changes allowed, all string inputs are mapped to Raw(on)
  • SelectQueryBuilder.orderBy(orderBy: string)
    • Previously does not differentiate between column/property name/path and a raw expression
    • Now it should assume ordering by a column, and if user wants expression they can pass qb.orderBy(Raw("..."))
    • If no breaking changes allowed, all string inputs are mapped to Raw(orderBy)
  • Insert/UpdateQueryBuilder.set(values: { [key: string]: () => string })
    • Previously used for passing raw expressions as values qb.set({count: () => "count + 1"})
    • Replaced by expressions qb.set({count: Plus(Col(), 1)})
    • If no breaking changes allowed, mapped to Raw(value())
  • @Column({default: (() => string)})
    • Previously used for passing raw expressions as values @Column({default: () => "CURRENT_TIMESTAMP"})
    • Replaced by expressions @Column({default: CurrentTimestamp()})
    • If no breaking changes allowed, mapped to Raw(default())

Considering this, it might be better to target the next branch, but I would first like to get your opinion.

Is there a possibility that we could work towards the release of next in the near future if a few features were to get merged, and if so what level of breaking changes would we be willing to accept? Aside from this if #7076 (upsert support) and #7010 (select and map) were to be merged there would already be a significant difference between the two versions that would be worthy of a major release.

@nebkat nebkat force-pushed the expression-builder branch 3 times, most recently from 49ce176 to 9c04ac5 Compare January 17, 2021 18:17
@nebkat nebkat force-pushed the expression-builder branch 3 times, most recently from 4177377 to e421232 Compare January 24, 2021 04:25
@draaglom
Copy link
Copy Markdown
Contributor

you love to see it!

Improves performance by using targeted property name replacement only on column expressions and raw expressions

Ever since looking into #4760 I was hoping someone would tackle this -- those regexes in replacePropertyNames dominate performance above a certain query rate, but the generic/blunt regex approach was necessary given the current QueryBuilder.

@nebkat nebkat force-pushed the expression-builder branch from e421232 to 49c0e5c Compare January 27, 2021 05:15
@nebkat nebkat force-pushed the expression-builder branch 10 times, most recently from 2f5e0c8 to 824115a Compare February 1, 2021 15:49
@pleerock
Copy link
Copy Markdown
Member

pleerock commented Feb 2, 2021

All problems you have listed are super important and it will be great if we solve them.

I'm not sure I can review this PR, however I read your description.
Everything looks good, but one important note:

cities.find({
    name: Not("Paris")
})

Here Not should only accept typeof City.name. And in this example: cities.find(Or({name: "London"})) - { name: ... } must be typeof City. In order to implement it expression must have a generic type, and find should only accept expressions of Expression<City>. I'm not sure how it will play well with nested expressions, but we should try to make type-safe as much as possible.

…rm#6977)

Searches embedded entity columns for relation ID column if relation column
is in embedded entity. If not found, creates new relation ID with embedded
metadata set to match the relation column.

fixes: typeorm#2254
fixes: typeorm#3132
fixes: typeorm#3226
fixes: typeorm#6977
Moves QueryBuilder and its subclasses to their own directory.
In preparation for "virtual"/computed columns, call internally generated columns "internal"
rather than "virtual" to avoid confusion.
…ction)

All options in @column() define a real column except for embedded. This adds an additional
@Embedded decorator for more explicitly defining embeddeds while still maintaining the
option in @column().
* Introduces new type to describe the fields parameter of an @Index or @unique.
* Allows (type) => [type.column] syntax inside embedded
…yMetadatas()

* Removes unnecessary array pushing/slicing
* Matches buildMigrations()/buildSubscribers() more closely
…elete

* Rename ModificationQueryBuilder to AbstractModifyQueryBuilder (UPDATE/DELETE)

* New AbstractPersistQueryBuilder for INSERT/UPDATE/DELETE
    * Common code for RETURNING expressions
    * Common code for INSERT/UPDATE value expressions

* Move UpdateDateColumn, VersionDateColumn, etc to value expression calculation
    * Matches InsertQueryBuilder behavior
refactor: driver: Move database specific EntityManagers to driver

Accidentally ammended during rebase
…lags and generators

Introduces `DriverConfig` and `DriverQueryGenerators` interfaces that include various
flags that drivers can use to customize behavior of the built in functionality. This
decouples these built in features from the drivers themselves, moving towards the
possibility of having each driver in a separate package.
* InsertQueryBuilder.orUpdate() now takes 3 parameters, overwrite columns, column values and conflict condition
    * Overwrite either specific property paths in array or `true` to overwrite all
    * Values same as UPDATE query values, replaces overwritten columns
    * Conflict see onConflict()

* InsertQueryBuilder.orIgnore() now takes 2 parameters, ignore and conflict condition
    * Ignore boolean whether to add IGNORE / DO NOTHING, no longer accepts statement
    * Conflict see onConflict()

* InsertQueryBuilder.onConflict() now accepts either raw expression or property paths array
    * For Postgres/Sqlite ON CONFLICT ...

* EntityManager.upsert(), Repository.upsert(), BaseEntity.upsert()

Fixes: typeorm#1090
@nebkat nebkat force-pushed the expression-builder branch from 67d196a to fc0ce38 Compare February 27, 2021 02:13
@michaelbromley
Copy link
Copy Markdown
Member

Just chiming in here to say that this looks really great and would solve some particularly tricky issues I am dealing with. Has any decision been made about whether this will go in next or master?

If there is any outside help you need which might expedite the merging of this work, please let me know!

@pleerock
Copy link
Copy Markdown
Member

@michaelbromley I'm currently experimenting on how I can make it completely type-safe on my own branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants