Skip to content

Lens formula exploration#88997

Closed
flash1293 wants to merge 1 commit intoelastic:masterfrom
flash1293:lens/math-operation
Closed

Lens formula exploration#88997
flash1293 wants to merge 1 commit intoelastic:masterfrom
flash1293:lens/math-operation

Conversation

@flash1293
Copy link
Copy Markdown
Contributor

@flash1293 flash1293 commented Jan 21, 2021

Quick and dirty exploration of the Lens formula concept

Screenshot 2021-01-21 at 17 59 04

Parts of this PR:

  • Add a new type of operation managedReference (should be selfManagedReference maybe). It behaves similar to a fullReference in the sense it specifies a list of referenced columns which are tied to it, but it doesn't specify meta data for accepted reference types because it doesn't use the global logic for matching operations and references (as the user specifies them manually and there's no automatic switching). It's a mixture of the existing "none" and "fullReference" operation types
  • There are two new operations: Formula and math. Formula is the top level operation which allows the user to enter a formula. It contains the parsing logic for turning the tinymath string into a series of operations. Math is the operation executing row-wise math using a tinymath string. The formula operation creates its own referenced operations itself and adds them to the state.
  • Short description of how tinymath string is turned into a tree of referenced operations:
    • Parse the tinymath string into an ast
    • Walk the ast from the top
    • If a node with an unknown function is encountered, assume it's a math operation (like subtract), and recurse into the args
    • If a node with a function name matching an operation name is encountered, look up the operation. If it's a field based operation, create a new column, add it to a global column list, and use the first argument of this function node to look up the field from the index pattern. Then, replace the node in the ast with the name of the column (will be turned into a variable lookup later on)
    • If it's a fullReference based operation, create a new column, add it to the global column list, then walk the first argument of the function node as a new root. Once it returns, replace the argument node with the name of the created column

Example

The following string derivative(sum(bytes) + 50) - min(bytes) is turned into the following operations:

  • col: formula, references: col1
  • col1: math, tinymath: subtract(col2,col3) references: col2,col3
  • col2: derivative, references: col4
  • col3: min, field: bytes
  • col4: math, tinymath: add(col5,50) references: col5
  • col5: sum, field: bytes

To turn these columns into an expression, the following is done:

  • Separate field based columns and reference based columns
  • Add all field based columns to the esaggs call
  • Sort the reference based columns using topological sorting (start with the ones without references, then walk up)
  • Add them one after the other to the expression
esaggs {aggSum field="bytes" id="col5"} {aggMin field="bytes" id="col3"}
| mapColumn id="col4" {math "add(col5,50)"}
| derivative outputId="col2" inputId="col4"
| mapColumn id="col1" {math "subtract(col2,col3)"}
| mapColumn id="col" {math "col1"}

Things to look into:

  • Parameters - meta data on the operation definition on how to map additional parameters (like for percentile)
  • Error handling - what happens if thing can't get mapped?
  • Count - field should be implicit
  • Field displayNames - how to handle renames?
  • Use previous column and turn it into expression (like "Moving average of sum of bytes" turns into forumula moving_average(sum(bytes))

Notes

The math function is a little annoying because it throws on null . As we need to handle empty buckets all the time I think we need to tweak the behavior a bit to just let the null bubble up
I think we need a new type of operation - I called it "managedReference". It sometimes behaves like a "fullReference" operation, sometimes like a "none" operation
The validation we have already kicks in, but it's a little confusing because it references "hidden" dimensions. I think we can change the validation logic to follow references to the top level metric and report errors on that
Not sure what to do about suggestions - just treat the formula as one existing dimension and everything else just works as it does now? One thing we could do is to split out parts of the formula as separate series (like an xy chart with max(bytes) - min(bytes) suggests an xy chart with two series, max(bytes) and min(bytes) ). We can also do that later

[skip ci]

@dej611 dej611 mentioned this pull request Jan 27, 2021
30 tasks
@flash1293
Copy link
Copy Markdown
Contributor Author

Superseded by #89457

@flash1293 flash1293 closed this Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant