Skip to content

Adds nexus endpoint for listing timeseries schema#518

Merged
bnaecker merged 6 commits into
mainfrom
oximeter/paginate-timeseries-output
Dec 21, 2021
Merged

Adds nexus endpoint for listing timeseries schema#518
bnaecker merged 6 commits into
mainfrom
oximeter/paginate-timeseries-output

Conversation

@bnaecker

Copy link
Copy Markdown
Collaborator
  • Adds pagination to the oximeter_db::Client API. This includes
    timeseries schema and actual timeseries. Adds unit tests for both.
  • Adds a timeseries database configuration table to Nexus's
    configuration objects and files. This includes the timeseries database
    address for now.
  • Adds an oximeter_db::Client to the main Nexus object.
  • Adds a public HTTP endpoint for listing timeseries schema, using the
    paginated API of the client.

- Adds pagination to the `oximeter_db::Client` API. This includes
  timeseries schema and actual timeseries. Adds unit tests for both.
- Adds a timeseries database configuration table to Nexus's
  configuration objects and files. This includes the timeseries database
  address for now.
- Adds an `oximeter_db::Client` to the main Nexus object.
- Adds a public HTTP endpoint for listing timeseries schema, using the
  paginated API of the client.
@bnaecker bnaecker requested a review from davepacheco December 15, 2021 18:37
@bnaecker

Copy link
Copy Markdown
Collaborator Author

A couple of things here.

First, there's a good bit of noise in the diff, because I moved types from oximeter_db::model into the top-level library. Those types weren't really used in the public API prior. Nexus now consumes them, and they're not exactly part of the ClickHouse database model, so I've moved them to the crate root. This includes the TimeseriesSchema, Timeseries, FieldSchema, and FieldSource types.

Second, this implements pagination on both the schema and the actual data. There's a unit test on the latter, but it's not part of the Nexus API yet. The main reason for that is that the querying model I'm using, where one describes a list of field selectors like field_name==field_value, doesn't fit into Dropshot's pagination types. Specifically, you can't have a Vec in either the PageSelector or ScanParams types.

One solution would be to write a specific endpoint for each timeseries, and then write a static type that describes the query parameters (the fields) for that timeseries. This might work for a small set of important timeseries (e.g., instance metrics), but it is not feasible in general. Timeseries can be added dynamically by metric producers, so it's not possible to write out all the schema at compile time.

So my current plan is to use a string as the main query parameter, similar to how tools like Prometheus work. This string will implement a small query language, where one can express the field selectors, time ranges, and potentially operations like selecting raw data or aggregating data by specified fields. I'm really trying to limit the scope of this, since writing a full SQL-like language is not something I want to tackle right now, but it does seem like the best approach given the query model we'd like to use. Sticking to operations like select and a small number of aggregations (sum, avg, min, max, maybe percentile) will support a lot of tools, including visualization in the console and things like alerts.

Comment thread oximeter/db/Cargo.toml Outdated
Comment thread oximeter/db/src/lib.rs
Comment thread oximeter/db/src/model.rs
Comment thread oximeter/db/src/lib.rs
@davepacheco

Copy link
Copy Markdown
Collaborator

Sorry if you've explained this and I already forgot...but why not use limit/offset for the timeseries data pagination as well?

Comment thread nexus/tests/config.test.toml Outdated
Comment thread oximeter/db/src/lib.rs
Comment thread nexus/src/nexus.rs
@david-crespo

david-crespo commented Dec 16, 2021

Copy link
Copy Markdown
Contributor

So my current plan is to use a string as the main query parameter, similar to how tools like Prometheus work. This string will implement a small query language, where one can express the field selectors, time ranges, and potentially operations like selecting raw data or aggregating data by specified fields. I'm really trying to limit the scope of this, since writing a full SQL-like language is not something I want to tackle right now, but it does seem like the best approach given the query model we'd like to use. Sticking to operations like select and a small number of aggregations (sum, avg, min, max, maybe percentile) will support a lot of tools, including visualization in the console and things like alerts.

This sounds very powerful. I think I would like to have a short meeting sometime to help Justin and I understand the pagination limitation and the goals of the query language.

It might make sense to make that a POST endpoint with the query in the body. It's weird because it's clearly retrieving a resource, but this is how things like GraphQL do it, only because you can't put a body on a GET. This is why they're working on a new HTTP method for this, QUERY.


Edit: Added topic to the CP huddle agenda for tomorrow.

@bnaecker

Copy link
Copy Markdown
Collaborator Author

@davepacheco @david-crespo Let me know if there's anything else on this PR, just so I can take it off the queue if it's ready. Thanks!

@bnaecker bnaecker force-pushed the oximeter/paginate-timeseries-output branch from b9e49db to 6011c93 Compare December 17, 2021 23:32
@bnaecker bnaecker merged commit 186a94a into main Dec 21, 2021
@bnaecker bnaecker deleted the oximeter/paginate-timeseries-output branch December 21, 2021 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants