Skip to content

Conversation

@mildbyte
Copy link
Contributor

Add a bunch of various timing / sizing metrics to the object store cache:

  • Bytes read from disk and from memory for cache hits
  • Bytes written to disk
  • Bytes requested in get_range for DataFusion
  • Success/error rate of all outbound object store calls and time-to-first-byte
  • Cache consistency "warnings" (our mitigations against race conditions)

Limitations:

  • Doesn't instrument the sizes returned by non-get-range methods (those return streams and we don't wrap them in an instrumenting wrapper)
  • Because of this, some timings on certain methods are meaningless (e.g. list seems to return almost immediately and only loads data when the stream is polled) and should potentially be removed

This is based on top of #518 which has some DataFusion MemoryManager metrics

Sample metrics scrape from a TPC-H/DS run:

# HELP seafowl_object_store_cache_get_range_requested_bytes_total Bytes requested in get_range calls by DataFusion before caching
# TYPE seafowl_object_store_cache_get_range_requested_bytes_total counter
seafowl_object_store_cache_get_range_requested_bytes_total 1653431887

# HELP seafowl_object_store_requests_total Number of calls to the actual object store
# TYPE seafowl_object_store_requests_total counter
seafowl_object_store_requests_total{operation="get_range",status="success"} 73
seafowl_object_store_requests_total{operation="list",status="unknown"} 4587
seafowl_object_store_requests_total{operation="get",status="success"} 13761

# HELP seafowl_object_store_cache_hit_read_bytes_total Bytes read from the object store cache (hit)
# TYPE seafowl_object_store_cache_hit_read_bytes_total counter
seafowl_object_store_cache_hit_read_bytes_total{location="disk"} 3768333104
seafowl_object_store_cache_hit_read_bytes_total{location="memory"} 2763647

# HELP seafowl_object_store_cache_get_range_requests_total Number of get_range requests from DataFusion before caching
# TYPE seafowl_object_store_cache_get_range_requests_total counter
seafowl_object_store_cache_get_range_requests_total 23932

# HELP seafowl_object_store_cache_evicted_bytes Bytes evicted from cache
# TYPE seafowl_object_store_cache_evicted_bytes counter
seafowl_object_store_cache_evicted_bytes 0

# HELP seafowl_object_store_cache_disk_written_bytes_total Bytes written to on-disk cache
# TYPE seafowl_object_store_cache_disk_written_bytes_total counter
seafowl_object_store_cache_disk_written_bytes_total 13268070

# HELP grpc_requests Counter tracking gRPC request statistics
# TYPE grpc_requests counter
grpc_requests{path="/arrow.flight.protocol.FlightService/GetFlightInfo",status="13"} 373
grpc_requests{path="/arrow.flight.protocol.FlightService/GetFlightInfo",status="0"} 4661
grpc_requests{path="/arrow.flight.protocol.FlightService/DoGet",status="0"} 4661

# HELP seafowl_object_store_cache_warnings_total Number of times various cache race conditions were discovered (read-after-evict, double-write, double-delete)
# TYPE seafowl_object_store_cache_warnings_total counter
seafowl_object_store_cache_warnings_total{error="deletion"} 0
seafowl_object_store_cache_warnings_total{error="double_write"} 0
seafowl_object_store_cache_warnings_total{error="redownload"} 0

# HELP seafowl_object_store_cache_get_range_read_bytes_total Bytes downloaded from the upstream object store for get_range cache misses
# TYPE seafowl_object_store_cache_get_range_read_bytes_total counter
seafowl_object_store_cache_get_range_read_bytes_total 15893361

# HELP seafowl_object_store_cache_usage_bytes Approximate current occupation of the cache
# TYPE seafowl_object_store_cache_usage_bytes gauge
seafowl_object_store_cache_usage_bytes 13267128

# HELP seafowl_object_store_cache_capacity_bytes Total cache capacity
# TYPE seafowl_object_store_cache_capacity_bytes gauge
seafowl_object_store_cache_capacity_bytes 268435456

# HELP seafowl_object_store_request_latency_seconds Time-to-first-byte of various requests to the actual object store
# TYPE seafowl_object_store_request_latency_seconds summary
seafowl_object_store_request_latency_seconds{operation="get_range",status="success",quantile="0"} 0
seafowl_object_store_request_latency_seconds{operation="get_range",status="success",quantile="0.5"} 0
seafowl_object_store_request_latency_seconds{operation="get_range",status="success",quantile="0.9"} 0
seafowl_object_store_request_latency_seconds{operation="get_range",status="success",quantile="0.95"} 0
seafowl_object_store_request_latency_seconds{operation="get_range",status="success",quantile="0.99"} 0
seafowl_object_store_request_latency_seconds{operation="get_range",status="success",quantile="0.999"} 0
seafowl_object_store_request_latency_seconds{operation="get_range",status="success",quantile="1"} 0
seafowl_object_store_request_latency_seconds_sum{operation="get_range",status="success"} 0.23178057900000007
seafowl_object_store_request_latency_seconds_count{operation="get_range",status="success"} 73
seafowl_object_store_request_latency_seconds{operation="get",status="success",quantile="0"} 0.000781908
seafowl_object_store_request_latency_seconds{operation="get",status="success",quantile="0.5"} 0.0007819654274285743
seafowl_object_store_request_latency_seconds{operation="get",status="success",quantile="0.9"} 0.0007819654274285743
seafowl_object_store_request_latency_seconds{operation="get",status="success",quantile="0.95"} 0.0007819654274285743
seafowl_object_store_request_latency_seconds{operation="get",status="success",quantile="0.99"} 0.0007819654274285743
seafowl_object_store_request_latency_seconds{operation="get",status="success",quantile="0.999"} 0.0007819654274285743
seafowl_object_store_request_latency_seconds{operation="get",status="success",quantile="1"} 0.000781908
seafowl_object_store_request_latency_seconds_sum{operation="get",status="success"} 11.07166186999995
seafowl_object_store_request_latency_seconds_count{operation="get",status="success"} 13761
seafowl_object_store_request_latency_seconds{operation="list",status="unknown",quantile="0"} 0.000001859
seafowl_object_store_request_latency_seconds{operation="list",status="unknown",quantile="0.5"} 0.0000033102062906382345
seafowl_object_store_request_latency_seconds{operation="list",status="unknown",quantile="0.9"} 0.000004868619916292589
seafowl_object_store_request_latency_seconds{operation="list",status="unknown",quantile="0.95"} 0.000005156254853472572
seafowl_object_store_request_latency_seconds{operation="list",status="unknown",quantile="0.99"} 0.000006454712686792021
seafowl_object_store_request_latency_seconds{operation="list",status="unknown",quantile="0.999"} 0.000006454712686792021
seafowl_object_store_request_latency_seconds{operation="list",status="unknown",quantile="1"} 0.000012291
seafowl_object_store_request_latency_seconds_sum{operation="list",status="unknown"} 0.013296752000000018
seafowl_object_store_request_latency_seconds_count{operation="list",status="unknown"} 4587

# HELP seafowl_object_store_cache_disk_latency_seconds Time spent waiting for disk cache read / write
# TYPE seafowl_object_store_cache_disk_latency_seconds summary
seafowl_object_store_cache_disk_latency_seconds{operation="read",quantile="0"} 0.000084778
seafowl_object_store_cache_disk_latency_seconds{operation="read",quantile="0.5"} 0.00018950407426943935
seafowl_object_store_cache_disk_latency_seconds{operation="read",quantile="0.9"} 0.0002668830441480235
seafowl_object_store_cache_disk_latency_seconds{operation="read",quantile="0.95"} 0.00032911608360367857
seafowl_object_store_cache_disk_latency_seconds{operation="read",quantile="0.99"} 0.0004255619484064256
seafowl_object_store_cache_disk_latency_seconds{operation="read",quantile="0.999"} 0.0004255619484064256
seafowl_object_store_cache_disk_latency_seconds{operation="read",quantile="1"} 0.000481224
seafowl_object_store_cache_disk_latency_seconds_sum{operation="read"} 23.07007866300003
seafowl_object_store_cache_disk_latency_seconds_count{operation="read"} 29875
seafowl_object_store_cache_disk_latency_seconds{operation="write",quantile="0"} 0
seafowl_object_store_cache_disk_latency_seconds{operation="write",quantile="0.5"} 0
seafowl_object_store_cache_disk_latency_seconds{operation="write",quantile="0.9"} 0
seafowl_object_store_cache_disk_latency_seconds{operation="write",quantile="0.95"} 0
seafowl_object_store_cache_disk_latency_seconds{operation="write",quantile="0.99"} 0
seafowl_object_store_cache_disk_latency_seconds{operation="write",quantile="0.999"} 0
seafowl_object_store_cache_disk_latency_seconds{operation="write",quantile="1"} 0
seafowl_object_store_cache_disk_latency_seconds_sum{operation="write"} 0.030619615999999995
seafowl_object_store_cache_disk_latency_seconds_count{operation="write"} 81

@mildbyte mildbyte requested a review from gruuya April 23, 2024 11:48
self.inner.abort_multipart(location, multipart_id).await
}

async fn get(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(get is implemented in terms of get_opts nowadays, so just doing get_opts is sufficient)

) -> BoxStream<'_, object_store::Result<ObjectMeta>> {
self.inner.list(prefix)
let start = Instant::now();
let result = self.inner.list(prefix);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. the p50 latency here is 3.31 microseconds, surely it's not connecting to Minio this quickly (although who knows)

"Re-downloading cache value for {key:?}: {err}"
);

self.metrics.redownload_errors.increment(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"redownload_errors" doesn't seem appropriate here, more like "cache_file_missing", "file_read_error" or "eviction_race".

Comment on lines 573 to 577
warn!("Invalidating cache entry for {key:?}; failed writing to a file: {err}");
cache.invalidate(&key).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use a counter for these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By these I mean counter for cache file write errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, good idea, I missed it somehow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mildbyte mildbyte force-pushed the feature/memory-manager-observability branch from 213dbae to 5445458 Compare April 24, 2024 05:04
@mildbyte mildbyte force-pushed the feature/object-store-creation-reuse branch from 20264b4 to aa8d012 Compare April 24, 2024 05:25
Base automatically changed from feature/memory-manager-observability to main April 24, 2024 05:26
@mildbyte mildbyte force-pushed the feature/object-store-creation-reuse branch from aa8d012 to 28dd821 Compare April 24, 2024 05:26
@mildbyte mildbyte merged commit fa095cf into main Apr 24, 2024
@mildbyte mildbyte deleted the feature/object-store-creation-reuse branch April 24, 2024 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants