-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Description
Enhancement
As metadata scales up, listing all entries can require a large number of requests and connections—especially on HTTP/1.1 storage backends (such as some MinIO setups), where each request opens a new TCP connection. This leads to heavy connection churn, increased load on clients and servers, and a higher risk of hitting connection limits, slowdowns, or throttling.
While listing can become slow under these conditions, the primary performance bottleneck often shifts to downloading all metadata files locally after listing.
Current Challenges:
-
Listing Speed
- When the metadata volume is very large, listing all metadata (e.g., with S3 or MinIO) becomes increasingly slow, especially due to connection overhead and limits with HTTP/1.1.
- However, listing is only part of the entire process. -
Download Bottleneck
- After listing, all metadata files must be downloaded locally before further processing.
- As metadata accumulates over time, the amount of data to be downloaded grows, and the download phase becomes the main bottleneck in terms of both time and resource consumption.
- Large numbers of small metadata files can result in significant network and disk overhead.