Skip to content

Add a way to show the contents of the ListFilesCache in datafusion-cli #19055

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

As we roll out the ListingFileCache from @BlakeOrth in #18855 it would be very helpful to be able to see its contents to debug any potential issues we see

@nuno-faria made a really nice feature to view the contents of the cache: metadata_cache()

For example:

> select * from metadata_cache();
+---------------------------------------------------+---------------------+-----------------+--------------------------------------+---------+---------------------+------+------------------+
| path                                              | file_modified       | file_size_bytes | e_tag                                | version | metadata_size_bytes | hits | extra            |
+---------------------------------------------------+---------------------+-----------------+--------------------------------------+---------+---------------------+------+------------------+
| hits_compatible/athena_partitioned/hits_1.parquet | 2022-07-03T15:33:57 | 174965044       | "1f5da68e097309811a675c849491ac48-9" | NULL    | 165128              | 0    | page_index=false |
+---------------------------------------------------+---------------------+-----------------+--------------------------------------+---------+---------------------+------+------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.

Describe the solution you'd like

I would like a table function similar to metadata_cache() for the listing files cache. Since each entry is a Vec<ObjectMeta> one option would be to flatten the entries so there is one entry per ObjectMeta stored:

Someting like

select * from list_files_cache();
path file_modified file_size_bytes e_tag version metadata_size_bytes expires
/foo/bar 2022-07-03T15:33:57 1234 ... ... 132 NULL
/foo/baz 2022-07-03T15:33:57 5678 ... ... 3112 2026-07-03T15:33:57
... ... ... ... ... ... ...

Where metadata_size_bytes shows the size of the statistics, in bytes and expires shows when the entry expires

This would mean that a single ListFilesEntry object is displayed as multiple rows.

It would also mean we would have to find some way to represent a ListFilesEntry that had no entries (e.g. metas is an empty Vec). Perhaps it could have a row entirely of nulls:

path file_modified file_size_bytes e_tag version metadata_size_bytes expires
NULL NULL NULL NULL NULL NULL 2026-07-03T15:33:57

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions