Skip to content

Add stats for time spent fetching data while searching snapshots#51866

Merged
DaveCTurner merged 12 commits intoelastic:feature/searchable-snapshotsfrom
DaveCTurner:2020-02-04-searchable-snapshots-track-time-spent-fetching-from-blob-store
Feb 24, 2020
Merged

Add stats for time spent fetching data while searching snapshots#51866
DaveCTurner merged 12 commits intoelastic:feature/searchable-snapshotsfrom
DaveCTurner:2020-02-04-searchable-snapshots-track-time-spent-fetching-from-blob-store

Conversation

@DaveCTurner
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner commented Feb 4, 2020

This commit builds on #51637, adding tracking of the total time spent fetching
data from the blob store.

Relates #50999.

This commit builds on elastic#51637, adding tracking of the total time spent fetching
data from the blob store and a linear regression model for these fetches.
@DaveCTurner DaveCTurner added WIP :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Feb 4, 2020
@DaveCTurner DaveCTurner requested review from tlrx and ywelsch February 4, 2020 14:19
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

@DaveCTurner
Copy link
Copy Markdown
Member Author

I'm not sure the linear regression is actually useful here. Since we're tracking stats on a file-by-file basis, essentially every fetch will be the same size, which stops us from building a meaningful model of the fetch time as a function of size. Maybe we just want to track the total took time? Possibly min and max too?

@tlrx
Copy link
Copy Markdown
Member

tlrx commented Feb 6, 2020

I tend to agree with you, unless we implement different range sizes per files (which we could do easily) I'm not sure the linear regression is very useful. Maybe we could reuse the existing counters and track the total/min/max took times as you suggested.

@DaveCTurner DaveCTurner removed the WIP label Feb 24, 2020
Copy link
Copy Markdown
Member Author

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated following discussions on other channels; this is ready for a proper review now.

public Map<String, DirectoryFactory> getDirectoryFactories() {
return Map.of(SearchableSnapshotRepository.SNAPSHOT_DIRECTORY_FACTORY_KEY,
SearchableSnapshotRepository.newDirectoryFactory(repositoriesService::get, cacheService::get));
SearchableSnapshotRepository.newDirectoryFactory(repositoriesService::get, cacheService::get, System::nanoTime));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using System::nanoTime since we need finer resolution than ThreadPool::relativeTimeInNanos offers.

@DaveCTurner DaveCTurner changed the title Add stats for fetch speed as a function of size Add stats for time spent fetching data while searching snapshots Feb 24, 2020
@DaveCTurner DaveCTurner requested a review from tlrx February 24, 2020 13:52
Copy link
Copy Markdown
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks David!

@DaveCTurner DaveCTurner merged commit c9ac57f into elastic:feature/searchable-snapshots Feb 24, 2020
@DaveCTurner DaveCTurner deleted the 2020-02-04-searchable-snapshots-track-time-spent-fetching-from-blob-store branch February 24, 2020 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants