Skip to content

news/blog: Pawsey scale testing#408

Merged
Thingee merged 1 commit intoceph:mainfrom
pcuzner:blog-scaletesting-with-pawsey
Jul 18, 2022
Merged

news/blog: Pawsey scale testing#408
Thingee merged 1 commit intoceph:mainfrom
pcuzner:blog-scaletesting-with-pawsey

Conversation

@pcuzner
Copy link
Contributor

@pcuzner pcuzner commented Jul 5, 2022

Blog describing the testing done with Pawsey during the Quincy development cycle.

@pcuzner pcuzner force-pushed the blog-scaletesting-with-pawsey branch from 6a5f01a to 788c3ef Compare July 5, 2022 22:38
@pcuzner pcuzner assigned neha-ojha and ljflores and unassigned neha-ojha and ljflores Jul 5, 2022
@pcuzner pcuzner requested review from ljflores and neha-ojha July 5, 2022 22:41
@pcuzner pcuzner added the ceph.io label Jul 5, 2022
@pcuzner pcuzner marked this pull request as ready for review July 5, 2022 22:41
Copy link
Member

@epuertat epuertat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT! Thanks @pcuzner !

Copy link

@adk3798 adk3798 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good


- `ceph-volume` was enhanced to better handle unexpected disk formats like Atari partitions and GPT based devices
- `cephadm` uses ssh as the control path to each cluster host. During our negative testing, we found that when ssh issues were encountered (bad/missing keys), the resulting information returned to the user was overly verbose. This was fixed in [PR 43880](https://github.com/ceph/ceph/pull/43880).
- Cephadm’s experimental 'agent' feature was used to provide the orchestrator with host, device and daemon state metadata. Testing showed that deployment and data quality were as expected, but the agent generated too much log traffic, making potential troubleshooting problematic. The intent is for the Agent to become a supported feature in the Ceph Reef release.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if you want to include it, but we did actually got rid of some of the worst of this with ceph/ceph#44017

@pcuzner pcuzner force-pushed the blog-scaletesting-with-pawsey branch from 788c3ef to 69a6606 Compare July 6, 2022 23:07
@pcuzner pcuzner added the do not merge Don't merge this PR when this label is assigned label Jul 6, 2022
@pcuzner pcuzner force-pushed the blog-scaletesting-with-pawsey branch from 69a6606 to 91ac975 Compare July 6, 2022 23:15
@pcuzner pcuzner removed the do not merge Don't merge this PR when this label is assigned label Jul 6, 2022
@neha-ojha neha-ojha requested a review from Thingee July 6, 2022 23:33
Copy link
Contributor

@anthonyeleven anthonyeleven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall really terrific. I've made my usual slate of nitpicky requests.


Pawsey provided access to over **200** servers, all connected via 100Gb networking. The server specifications were optimized for a large Object storage use case, and enabled us to create the following Ceph cluster;

- 180 OSD Hosts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above we say >200 hosts, are 20+ of them non-OSD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they were, thats why i said it

- Plan for approximately 512GB of storage, based on the default 15 day retention cycle.

Getting a deeper understanding of the effects of scale on the metrics 'path' was a key finding, and formed the basis for further analysis work that will be covered in a subsequent blog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Getting a/A/
s/work//
s/blog/blog post/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied the suggestions 2 and 3

Copy link
Member

@ljflores ljflores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a "quincy@scale" tag to bind all of these blog posts together. Otherwise, looks great to me!

@neha-ojha
Copy link
Member

Overall really terrific. I've made my usual slate of nitpicky requests.

@anthonyeleven do you think these can be addressed in a later PR? This PR was intended to be merged yesterday https://github.com/ceph/ceph.io/pull/408/files#diff-04d54a1f366be0c6924c573a3aa35605b0ebe2ac9ed1764ed0f6d1b27b8b6e5eR3

Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
@pcuzner pcuzner force-pushed the blog-scaletesting-with-pawsey branch from 91ac975 to 279508c Compare July 13, 2022 04:20
@pcuzner pcuzner requested a review from anthonyeleven July 13, 2022 04:21
anthonyeleven
anthonyeleven previously approved these changes Jul 13, 2022
@anthonyeleven
Copy link
Contributor

Apologies. Reviewed as soon as I got the notification.

@Thingee Thingee merged commit eaaa020 into ceph:main Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants