Skip to content

os/bluestore: fix no metadata update on truncate+fsync #45883

Merged
yuriw merged 2 commits intoceph:mainfrom
ifed01:wip-ifed-fix-bluefs-truncate
Sep 6, 2022
Merged

os/bluestore: fix no metadata update on truncate+fsync #45883
yuriw merged 2 commits intoceph:mainfrom
ifed01:wip-ifed-fix-bluefs-truncate

Conversation

@ifed01
Copy link
Contributor

@ifed01 ifed01 commented Apr 12, 2022

Fixes: https://tracker.ceph.com/issues/55307

Although met this during new WAL development only IMO this is potentially dangerous for NCB stuff as well.

Signed-off-by: Igor Fedotov igor.fedotov@croit.io

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@benhanokh
Copy link
Contributor

Igor,
I don't understand the role of the is_dirty flag.
It is only set from BlueFS->flush_xxx() and then cleared by BlueFs->fsync() after committing data to the disk.

It is not being set on BlueFS->preallocate() BlueFS->open_for_write() File->append() or any other operation.

Is the semantics that a file become dirty only after flush()?
If so, maybe we should require a flush operation after truncate() ?

@ifed01
Copy link
Contributor Author

ifed01 commented Apr 15, 2022

jenkins test make check

@ifed01 ifed01 force-pushed the wip-ifed-fix-bluefs-truncate branch from f528825 to e3777ba Compare April 15, 2022 12:30
@ifed01
Copy link
Contributor Author

ifed01 commented Apr 15, 2022

Igor, I don't understand the role of the is_dirty flag. It is only set from BlueFS->flush_xxx() and then cleared by BlueFs->fsync() after committing data to the disk.

It is not being set on BlueFS->preallocate() BlueFS->open_for_write() File->append() or any other operation.

Is the semantics that a file become dirty only after flush()? If so, maybe we should require a flush operation after truncate() ?

IIUC the original meaning of the flag was to note a file that has got some content flushed to disk but not yet synced (i.e. metadata isn't in the log yet).
In that respect truncate op looks like a kind of "negative writing/appending" - we just "updated" some file content and want to sync metadata with that change.. Which didn't happen before the patch as fsync relies on file's is_dirty flag for that.
Perhaps we might want to check if other pure metadata-related operations (e.g. delete or touch) are handled properly in this respect - whether they get metadata synchronization on fsync call if no data has been updated as well.
But that's rather out of scope of this commit.

@ifed01
Copy link
Contributor Author

ifed01 commented Apr 16, 2022

jenkins test make check

@djgalloway djgalloway changed the base branch from master to main May 25, 2022 20:00
@ifed01
Copy link
Contributor Author

ifed01 commented Jul 28, 2022

@aclamk - ping

h->file->fnode.size = offset;
h->file->is_dirty = true;
vselector->add_usage(h->file->vselector_hint, h->file->fnode.size);
log.t.op_file_update_inc(h->file->fnode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It unfortunate we have now both direct update of log.t and signaling is_dirty.
How about:

{
  std::lock_guard ll(log.lock);
  vselector->sub_usage(h->file->vselector_hint, h->file->fnode.size);
  h->file->fnode.size = offset;
  vselector->add_usage(h->file->vselector_hint, h->file->fnode.size);
  log.t.op_file_update_inc(h->file->fnode);
  }
  _flush_and_sync_log_LD(0);

This way we would be in-sync right after truncate(), without need to fsync().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget it.
I just realized, that sequence truncate(h); fsync(h); should give us proper results even if we ship sync_metadata() or umount().

@ifed01
Copy link
Contributor Author

ifed01 commented Aug 2, 2022

jenkins test make check arm64

@ifed01 ifed01 added the needs-qa label Aug 2, 2022
@aclamk
Copy link
Contributor

aclamk commented Aug 3, 2022

@ifed I checked that the same problem occurs in pacific. Do we want to backport fix to octopus/pacifc?

@ifed01
Copy link
Contributor Author

ifed01 commented Aug 3, 2022

@ifed I checked that the same problem occurs in pacific. Do we want to backport fix to octopus/pacifc?

Apparently "no" for Octopus and "may be" for Pacific. Honestly I haven't seen that issue under regular usage - I only faced it when developing that new BlueWAL functionality...

@ifed01
Copy link
Contributor Author

ifed01 commented Aug 3, 2022

jenkins test signing

@github-actions
Copy link

github-actions bot commented Aug 3, 2022

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

ifed01 added 2 commits August 12, 2022 13:59
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
Fixes: https://tracker.ceph.com/issues/55307
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
@ifed01 ifed01 force-pushed the wip-ifed-fix-bluefs-truncate branch from e3777ba to 73a7e92 Compare August 14, 2022 07:46
@ifed01 ifed01 requested a review from a team as a code owner August 14, 2022 07:46
@ifed01
Copy link
Contributor Author

ifed01 commented Aug 14, 2022

jenkins test make check

1 similar comment
@ifed01
Copy link
Contributor Author

ifed01 commented Aug 15, 2022

jenkins test make check

@ifed01
Copy link
Contributor Author

ifed01 commented Aug 23, 2022

jenkins test make check

@sseshasa
Copy link
Contributor

sseshasa commented Sep 5, 2022

Teuthology Test Result
http://pulpito.front.sepia.ceph.com/?branch=wip-yuri5-testing-2022-08-18-0812
Summary: 294/319 Passed, 22 Failed, 3 dead jobs

Unrelated Failures
The following are the unrelated failures for which trackers already exist:

  1. https://tracker.ceph.com/issues/57368 - The CustomResourceDefinition "installations.operator.tigera.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
  2. https://tracker.ceph.com/issues/57122 - test failure: rados:singleton-nomsgr librados_hello_world - Ceph - RADOS
  3. https://tracker.ceph.com/issues/55853 - cls_rgw test failures
  4. https://tracker.ceph.com/issues/57165 - expected valgrind issues and found none
  5. https://tracker.ceph.com/issues/57270 - stderr E: The repository 'https://download.ceph.com/debian-octopus jammy Release' does not have a Release file
  6. https://tracker.ceph.com/issues/52321 - qa/tasks/rook times out: 'check osd count' reached maximum tries (90) after waiting for 900 seconds - Ceph - Orchestrator
  7. https://tracker.ceph.com/issues/56850 - src/mon/PaxosService.cc: 193: FAILED ceph_assert(have_pending)

@yuriw
Copy link
Contributor

yuriw commented Sep 5, 2022

jenkins test this please

@yuriw yuriw merged commit 747d1d3 into ceph:main Sep 6, 2022
@ifed01 ifed01 deleted the wip-ifed-fix-bluefs-truncate branch September 7, 2022 11:52
ifed01 added a commit to ifed01/ceph that referenced this pull request Jun 27, 2023
This effectively enables having 4K allocation units for BlueFS.
But it doesn't turn it on by default for the sake of performance.
Using main device which lacks enough free large continuous extents
might do the trick though.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 001b08d)

 Conflicts:
	src/os/bluestore/BlueFS.cc
(trivial, no ceph#39871)
	src/os/bluestore/BlueStore.cc
(trivial, no commits for zoned support)
	src/test/objectstore/test_bluefs.cc
(trivial, no ceph#45883)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants