Skip to content
This repository was archived by the owner on Mar 9, 2019. It is now read-only.
This repository was archived by the owner on Mar 9, 2019. It is now read-only.

ext3/ext4 is not fully POSIX, to be safe there, need to fsync after file size changes #284

@tv42

Description

@tv42

@cespare pointed out this conversation to me: http://www.openldap.org/lists/openldap-devel/201411/msg00000.html (I've seen the talk before, but didn't pay enough attention).

Reading http://linux.die.net/man/2/fdatasync says (my emphasis) "fdatasync() [...] does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. [...] On the other hand, a change to the file size (st_size, as made by say ftruncate(2)), would require a metadata flush."

Quoting the paper (my emphasis): "On ext3 with the default “ordered” journaling mode, the file data is forced directly out to the main file system prior to its metadata being committed to the journal. This is why we observe the journaling of the length update (op#399, 400, and 402) after the file data updates(op#342–398)." "[...] means fdatasync on ext3 does not wait for the completion of journaling (similar behavior has been observed on ext4)."

Currently, bolt seems to rely on individual page writes increasing the st_size of the file; there's no file size change where it actually notices it needs more space:

bolt/db.go

Line 554 in 15a58b0

// Resize mmap() if we're at the end.

So, my takeaway from this is, ext3, and likely ext4, have a bug / "design tradeoff". If bolt wants to be pragmatic it should probably accommodate for that bug.

I think it would be enough to do something like this:

    if minsz >= db.datasz {
        if err := db.file.Truncate(minsz); err != nil {
            return nil, fmt.Errorf("file resize error: %s", err)
        }
        if err := db.file.Sync(); err != nil {
            return nil, fmt.Errorf("file sync error: %s", err)
        }
        if err := db.mmap(minsz); err != nil {
            return nil, fmt.Errorf("mmap allocate error: %s", err)
        }
    }

Doing the resize once, and not page-by-page when writing out dirty pages, might even be more efficient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions