Skip to content

Shared Cache Locking#9258

Merged
andrewrk merged 6 commits intomasterfrom
shared-cache-locking
Jun 29, 2021
Merged

Shared Cache Locking#9258
andrewrk merged 6 commits intomasterfrom
shared-cache-locking

Conversation

@andrewrk
Copy link
Copy Markdown
Member

@andrewrk andrewrk commented Jun 28, 2021

The cache system now drops from exclusive locks to shared locks after the artifact is built. Additionally, when checking the cache, if an exclusive lock cannot be obtained, a shared lock will be obtained instead. Together, this solves a design flaw in the caching system that was causing deadlocks, and allows multiple processes to share the same read-only build artifacts, while still allowing a cache garbage collection utility to avoid deleting files which are being actively used.

This required new std lib functionality: std.fs.File.setLock. This PR cannot be merged until the Windows implementation is completed. This may require switching strategies to using NtLockFile and NtUnlockFile rather than relying on ShareAccess flags and manual polling.

I was able to reproduce the deadlock problems reported in #9139 and observed this change fixed the problem. It also allowed me to delete another workaround to a deadlock when supplying the same source file multiple times.

Closes #7596
Fixes #9139
Fixes #9187

@andrewrk andrewrk added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Jun 28, 2021
@andrewrk andrewrk force-pushed the shared-cache-locking branch from 271eac8 to ba9b554 Compare June 28, 2021 23:53
@andrewrk
Copy link
Copy Markdown
Member Author

This is ready to go as soon as it passes tests. Code review welcome, especially from those knowledgeable about Windows File System APIs.

@andrewrk andrewrk added standard library This issue involves writing Zig code for the standard library. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Jun 29, 2021
.Shared => false,
.Exclusive => true,
};
try w.LockFile(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're losing the operation atomicity by acquiring the lock after opening the file.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with Linux; addressed this concern by updating doc comments in e32530b.

};
var io: w.IO_STATUS_BLOCK = undefined;
const range_off: w.LARGE_INTEGER = 0;
const range_len: w.LARGE_INTEGER = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lock the whole maxInt(LARGE_INTEGER) range instead of a single byte only.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.

Based on the docs of NtUnlockFile:

The ZwUnlockFile routine takes a range of bytes as specified by the ByteOffset and Length arguments. This range must be identical to a range of bytes in the file that was previously locked with a single call to the ZwUnlockFile routine. It is not possible to unlock two previously locked adjacent ranges with a single call to ZwUnlockFile. It is also not possible to unlock part of a range that was previously locked with a single call to the ZwUnlockFile routine.

As far as I can tell, these act like keys in a table, rather than actual spans of bytes. So the only consideration here is to pick a range that is self-consistent as well as sensible for third party code to coordinate with if it so chooses. Given that this PR does not add documentation that guarantees the implementation on Windows, I don't think it matters what range we pick.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, these act like keys in a table, rather than actual spans of bytes.

The kernel lets you lock lock/unlock specific (and disjoint) sets of bytes, not unlike the Posix locking facilities, so the suggestion was to lock the whole span [0..maxInt(w.LARGE_INTEGER)] to ensure the whole file is locked.

Picking a consistent offset/length pair to lock/unlock is fine anyway fine for the purpose of ensuring the locking is effective when using the Zig stdlib.

return .{ .context = file };
}

const range_off: windows.LARGE_INTEGER = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those two fields are still needed you can slim down the structure by making them void on platforms that are not windows.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this suggestion - did you mistake these global variables for fields?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, taking them by pointer is what confused me.

null,
null,
&io_status_block,
&range_off,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this missing a file.? As suggested above why not lock/unlock the whole maxInt range?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see the confusion now. The range is arbitrary and hard-coded; no need to store any data in fields. If you know something I don't know about why maxint would be any different than 1, please share.

andrewrk added 6 commits June 29, 2021 14:25
Windows implementation is still missing.
This logic was a workaround to prevent cache deadlocks which happened
from always using exclusive file locks. Now that the Cache system
supports sharing cached artifacts, this workaround is no longer needed.

Closes #7596
This modifies the lock semantics from using AccessMode to using
NtLockFile/NtUnlockFile.

This is a breaking change.
Update to accomodate the differences in Windows, which is now advisory
file locking, and include details about which operating systems have
atomic locking flags.
@andrewrk andrewrk force-pushed the shared-cache-locking branch from 8220816 to e32530b Compare June 29, 2021 21:27
@andrewrk andrewrk merged commit 37fbf5b into master Jun 29, 2021
@andrewrk andrewrk deleted the shared-cache-locking branch June 29, 2021 21:30
zeramorphic added a commit to quill-lang/quill-v1 that referenced this pull request Jul 1, 2021
Regression fixed by ziglang/zig#9258

Signed-off-by: thirdsgames <thirdsgames2018@gmail.com>
andrewrk added a commit that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Implementing this issue could cause existing code to no longer compile or have different behavior. frontend Tokenization, parsing, AstGen, Sema, and Liveness. standard library This issue involves writing Zig code for the standard library.

Projects

None yet

2 participants