Skip to content

Experiments with distributed compilation#249

Merged
luser merged 53 commits intomozilla:masterfrom
aidanhs:aphs-dist-sccache-1
Aug 14, 2018
Merged

Experiments with distributed compilation#249
luser merged 53 commits intomozilla:masterfrom
aidanhs:aphs-dist-sccache-1

Conversation

@aidanhs
Copy link
Contributor

@aidanhs aidanhs commented May 17, 2018

An initial version that works and can actually be used to at least try it out.

@aidanhs aidanhs force-pushed the aphs-dist-sccache-1 branch from 8b0bb57 to 963947c Compare June 7, 2018 21:36
@aidanhs aidanhs force-pushed the aphs-dist-sccache-1 branch 2 times, most recently from 22355ca to ef230a7 Compare July 30, 2018 22:27
@aidanhs aidanhs force-pushed the aphs-dist-sccache-1 branch from 605f57f to 997c6e8 Compare August 1, 2018 08:38
Cargo.toml Outdated
serde_derive = "1.0"
serde_json = "1.0"
# TODO: publish to crates.io
sccache-dist = { path = "sccache-dist" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how you'd prefer to deal with this - I can publish and update this PR? Or you can do it next time you release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason you're adding this code as a separate crate? It's not really usable standalone, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to split things out into two separate binaries - the sccache-dist binary has a number of things (e.g. rouille) that sccache itself doesn't need to have compiled in.

It also makes the compile a bit more incremental. Are you thinking I could just have it be a separate binary within the same crate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a separate binary within the same crate is what I had envisioned. I get the compile argument (although incremental compilation should help with that nowadays), but managing crates.io releases for two intertwined crates like that is a PITA. (I've dealt with this with another project of mine.)

// With -fprofile-generate line number information is important, so don't use -P.
if !parsed_args.profile_generate {
cmd.arg("-P");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With distributed compilation, line number information is always important. More generally, it's important if you want to use a preprocessed file as input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. This will regress the change in #208. Would it be possible to change things so that when distributed compilation is not enabled we can continue to use this?

Copy link
Contributor Author

@aidanhs aidanhs Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is disappointing, I should've done a bit more digging.

I'm not sure if it's better to say "distributed compilation may happen, your cache is completely invalid (because it's based on non-line-number-annotated source files)", or just preprocess a second time on-demand when sending for remote compilation. Looking at a reasonably large file of C++ code I have easily to hand, it's spending 10-20% of wall-clock time in preprocessing so running twice isn't ideal.

What about: if we might be distributing, do the full preprocess with line numbers but strip out the line numbers with a regexp when considering the cache key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds fine. Since users have to explicitly opt-in to using distributed compilation, I don't think it's unreasonable for them to get this behavior in that case. Running the preprocessor twice seems like too much overhead just for this purpose. We can make a note somewhere if necessary that cache hits using different source paths are not possible with distributed compilation enabled.

fn box_clone(&self) -> Box<CompilerHasher<T>>;
}

#[cfg(not(feature = "dist"))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature is disabled by default, therefore so is distributed compilation

@aidanhs
Copy link
Contributor Author

aidanhs commented Aug 1, 2018

@luser this is ready for review - it adds the basic structure of two binaries for distributed compilation, the way they communicate and the initial support for C, C++ and Linux. Although this unconditionally shuffles some things around (e.g. the generation of compilation command rather than executing them), distributed compilation is hidden behind a feature flag.

Followups coming soon.


use errors::*;

// TODO: possibly shouldn't be public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with TODO comments, but I'd prefer if they either get removed or if you file follow-up issues and put the issue URL in the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of the TODOs will be gone by the final PR.

Ok(hex(m.finish().as_ref()))
}

fn hex(bytes: &[u8]) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something we could pull in from a crate, like the hex crate or similar.

Copy link
Contributor Author

@aidanhs aidanhs Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just stole it from src/util.rs, I can pull in the crate if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, hah! If you move this code into the existing crate then you can just reuse that code, I guess. I'm certainly not going to hold you to a higher standard than the existing code. :)

.truncate(true)
.open("/tmp/toolchain_cache.tar")?;
create(file)?;
// TODO: after, if still exists, remove it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like something that should definitely be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refined caching and scheduling are a slightly later PR.

return Ok(strong_key)
}
debug!("Weak key {} appears to be new", weak_key);
// TODO: don't use this as a poor exclusive lock on this global file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be an exclusive lock? You've already got a mutex around the LRUDiskCache here. If not, using a tempfile would be much nicer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a tempfile (for now)

pub struct ClientToolchainCache {
cache_dir: PathBuf,
cache: Mutex<TcCache>,
// Local machine mapping from 'weak' hashes to strong toolchain hashes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some exposition on what weak and strong hashes are would be helpful. I gather that the strong hash is a SHA512 of the toolchain tar file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the explanation that 'weak' is just what sccache uses as part of the cache key.

@aidanhs aidanhs force-pushed the aphs-dist-sccache-1 branch from 81785e5 to 191a45b Compare August 3, 2018 08:48
@aidanhs aidanhs force-pushed the aphs-dist-sccache-1 branch from 7f36f9c to 0f799cb Compare August 4, 2018 15:51
Some(_) => {
warn!("Scheduler address configured but dist feature disabled, disabling distributed sccache");
Arc::new(dist::NoopClient)
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit has added in light of the new preprocessing rules, to use -P in more cases when it's possible.

@aidanhs aidanhs force-pushed the aphs-dist-sccache-1 branch from 0f799cb to 8fb37aa Compare August 4, 2018 15:54
if !parsed_args.profile_generate {
if !may_dist && !parsed_args.profile_generate {
cmd.arg("-P");
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this back in and explained why it's important for a distributed compile

@aidanhs
Copy link
Contributor Author

aidanhs commented Aug 4, 2018

sccache-dist has been moved back in-crate as a separate binary, enabled by the dist-server feature.

Are you ok to wait for the rest of my PRs for the TODOs to be resolved?

@luser
Copy link
Contributor

luser commented Aug 6, 2018

Are you ok to wait for the rest of my PRs for the TODOs to be resolved?

Yes. The separate crate part was the only thing I was really hung up on. I'll make sure I've given the rest of the code here a once-over today and we should be good to merge this.

Cargo.toml Outdated
[[bin]]
name = "sccache-dist"
required-features = ["dist-server"]
path = "sccache-dist/main.rs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unconventional to have source in a path that's not under src/, isn't it?

}

/// Add a file by calling `with` with the open `File` corresponding to the cache at path `key`.
pub fn insert_with<K: AsRef<OsStr>, F: FnOnce(File) -> io::Result<()>>(&mut self, key: K, with: F) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure on the precise differences of this vs. insert_by. Can this just call insert_by, passing a closure that opens the file and calls with(f)?

@aidanhs
Copy link
Contributor Author

aidanhs commented Aug 8, 2018

Added two new commits, one to put sccache-dist in a conventional location (and remove explicit paths to the main.rs files), one to simplify insert_with (this had some minor knock-on effects in insert_by as we don't have the size immediately available).

I also took the opportunity to clean up a couple of things in add_file, as well as add a rel_to_abs_path helper that means we no longer need to store the absolute path in the LRU (since it can be derived from the key).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants