Skip to content
This repository was archived by the owner on Dec 29, 2022. It is now read-only.

Use linked compiler during Cargo exec() callback#424

Merged
nrc merged 3 commits intorust-lang:masterfrom
Xanewok:linked-compiler
Aug 5, 2017
Merged

Use linked compiler during Cargo exec() callback#424
nrc merged 3 commits intorust-lang:masterfrom
Xanewok:linked-compiler

Conversation

@Xanewok
Copy link
Copy Markdown
Contributor

@Xanewok Xanewok commented Jul 25, 2017

This is a first step to not be as reliant on Cargo and makes it so RLS uses linked compiler instead of rustc or $RUSTC process.
Passing analysis data from the compiler should be faster in few cases since analysis data will be passed in-memory, rather than unconditionally via file on disk. Additionally cargo routine can now collect diagnostics messages and display them until subsequent bare rustc build routine is performed.

if let Some(new_analysis) = new_analysis {
analysis.reload_from_analysis(new_analysis, &project_path_clone, &cwd, false).unwrap();
} else {
if new_analysis.is_empty() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After each crate compilation rustc can return an Option with analysis data and we collect unwrapped payloads into a vector, which we then load from file. For a single crate the behaviour is the same, since it's a change None => vec![] and Some(analysis) => vec![analysis]. However with this, now Cargo routine can collect and load the analysis data. There's one what-if: if a compiled indirect dependency will return None for analysis but there will be some collected into a vector, then a missing one won't be loaded from disk, since it'll only try to load the provided analysis in-memory, and possible cause subtle problems resolving the data? @nrc is such a case possible?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is rare for there to be no save-analysis info, even if we fail to compile, so I expect this will not be a problem. In any case, it should be resolved once we build successfully, so it should only be temporary.

_ => {}
}
} else {
cmd.status().expect("Couldn't execute rustc");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must do this before storing the args/envs into the compialtion_cx

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that I made it so the lock on compilation_cx is unnecessarily held for the whole duration of compilation, I'll fix that.

Besides that, should we set it after in case cmd.status() or rustc::rustc() panics and we don't store bogus values from faulty compilation? If not, then I think this may not be a problem, since we feed the values directly to rustc::rustc and it doesn't touch it nor does cmd.status() require it.


if self.workspace_mode {
let args = &compilation_cx.args;
let envs = &compilation_cx.envs;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how these ever get populated since we never do a Cargo run without running rustc directly in workspace mode? How does this work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only cargo routine populates the args and envs, rustc consumes them externally. In workspace mode currently only ever cargo routine is called, which then prepares RLS-specific args/envs from those generated by Cargo itself as usual. Then it feeds them directly to, introduced in this change, rustc::rustc (unlike being passed via Command API and actually executing the command). In practice it doesn't use use compilation_cx.{args,envs} at all, however it doesn't prevent it from being set here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, going to take another look, I think I misunderstood how it all works

// however order of received messages is non-deterministic and this
// would require implementing something like `or_expect_contains`
ExpectedMessage::new(None).expect_contains("publishDiagnostics"),
ExpectedMessage::new(None).expect_contains("publishDiagnostics"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, I think we should only send a single publish/end pair, rather than one per crate. But that can be fixed later. You should check for a second diagnosticsEnd though

Copy link
Copy Markdown
Contributor Author

@Xanewok Xanewok Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? There is only one requested build here, so there's only one pair of diagnosticsBegin/diagnosticsEnd and multiple publishDiagnostics in between because there are multiple files (issued here while iterating every file => diagnostics results pair), confirmed with output now

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry, I was confused - I thought they were multiple diagnosticsBegin, not publishDiagnostics. Then this is doing what I expect.

}

// Finally, store the modified cargo-generated args/envs for future rustc calls
args.insert(0, rustc_exe);
Copy link
Copy Markdown
Contributor Author

@Xanewok Xanewok Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrc while we're at it, do you know why we need to pass a rustc exe as a first argument to the linked compiler?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to pass it, but it is so that the args look like what would happen if you took the args from inside the program (there you get the executable name as the first arg). We could not insert it and strip that arg if we ever still take the args from inside the program (I'm not sure we do).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay, I thought run_compiler accepts args with the first program argument omitted, nevermind then 👍

@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Jul 26, 2017

As a note, test_simple_workspace is failing right now because of a env-related deadlock caused by locking in run_cargo followed by running rustc::rustc since 79d659e. Environment should be handled in a more principled way.

@nrc
Copy link
Copy Markdown
Member

nrc commented Jul 27, 2017

OK, I think I understand better what you're doing and it looks good. Just need to fix up the locking on the env and it should be good to go.

@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Jul 28, 2017

Force-pushed a version with two locks (one for Cargo, one for rustc scopes) that fixes the workspace deadlock. Unfortunately right now this still can fail (what's interesting that it only seems to affect find_all_refs tests?). I don't think it's good to try and fix that and devise a more complex and foolproof lock in the code that will be practically only needed by the tests, so I pushed the current version for now.

Since more work in general and the more complete workspace support will probably mean more work involving env vars, I'll try to move relevant tests into separate processes so we don't have to work around env var as much and unnecessarily.

@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Jul 28, 2017

I'll get working on more foolproof locking for the tests during the weekend and hopefully will get back whenever I manage to do something with it.

@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Jul 31, 2017

Pushed a prototype version of EnvironmentLock, which is supposed to be double-layered lock that ensures consistent env vars during the build and testing.
The lock has to be double-layered, since we'll begin to call linked rustc during cargo, and not only do we need to provide consistent and mutually-exclusive access to env vars during cargo routine, but also during it, we need to do the same for multiple rustc calls while holding the outer/parent lock.

Naive approach of doing two separate locks for each procedure won't work, because our builds has 2 modes: bare rustc and cargo (which can now run nested rustc as well). There can be a situation where we start a cargo build in one RLS server instance (multiple instances run during testing) and before we get to the nested rustc lock, some other instance can start building while acquiring only rustc lock. That's why any env var locking required for the build must go through the same lock (regardless of whether it's cargo or bare rustc build routine) and only then, while holding it, another one (inner) can be acquired if needed for additional calls.

I'm currently using EnvironmentLockFacade to pass appropriate outer/inner mutex access to underlying rustc and cargo procedures, while the actual Mutexes (in EnvironmentLock as ENV_LOCK) are initialized statically, ensuring single access across many spawned language services.

It's only a prototype and requires more work, like organizing code better in general, encapsulating the logic a bit more, so we can ensure to the best of our ablities that the locks will be acquired in correct order and it needs proper comments when the design will be set in stone.

let mut restore_env = Environment::push(&env);
// There can be at most one Cargo instance ran, so don't guard env vars with a lock in case
// unerlying rustc calls will require mutually exclusive access to this process' env vars
// FIXME: Don't lock me. This is only required for tests and shouldn't modify program's behaviour
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed, forgot about this comment

@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Aug 1, 2017

Pushed a reorganized and documented version of the double mutex. For this, I separated this into util::environment module to hide the implementation details and clean up in general. More implementation details are in the description of 5bd732f (the last commit with the lock).

@nrc does this version look better? I tried to encode the none -> outer -> inner lock access order via types, but locking order is still not guaranteed. However, I think it's a lot more clear and more ergonomical (?) now, for what it's worth 🙂

Copy link
Copy Markdown
Member

@nrc nrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to come back and do a proper review, but I'm feeling fuzzy. Here are some initial comments

@@ -0,0 +1,141 @@
use std::env;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets keep this in the build module, rather than util (and the file needs the copyright header)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to separate it into a different module to hide the impl details and this seemed abstract enough to put under util module. Can I put it in the build module but under inline module to achieve the same visibility semantics (so the build/cargo modules can't access private fields)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that sounds perfect

src/util/mod.rs Outdated
@@ -0,0 +1 @@
pub mod environment;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we don't need this with the previous comment, for future reference, you don't need a separate file here, you could use an inline module in the parent.

let outer_lock = env_lock.as_outer();
let (outer, inner) = outer_lock.get_lock();
// Lock early to guarantee synchronized access to read RUSTFLAGS env var
let env_lock_guard = outer.lock().unwrap();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we encapsulate this stuff in the EnvironmentLock class and expose descriptively named API instead of low-level locking primitives?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrc what should the API return? MutexGuard<'a, ()> (newtyped?) instead of &Mutex<()>? I just realized that returning a scoped value in this case does not require providing explicit lifetime to the lock structs, so this will be an improvement.
In general, though, I'd love to limit the lifetime of InnerLock to the one of the guard that's returned alongside of it, but since it's passed to Arc<Executor>, I'm not sure that's possible.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine you would have your own guard struct that you would return, rather than returning either the mutex itself or a guard.

@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Aug 2, 2017

Pushed updated and rebased version, hope I addressed some points you wanted.
The lock etc. are still under different module to hide private data and such, but the module is in build now.
I know the lock implementation probably still leaves much to be desired, but if it's acceptable for now in the current form, I'd love for the PR to be merged, mainly because of the linked compiler changes in workspace mode. If needed, I can improve the lock or work on a way to handle scoped, synchronized environment properly later.
However, if necessary, I'll obviously address further issues if there will be any with the current implementation.

@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Aug 3, 2017

macOS build got stuck, retrying

@Xanewok Xanewok closed this Aug 3, 2017
@Xanewok Xanewok reopened this Aug 3, 2017
@nrc
Copy link
Copy Markdown
Member

nrc commented Aug 4, 2017

I've read through and I think this looks OK. I'd be interested in trying to iterate on the env lock a bit more, but I can't imagine how to do that without playing with the code myself. So, lets land this as is. Could you rebase please?

Xanewok added 3 commits August 4, 2017 10:05
This introduces a new `build::environment` module with moved `Environment` RAII guard and new global `EnvironmentLock` and related. The lock has to be a double one, because we might need doubly scoped env vars (outer one for Cargo routine and inner one for underlying rustc calls across different threads, contained within Cargo routine invocation scope). While the lock itself and provided types don't enforce a strict lock ordering, the order of retrieved locks and types is still encoded in the type system.
This might not be a complete and fully abstracted away solution to N-valued mutex and guarded environment, but works good enough and seems to cover enough cases in both production and testing scenarios.
@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Aug 5, 2017

Rebased now

@nrc nrc merged commit c4dac14 into rust-lang:master Aug 5, 2017
@nrc
Copy link
Copy Markdown
Member

nrc commented Aug 5, 2017

Thanks for the rebase!

@Xanewok Xanewok deleted the linked-compiler branch August 6, 2017 17:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants