Fix circular source causing Nushell to crash#12262
Fix circular source causing Nushell to crash#12262kubouch merged 14 commits intonushell:mainfrom yizhepku:fix-circular-source
Conversation
EngineState now tracks the script currently running, instead of the parent directory of the script. This also provides an easy way to expose the current running script to the user (Issue #12195). Similarly, StateWorkingSet now tracks scripts instead of directories. `parsed_module_files`, which was used to prevent circular module import, is renamed to `scripts` and acts like a stack for tracking the current running script (which is on the top of the stack). Circular import check is now extended for source as well. A simple testcase is added for circular source.
It seems that CI is failed because of this |
sholderbach
left a comment
There was a problem hiding this comment.
Do you have some ideas, what needs resolution here?
| // Path to the script Nushell is currently running, if any. | ||
| pub script: Option<PathBuf>, |
There was a problem hiding this comment.
Is this only used for scripts or does it have uses for other kinds of files (modules or config as in #12338 ) else the renaming is probably bad as we still attach some specific semantics in evaluation to scripts.
There was a problem hiding this comment.
I've renamed it to simply "file". It's less descriptive, but with the comments I've added, it should be fine?
crates/nu-cli/src/eval_file.rs
Outdated
| }); | ||
|
|
||
| engine_state.start_in_file(Some(file_path_str)); | ||
| engine_state.script = Some(file_path.clone()); |
There was a problem hiding this comment.
Personally I would prefer this to remain an accessor instead of assignment to pub fields, if we intend to do more logic when switching the file.
There was a problem hiding this comment.
I removed start_in_file() because its usage is inconsistent (only two occurances throughout the entire codebase). We can re-introduce an abstraction later if necessary.
| /// Absolute paths to scripts that are being processed. | ||
| /// The script currently being processed is on the top of the stack. | ||
| /// Circular import/sourcing is prevented by checking the stack before processing. | ||
| pub scripts: Vec<PathBuf>, |
There was a problem hiding this comment.
Same, we should be accurate with our terminology (defering to @kubouch for the prime expertise on this part of the engine)
There was a problem hiding this comment.
Agreed, we have more files than scripts and the plain name “scripts” doesn't carry the intended meaning for the field (i.e., stack of accessed parsed files).
There was a problem hiding this comment.
Noted. I didn't think too much on the names initially, and I agree "file" is a much better name than "script".
There was a problem hiding this comment.
Same as above, I opted for the shorter name "files" hoping that the comments (and IDE tooltips) would be clear enough.
crates/nu-std/src/lib.rs
Outdated
| let prev_currently_parsed_cwd = working_set.currently_parsed_cwd.clone(); | ||
| working_set.currently_parsed_cwd = Some(PathBuf::from(NU_STDLIB_VIRTUAL_DIR)); | ||
| // Add a placeholder file to the stack of scripts being processed. | ||
| let placeholder = [NU_STDLIB_VIRTUAL_DIR, "placeholder"].iter().collect(); |
There was a problem hiding this comment.
Instead of “placeholder” I'd choose some different name, like “loading_stdlib”. The identical file could also be parsed to the parse() below.
| currently_parsed_cwd.as_path() | ||
| let actual_cwd = if let Some(path) = working_set.scripts.last() { | ||
| // `script_stack` contains absolute paths to scripts, so `path` always has a parent | ||
| path.parent().expect("script path has a parent") |
There was a problem hiding this comment.
Can you refactor the expect() away to reduce potential panic risk?
There was a problem hiding this comment.
The file stack is explictly designed to contain only absolute paths to files, as an invariant. That said, I think the correct approach would be to enforce this invariant early, instead of failing only when parent is accessed. I'll figure something out.
There was a problem hiding this comment.
The file stack is explictly designed to contain only absolute paths to files
This might not work as I hoped. The virtual filesystem depends on the fact that a non-existent relative path is accepted. Hmm...
| currently_parsed_cwd.as_path() | ||
| let actual_cwd = if let Some(path) = working_set.scripts.last() { | ||
| // `script_stack` contains absolute paths to scripts, so `path` always has a parent | ||
| path.parent().expect("script path has a parent") |
There was a problem hiding this comment.
Same here, no expects please.
|
Just some small comments. I like the general idea of using a single stack of files to handle both file-relative current working directory and circular parser access (be it sourced file or imported module). With a few touches, I think this can land.
|
|
#12338 paused to extend this ✨ |
The fix from PR #12338 is removed, but the test is kept.
|
Thanks for all the feedback, and sorry for the late replies. I had something else to tend to, which was why I saved this PR as a draft. I'm back to working on this now. |
|
I'm working on a dedicated I also wants to enforce the invariant that all paths must be absolute, so that we can safely calculate the current working directory. But it doesn't work with virtual filesystem, which uses a non-existent relative path, so a call to Maybe I should just give up the "must be absolute path" part of the invariant? But that would make circular detection unreliable, since two relative paths can resolve to the same file. Here's what it currently looks like: /// Files being evaluated, arranged as a stack.
///
/// The current active file is on the top of the stack.
/// When a file source/import another file, the new file is pushed onto the stack.
/// Attempting to add files that are already in the stack (circular source/import) results in an error.
///
/// Paths stored in a stack are always in a canonical, absolute form.
pub struct FileStack(Vec<PathBuf>);
impl FileStack {
/// Creates an empty stack.
pub fn new() -> Self {
Self(vec![])
}
/// Adds a file to the stack.
///
/// Relative paths are resolved into absolute paths first.
///
/// If the same file is already present in the stack, returns `ParseError::CircularImport`.
/// If the path doesn't exist or is not a regular file, returns `ParseError::FileNotFound`.
pub fn push(&mut self, file: &Path, span: Span) -> Result<(), ParseError> {
// Canonicalize the path.
let path = file
.canonicalize()
.map_err(|_| ParseError::FileNotFound(file.to_string_lossy().to_string(), span))?;
// Check if the path points to a regular file.
if !path.is_file() {
return Err(ParseError::FileNotFound(
path.to_string_lossy().to_string(),
span,
));
}
// Check for circular import.
if let Some(i) = self.0.iter().rposition(|p| p == &path) {
let filenames: Vec<String> = self.0[i..]
.iter()
.chain(std::iter::once(&path))
.map(|p| p.to_string_lossy().to_string())
.collect();
let msg = filenames.join("\nuses ");
return Err(ParseError::CircularImport(msg, span));
}
self.0.push(path);
Ok(())
}
/// Removes a file from the stack and returns its path, or None if the stack is empty.
pub fn pop(&mut self) -> Option<PathBuf> {
self.0.pop()
}
/// Returns the current active file (that is, the file on the top of the stack), or None if the stack is empty.
pub fn top(&self) -> Option<&Path> {
self.0.last().map(PathBuf::as_path)
}
/// Returns the parent directory of the active file, or None if the stack is empty.
pub fn current_working_directory(&self) -> Option<&Path> {
// We maintain the invariant that paths are always absolute, and always points to a regular file.
// Therefore, `path.parent()` always succeeds, and the unwrap is safe.
self.0.last().map(|path| path.parent().unwrap())
}
} |
FileStack represents files that are currently being evaluated. By keeping the files in a stack, we can prevent circular imports and keep track of the active file at the same time.
Unlike `check_circular_import`, `FileStack` doesn't add errors to the working set as a side effect, but returns the error to the caller. Therefore, we need to handle the error at the call sites.
nu-check handles modules and scripts differently. In case of modules, it calls parse_module_file which adds the file to the stack, but in case of scripts it doesn't. The current solution is to just add the file to the stack manually. It's not pretty, but it'll do for now.
|
I'd like to defend my usage of This topic used to be discussed quite a lot, such as this post, which is an interesting read. |
|
It used to be that panics would simply kill our REPL which is unacceptable for a shell. Nowadays, we catch a panic, so it's less of a problem, so I guess it's OK. I'd wrap it into some API, like the FileStack. Having random uncommented freestanding expects/unwraps around the codebase can promote bad hygiene. I like the FileStack API approach better than hacking directly on the working set stack manually. Maybe the FileStack can use the ParserPath instead of PathBuf? The ParserPath was created specifically to abstract over real and virtual paths. Virtual paths can always be considered "absolute" so that should solve your problem (it could have a helper One last thing about the FileStack: Instead of |
I would do that if I could, but sometimes only the caller knows whether an expect/unwrap is safe or not (e.g. pushing paths into a newly-created stack is always safe, but the
I forgot to mention this, but I decided to drop the invariant that paths must be absolute in the final implementation (that code snippet in the comment above is outdated). Refer to the actual commits instead. Without canonicalization, the same file might be imported twice under different paths, but we'll catch the circular import eventually, so this shouldn't be a problem. |
|
OK, looks good. Thanks! |
Description
EngineState now tracks the script currently running, instead of the parent directory of the script. This also provides an easy way to expose the current running script to the user (Issue #12195).
Similarly, StateWorkingSet now tracks scripts instead of directories.
parsed_module_filesandcurrently_parsed_pwdare merged into one variable,scripts, which acts like a stack for tracking the current running script (which is on the top of the stack).Circular import check is added for
sourceoperations, in addition to module import. A simple testcase is added for circular source.User-Facing Changes
It shouldn't have any user facing changes.