Skip to content

Fix circular source causing Nushell to crash#12262

Merged
kubouch merged 14 commits intonushell:mainfrom
yizhepku:fix-circular-source
Apr 19, 2024
Merged

Fix circular source causing Nushell to crash#12262
kubouch merged 14 commits intonushell:mainfrom
yizhepku:fix-circular-source

Conversation

@yizhepku
Copy link
Copy Markdown
Contributor

Description

EngineState now tracks the script currently running, instead of the parent directory of the script. This also provides an easy way to expose the current running script to the user (Issue #12195).

Similarly, StateWorkingSet now tracks scripts instead of directories. parsed_module_files and currently_parsed_pwd are merged into one variable, scripts, which acts like a stack for tracking the current running script (which is on the top of the stack).

Circular import check is added for source operations, in addition to module import. A simple testcase is added for circular source.

User-Facing Changes

It shouldn't have any user facing changes.

EngineState now tracks the script currently running, instead of the
parent directory of the script. This also provides an easy way to expose
the current running script to the user (Issue #12195).

Similarly, StateWorkingSet now tracks scripts instead of directories.
`parsed_module_files`, which was used to prevent circular module import,
is renamed to `scripts` and acts like a stack for tracking the current
running script (which is on the top of the stack).

Circular import check is now extended for source as well. A simple
testcase is added for circular source.
@yizhepku yizhepku marked this pull request as draft March 22, 2024 21:05
@WindSoilder
Copy link
Copy Markdown
Contributor

parsed_module_files and currently_parsed_pwd are merged into one variable

It seems that CI is failed because of this

Copy link
Copy Markdown
Member

@sholderbach sholderbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some ideas, what needs resolution here?

Comment on lines +103 to +104
// Path to the script Nushell is currently running, if any.
pub script: Option<PathBuf>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only used for scripts or does it have uses for other kinds of files (modules or config as in #12338 ) else the renaming is probably bad as we still attach some specific semantics in evaluation to scripts.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it to simply "file". It's less descriptive, but with the comments I've added, it should be fine?

});

engine_state.start_in_file(Some(file_path_str));
engine_state.script = Some(file_path.clone());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would prefer this to remain an accessor instead of assignment to pub fields, if we intend to do more logic when switching the file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed start_in_file() because its usage is inconsistent (only two occurances throughout the entire codebase). We can re-introduce an abstraction later if necessary.

/// Absolute paths to scripts that are being processed.
/// The script currently being processed is on the top of the stack.
/// Circular import/sourcing is prevented by checking the stack before processing.
pub scripts: Vec<PathBuf>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, we should be accurate with our terminology (defering to @kubouch for the prime expertise on this part of the engine)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we have more files than scripts and the plain name “scripts” doesn't carry the intended meaning for the field (i.e., stack of accessed parsed files).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I didn't think too much on the names initially, and I agree "file" is a much better name than "script".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I opted for the shorter name "files" hoping that the comments (and IDE tooltips) would be clear enough.

let prev_currently_parsed_cwd = working_set.currently_parsed_cwd.clone();
working_set.currently_parsed_cwd = Some(PathBuf::from(NU_STDLIB_VIRTUAL_DIR));
// Add a placeholder file to the stack of scripts being processed.
let placeholder = [NU_STDLIB_VIRTUAL_DIR, "placeholder"].iter().collect();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of “placeholder” I'd choose some different name, like “loading_stdlib”. The identical file could also be parsed to the parse() below.

currently_parsed_cwd.as_path()
let actual_cwd = if let Some(path) = working_set.scripts.last() {
// `script_stack` contains absolute paths to scripts, so `path` always has a parent
path.parent().expect("script path has a parent")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor the expect() away to reduce potential panic risk?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file stack is explictly designed to contain only absolute paths to files, as an invariant. That said, I think the correct approach would be to enforce this invariant early, instead of failing only when parent is accessed. I'll figure something out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file stack is explictly designed to contain only absolute paths to files

This might not work as I hoped. The virtual filesystem depends on the fact that a non-existent relative path is accepted. Hmm...

currently_parsed_cwd.as_path()
let actual_cwd = if let Some(path) = working_set.scripts.last() {
// `script_stack` contains absolute paths to scripts, so `path` always has a parent
path.parent().expect("script path has a parent")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, no expects please.

@kubouch
Copy link
Copy Markdown
Contributor

kubouch commented Apr 1, 2024

Just some small comments. I like the general idea of using a single stack of files to handle both file-relative current working directory and circular parser access (be it sourced file or imported module). With a few touches, I think this can land.

nu_check is failing because you set the parsed file in the command, then is_circular_import() is triggered when parse_module_file() is called. The solution is probably to remove the setting in nu-check for modules because that will be done in parse_module_file().

@texastoland
Copy link
Copy Markdown
Contributor

#12338 paused to extend this ✨

The fix from PR #12338 is removed, but the test is kept.
@yizhepku
Copy link
Copy Markdown
Contributor Author

Thanks for all the feedback, and sorry for the late replies. I had something else to tend to, which was why I saved this PR as a draft. I'm back to working on this now.

@yizhepku
Copy link
Copy Markdown
Contributor Author

yizhepku commented Apr 13, 2024

I'm working on a dedicated FileStack abstraction. Circular import detection happens as part of the push method call, which prevents inconsistent states.

I also wants to enforce the invariant that all paths must be absolute, so that we can safely calculate the current working directory. But it doesn't work with virtual filesystem, which uses a non-existent relative path, so a call to canonicalize() will fail.

Maybe I should just give up the "must be absolute path" part of the invariant? But that would make circular detection unreliable, since two relative paths can resolve to the same file.

Here's what it currently looks like:

/// Files being evaluated, arranged as a stack.
///
/// The current active file is on the top of the stack.
/// When a file source/import another file, the new file is pushed onto the stack.
/// Attempting to add files that are already in the stack (circular source/import) results in an error.
///
/// Paths stored in a stack are always in a canonical, absolute form.
pub struct FileStack(Vec<PathBuf>);

impl FileStack {
    /// Creates an empty stack.
    pub fn new() -> Self {
        Self(vec![])
    }

    /// Adds a file to the stack.
    ///
    /// Relative paths are resolved into absolute paths first.
    ///
    /// If the same file is already present in the stack, returns `ParseError::CircularImport`.
    /// If the path doesn't exist or is not a regular file, returns `ParseError::FileNotFound`.
    pub fn push(&mut self, file: &Path, span: Span) -> Result<(), ParseError> {
        // Canonicalize the path.
        let path = file
            .canonicalize()
            .map_err(|_| ParseError::FileNotFound(file.to_string_lossy().to_string(), span))?;

        // Check if the path points to a regular file.
        if !path.is_file() {
            return Err(ParseError::FileNotFound(
                path.to_string_lossy().to_string(),
                span,
            ));
        }

        // Check for circular import.
        if let Some(i) = self.0.iter().rposition(|p| p == &path) {
            let filenames: Vec<String> = self.0[i..]
                .iter()
                .chain(std::iter::once(&path))
                .map(|p| p.to_string_lossy().to_string())
                .collect();
            let msg = filenames.join("\nuses ");
            return Err(ParseError::CircularImport(msg, span));
        }

        self.0.push(path);
        Ok(())
    }

    /// Removes a file from the stack and returns its path, or None if the stack is empty.
    pub fn pop(&mut self) -> Option<PathBuf> {
        self.0.pop()
    }

    /// Returns the current active file (that is, the file on the top of the stack), or None if the stack is empty.
    pub fn top(&self) -> Option<&Path> {
        self.0.last().map(PathBuf::as_path)
    }

    /// Returns the parent directory of the active file, or None if the stack is empty.
    pub fn current_working_directory(&self) -> Option<&Path> {
        // We maintain the invariant that paths are always absolute, and always points to a regular file.
        // Therefore, `path.parent()` always succeeds, and the unwrap is safe.
        self.0.last().map(|path| path.parent().unwrap())
    }
}

FileStack represents files that are currently being evaluated. By keeping the files in a stack, we can prevent circular imports and keep track of the active file at the same time.
Unlike `check_circular_import`, `FileStack` doesn't add errors to the working set as a side effect, but returns the error to the caller. Therefore, we need to handle the error at the call sites.
nu-check handles modules and scripts differently. In case of modules, it calls parse_module_file which adds the file to the stack, but in case of scripts it doesn't. The current solution is to just add the file to the stack manually. It's not pretty, but it'll do for now.
@yizhepku yizhepku marked this pull request as ready for review April 14, 2024 13:58
@yizhepku
Copy link
Copy Markdown
Contributor Author

yizhepku commented Apr 14, 2024

I'd like to defend my usage of expect. In my opinion, using expect to check invariants and catch bugs is a good thing. If the invariant is ever broken, there's some serious issues with the code, and it's better to just crash than continue with an inconsistent state. Using expect instead of bubbling up a result also has the advantange that the user can send a stacktrace to us, which helps with debugging.

This topic used to be discussed quite a lot, such as this post, which is an interesting read.

@yizhepku yizhepku requested a review from kubouch April 14, 2024 15:37
@kubouch
Copy link
Copy Markdown
Contributor

kubouch commented Apr 15, 2024

It used to be that panics would simply kill our REPL which is unacceptable for a shell. Nowadays, we catch a panic, so it's less of a problem, so I guess it's OK. I'd wrap it into some API, like the FileStack. Having random uncommented freestanding expects/unwraps around the codebase can promote bad hygiene.

I like the FileStack API approach better than hacking directly on the working set stack manually. Maybe the FileStack can use the ParserPath instead of PathBuf? The ParserPath was created specifically to abstract over real and virtual paths. Virtual paths can always be considered "absolute" so that should solve your problem (it could have a helper is_absolute() method for example).

One last thing about the FileStack: Instead of canonicalize(), our own nu-path::canonicalize_with() should be used with a provided cwd. Default canonicalize() implicitly uses std::env::current_dir() to expand the path, which results in wrong semantics because we don't use std::env in Nushell at all. Canonicalizing inside the push(), might not be necessary, though. The paths coming from the parser should be already canonicalized, so we could just check if the path is absolute. Either way is fine to me.

@yizhepku
Copy link
Copy Markdown
Contributor Author

I'd wrap it into some API, like the FileStack.

I would do that if I could, but sometimes only the caller knows whether an expect/unwrap is safe or not (e.g. pushing paths into a newly-created stack is always safe, but the FileStack doesn't know that).

Canonicalizing inside the push(), might not be necessary, though.

I forgot to mention this, but I decided to drop the invariant that paths must be absolute in the final implementation (that code snippet in the comment above is outdated). Refer to the actual commits instead.

Without canonicalization, the same file might be imported twice under different paths, but we'll catch the circular import eventually, so this shouldn't be a problem.

Copy link
Copy Markdown
Member

@IanManske IanManske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, it looks good! The comments below have to do with removing a few expects. (The other expects might be a little harder to eliminate, so I'm fine with leaving them in for now.)

@kubouch
Copy link
Copy Markdown
Contributor

kubouch commented Apr 19, 2024

OK, looks good. Thanks!

@kubouch kubouch merged commit 6d2cb43 into nushell:main Apr 19, 2024
@yizhepku yizhepku deleted the fix-circular-source branch April 19, 2024 06:38
@hustcer hustcer added this to the v0.93.0 milestone Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants