Skip to content

fix(windows): Windows specific script handling#1439

Merged
yohamta0 merged 34 commits into
mainfrom
windows-specific-envs
Dec 4, 2025
Merged

fix(windows): Windows specific script handling#1439
yohamta0 merged 34 commits into
mainfrom
windows-specific-envs

Conversation

@yohamta0

@yohamta0 yohamta0 commented Dec 2, 2025

Copy link
Copy Markdown
Collaborator
  • support relative path script execution
  • support failure on non-zero interim command in a script
  • add missing base environment variables for windows

Summary by CodeRabbit

  • New Features

    • Direct command mode, nix-shell and prioritized shell handlers (PowerShell, cmd, Unix).
    • Platform-aware environment whitelist with normalized env-key handling and platform-specific defaults.
    • Log reading: selectable encodings, platform-aware default encoding, and richer read-result metadata.
    • New helper to detect regular files.
  • Refactor

    • Consolidated command construction for consistent cross-shell execution.
  • Tests

    • Expanded shell, path-normalization, file, and encoding tests.
  • Chore

    • More restrictive permissions for created work directories.

✏️ Tip: You can customize this high-level summary in your review settings.

@ghansham

ghansham commented Dec 2, 2025

Copy link
Copy Markdown

Can we have a base structure and based on os populate environment variables. That may make the code more maintainable, testable and traceable.

Similarly normalizeShell commands.
We can have a factory pattern which implements os specific version of base class functions based on the os choice

@yohamta0

yohamta0 commented Dec 3, 2025

Copy link
Copy Markdown
Collaborator Author

@ghansham Good point, thanks for pointing this out!

@yohamta0

yohamta0 commented Dec 3, 2025

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review request

@coderabbitai

coderabbitai Bot commented Dec 3, 2025

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Dec 3, 2025

Copy link
Copy Markdown

Walkthrough

Centralizes command construction into a shellCommandBuilder with a Shell registry and per-shell builders; moves environment whitelist population to OS-specific init files; adds fileutil.IsFile, log-reading encoding/decoders, platform script-path normalization, tighter directory permissions, and expanded unit/integration tests.

Changes

Cohort / File(s) Summary
Environment whitelisting
internal/common/config/env.go, internal/common/config/env_unix.go, internal/common/config/env_windows.go
env.go uses an empty defaultWhitelist and calls normalizeEnvKey when filtering; env_unix.go and env_windows.go populate defaultWhitelist in init() and provide platform normalizeEnvKey (Unix: identity, Windows: uppercase).
File utilities & tests
internal/common/fileutil/fileutil.go, internal/common/fileutil/fileutil_test.go
Adds IsFile(path string) bool (true only for regular files), updates doc comments, and adds tests covering file, directory, and non-existent cases.
Log reading encoding support
internal/common/fileutil/logutil.go, internal/common/fileutil/logutil_test.go
Adds Encoding to LogReadOptions, LogResult fields extended, adds getEncodingDecoder(charset string) using x/text, threads *encoding.Decoder through read helpers, and updates tests to pass decoder (nil).
Frontend API log options
internal/service/frontend/api/v2/dagruns.go
Passes Encoding into LogReadOptions for DAG run log endpoints.
Config loader: default log encoding
internal/common/config/loader.go, internal/common/config/loader_unix.go, internal/common/config/loader_windows.go
Default charset now from getDefaultLogEncodingCharset(); Unix extracts from LC_ALL/LANG and normalizes via ianaindex; Windows maps ANSI code page to charset and normalizes.
Shell abstraction & registry
internal/runtime/builtin/command/shell.go
Adds Shell interface, shellRegistry (direct, nix, powershell, cmd, unix) and findShell to select implementation by executable name (unixShell as fallback).
Per-shell implementations
internal/runtime/builtin/command/shell_direct.go, .../shell_nix.go, .../shell_powershell.go, .../shell_cmd.go, .../shell_unix.go
New shell implementations with Match and Build logic: direct execution, nix-shell (package flags, --pure/--run handling), PowerShell, cmd.exe (COMSPEC resolution, /c handling), and Unix shell fallback (may inject -e).
Command builder & script support
internal/runtime/builtin/command/command.go, .../command_builder.go, .../command_script.go, .../command_unix.go, .../command_windows.go
Centralizes command creation in shellCommandBuilder.Build(ctx), adds script setup/preprocessing (setupScript, preprocessScript), step validation, exitCodeFromError, and platform normalizeScriptPath (Windows may prefix local scripts with .\).
Command tests (unit & windows)
internal/runtime/builtin/command/command_test.go, internal/runtime/builtin/command/command_windows_test.go
Tests updated/added to exercise shellCommandBuilder.Build(ctx) across shells and scenarios; Windows tests for normalizeScriptPath and isWindowsScriptExtension.
Integration shell tests
internal/integration/shell_test.go
Adds Unix-oriented direct-shell integration tests and many subtests for shell modes, env substitution, globbing, redirection, shebang handling (Unix tests skip on Windows).
GHA executor permission tightening
internal/runtime/builtin/gha/executor.go
Two MkdirAll calls changed from 0755 to 0750.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Config as Step Config
    participant Builder as shellCommandBuilder
    participant Finder as findShell()
    participant Shell as Shell impl
    participant OS as OS utils (env/fileutil/loader)
    participant Exec as exec.Cmd

    Config->>Builder: instantiate (Command, Args, Script, Shell, Dir, ShellPackages)
    Builder->>Finder: findShell(b.Shell[0])
    Finder-->>Builder: selected Shell impl
    alt script present
        Builder->>OS: setupScript(workDir, script, shell)
        OS-->>Builder: scriptFile path / error
    end
    Builder->>Shell: Shell.Build(ctx, b) (may call b.normalizeScriptPath())
    Shell->>OS: may call fileutil.IsFile / normalizeEnvKey / getDefaultLogEncodingCharset
    Shell->>Exec: construct exec.CommandContext(...)
    Exec-->>Builder: *exec.Cmd or error
    Builder-->>Config: return command or error (exit code normalized)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus review on:
    • shellRegistry ordering and each Shell.Match/Build implementation.
    • Windows normalizeScriptPath and isWindowsScriptExtension edge cases and fileutil.IsFile usage.
    • setupScript (temp-file creation, write/sync, permission 0750, cleanup-on-error).
    • nix-shell handling: package -p flags, --pure/--impure detection, --run insertion, and set -e placement.
    • Log encoding detection/mapping and correct propagation of decoder to all read helpers.

Poem

🐇 I hopped through code where shells align,
Whitelists wake per OS, keys kept fine.
Scripts are stitched and paths wear tidy shoes,
Permissions snug in tighter, fitter hues.
I nibble tests, they green — my whiskers twitch.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.18% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly addresses Windows-specific script path handling, which is a core theme across multiple file changes including new Windows-specific files for environment variables, script normalization, and permissions.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch windows-specific-envs

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai

coderabbitai Bot commented Dec 3, 2025

Copy link
Copy Markdown

@yottahmd I'll review the changes in this PR.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (11)
internal/common/fileutil/fileutil.go (1)

48-55: IsFile semantics look good; just be aware of error/links behavior

Implementation is straightforward and correct: it returns true only for regular files and false on error. Just keep in mind this treats errors (including permission issues) the same as “not a regular file”, and follows symlinks because it uses os.Stat rather than os.Lstat. If that matches the intended use (e.g., script path checks), this is fine.

internal/runtime/builtin/command/command_builder.go (1)

21-30: Consider guarding against a nil Shell implementation from findShell

Build assumes findShell(b.Shell[0]) always returns a non-nil implementation. If that invariant is ever broken (e.g., future refactor, tests swapping the registry), shell.Build would panic.

You could harden this by returning an explicit error when findShell yields nil:

 func (b *shellCommandBuilder) Build(ctx context.Context) (*exec.Cmd, error) {
 	if len(b.Shell) == 0 {
 		return nil, fmt.Errorf("shell command is required")
 	}
 
-	shell := findShell(b.Shell[0])
-	return shell.Build(ctx, b)
+	shell := findShell(b.Shell[0])
+	if shell == nil {
+		return nil, fmt.Errorf("unknown shell: %s", b.Shell[0])
+	}
+	return shell.Build(ctx, b)
 }
internal/runtime/builtin/command/shell_cmd.go (1)

21-39: Verify working directory propagation and direct-command path

The overall branching looks good:

  • When both Command and Script are set, you bypass cmd.exe and invoke the command directly with Args + Script, which matches the “direct command + script file” intent.
  • Otherwise you:
    • Clone shell arguments,
    • Append step args,
    • Ensure /c or /C is present,
    • Append the normalized script path via b.normalizeScriptPath(),
    • Then execute cmd.

Two things to double‑check:

  1. Working directory: exec.CommandContext is created without setting cmd.Dir here. Unless some higher layer sets Dir on the returned *exec.Cmd, the step’s working directory (b.Dir) may be ignored on Windows for cmdShell. If that’s not handled elsewhere, consider:

  • return exec.CommandContext(ctx, cmd, args...), nil // nolint: gosec
  • c := exec.CommandContext(ctx, cmd, args...) // nolint: gosec
  • c.Dir = b.Dir
  • return c, nil

and similarly in the direct‑command branch if needed.

2. **Empty script path:** If neither `b.Script` nor `b.ShellCommandArgs` is set, `b.normalizeScriptPath()` might end up empty. That would add an empty argument to `cmd`. If that’s an impossible state by construction, fine; otherwise it may be worth asserting or failing earlier.

</blockquote></details>
<details>
<summary>internal/runtime/builtin/command/command_script.go (3)</summary><blockquote>

`13-49`: **`setupScript` is robust; consider optional cleanup on failure**

The script setup flow is solid:

- Chooses an extension based on the shell (via `GetScriptExtension`),
- Uses `os.CreateTemp` in `workDir` with a clear pattern,
- Applies preprocessing, writes, syncs, and then `Chmod` to `0750`.

Two optional refinements:

- On write/sync/Chmod failures you currently leave the temp file behind. If those failure paths are likely (e.g., disk full), you might want to `os.Remove(file.Name())` before returning the error.
- The comment says “creates a temporary script file … in the given working directory”; with `workDir == ""`, `os.CreateTemp` falls back to the system temp dir, which is fine but slightly different from the comment.

Both are small polish items; the core behavior looks correct.

---

`64-72`: **Consider always cloning `args` in `createDirectCommand` for consistency**

Logic is correct, but the aliasing behavior differs by branch:

- When `scriptFile != ""`, you clone `args` before appending.
- When `scriptFile == ""`, you pass `args` directly.

For predictable behavior (callers never see their slice modified via `exec.Cmd.Args`), you could always clone:

```diff
func createDirectCommand(ctx context.Context, cmd string, args []string, scriptFile string) *exec.Cmd {
-	if scriptFile != "" {
-		clonedArgs := cloneArgs(args)
-		clonedArgs = append(clonedArgs, scriptFile)
-		return exec.CommandContext(ctx, cmd, clonedArgs...) // nolint: gosec
-	}
-	return exec.CommandContext(ctx, cmd, args...) // nolint: gosec
+	clonedArgs := cloneArgs(args)
+	if scriptFile != "" {
+		clonedArgs = append(clonedArgs, scriptFile)
+	}
+	return exec.CommandContext(ctx, cmd, clonedArgs...) // nolint: gosec
}

74-90: validateCommandStep logic is sound; clarify combinations if needed

The validation covers:

  • Command + Script (both set),
  • Command‑only,
  • Script‑only,
  • SubDAG‑only,

and otherwise returns core.ErrStepCommandIsRequired. That matches “at least one of Command, Script, or SubDAG must be set”.

If the domain model intends to forbid combinations like Command + SubDAG or Script + SubDAG, you might want to tighten the conditions; otherwise this is fine as‑is.

internal/runtime/builtin/command/shell_unix.go (1)

46-62: Consider adding support for additional Unix shells.

The current list covers the most common Unix shells, but consider whether tcsh or csh should be included. However, since these shells have different syntax for errexit (e.g., csh doesn't support -e), the current approach of excluding them is likely intentional and correct.

internal/runtime/builtin/command/shell_nix.go (2)

26-27: Consider making pure mode optional.

The code always enforces --pure mode, but users might want impure nix-shell environments. Consider making this conditional based on whether the user explicitly specified shell arguments.

Apply this diff to make pure mode conditional:

-	// nix-shell runs in pure mode by default
-	args = append(args, "--pure")
+	// Add pure mode if not already specified
+	if !slices.Contains(args, "--pure") && !slices.Contains(args, "--impure") {
+		args = append(args, "--pure")
+	}

35-47: String concatenation for command construction may introduce issues.

Building shell commands via string concatenation (lines 39-43) can be fragile with special characters or whitespace in arguments. While this works for the nix-shell use case, consider if proper shell escaping is needed.

internal/runtime/builtin/command/command_test.go (2)

1800-1814: Consider making the test assertion deterministic.

The comment "This should work since Shell is now a slice" and the conditional if err == nil suggests uncertainty about the expected behavior. For reliable test coverage, consider either:

  1. Fixing the input to guarantee a specific outcome
  2. Using a more deterministic test case
 	t.Run("shellCommandBuilder error with shell cmd args", func(t *testing.T) {
-		// Test the path where shellCommandBuilder fails due to empty Shell
-		// when ShellCommandArgs is set
 		config := commandConfig{
 			Ctx:              ctx,
-			Shell:            []string{"invalid shell with\ttabs"},
+			Shell:            []string{}, // Empty shell should cause error
 			ShellCommandArgs: "echo hello",
 		}
-		// This should work since Shell is now a slice
 		cmd, err := config.newCmd(ctx, "")
-		// The command might succeed or fail depending on the shell parsing
-		if err == nil {
-			assert.NotNil(t, cmd)
-		}
+		assert.Error(t, err)
+		assert.Nil(t, cmd)
 	})

2114-2123: Test assertion is weak and may not provide meaningful coverage.

This test attempts to write to /dev directory but the assertion if err != nil { assert.Error(t, err) } is tautological - it only asserts an error when there is already an error. Consider removing this test or making it more deterministic.

 	if _, err := os.Stat("/dev/full"); err == nil {
 		t.Run("write to full device", func(t *testing.T) {
-			// /dev/full simulates a full disk
-			_, err := setupScript("/dev", "echo hello", []string{"/bin/sh"})
-			// This might fail due to permission or other issues
-			if err != nil {
-				assert.Error(t, err)
-			}
+			t.Skip("Test is non-deterministic and depends on system permissions")
 		})
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b340e6 and 4933434.

📒 Files selected for processing (18)
  • internal/common/config/env.go (1 hunks)
  • internal/common/config/env_unix.go (1 hunks)
  • internal/common/config/env_windows.go (1 hunks)
  • internal/common/fileutil/fileutil.go (1 hunks)
  • internal/integration/shell_test.go (2 hunks)
  • internal/runtime/builtin/command/command.go (2 hunks)
  • internal/runtime/builtin/command/command_builder.go (1 hunks)
  • internal/runtime/builtin/command/command_script.go (1 hunks)
  • internal/runtime/builtin/command/command_test.go (5 hunks)
  • internal/runtime/builtin/command/command_unix.go (1 hunks)
  • internal/runtime/builtin/command/command_windows.go (1 hunks)
  • internal/runtime/builtin/command/command_windows_test.go (1 hunks)
  • internal/runtime/builtin/command/shell.go (1 hunks)
  • internal/runtime/builtin/command/shell_cmd.go (1 hunks)
  • internal/runtime/builtin/command/shell_direct.go (1 hunks)
  • internal/runtime/builtin/command/shell_nix.go (1 hunks)
  • internal/runtime/builtin/command/shell_powershell.go (1 hunks)
  • internal/runtime/builtin/command/shell_unix.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
internal/runtime/builtin/command/shell_unix.go (1)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
internal/integration/shell_test.go (3)
internal/test/helper.go (2)
  • Setup (79-223)
  • DAG (420-423)
internal/runtime/subcmd.go (1)
  • Run (248-270)
internal/core/dag.go (1)
  • DAG (33-135)
internal/runtime/builtin/command/command_builder.go (1)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
internal/runtime/builtin/command/shell_nix.go (1)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
internal/runtime/builtin/command/command_windows.go (1)
internal/common/fileutil/fileutil.go (1)
  • IsFile (49-55)
internal/runtime/builtin/command/command_script.go (1)
internal/common/cmdutil/cmdutil.go (1)
  • GetScriptExtension (306-338)
internal/runtime/builtin/command/command_test.go (2)
internal/runtime/subcmd.go (1)
  • Run (248-270)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
🔇 Additional comments (20)
internal/runtime/builtin/command/command_builder.go (1)

9-19: Builder shape and fields look reasonable

The struct cleanly captures all the knobs needed for shell construction (dir, command, args, script, shell metadata, user-specified flag). No issues here from a modeling standpoint.

internal/runtime/builtin/command/shell_cmd.go (1)

9-19: cmdShell.Match is clear and correct

Match logic for "cmd.exe" / "cmd" is straightforward and aligned with expectations.

internal/runtime/builtin/command/command_script.go (1)

51-62: PowerShell preprocessing correctly enforces failure on non‑zero exit

The .ps1 branch prepends:

  • $ErrorActionPreference = 'Stop' to fail on cmdlet errors.
  • $PSNativeCommandUseErrorActionPreference = $true to propagate non‑zero native command exits (in PS 7+; harmless as a normal variable earlier).

This aligns well with the PR goal of failing on non‑zero interim commands in scripts. No changes needed here.

internal/common/config/env.go (1)

18-21: Platform‑specific whitelist initialization is reasonable; watch key casing

Moving defaultWhitelist to an empty map and populating it from env_windows.go / env_unix.go is a good step toward OS‑specific behavior.

Because filterEnv does a case‑sensitive allow[key] lookup, it’s important that the keys inserted in those init functions match the actual casing returned by os.Environ() on each platform (especially on Windows where env keys are case‑insensitive but case‑preserving). The rest of the logic here looks fine.

internal/runtime/builtin/command/command_unix.go (1)

1-11: Unix normalizeScriptPath behavior is appropriate

On non‑Windows builds you simply return b.ShellCommandArgs unchanged, keeping the existing Unix expectation that callers use ./script or absolute paths. This cleanly isolates the Windows‑specific correction logic to the windows build without changing Unix semantics.

internal/common/config/env_unix.go (1)

5-30: Unix whitelist contents are sensible and consistent

The selected Unix/macOS environment variables (PATH, HOME, USER, SHELL, TMPDIR, TERM, locale vars, TZ, LD_LIBRARY_PATH, XDG_* dirs) are a reasonable default set to propagate. Combined with the shared prefix rules, this should give child processes what they need without over‑exposing the environment.

internal/runtime/builtin/command/shell_direct.go (1)

29-47: LGTM! Clean direct shell implementation.

The validation logic correctly enforces array syntax for direct shell commands and provides clear error messages to guide users. The use of cloneArgs prevents mutation issues.

internal/integration/shell_test.go (2)

10-139: Excellent test coverage for direct shell execution.

The test suite thoroughly covers direct shell behavior including command arrays, exit codes, special character handling, and Dagu variable substitution. The parallel execution is appropriate for these independent tests.


219-359: Comprehensive shell feature testing.

Good coverage of shell-specific behaviors including command substitution, environment variables, multiline scripts, redirection, and glob expansion. These tests properly validate the distinction between direct and shell-wrapped execution.

internal/runtime/builtin/command/command_windows.go (1)

17-43: Well-designed Windows path normalization logic.

The implementation correctly handles the Windows-specific behavior where PowerShell and cmd.exe don't search the current directory by default. The restriction to script extensions prevents false positives when commands happen to match filenames.

internal/runtime/builtin/command/shell_powershell.go (1)

21-40: PowerShell command construction looks correct.

The implementation properly handles both Windows PowerShell and PowerShell Core, uses the appropriate -Command flag, and correctly bypasses shell wrapping for direct command execution.

internal/runtime/builtin/command/shell_unix.go (1)

19-44: Solid Unix shell implementation with proper errexit handling.

The conditional addition of the -e flag for Unix-like shells improves error handling while respecting user-specified shell configurations. The direct command bypass path is consistent with other shell implementations.

internal/runtime/builtin/command/command_windows_test.go (1)

15-127: Comprehensive test coverage for Windows path normalization.

The test cases thoroughly cover edge cases including subdirectories, absolute paths, command collisions, and non-existent files. The use of real temp files provides confidence in the file system interaction logic.

internal/runtime/builtin/command/shell.go (1)

10-42: Excellent shell abstraction design.

The Shell interface provides a clean abstraction for platform-specific command building. The registry-based approach with ordered matching is elegant and extensible. The fallback to unixShell ensures robustness.

internal/runtime/builtin/command/command.go (2)

131-140: LGTM - Dir field addition supports relative path script execution.

The addition of the Dir field to the shellCommandBuilder aligns with the PR objective of supporting relative path script execution. This ensures the builder has access to the working directory context when constructing commands.


168-182: Passing Command/Args to shellCommandBuilder when ShellCommandArgs is set is intentional.

The shell implementations (unixShell and nixShell) explicitly ignore Command and Args when ShellCommandArgs is set. The logic branches on whether Script exists: if Command != "" && Script != "", it uses Command/Args; otherwise, when ShellCommandArgs is set, it uses only ShellCommandArgs with the -c flag. This is documented in the code comments (shell_unix.go:35-37) and confirmed by existing tests (command_test.go:2023-2034).

internal/runtime/builtin/command/command_test.go (4)

22-85: Good test coverage for the direct shell type.

The TestDirectShell function provides comprehensive coverage of the "direct" shell execution path, including:

  • Command + Script combination
  • Command-only execution
  • Verification that ShellCommandArgs is correctly ignored when Command is set
  • Proper error handling for invalid configurations

160-205: LGTM - PowerShell test coverage is thorough.

The tests properly validate PowerShell command construction including:

  • Basic commands with -Command flag injection
  • Cases where -Command already exists in shell args
  • Script execution bypassing PowerShell when Command is specified

207-252: LGTM - Cmd.exe test coverage is appropriate.

The tests validate Windows cmd.exe command construction, including proper /c flag handling and script execution paths.


1-20: Comprehensive test coverage for the shell abstraction refactoring.

The test file provides excellent coverage including:

  • Direct shell execution paths
  • PowerShell and cmd.exe handling
  • Script execution with and without shebangs
  • Error handling and edge cases
  • Proper platform-specific test skipping

Comment thread internal/common/config/env_windows.go Outdated
@yohamta0

yohamta0 commented Dec 3, 2025

Copy link
Copy Markdown
Collaborator Author

@coderabbitai generate docstrings

@coderabbitai

coderabbitai Bot commented Dec 3, 2025

Copy link
Copy Markdown
✅ Actions performed

Initiated docstring generation; will generate only if new commits exist.

@coderabbitai

coderabbitai Bot commented Dec 3, 2025

Copy link
Copy Markdown

Note

Docstrings generation - SUCCESS
Generated docstrings for this pull request at #1441

coderabbitai Bot added a commit that referenced this pull request Dec 3, 2025
Docstrings generation was requested by @yottahmd.

* #1439 (comment)

The following files were modified:

* `internal/common/config/env.go`
* `internal/common/config/env_unix.go`
* `internal/common/config/env_windows.go`
* `internal/common/fileutil/fileutil.go`
* `internal/runtime/builtin/command/command.go`
* `internal/runtime/builtin/command/command_script.go`
* `internal/runtime/builtin/command/command_windows.go`
* `internal/runtime/builtin/command/shell.go`
* `internal/runtime/builtin/command/shell_unix.go`
Docstrings generation was requested by @yottahmd.

* #1439 (comment)

The following files were modified:

* `internal/common/config/env.go`
* `internal/common/config/env_unix.go`
* `internal/common/config/env_windows.go`
* `internal/common/fileutil/fileutil.go`
* `internal/runtime/builtin/command/command.go`
* `internal/runtime/builtin/command/command_script.go`
* `internal/runtime/builtin/command/command_windows.go`
* `internal/runtime/builtin/command/shell.go`
* `internal/runtime/builtin/command/shell_unix.go`

<details>
<summary>These files were ignored</summary>

* `internal/integration/shell_test.go`
* `internal/runtime/builtin/command/command_test.go`
* `internal/runtime/builtin/command/command_windows_test.go`

</details>

<details>
<summary>ℹ️ Note</summary><blockquote>

CodeRabbit cannot perform edits on its own pull requests yet.

</blockquote></details>

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Yota Hamada <yohamta@gmail.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
internal/common/config/env_windows.go (1)

9-25: COMSPEC casing still inconsistent with Windows env / whitelist lookup

The whitelist still uses "COMSPEC" while Windows commonly exposes this variable as "ComSpec". With a case‑sensitive defaultWhitelist lookup, ComSpec from os.Environ() will not be whitelisted, so this base variable is still effectively filtered out.

Recommend aligning with the actual casing (e.g. replace with "ComSpec", or include both "ComSpec" and "COMSPEC" if you want to be extra defensive), or normalize casing in filterEnv before lookup.

-        "COMSPEC",     // cmd.exe
+        "ComSpec",     // cmd.exe (canonical Windows casing)
🧹 Nitpick comments (2)
internal/runtime/builtin/command/shell_unix.go (1)

19-44: Defensively guard against an empty Shell slice in unixShell.Build

Build assumes len(b.Shell) > 0:

cmd := b.Shell[0]
args := cloneArgs(b.Shell[1:])

If shellCommandBuilder ever routes here with an empty Shell slice, this will panic. That may be impossible by construction, but it’s not enforced locally.

Consider adding a small invariant check (returning an error or similar) so a misconfigured builder can’t crash the worker:

if len(b.Shell) == 0 {
    return nil, fmt.Errorf("unixShell.Build: empty Shell slice")
}
cmd := b.Shell[0]
args := cloneArgs(b.Shell[1:])
internal/runtime/builtin/command/command.go (1)

252-305: Exit-code normalization and executor registration are reasonable

exitCodeFromError cleanly centralizes exit-code extraction from *exec.ExitError, returning 0 for nil and 1 for non‑exit errors, and it’s used in both the Start and Wait error paths so ExitCode() won’t remain 0 on failures. NewCommand/NewCommandConfig and the init registration of "", "shell", "command" executors also look coherent with the runtime/env design (Dir and Shell sourced from runtime.GetEnv(ctx) and UserSpecifiedShell derived from step.Shell != "").

If you ever expect wrapped errors around *exec.ExitError, consider switching to errors.As in exitCodeFromError for extra robustness, but it’s not strictly necessary with the current call sites.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4933434 and 5e1565d.

📒 Files selected for processing (9)
  • internal/common/config/env.go (1 hunks)
  • internal/common/config/env_unix.go (1 hunks)
  • internal/common/config/env_windows.go (1 hunks)
  • internal/common/fileutil/fileutil.go (1 hunks)
  • internal/runtime/builtin/command/command.go (6 hunks)
  • internal/runtime/builtin/command/command_script.go (1 hunks)
  • internal/runtime/builtin/command/command_windows.go (1 hunks)
  • internal/runtime/builtin/command/shell.go (1 hunks)
  • internal/runtime/builtin/command/shell_unix.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • internal/common/fileutil/fileutil.go
  • internal/common/config/env_unix.go
  • internal/runtime/builtin/command/command_windows.go
  • internal/common/config/env.go
  • internal/runtime/builtin/command/command_script.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/runtime/builtin/command/command.go (1)
internal/common/logger/tag/tag.go (3)
  • Dir (77-79)
  • Command (403-405)
  • Args (408-410)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (2)
internal/runtime/builtin/command/shell.go (1)

10-44: Shell interface and registry wiring look solid

The Shell interface, registry ordering, and findShell normalization (strings.ToLower(filepath.Base(cmd))) cleanly capture the “first match wins, unixShell as fallback” model. Given unixShell.Match always returns true, the loop guarantees a match as intended.

internal/runtime/builtin/command/command.go (1)

127-201: Routing command+script and shell-command flows through shellCommandBuilder looks correct

The two updated branches:

  • cfg.Command != "" && scriptFile != ""
  • len(cfg.Shell) > 0 && cfg.ShellCommandArgs != ""

now both use shellCommandBuilder with Dir, Command, Args, Shell, ShellCommandArgs, ShellPackages, and UserSpecifiedShell populated. Given the common post-build setup (cmd.Env, cmd.Dir, cmd.Stdout, cmd.Stderr, cmdutil.SetupCommand), this is consistent with the new shell abstraction and keeps Dir/script handling centralized.

I don’t see functional issues here as long as shellCommandBuilder.Build honours Dir and UserSpecifiedShell as expected.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/common/config/env_windows.go (2)

5-7: Relocate the init comment after imports.

The comment about init (lines 5-6) is placed before the import statement, which is unconventional. By Go convention, import statements typically follow immediately after the package declaration, with package-level comments placed after imports.

Apply this diff to improve code organization:

 package config
 
-// init populates the package's defaultWhitelist with common Windows environment
-// variable names so they are treated as whitelisted on Windows builds.
 import "strings"
 
+// init populates the package's defaultWhitelist with common Windows environment
+// variable names so they are treated as whitelisted on Windows builds.
+
 // normalizeEnvKey converts to uppercase on Windows for case-insensitive matching.

15-32: LGTM! Addresses the previous casing concern.

The uppercase whitelist keys combined with normalizeEnvKey in filterEnv (env.go:50) correctly resolves case-insensitive Windows environment variable matching. When os.Environ() returns variables like "ComSpec" or "COMSPEC", they are normalized to uppercase before the whitelist lookup, ensuring consistent matching.

The selected environment variables are appropriate for Windows: USERPROFILE for user home, SYSTEMROOT/WINDIR for the system directory, COMSPEC for the command interpreter, PATH/PATHEXT for executables, TEMP/TMP for temporary files, PSMODULEPATH for PowerShell, and HOME for Git/Go compatibility.

However, consider adding explicit tests for Windows-specific behavior: normalizeEnvKey case conversion and whitelist matching with case variations (e.g., verifying "ComSpec=cmd.exe" is properly filtered).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e1565d and b97e67e.

📒 Files selected for processing (4)
  • internal/common/config/env.go (2 hunks)
  • internal/common/config/env_unix.go (1 hunks)
  • internal/common/config/env_windows.go (1 hunks)
  • internal/common/fileutil/fileutil_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/common/config/env_unix.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/common/fileutil/fileutil_test.go (1)
internal/common/fileutil/fileutil.go (1)
  • IsFile (51-57)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (5)
internal/common/fileutil/fileutil_test.go (1)

191-221: LGTM! Comprehensive test coverage for IsFile.

The test suite properly covers the three main scenarios: regular file, directory, and non-existent path. The parallel test structure is correctly implemented—t.TempDir() ensures cleanup waits for all subtests, and each subtest uses unique filenames to avoid conflicts.

internal/common/config/env.go (3)

13-14: LGTM! Clear documentation of storage semantics.

The updated documentation explicitly clarifies that the slice is stored without copying and recommends defensive copying by callers when isolation is needed. This transparency about memory semantics helps prevent unintended mutations and follows Go best practices.


19-21: LGTM! Architectural improvement for platform-specific behavior.

Moving from a hard-coded whitelist to platform-specific initialization via init() functions improves maintainability and separation of concerns. This aligns with the PR objective to support OS-specific environments and addresses the maintainability concerns raised in the PR comments.


49-50: LGTM! Elegant solution for cross-platform key matching.

The introduction of normalizeEnvKey elegantly handles platform differences in environment variable casing. On Windows, it uppercases keys for case-insensitive matching; on Unix, it likely returns keys unchanged. This approach resolves the previous review concern about mixed casing in the Windows whitelist while maintaining clean, platform-agnostic code in the core filtering logic.

internal/common/config/env_windows.go (1)

9-13: No action required – cross-platform implementation is already in place.

The normalizeEnvKey function is correctly implemented with proper build tags: env_windows.go (//go:build windows) provides case-insensitive matching, while env_unix.go (//go:build !windows) returns the key unchanged. Go's build system automatically selects the correct version per platform, so env.go can safely call this function without compilation issues.

Likely an incorrect or invalid review comment.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/runtime/builtin/command/shell_cmd.go (2)

26-49: Path resolution is reasonable; consider COMSPEC for extra robustness

The resolveCmdPath flow (default C:\Windows\System32\cmd.exeSystemRoot\System32 → fall back to original) looks correct and should avoid most Go exec security restrictions around relative paths.

If you want to harden this further for non‑standard installs or customized command interpreters, consider optionally checking the COMSPEC environment variable first and validating it points to an existing cmd.exe before falling back to the hard‑coded/default locations. That would better honor user/OS configuration while keeping the same overall behavior.


51-74: Tighten Build arg handling and guard against empty script paths

The overall Build flow looks correct: resolve cmd.exe, preserve any shell args, ensure /c or /C is present, and append the normalized script path.

Two small maintainability/robustness tweaks you might consider:

  1. Avoid cloning b.Shell[1:] when the early b.Command != "" && b.Script != "" branch returns immediately.
  2. Only append the normalized script path if it’s non‑empty, so a future change that allows “no script” for cmd won’t result in cmd.exe /c "".

A refactor along these lines keeps semantics while tightening things up:

 func (s *cmdShell) Build(ctx context.Context, b *shellCommandBuilder) (*exec.Cmd, error) {
-	cmd := resolveCmdPath(b.Shell[0])
-	args := cloneArgs(b.Shell[1:])
-
-	// When running a command directly with a script, don't include cmd.exe arguments
-	if b.Command != "" && b.Script != "" {
-		args := cloneArgs(b.Args)
-		if b.Script != "" {
-			args = append(args, b.Script)
-		}
-		return exec.CommandContext(ctx, b.Command, args...), nil // nolint: gosec
-	}
-
-	args = append(args, b.Args...)
+	cmd := resolveCmdPath(b.Shell[0])
+
+	// When running a command directly with a script, don't include cmd.exe arguments
+	if b.Command != "" && b.Script != "" {
+		args := cloneArgs(b.Args)
+		args = append(args, b.Script)
+		return exec.CommandContext(ctx, b.Command, args...), nil // nolint: gosec
+	}
+
+	args := cloneArgs(b.Shell[1:])
+	args = append(args, b.Args...)

 	// cmd.exe uses /c instead of -c
 	if !slices.Contains(args, "/c") && !slices.Contains(args, "/C") {
 		args = append(args, "/c")
 	}

-	args = append(args, b.normalizeScriptPath())
+	if scriptPath := b.normalizeScriptPath(); scriptPath != "" {
+		args = append(args, scriptPath)
+	}

 	return exec.CommandContext(ctx, cmd, args...), nil // nolint: gosec
 }

This should keep behavior identical for the current flows while making the function a bit safer against future builder changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c67d100 and 4aade60.

📒 Files selected for processing (1)
  • internal/runtime/builtin/command/shell_cmd.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/runtime/builtin/command/shell_cmd.go (1)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (1)
internal/runtime/builtin/command/shell_cmd.go (1)

14-21: cmdShell.Match logic is clear and correct

The matcher cleanly handles both cmd.exe and cmd and aligns with the documented convention that names are lowercase and pathless. No changes needed here.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
internal/runtime/builtin/command/shell_unix.go (1)

68-92: Consider adding tcsh and csh to shell detection.

The isUnixLikeShell function covers common Bourne-compatible shells but omits csh/tcsh. While these shells don't support -e the same way, the function could be renamed or documented to clarify it specifically identifies "errexit-compatible" shells.

-// isUnixLikeShell reports whether the named shell is known to support the -e (errexit) flag.
-// It returns true for "sh", "bash", "zsh", "ksh", "ash", and "dash"; returns false for "fish", an empty string, or any unrecognized shell.
-func isUnixLikeShell(shell string) bool {
+// isErrexitCompatibleShell reports whether the named shell supports the -e (errexit) flag.
+// Returns true for POSIX-compatible shells (sh, bash, zsh, ksh, ash, dash).
+// Returns false for fish, csh, tcsh, empty string, or unrecognized shells.
+func isErrexitCompatibleShell(shell string) bool {
internal/runtime/builtin/command/command_test.go (1)

1380-1420: Consider adding timeout to concurrent test.

The concurrent execution test uses sleep 0.1 which could be flaky. Consider adding a test timeout or using a more deterministic synchronization mechanism.

 func TestCommandExecutor_ConcurrentRun(t *testing.T) {
 	if goruntime.GOOS == "windows" {
 		t.Skip("Skipping Unix-specific test on Windows")
 	}
+	t.Parallel()

 	step := core.Step{
 		Name:    "test",
-		Command: "sleep",
-		Args:    []string{"0.1"},
+		Command: "true", // Use a fast command instead of sleep
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 264d21b and bd2e4e6.

📒 Files selected for processing (6)
  • internal/runtime/builtin/command/command.go (7 hunks)
  • internal/runtime/builtin/command/command_test.go (8 hunks)
  • internal/runtime/builtin/command/shell_cmd.go (1 hunks)
  • internal/runtime/builtin/command/shell_nix.go (1 hunks)
  • internal/runtime/builtin/command/shell_powershell.go (1 hunks)
  • internal/runtime/builtin/command/shell_unix.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
internal/runtime/builtin/command/shell_nix.go (2)
internal/cmd/context.go (1)
  • Context (38-53)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
internal/runtime/builtin/command/shell_powershell.go (1)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
internal/runtime/builtin/command/command.go (2)
internal/common/logger/tag/tag.go (4)
  • Dir (77-79)
  • Command (403-405)
  • Args (408-410)
  • ExitCode (99-101)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
internal/runtime/builtin/command/command_test.go (4)
internal/runtime/subcmd.go (1)
  • Run (248-270)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
internal/common/logger/tag/tag.go (3)
  • Command (403-405)
  • Args (408-410)
  • Error (20-22)
internal/test/helper.go (1)
  • Contains (519-519)
internal/runtime/builtin/command/shell_cmd.go (1)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
internal/runtime/builtin/command/shell_unix.go (1)
internal/runtime/builtin/command/shell.go (1)
  • Shell (13-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (17)
internal/runtime/builtin/command/shell_powershell.go (2)

1-19: LGTM!

The Match implementation correctly identifies all common PowerShell executables (Windows PowerShell and PowerShell Core).


26-30: This design is intentional and does not require changes.

The code correctly distinguishes between running an explicit interpreter directly (e.g., python script.py) versus running a PowerShell script file. When both Command and Script are specified, bypassing PowerShell wrapping provides cross-platform consistency. This behavior is explicitly tested in TestCommandExecutor_CommandWithScript and is implemented identically in shell_cmd.go, confirming it's a deliberate architectural pattern.

internal/runtime/builtin/command/shell_cmd.go (3)

1-21: LGTM on package structure and Match implementation.

The cmdShell type correctly implements the Shell interface. The Match method properly handles both cmd.exe and cmd variants.


23-58: Solid fallback chain for resolving cmd.exe path.

The resolveCmdPath function implements a robust resolution strategy: COMSPEC → default path → SystemRoot fallback. This handles non-standard Windows installations correctly.


60-97: This concern is incorrect; b.Script and b.ShellCommandArgs are different fields with different purposes.

The normalizeScriptPath() method operates on b.ShellCommandArgs, not b.Script. In the script-only branch, b.Script is a direct file path already provided by the user, while in the command-string branch, b.ShellCommandArgs is a command string that may require path normalization for Windows script extensions. The two code paths intentionally handle different data structures and are not inconsistent.

Likely an incorrect or invalid review comment.

internal/runtime/builtin/command/shell_unix.go (2)

11-18: LGTM on fallback shell design.

The unixShell with Match always returning true correctly serves as the catch-all fallback in the shell registry.


20-66: Well-structured Build logic with proper errexit handling.

The three execution paths are cleanly separated. The automatic -e flag injection for Unix-like shells (when not user-specified) aligns with the PR objective of supporting failure on non-zero interim commands.

internal/runtime/builtin/command/shell_nix.go (2)

10-15: LGTM on Match implementation.

Clean and precise matching for nix-shell.


17-73: --run argument positioning is correct; the review concern is unfounded.

The code correctly positions --run and its command argument. When line 32 appends "--run" to args, subsequent calls to append(args, cmdStr) (lines 49, 59, 69) place the command string immediately after it in the slice, which is the correct order. The test "NixShellWithScriptOnly" validates this by expecting {"--run", "set -e; /tmp/script.sh"} consecutively in the args. Nix-shell requires --run to be followed directly by its command string, and the current implementation satisfies this requirement.

Extract the duplicated set -e prefix logic into a helper function. The prefix prepending pattern appears identically in three branches (lines 45–47, 56–58, 64–66). A helper function would improve readability and maintainability.

Likely an incorrect or invalid review comment.

internal/runtime/builtin/command/command.go (4)

70-75: Good security improvement with tighter directory permissions.

Changing to 0750 restricts group write and other access, following the principle of least privilege.


127-170: Consistent Dir propagation to shell command builders.

The Dir field is now properly passed to shellCommandBuilder across all code paths, ensuring working directory context is available for script path normalization.


256-267: Clean exit code extraction with proper error unwrapping.

Using errors.As correctly handles wrapped errors, making this more robust than type assertions.


269-307: Well-documented public API with proper init registration.

The docstrings clearly describe the behavior of NewCommand and NewCommandConfig. Registering for "", "shell", and "command" executor types provides good backward compatibility.

internal/runtime/builtin/command/command_test.go (4)

22-85: Comprehensive direct shell tests.

Good coverage of the direct shell execution paths including error cases for missing command field and string-only command syntax.


264-335: Excellent shell-script-only coverage across all shell types.

Testing each shell's script-only behavior (cmd, PowerShell, pwsh, bash, sh, nix-shell) ensures consistent behavior across platforms.


1099-1105: Appropriate platform-specific permission check.

Correctly skipping the Unix permission check on Windows where the permission model differs.


2179-2196: Good edge case test for scanner buffer limits.

Testing the 4KB buffer limit for shebang detection helps ensure the code handles edge cases gracefully.

}

func (s *powerShell) Build(ctx context.Context, b *shellCommandBuilder) (*exec.Cmd, error) {
cmd := b.Shell[0]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add bounds check for b.Shell[0].

Accessing b.Shell[0] without verifying that b.Shell is non-empty will panic if the slice is empty or nil.

Apply this diff to add defensive bounds checking:

 func (s *powerShell) Build(ctx context.Context, b *shellCommandBuilder) (*exec.Cmd, error) {
+	if len(b.Shell) == 0 {
+		return nil, fmt.Errorf("shell executable not specified")
+	}
 	cmd := b.Shell[0]

Note: You'll need to import "fmt" if not already imported.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cmd := b.Shell[0]
func (s *powerShell) Build(ctx context.Context, b *shellCommandBuilder) (*exec.Cmd, error) {
if len(b.Shell) == 0 {
return nil, fmt.Errorf("shell executable not specified")
}
cmd := b.Shell[0]
🤖 Prompt for AI Agents
In internal/runtime/builtin/command/shell_powershell.go around line 22, the code
unconditionally uses b.Shell[0] which will panic if b.Shell is nil or empty; add
a defensive bounds check that verifies len(b.Shell) > 0 before accessing index
0, return or wrap an error (using fmt.Errorf) if the slice is empty, and ensure
"fmt" is imported if not already present.

Comment on lines +34 to +37
if b.Script != "" {
args := []string{"-ExecutionPolicy", "Bypass", "-File", b.Script}
return exec.CommandContext(ctx, cmd, args...), nil // nolint: gosec
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Include script arguments in PowerShell -File invocation.

When running a PowerShell script with the -File flag, any additional arguments in b.Args should be passed to the script. Currently, only b.Script is included, which means script arguments are ignored.

Apply this diff to include script arguments:

 	// When running just a script file with PowerShell (no explicit command)
 	// e.g., powershell -ExecutionPolicy Bypass -File script.ps1
 	if b.Script != "" {
 		args := []string{"-ExecutionPolicy", "Bypass", "-File", b.Script}
+		args = append(args, b.Args...)
 		return exec.CommandContext(ctx, cmd, args...), nil // nolint: gosec
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if b.Script != "" {
args := []string{"-ExecutionPolicy", "Bypass", "-File", b.Script}
return exec.CommandContext(ctx, cmd, args...), nil // nolint: gosec
}
if b.Script != "" {
args := []string{"-ExecutionPolicy", "Bypass", "-File", b.Script}
args = append(args, b.Args...)
return exec.CommandContext(ctx, cmd, args...), nil // nolint: gosec
}
🤖 Prompt for AI Agents
In internal/runtime/builtin/command/shell_powershell.go around lines 34 to 37,
the code builds args for PowerShell with "-ExecutionPolicy", "Bypass", "-File",
b.Script but does not forward b.Args to the script; update the args construction
to append b.Args (if any) to the args slice so exec.CommandContext(ctx, cmd,
args...) passes script arguments through (preserve the existing nolint comment
and ensure b.Args is appended safely).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/common/fileutil/logutil.go (1)

314-322: Division‑by‑zero risk in estimateLineCount for files with no sampled newlines

If both startLineCount and middleLineCount are zero (e.g., a large file with a single very long line or very sparse newlines), avgLineLength will divide by zero, causing a panic. This is independent of the new encoding changes but is a correctness issue for large logs.

-    // Calculate average line length from samples
-    var avgLineLength float64
-    if fileSize > sampleSize*2 {
-        // Average of start and middle samples
-        avgLineLength = float64(sampleSize*2) / float64(startLineCount+middleLineCount)
-    } else {
-        // Just use start sample
-        avgLineLength = float64(sampleSize) / float64(startLineCount)
-    }
+    // Calculate average line length from samples
+    totalSampleLines := startLineCount + middleLineCount
+    if totalSampleLines == 0 {
+        // Fallback to exact count in degenerate cases (e.g. huge single-line files)
+        exactCount, err := countLinesExact(filePath)
+        if err != nil {
+            return 0, false, err
+        }
+        return exactCount, false, nil
+    }
+
+    var avgLineLength float64
+    if fileSize > sampleSize*2 {
+        // Average of start and middle samples
+        avgLineLength = float64(sampleSize*2) / float64(totalSampleLines)
+    } else {
+        // Just use start sample
+        avgLineLength = float64(sampleSize) / float64(startLineCount)
+    }

Also applies to: 325-329

🧹 Nitpick comments (2)
internal/common/fileutil/logutil.go (2)

40-176: Encoding decoder helper is comprehensive but could be made more data‑driven

The charset normalization and alias coverage here is extensive and well thought out. Long term, consider refactoring this giant switch into a table‑driven map (normalizedName → encoding) to reduce cognitive load and make it easier to add or tweak aliases.

-func getEncodingDecoder(charset string) *encoding.Decoder {
-    if charset == "" {
-        return nil
-    }
-
-    // Normalize the charset name for comparison
-    normalized := strings.ToLower(strings.ReplaceAll(charset, "_", "-"))
-    normalized = strings.ReplaceAll(normalized, " ", "-")
-
-    switch normalized {
-    // ...
-    }
-}
+var encodingDecoders = map[string]*encoding.Decoder{
+    // utf-8 / default: handled by nil
+    "euc-jp": japanese.EUCJP.NewDecoder(),
+    // (populate with the existing aliases and decoders)
+}
+
+func getEncodingDecoder(charset string) *encoding.Decoder {
+    if charset == "" {
+        return nil
+    }
+
+    normalized := strings.ToLower(strings.ReplaceAll(charset, "_", "-"))
+    normalized = strings.ReplaceAll(normalized, " ", "-")
+
+    if dec, ok := encodingDecoders[normalized]; ok {
+        return dec
+    }
+    // Unknown or utf-8 → nil (no decoding)
+    return nil
+}

538-544: Use strings.Builder or Join in ReadLogContent to avoid quadratic concatenation

Building content via repeated += on strings is O(n²) for many lines. Since you already have []string, you can construct the result more efficiently.

-func ReadLogContent(filePath string, options LogReadOptions) (string, int, int, bool, bool, error) {
-    result, err := ReadLogLines(filePath, options)
+func ReadLogContent(filePath string, options LogReadOptions) (string, int, int, bool, bool, error) {
+    result, err := ReadLogLines(filePath, options)
     if err != nil {
         return "", 0, 0, false, false, err
     }
 
-    content := ""
-    for i, line := range result.Lines {
-        content += line
-        if i < len(result.Lines)-1 {
-            content += "\n"
-        }
-    }
+    // Preserve existing behavior (newline-separated, no trailing newline)
+    content := strings.Join(result.Lines, "\n")
 
     return content, result.LineCount, result.TotalLines, result.HasMore, result.IsEstimate, nil
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd2e4e6 and 2e47bb5.

📒 Files selected for processing (3)
  • internal/common/fileutil/logutil.go (13 hunks)
  • internal/common/fileutil/logutil_test.go (3 hunks)
  • internal/service/frontend/api/v2/dagruns.go (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
internal/service/frontend/api/v2/dagruns.go (1)
api/v2/api.gen.go (4)
  • Head (946-946)
  • Tail (967-967)
  • Offset (952-952)
  • Limit (949-949)
internal/common/fileutil/logutil.go (1)
api/v2/api.gen.go (3)
  • Head (946-946)
  • Tail (967-967)
  • Offset (952-952)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (4)
internal/common/fileutil/logutil_test.go (1)

431-431: Tests correctly adapted to new decoder parameter

The extra nil argument at these call sites matches the updated helper signatures and preserves the previous behavior (no decoding for default UTF‑8 logs). No issues from the change itself.

Also applies to: 542-542, 680-680

internal/service/frontend/api/v2/dagruns.go (1)

311-315: Encoding plumbed through log read options consistently

Wiring Encoding: a.logEncodingCharset into all DAG/SubDAG log endpoints keeps pagination behavior the same while enabling charset-aware decoding. The pattern is consistent across all four handlers and looks correct.

Also applies to: 356-360, 491-495, 534-538

internal/common/fileutil/logutil.go (2)

24-29: LogReadOptions.Encoding addition is clear and backward compatible

Adding Encoding string to LogReadOptions cleanly extends the API without affecting existing callers; empty string naturally maps to “no decoding”. The inline comment documents expected values sufficiently for now.


202-205: Decoder propagation through log readers is consistent and safe

Deriving a single decoder from options.Encoding and threading it through readFirstLines, readLastLines, and readLinesRange, with conditional transform.NewReader wrapping, keeps the non‑UTF‑8 path localized and preserves behavior when Encoding is empty/UTF‑8 (decoder is nil). The ownership and lifetime of the decoder are confined to a single read path, so there are no obvious state or concurrency hazards.

Also applies to: 213-251, 357-397, 400-470, 473-527

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/common/config/loader_windows.go (1)

1-83: Windows code-page→charset mapping looks good; consider tests and documenting fallback behavior

The Windows implementation is clear: you read the system code page once and map it through a focused switch, with a sensible UTF‑8 fallback for anything unknown. That should give good defaults for the vast majority of real-world locales.

As non-blocking follow-ups, you might:

  • Add a small table-driven unit test for codePageToCharset (a few known pages plus an unknown) to prevent regressions.
  • Consider a brief comment or doc note about the fallback-to-UTF‑8 behavior so it’s explicit how truly unknown/legacy code pages will be treated.
internal/common/config/loader_unix.go (1)

1-131: Unix locale→charset detection is solid; minor hardening could be added later

The Unix-side logic is well-structured: LC_ALL/LANG precedence is correct, explicit encodings are parsed and normalized sensibly, and you fall back to UTF‑8 (via inferEncodingFromLocale) in ambiguous or modern-default cases. The normalization table covers the usual suspects (UTF‑8 variants, EUC/Shift‑JIS, GBK/Big5, ISO‑8859, KOI8).

If you want a bit more robustness later (non-blocking):

  • strings.TrimSpace on the initial locale/encoding values would make the parsing more tolerant of stray whitespace.
  • A few focused unit tests for normalizeUnixEncoding and getDefaultLogEncodingCharset (e.g., ja_JP.UTF-8, en_US.ISO-8859-1, C, POSIX) would lock in the behavior across platforms.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e47bb5 and ffd7d2d.

📒 Files selected for processing (3)
  • internal/common/config/loader.go (1 hunks)
  • internal/common/config/loader_unix.go (1 hunks)
  • internal/common/config/loader_windows.go (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (1)
internal/common/config/loader.go (1)

548-552: Locale-aware default for ui.logEncodingCharset is wired correctly

Switching the default from a hard-coded "utf-8" to getDefaultLogEncodingCharset() cleanly delegates charset choice to the new OS-specific helpers while still allowing config/env overrides through viper. The change is minimal, self-contained, and consistent with the rest of the loader flow.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/common/fileutil/logutil.go (1)

363-403: bufio.Scanner's default 64KB token limit requires explicit handling

While transform.NewReader wrapping the file is the correct pattern, using bufio.Scanner without calling scanner.Buffer() to configure the token size may fail on log files with lines exceeding ~64KB. Either call scanner.Buffer(make([]byte, initialSize), maxTokenSize) before the scan loop to set an appropriate maximum, or use bufio.Reader.ReadString('\n') instead for more robust handling of arbitrarily long lines. The bufio.ErrTooLong error handling is present but only catches the failure after it occurs; it doesn't prevent the limitation.

🧹 Nitpick comments (2)
internal/common/fileutil/logutil.go (2)

479-534: Consider consistent handling of very long lines in range reads

readLinesRange now correctly wraps the reader with transform.NewReader when a decoder is present, matching the other helpers. One small inconsistency:

  • readFirstLines and readLastLines special‑case bufio.ErrTooLong and return a friendly "Error: Line too long to read." message.
  • readLinesRange still propagates scanner.Err() directly, which will surface bufio.ErrTooLong as a raw error to callers of the range API.

If API symmetry matters for your callers, consider matching the head/tail behavior here:

-    if err := scanner.Err(); err != nil {
-        return nil, err
-    }
+    if err := scanner.Err(); err != nil {
+        if errors.Is(err, bufio.ErrTooLong) {
+            return &LogResult{Lines: []string{"Error: Line too long to read."}}, nil
+        }
+        return nil, err
+    }

40-176: Consider using ianaindex to reduce charset alias maintenance and improve unknown-encoding transparency

The manual alias table is comprehensive and correct, but long-term maintainability can be improved. Two points:

  1. Silent fallback for unknown charsets: Treating unrecognized charset names as UTF-8 without logging creates silent data loss. For troubleshooting garbled logs, either log a warning when charset is not recognized or optionally return an error. This surfaces the problem instead of masking it.

  2. Charset alias maintenance: The manual switch handles ~100+ aliases. Consider leveraging golang.org/x/text/encoding/ianaindex.MIME (or .IANA) instead. It provides case-insensitive lookup of IANA-registered charset names and returns encoding.Encoding objects compatible with your .NewDecoder() pattern. This eliminates bookkeeping and ensures consistency with IANA standards. Note: ianaindex.Encoding() returns (nil, nil) for valid but unsupported charsets (distinct from unknown names), so you'll need explicit handling for that case when adopting it.

These are non-blocking suggestions; the current implementation is functionally correct.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ffd7d2d and 271726f.

📒 Files selected for processing (1)
  • internal/common/fileutil/logutil.go (14 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/common/fileutil/logutil.go (1)
api/v2/api.gen.go (3)
  • Head (946-946)
  • Tail (967-967)
  • Offset (952-952)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (3)
internal/common/fileutil/logutil.go (3)

10-29: Struct and import changes look good

The added imports and the Encoding field on LogReadOptions are straightforward and keep defaults backward compatible (empty string → UTF-8 path via nil decoder). No issues here.


202-251: Decoder plumbing through ReadLogLines and helpers looks correct

Threading decoder := getEncodingDecoder(options.Encoding) through the head/tail/range helpers is clean and keeps UTF‑8 behavior unchanged (nil decoder). The control flow for Head, Tail, and Offset/Limit is preserved, and IsEstimate is still set only at the top level. No functional issues spotted in this wiring.


406-476: readLastLines decoder usage is correct; control flow preserved

The decoder integration mirrors readFirstLines: you correctly reuse the same decoder when delegating to readFirstLines for n >= totalLines and otherwise wrap the file via transform.NewReader. The ring‑buffer logic is unchanged and still safe under both decoded and undecoded paths. No correctness issues found.

Comment thread internal/common/fileutil/logutil.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
internal/common/fileutil/logutil.go (1)

315-319: Address the past review comment: mark zero-sample result as an estimate.

This issue was previously flagged: when totalSampleLines == 0, the function returns (1, false, nil), marking the result as exact even though it's a heuristic guess. The IsEstimate flag should be true to indicate that the line count is estimated.

Apply this diff:

 	if totalSampleLines == 0 {
 		// Avoid division by zero; assume 1 line
-		return 1, false, nil
+		return 1, true, nil
 	}
🧹 Nitpick comments (2)
internal/common/fileutil/logutil.go (2)

172-175: Consider logging or returning an error for unknown encodings.

Returning nil for unknown encodings silently treats them as UTF-8, which may mask configuration errors or typos in encoding names. Consider logging a warning to help users detect misconfigurations.


12-19: golang.org/x/text v0.29.0 is safe but not the latest version.

The project uses v0.29.0, which is free from known vulnerabilities. However, v0.31.0 is the current latest release (as of December 2025). While the current version is reasonably recent and secure, consider updating to v0.31.0 if newer features or performance improvements are beneficial.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 271726f and 028673c.

📒 Files selected for processing (3)
  • internal/common/config/loader_unix.go (1 hunks)
  • internal/common/config/loader_windows.go (1 hunks)
  • internal/common/fileutil/logutil.go (15 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/common/config/loader_unix.go
  • internal/common/config/loader_windows.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/common/fileutil/logutil.go (1)
api/v2/api.gen.go (3)
  • Head (946-946)
  • Tail (967-967)
  • Offset (952-952)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (6)
internal/common/fileutil/logutil.go (6)

10-19: LGTM! Imports are appropriate for encoding support.

The added imports support the new character encoding functionality across multiple language families (Japanese, Chinese, Korean, Latin, etc.).


24-29: LGTM! Clear field addition with helpful documentation.

The Encoding field and its comment appropriately document the supported character encodings.


363-403: LGTM! Decoder integration is correct.

The function properly wraps the file reader with transform.NewReader when a decoder is provided, enabling transparent character set conversion while reading.


406-476: LGTM! Decoder integration preserves ring-buffer logic.

The function correctly integrates the decoder while maintaining the existing ring-buffer approach for efficiently reading the last n lines.


479-534: LGTM! Consistent decoder integration.

The range-reading function follows the same decoder-wrapping pattern as the other helpers, ensuring consistent behavior across all reading modes.


544-544: LGTM! Using strings.Join is more efficient.

This is a better approach than string concatenation in a loop.

Comment thread internal/common/fileutil/logutil.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
internal/common/fileutil/logutil.go (1)

67-68: GB2312 decoder mapping is still incorrect.

This critical issue was previously identified and marked as addressed, but the code still contains the incorrect mapping. GB2312 is an 8-bit encoding, but the code maps it to HZGB2312.NewDecoder(), which decodes HZ-GB-2312 (a 7-bit ASCII-safe encoding). This will cause incorrect decoding of GB2312-encoded log files.

Apply this diff to fix the mapping:

 // Simplified Chinese encodings
 case "gb2312", "gb-2312", "csgb2312":
-    return simplifiedchinese.HZGB2312.NewDecoder()
+    return simplifiedchinese.GBK.NewDecoder()

Note: GBK is a superset of GB2312 and correctly decodes GB2312-encoded text.

🧹 Nitpick comments (2)
internal/common/config/loader_unix.go (1)

14-22: Consider checking LC_CTYPE for complete POSIX precedence.

The standard POSIX locale precedence is LC_ALL > LC_CTYPE > LANG. Since LC_CTYPE specifically controls character encoding behavior, it may be worth including it for completeness:

 func getDefaultLogEncodingCharset() string {
 	// Check LC_ALL first, then LANG (standard precedence)
 	locale := os.Getenv("LC_ALL")
 	if locale == "" {
+		locale = os.Getenv("LC_CTYPE")
+	}
+	if locale == "" {
 		locale = os.Getenv("LANG")
 	}
internal/common/fileutil/logutil.go (1)

167-170: Consider consistent BOM handling for explicitly-specified UTF-16 endianness.

When UTF-16BE is explicitly specified, the code uses IgnoreBOM, but UTF-16LE uses UseBOM. For consistency, consider using IgnoreBOM for UTF-16LE as well, since the endianness is already known:

 // Unicode variants
-case "utf-16", "utf16", "utf-16le", "utf16le":
-    return unicode.UTF16(unicode.LittleEndian, unicode.UseBOM).NewDecoder()
+case "utf-16", "utf16":
+    return unicode.UTF16(unicode.LittleEndian, unicode.UseBOM).NewDecoder()
+case "utf-16le", "utf16le":
+    return unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM).NewDecoder()
 case "utf-16be", "utf16be":
     return unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM).NewDecoder()

This makes the handling symmetric: when endianness is explicit (LE or BE), ignore the BOM; when unspecified, use the BOM to determine endianness.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 028673c and a7d7203.

📒 Files selected for processing (3)
  • internal/common/config/loader_unix.go (1 hunks)
  • internal/common/config/loader_windows.go (1 hunks)
  • internal/common/fileutil/logutil.go (16 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/common/fileutil/logutil.go (1)
api/v2/api.gen.go (3)
  • Head (946-946)
  • Tail (967-967)
  • Offset (952-952)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (13)
internal/common/config/loader_windows.go (2)

15-18: LGTM!

Clean delegation to codePageToCharset with the system ANSI code page.


22-51: LGTM!

The code page mapping logic is well-structured. The early return for UTF-8 (65001) avoids unnecessary IANA lookup, and the default case correctly handles Windows code pages using the standard "windows-XXX" format that IANA recognizes.

internal/common/config/loader_unix.go (1)

43-53: LGTM!

Clean normalization helper that correctly uses the IANA index for canonical charset name resolution.

internal/common/fileutil/logutil.go (10)

10-19: LGTM!

The imports are appropriate for the encoding support functionality. All packages are used within the file.


23-29: LGTM!

The addition of the Encoding field is well-documented with clear examples. The enhanced comments for other fields improve usability.


202-203: LGTM!

The decoder is obtained once and efficiently passed through to all helper functions. The integration is clean and consistent across all code paths.

Also applies to: 213-213, 223-223, 237-237, 250-250


315-319: LGTM!

The division-by-zero guard is correctly implemented. Returning IsEstimate: true properly indicates that the single-line count is a heuristic for files without newlines in the samples.


363-363: LGTM!

The decoder integration in readFirstLines follows the correct pattern: conditionally wrapping the file reader with a transform.NewReader when a decoder is provided. The implementation is clean and maintains backward compatibility.

Also applies to: 372-378


406-406: LGTM!

The decoder integration in readLastLines is consistent with readFirstLines. The delegation to readFirstLines correctly passes the decoder, and the transform wrapping follows the same pattern.

Also applies to: 428-435, 439-439


479-479: LGTM!

The decoder integration in readLinesRange maintains the same consistent pattern as the other helper functions. All three helpers now uniformly support encoding transformation.

Also applies to: 502-508


547-547: LGTM!

Using strings.Join is more efficient than concatenating strings in a loop. Good optimization.


49-51: LGTM!

The charset normalization logic properly handles common variations (case, underscores, spaces), allowing flexible specification while maintaining consistency.


172-175: LGTM!

Returning nil for unknown encodings (treating them as UTF-8) is a pragmatic default. This provides graceful degradation rather than failing hard, which is appropriate for a log reading utility.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
internal/common/fileutil/logutil.go (4)

24-38: Encoding option in LogReadOptions/LogResult looks reasonable; consider documenting accepted labels

The added Encoding string on LogReadOptions plus the existing IsEstimate flag on LogResult give callers enough control and feedback. To avoid confusion when users pass OS-specific names (e.g., Windows code pages), it would help to clearly document which charset labels are supported and that ""/UTF‑8 use the “no decoding” fast path.


365-405: Decoder plumbing through readFirstLines / readLastLines / readLinesRange is consistent; consider scanner buffer tuning

Passing a single *encoding.Decoder from ReadLogLines into all three helpers and wrapping via transform.NewReader keeps the behavior uniform for non‑UTF‑8 encodings, and error handling for bufio.ErrTooLong is preserved.

One possible follow‑up: if you expect very long log lines (especially in multibyte encodings like UTF‑16 or Shift‑JIS), you may want to raise the scanner’s max token size to avoid tripping the “Line too long to read.” path:

-	scanner := bufio.NewScanner(reader)
+	scanner := bufio.NewScanner(reader)
+	// Optionally allow longer log lines, e.g. up to 1MB.
+	scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)

You could factor that into a small helper to keep the three functions aligned.

Also applies to: 408-478, 481-539


204-205: Decoder integration in ReadLogLines is straightforward; consider whether unknown encodings should be surfaced

Using getEncodingDecoder(options.Encoding) once in ReadLogLines and threading the decoder into all code paths (head, tail, range, default range) is clean and minimizes repeated charset lookups. The only behavioral question is whether silently treating unknown encodings as UTF‑8 (via nil decoder) is what you want operationally; if not, this is the natural place to fail fast once getEncodingDecoder starts returning an error.

Also applies to: 215-216, 225-226, 239-240, 252-253


549-550: ReadLogContent join is fine; verify trailing newline semantics vs previous behavior

Switching to strings.Join(result.Lines, "\n") is clearer and avoids manual string concatenation. It does, however, fix the exact placement of newlines to “between lines only”. If the old implementation happened to append a trailing newline after the last line, this would be a subtle behavior change. Please double‑check any callers/tests that might rely on a final newline when returning full‑file content.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a7d7203 and 172ca9a.

📒 Files selected for processing (1)
  • internal/common/fileutil/logutil.go (16 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test on ubuntu-latest
🔇 Additional comments (1)
internal/common/fileutil/logutil.go (1)

317-321: Zero-sample guard in estimateLineCount correctly avoids divide-by-zero and marks result as heuristic

The new totalSampleLines == 0 check with return 1, true, nil fixes the divide‑by‑zero case for large files with no \n in the sampled regions and correctly marks the count as an estimate. This aligns with the earlier review feedback without changing behavior for normal files.

Comment on lines +40 to +178
// getEncodingDecoder returns an encoding.Decoder for the given charset name.
// Returns nil for UTF-8 or empty charset (no decoding needed).
// Supports a wide range of encodings including Japanese, Chinese, Korean,
// and various ISO-8859 and Windows code pages.
func getEncodingDecoder(charset string) *encoding.Decoder {
if charset == "" {
return nil
}

// Normalize the charset name for comparison
normalized := strings.ToLower(strings.ReplaceAll(charset, "_", "-"))
normalized = strings.ReplaceAll(normalized, " ", "-")

switch normalized {
// UTF-8 (no decoder needed)
case "utf-8", "utf8":
return nil

// Japanese encodings
case "euc-jp", "eucjp":
return japanese.EUCJP.NewDecoder()
case "shift-jis", "shiftjis", "sjis", "s-jis", "x-sjis", "ms-kanji", "csshiftjis":
return japanese.ShiftJIS.NewDecoder()
case "iso-2022-jp", "iso2022jp", "csiso2022jp":
return japanese.ISO2022JP.NewDecoder()

// Simplified Chinese encodings
case "gb2312", "gb-2312", "csgb2312":
return simplifiedchinese.GBK.NewDecoder()
case "gbk", "cp936", "ms936", "windows-936":
return simplifiedchinese.GBK.NewDecoder()
case "gb18030":
return simplifiedchinese.GB18030.NewDecoder()
case "hz-gb-2312", "hz":
return simplifiedchinese.HZGB2312.NewDecoder()

// Traditional Chinese encodings
case "big5", "big-5", "csbig5", "x-x-big5", "cn-big5":
return traditionalchinese.Big5.NewDecoder()

// Korean encodings
case "euc-kr", "euckr", "cseuckr", "ks-c-5601-1987", "ksc5601", "iso-ir-149", "korean":
return korean.EUCKR.NewDecoder()

// ISO-8859 encodings (Latin character sets)
case "iso-8859-1", "iso88591", "latin1", "latin-1", "l1", "csisolatin1", "iso-ir-100", "ibm819", "cp819":
return charmap.ISO8859_1.NewDecoder()
case "iso-8859-2", "iso88592", "latin2", "latin-2", "l2", "csisolatin2", "iso-ir-101":
return charmap.ISO8859_2.NewDecoder()
case "iso-8859-3", "iso88593", "latin3", "latin-3", "l3", "csisolatin3", "iso-ir-109":
return charmap.ISO8859_3.NewDecoder()
case "iso-8859-4", "iso88594", "latin4", "latin-4", "l4", "csisolatin4", "iso-ir-110":
return charmap.ISO8859_4.NewDecoder()
case "iso-8859-5", "iso88595", "cyrillic", "iso-ir-144", "csisolatincyrillic":
return charmap.ISO8859_5.NewDecoder()
case "iso-8859-6", "iso88596", "arabic", "iso-ir-127", "csisolatinarabic", "ecma-114", "asmo-708":
return charmap.ISO8859_6.NewDecoder()
case "iso-8859-7", "iso88597", "greek", "greek8", "iso-ir-126", "csisolatingreek", "ecma-118", "elot-928":
return charmap.ISO8859_7.NewDecoder()
case "iso-8859-8", "iso88598", "hebrew", "iso-ir-138", "csisolatinhebrew":
return charmap.ISO8859_8.NewDecoder()
case "iso-8859-9", "iso88599", "latin5", "latin-5", "l5", "iso-ir-148", "csisolatin5", "turkish":
return charmap.ISO8859_9.NewDecoder()
case "iso-8859-10", "iso885910", "latin6", "latin-6", "l6", "iso-ir-157", "csisolatin6":
return charmap.ISO8859_10.NewDecoder()
case "iso-8859-13", "iso885913", "latin7", "latin-7", "l7":
return charmap.ISO8859_13.NewDecoder()
case "iso-8859-14", "iso885914", "latin8", "latin-8", "l8", "iso-ir-199", "iso-celtic":
return charmap.ISO8859_14.NewDecoder()
case "iso-8859-15", "iso885915", "latin9", "latin-9", "l9", "latin0":
return charmap.ISO8859_15.NewDecoder()
case "iso-8859-16", "iso885916", "latin10", "latin-10", "l10", "iso-ir-226":
return charmap.ISO8859_16.NewDecoder()

// Windows code pages
case "windows-1250", "cp1250", "x-cp1250":
return charmap.Windows1250.NewDecoder()
case "windows-1251", "cp1251", "x-cp1251":
return charmap.Windows1251.NewDecoder()
case "windows-1252", "cp1252", "x-cp1252", "ansi":
return charmap.Windows1252.NewDecoder()
case "windows-1253", "cp1253", "x-cp1253":
return charmap.Windows1253.NewDecoder()
case "windows-1254", "cp1254", "x-cp1254":
return charmap.Windows1254.NewDecoder()
case "windows-1255", "cp1255", "x-cp1255":
return charmap.Windows1255.NewDecoder()
case "windows-1256", "cp1256", "x-cp1256":
return charmap.Windows1256.NewDecoder()
case "windows-1257", "cp1257", "x-cp1257":
return charmap.Windows1257.NewDecoder()
case "windows-1258", "cp1258", "x-cp1258":
return charmap.Windows1258.NewDecoder()

// Cyrillic encodings
case "koi8-r", "koi8r", "cskoi8r":
return charmap.KOI8R.NewDecoder()
case "koi8-u", "koi8u":
return charmap.KOI8U.NewDecoder()

// IBM code pages
case "ibm437", "cp437", "437", "cspc8codepage437":
return charmap.CodePage437.NewDecoder()
case "ibm850", "cp850", "850", "cspc850multilingual":
return charmap.CodePage850.NewDecoder()
case "ibm852", "cp852", "852":
return charmap.CodePage852.NewDecoder()
case "ibm855", "cp855", "855":
return charmap.CodePage855.NewDecoder()
case "ibm858", "cp858", "858":
return charmap.CodePage858.NewDecoder()
case "ibm860", "cp860", "860":
return charmap.CodePage860.NewDecoder()
case "ibm862", "cp862", "862":
return charmap.CodePage862.NewDecoder()
case "ibm863", "cp863", "863":
return charmap.CodePage863.NewDecoder()
case "ibm865", "cp865", "865":
return charmap.CodePage865.NewDecoder()
case "ibm866", "cp866", "866", "csibm866":
return charmap.CodePage866.NewDecoder()

// Mac encodings
case "macintosh", "mac", "macroman", "csmacintosh", "x-mac-roman":
return charmap.Macintosh.NewDecoder()

// Unicode variants
case "utf-16", "utf16":
return unicode.UTF16(unicode.LittleEndian, unicode.UseBOM).NewDecoder()
case "utf-16le", "utf16le":
return unicode.UTF16(unicode.LittleEndian, unicode.IgnoreBOM).NewDecoder()
case "utf-16be", "utf16be":
return unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM).NewDecoder()

default:
// Unknown encoding, return nil (treat as UTF-8)
return nil
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Encoding decoder table is solid; add a few common aliases and consider handling unknown encodings explicitly

The normalization and breadth of aliases here are good, and mapping GB2312 → simplifiedchinese.GBK is the right call given GBK is a superset. Two small refinements:

  1. You’re missing some very common Windows aliases, especially for Shift‑JIS. For example, cp932 is how Windows typically labels Shift‑JIS. Right now that will fall through to the default and be treated as UTF‑8, producing mojibake. Suggest at least:
-	case "shift-jis", "shiftjis", "sjis", "s-jis", "x-sjis", "ms-kanji", "csshiftjis":
+	case "shift-jis", "shiftjis", "sjis", "s-jis", "x-sjis", "ms-kanji", "csshiftjis", "cp932", "windows-932":
 		return japanese.ShiftJIS.NewDecoder()
  1. For unknown charset names you currently return nil and silently treat the file as UTF‑8. That’s safe but can hide configuration/typing errors. Consider optionally surfacing this as an error (e.g., by changing getEncodingDecoder to return (*encoding.Decoder, error) and propagating a clear “unsupported encoding” message), or at least logging a warning path so operators can see why their logs are garbled.

@yohamta0 yohamta0 merged commit e75d14f into main Dec 4, 2025
5 checks passed
@yohamta0 yohamta0 deleted the windows-specific-envs branch December 4, 2025 10:33
@codecov

codecov Bot commented Dec 4, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.29032% with 156 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.96%. Comparing base (8b340e6) to head (172ca9a).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
internal/common/fileutil/logutil.go 14.92% 110 Missing and 4 partials ⚠️
internal/runtime/builtin/command/command_script.go 68.18% 10 Missing and 4 partials ⚠️
internal/runtime/builtin/command/shell_cmd.go 71.79% 7 Missing and 4 partials ⚠️
internal/common/config/loader_unix.go 59.09% 5 Missing and 4 partials ⚠️
internal/runtime/builtin/command/command.go 83.33% 1 Missing and 2 partials ⚠️
internal/runtime/builtin/command/shell_unix.go 94.87% 1 Missing and 1 partial ⚠️
internal/runtime/builtin/gha/executor.go 0.00% 2 Missing ⚠️
internal/runtime/builtin/command/shell.go 83.33% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1439      +/-   ##
==========================================
- Coverage   60.13%   59.96%   -0.18%     
==========================================
  Files         176      187      +11     
  Lines       20756    21022     +266     
==========================================
+ Hits        12482    12605     +123     
- Misses       6975     7104     +129     
- Partials     1299     1313      +14     
Files with missing lines Coverage Δ
internal/common/config/env.go 90.47% <100.00%> (ø)
internal/common/config/env_unix.go 100.00% <100.00%> (ø)
internal/common/config/loader.go 91.86% <100.00%> (ø)
internal/common/fileutil/fileutil.go 37.34% <100.00%> (+4.01%) ⬆️
...nternal/runtime/builtin/command/command_builder.go 100.00% <100.00%> (ø)
internal/runtime/builtin/command/command_unix.go 100.00% <100.00%> (ø)
internal/runtime/builtin/command/shell_direct.go 100.00% <100.00%> (ø)
internal/runtime/builtin/command/shell_nix.go 100.00% <100.00%> (ø)
...ternal/runtime/builtin/command/shell_powershell.go 100.00% <100.00%> (ø)
internal/runtime/builtin/command/shell.go 83.33% <83.33%> (ø)
... and 7 more

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b340e6...172ca9a. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants