Skip to content

[Enhancement] PowerShell - Optimize Entropy Calculation, Add Normalized Entropy, Add Pipeline Benchmark#16707

Merged
w0rk3r merged 6 commits intomainfrom
posh_entropy_2
Jan 26, 2026
Merged

[Enhancement] PowerShell - Optimize Entropy Calculation, Add Normalized Entropy, Add Pipeline Benchmark#16707
w0rk3r merged 6 commits intomainfrom
posh_entropy_2

Conversation

@w0rk3r
Copy link
Copy Markdown
Contributor

@w0rk3r w0rk3r commented Dec 26, 2025

Proposed commit message

windows: refine PowerShell script entropy pipeline

Replace code-point HashMap counting with a fixed 65k UTF-16 char histogram
and skip truncated signature fragments before entropy is computed. Add a
normalized entropy field scaled by script length (0–1).

Summary

Related issue:

This PR:

  • Replaces code‑point HashMap counting with a fixed 65k UTF‑16 char histogram for script entropy, reducing the script processor time and improving eps (2924 → 4873 eps in warm run).
  • Skips truncated signature fragments before entropy is computed.
  • Adds powershell.file.script_block_entropy_normalized = entropy_bits / log2(script_block_length) (0–1).
  • Adds benchmark fixtures to track performance regressions during our research.

Old pipeline:

image

Improved pipeline:

image
Complete benchmark output

Old:

PS C:\Users\Jonhnathan\Documents\Github\integrations\packages\windows> .\..\..\elastic-package.exe benchmark pipeline --data-streams powershell_operational --use-test-samples=false
Run pipeline benchmarks for the package
--- Benchmark results for package: windows - START ---
╭─────────────────────────╮
│ parameters              │
├──────────────────┬──────┤
│ source_doc_count │   11 │
│ doc_count        │ 2500 │
╰──────────────────┴──────╯
╭───────────────────────────╮
│ pipeline_performance      │
├─────────────────┬─────────┤
│ processing_time │   1.10s │
│ eps             │ 2278.94 │
╰─────────────────┴─────────╯
╭────────────────────────────────────────╮
│ procs_by_total_time                    │
├───────────────────────────────┬────────┤
│ script @ default.yml:322      │ 47.49% │
│ gsub @ default.yml:305        │ 30.36% │
│ fingerprint @ default.yml:311 │  3.19% │
│ set @ default.yml:60          │  2.10% │
│ script @ default.yml:13       │  1.82% │
│ gsub @ default.yml:316        │  1.09% │
│ script @ default.yml:30       │  1.00% │
│ remove @ default.yml:575      │  0.55% │
│ rename @ default.yml:290      │  0.18% │
│ trim @ default.yml:302        │  0.18% │
╰───────────────────────────────┴────────╯
╭─────────────────────────────────────────╮
│ procs_by_avg_time_per_doc               │
├───────────────────────────────┬─────────┤
│ script @ default.yml:322      │ 208.4µs │
│ gsub @ default.yml:305        │ 133.2µs │
│ fingerprint @ default.yml:311 │    14µs │
│ set @ default.yml:60          │   9.2µs │
│ script @ default.yml:13       │     8µs │
│ gsub @ default.yml:316        │   4.8µs │
│ script @ default.yml:30       │   4.4µs │
│ remove @ default.yml:575      │   2.4µs │
│ rename @ default.yml:290      │   800ns │
│ trim @ default.yml:302        │   800ns │
╰───────────────────────────────┴─────────╯

--- Benchmark results for package: windows - END   ---
Done
--- Benchmark results for package: windows - START ---
╭─────────────────────────╮
│ parameters              │
├──────────────────┬──────┤
│ source_doc_count │   11 │
│ doc_count        │ 2500 │
╰──────────────────┴──────╯
╭───────────────────────────╮
│ pipeline_performance      │
├─────────────────┬─────────┤
│ processing_time │   0.85s │
│ eps             │ 2923.98 │
╰─────────────────┴─────────╯
╭────────────────────────────────────────╮
│ procs_by_total_time                    │
├───────────────────────────────┬────────┤
│ script @ default.yml:322      │ 50.53% │
│ gsub @ default.yml:305        │ 34.15% │
│ fingerprint @ default.yml:311 │  2.57% │
│ gsub @ default.yml:316        │  1.17% │
│ script @ default.yml:13       │  0.70% │
│ set @ default.yml:60          │  0.58% │
│ remove @ default.yml:575      │  0.35% │
│ script @ default.yml:30       │  0.35% │
│ rename @ default.yml:290      │  0.12% │
╰───────────────────────────────┴────────╯
╭─────────────────────────────────────────╮
│ procs_by_avg_time_per_doc               │
├───────────────────────────────┬─────────┤
│ script @ default.yml:322      │ 172.8µs │
│ gsub @ default.yml:305        │ 116.8µs │
│ fingerprint @ default.yml:311 │   8.8µs │
│ gsub @ default.yml:316        │     4µs │
│ script @ default.yml:13       │   2.4µs │
│ set @ default.yml:60          │     2µs │
│ remove @ default.yml:575      │   1.2µs │
│ script @ default.yml:30       │   1.2µs │
│ rename @ default.yml:290      │   400ns │
╰───────────────────────────────┴─────────╯

--- Benchmark results for package: windows - END   ---
Done

Improved:

PS C:\Users\Jonhnathan\Documents\Github\integrations\packages\windows> .\..\..\elastic-package.exe benchmark pipeline --data-streams powershell_operational --use-test-samples=false
Run pipeline benchmarks for the package
--- Benchmark results for package: windows - START ---
╭─────────────────────────╮
│ parameters              │
├──────────────────┬──────┤
│ source_doc_count │   11 │
│ doc_count        │ 2500 │
╰──────────────────┴──────╯
╭───────────────────────────╮
│ pipeline_performance      │
├─────────────────┬─────────┤
│ processing_time │   0.51s │
│ eps             │ 4892.37 │
╰─────────────────┴─────────╯
╭────────────────────────────────────────╮
│ procs_by_total_time                    │
├───────────────────────────────┬────────┤
│ gsub @ default.yml:305        │ 55.19% │
│ script @ default.yml:322      │ 28.18% │
│ fingerprint @ default.yml:311 │  4.11% │
│ gsub @ default.yml:316        │  1.96% │
│ script @ default.yml:13       │  0.59% │
│ remove @ default.yml:657      │  0.39% │
│ rename @ default.yml:290      │  0.20% │
│ set @ default.yml:60          │  0.20% │
╰───────────────────────────────┴────────╯
╭─────────────────────────────────────────╮
│ procs_by_avg_time_per_doc               │
├───────────────────────────────┬─────────┤
│ gsub @ default.yml:305        │ 112.8µs │
│ script @ default.yml:322      │  57.6µs │
│ fingerprint @ default.yml:311 │   8.4µs │
│ gsub @ default.yml:316        │     4µs │
│ script @ default.yml:13       │   1.2µs │
│ remove @ default.yml:657      │   800ns │
│ rename @ default.yml:290      │   400ns │
│ set @ default.yml:60          │   400ns │
╰───────────────────────────────┴─────────╯

--- Benchmark results for package: windows - END   ---
Done
--- Benchmark results for package: windows - START ---
╭─────────────────────────╮
│ parameters              │
├──────────────────┬──────┤
│ source_doc_count │   11 │
│ doc_count        │ 2500 │
╰──────────────────┴──────╯
╭───────────────────────────╮
│ pipeline_performance      │
├─────────────────┬─────────┤
│ processing_time │   0.51s │
│ eps             │ 4873.29 │
╰─────────────────┴─────────╯
╭────────────────────────────────────────╮
│ procs_by_total_time                    │
├───────────────────────────────┬────────┤
│ gsub @ default.yml:305        │ 57.89% │
│ script @ default.yml:322      │ 25.93% │
│ fingerprint @ default.yml:311 │  3.51% │
│ gsub @ default.yml:316        │  1.95% │
│ script @ default.yml:13       │  0.78% │
│ remove @ default.yml:657      │  0.39% │
│ set @ default.yml:60          │  0.19% │
╰───────────────────────────────┴────────╯
╭─────────────────────────────────────────╮
│ procs_by_avg_time_per_doc               │
├───────────────────────────────┬─────────┤
│ gsub @ default.yml:305        │ 118.8µs │
│ script @ default.yml:322      │  53.2µs │
│ fingerprint @ default.yml:311 │   7.2µs │
│ gsub @ default.yml:316        │     4µs │
│ script @ default.yml:13       │   1.6µs │
│ remove @ default.yml:657      │   800ns │
│ set @ default.yml:60          │   400ns │
╰───────────────────────────────┴─────────╯

--- Benchmark results for package: windows - END   ---
Done

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

@w0rk3r w0rk3r self-assigned this Dec 26, 2025
@w0rk3r w0rk3r requested review from a team as code owners December 26, 2025 22:21
@w0rk3r w0rk3r added enhancement New feature or request Integration:windows Windows Team:Security-Windows Platform Security Windows Platform team [elastic/sec-windows-platform] labels Dec 26, 2025
@w0rk3r w0rk3r requested review from faec and mauri870 December 26, 2025 22:21
@elasticmachine
Copy link
Copy Markdown

Pinging @elastic/sec-windows-platform (Team:Security-Windows Platform)

@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Agent Data Plane team [elastic/elastic-agent-data-plane] label Jan 4, 2026
@elasticmachine
Copy link
Copy Markdown

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@mauri870 mauri870 self-requested a review January 5, 2026 12:15
Copy link
Copy Markdown
Member

@mauri870 mauri870 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'm not very proficient with PowerShell. The code looks fine, but it needs a deeper look from the Windows team.

@andrewkroh andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label Jan 8, 2026

double normalizedEntropy = 0.0;
if (length > 1) {
double maxEntropy = Math.log((double) length) * invLog2; // max bits if every character is unique
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the normalized entropy calculation looks good 👍

Few notes for posterity:

  • For the line double maxEntropy = Math.log((double) length) * invLog2; // max bits if every character is unique I think it makes sense to use length here. Typical normalized entropy calculations (like that for R/Posterior ref) would use something akin to seenCount instead of length. However, this is expecting the input to be more akin to categories where a and a are equivalent regardless of their position in the script block. In our case, I think we want the position to mater as well, so each value is by definition unique making length the correct number to use here (as is correctly done in the code).
  • The pre-output check else if (normalizedEntropy > 1.0) normalizedEntropy = 1.0; I think is technically not necessary, as this should not occur. However, I think we should keep this check as it could catch floating point rounding issues without impacting the integrity of the data result (code is correct as is).

@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@elasticmachine
Copy link
Copy Markdown

💚 Build Succeeded

History

cc @w0rk3r

@w0rk3r w0rk3r merged commit da83fd3 into main Jan 26, 2026
8 checks passed
@w0rk3r w0rk3r deleted the posh_entropy_2 branch January 26, 2026 23:31
@elastic-vault-github-plugin-prod
Copy link
Copy Markdown

Package windows - 3.4.0 containing this change is available at https://epr.elastic.co/package/windows/3.4.0/

jakubgalecki0 pushed a commit to jakubgalecki0/integrations that referenced this pull request Feb 19, 2026
…ed Entropy, Add Pipeline Benchmark (elastic#16707)

* [Enhancement] PowerShell - Optimize Entropy Calculation, Add Normalized Entropy, Add Pipeline Benchmark

* Update test-powershell-operational-events.json-expected.json

* Update changelog.yml

* rename benchmark file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. enhancement New feature or request Integration:windows Windows Team:Elastic-Agent-Data-Plane Agent Data Plane team [elastic/elastic-agent-data-plane] Team:Security-Windows Platform Security Windows Platform team [elastic/sec-windows-platform]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants