Add architecture and imphash for PE field set by andrewstucki · Pull Request #763 · elastic/ecs

andrewstucki · 2020-02-26T18:07:10Z

So, this PR adds the fields imphash and architecture to the PE field set. Both are commonly used in PE parsing tools and in the security industry (see fields for Imphash and Target Machine). A couple of things to throw out there that people may have in mind:

Imphash isn't really a pure "hash" per se, it's more of a feature fingerprint, and it's highly PE-specific, so I didn't extend hash with it.
Architecture could also go under dll or process, but:
a. we'd have to dup the field
b. most of the time under process it's going to be the same as the host architecture unless you're running in some sort of execution subsystem like WSL or WINE for linux or something like that.
c. there's some differences between file formats that support multiple architectures (i.e. fat binaries) and those that don't

So, the thought was due to the above reasons, these fields should exist as a subset of pe since they are tied to the file format itself. Thoughts on getting this in for 1.5?

gabriellandau · 2020-02-26T18:15:20Z

generated/beats/fields.ecs.yml

+      type: keyword
+      ignore_above: 1024
+      description: CPU architecture target for the file.
+      example: x64


Do we care if we make our own values here or should we use the ones that Microsoft defined? For example, in the sensor outputs x64 but Microsoft uses the nomenclature AMD64 (IMAGE_FILE_MACHINE_AMD64)

My thought was to normalize it like VirusTotal does, but not entirely sure if we'd have to be strict about this.

If there's a clear set of instructions we can give on how this should be normalized (e.g. linking to another source) we should do that now.

If there isn't, we can leave this up to the source, and only address later, only if needed.

The thinking: we have to balance the amount of work required by sources to get the normalization right. So I think it's fine to tighten this later, only if needed.

marshallmain · 2020-02-26T18:31:28Z

The PE field set is windows specific though, right? Wouldn't we want architecture to be OS agnostic and not tied to the PE fields?

gabriellandau · 2020-02-26T18:34:47Z

b. most of the time under process it's going to be the same as the host architecture

In many cases yes. In every WOW64 Windows process, however, you will have a combination of 32- and 64-bit DLLs loaded. The x64 DLLs implement the WOW64 emulation layer, among other things. Some security products will inject x64 hook DLLs into WOW64 processes.

Here's an example WOW64 process where you can see several x64 DLLs loaded from C:\Windows\System32

C:\Users\user\Desktop>listdlls 7064

Listdlls v3.2 - Listdlls
Copyright (C) 1997-2016 Mark Russinovich
Sysinternals

------------------------------------------------------------------------------
powershell.exe pid: 7064
Command line: "C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe"

Base                Size      Path
0x0000000000fc0000  0x6c000   C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe
0x0000000081cf0000  0x1e1000  C:\Windows\SYSTEM32\ntdll.dll
0x0000000077310000  0x52000   C:\Windows\System32\wow64.dll
0x0000000077370000  0x78000   C:\Windows\System32\wow64win.dll
0x0000000077300000  0xa000    C:\Windows\System32\wow64cpu.dll
0x0000000000fc0000  0x6c000   C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe
0x00000000773f0000  0x190000  C:\Windows\SysWOW64\ntdll.dll
0x00000000751b0000  0xe0000   C:\Windows\SysWOW64\KERNEL32.DLL
0x0000000073e40000  0x1e4000  C:\Windows\SysWOW64\KERNELBASE.dll
0x0000000074b80000  0xbf000   C:\Windows\SysWOW64\msvcrt.dll
0x0000000074ff0000  0x96000   C:\Windows\SysWOW64\OLEAUT32.dll
0x00000000747a0000  0x7d000   C:\Windows\SysWOW64\msvcp_win.dll
0x0000000075090000  0x11d000  C:\Windows\SysWOW64\ucrtbase.dll
0x0000000075400000  0x25c000  C:\Windows\SysWOW64\combase.dll
0x0000000074080000  0xc0000   C:\Windows\SysWOW64\RPCRT4.dll
0x0000000073cb0000  0x20000   C:\Windows\SysWOW64\SspiCli.dll
0x0000000073ca0000  0xa000    C:\Windows\SysWOW64\CRYPTBASE.dll
0x0000000073d50000  0x57000   C:\Windows\SysWOW64\bcryptPrimitives.dll
0x0000000074e10000  0x44000   C:\Windows\SysWOW64\sechost.dll
0x0000000075660000  0x78000   C:\Windows\SysWOW64\ADVAPI32.dll
0x0000000074820000  0xfc000   C:\Windows\SysWOW64\OLE32.dll
0x0000000074030000  0x22000   C:\Windows\SysWOW64\GDI32.dll
0x0000000074e80000  0x164000  C:\Windows\SysWOW64\gdi32full.dll
0x0000000075b70000  0x18d000  C:\Windows\SysWOW64\USER32.dll
0x0000000074e60000  0x17000   C:\Windows\SysWOW64\win32u.dll
0x000000006a040000  0x18000   C:\Windows\SysWOW64\ATL.DLL
0x0000000069fe0000  0x55000   C:\Windows\SysWOW64\mscoree.dll
0x0000000075d80000  0x26000   C:\Windows\SysWOW64\IMM32.DLL
0x0000000074c40000  0xf000    C:\Windows\SysWOW64\kernel.appcore.dll
0x00000000731a0000  0x7c000   C:\Windows\SysWOW64\uxtheme.dll
0x0000000073db0000  0x83000   C:\Windows\SysWOW64\clbcatq.dll
0x00000000741e0000  0x5bc000  C:\Windows\SysWOW64\windows.storage.dll
0x0000000074c50000  0x45000   C:\Windows\SysWOW64\shlwapi.dll
0x0000000074af0000  0x88000   C:\Windows\SysWOW64\shcore.dll
0x0000000073ce0000  0x18000   C:\Windows\SysWOW64\profapi.dll
0x0000000077250000  0x45000   C:\Windows\SysWOW64\powrprof.dll
0x0000000074ac0000  0x8000    C:\Windows\SysWOW64\FLTLIB.DLL
0x0000000073960000  0x181000  C:\Windows\SysWOW64\propsys.dll
0x0000000075f00000  0x134e000  C:\Windows\SysWOW64\shell32.dll
0x00000000741a0000  0x39000   C:\Windows\SysWOW64\cfgmgr32.dll
0x0000000069ec0000  0xb000    C:\Windows\SysWOW64\LINKINFO.dll
0x000000006d490000  0xc8000   C:\Windows\SysWOW64\ntshrui.dll
0x000000006f9d0000  0x1c000   C:\Windows\SysWOW64\srvcli.dll
0x000000006d480000  0xf000    C:\Windows\SysWOW64\cscapi.dll
0x000000006d340000  0x73000   C:\Windows\SysWOW64\policymanager.dll
0x000000006ea70000  0x67000   C:\Windows\SysWOW64\msvcp110_win.dll
0x0000000073370000  0x19000   C:\Windows\SysWOW64\bcrypt.dll
0x0000000069f50000  0x8d000   C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscoreei.dll
0x0000000073940000  0x8000    C:\Windows\SysWOW64\VERSION.dll
0x0000000068f90000  0x7b0000  C:\Windows\Microsoft.NET\Framework\v4.0.30319\clr.dll
0x0000000069f30000  0x14000   C:\Windows\SysWOW64\VCRUNTIME140_CLR0400.dll
0x0000000069e10000  0xab000   C:\Windows\SysWOW64\ucrtbase_clr0400.dll
0x0000000067b80000  0x140e000  C:\Windows\assembly\NativeImages_v4.0.30319_32\mscorlib\92d30f3dd1d092e15dd783b14354d8ea\mscorlib.ni.dll
0x00000000733c0000  0x13000   C:\Windows\SysWOW64\CRYPTSP.dll
0x0000000073390000  0x2f000   C:\Windows\SysWOW64\rsaenh.dll
0x0000000069d80000  0x89000   C:\Windows\Microsoft.NET\Framework\v4.0.30319\clrjit.dll
0x0000000067120000  0xa55000  C:\Windows\assembly\NativeImages_v4.0.30319_32\System\0eeb857977bf8c4a1fcc54528b8f5853\System.ni.dll
0x0000000066900000  0x818000  C:\Windows\assembly\NativeImages_v4.0.30319_32\System.Core\ee717223ade9f8db56b8e1279d794680\System.Core.ni.dll
0x0000000073cd0000  0x6000    C:\Windows\SysWOW64\psapi.dll
0x0000000073d00000  0x47000   C:\Windows\SysWOW64\wintrust.dll
0x0000000074060000  0xe000    C:\Windows\SysWOW64\MSASN1.dll
0x0000000074920000  0x195000  C:\Windows\SysWOW64\CRYPT32.dll
0x0000000066180000  0x774000  C:\Windows\assembly\NativeImages_v4.0.30319_32\System.Xml\8424b694e88098301691974f4abc2099\System.Xml.ni.dll
0x0000000069f10000  0x1e000   C:\Windows\SysWOW64\gpapi.dll
0x000000006d3c0000  0x15000   C:\Windows\SysWOW64\wldp.dll
0x0000000075b10000  0x60000   C:\Windows\SysWOW64\coml2.dll
0x0000000069ef0000  0xf000    C:\Windows\SysWOW64\amsi.dll
0x0000000073430000  0x22000   C:\Windows\SysWOW64\USERENV.dll
0x0000000069a90000  0x80000   C:\Windows\assembly\NativeImages_v4.0.30319_32\Microsoft.Mf49f6405#\8e60287f3ad05ee5e34d1c308fbf7e27\Microsoft.Management.Infrastructure.ni.dll
0x00000000658b0000  0x354000  C:\Windows\Microsoft.Net\assembly\GAC_32\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll
0x0000000075d10000  0x67000   C:\Windows\SysWOW64\WS2_32.dll
0x0000000069990000  0x4c000   C:\Windows\Microsoft.Net\assembly\GAC_32\System.Transactions\v4.0_4.0.0.0__b77a5c561934e089\System.Transactions.dll
0x0000000073950000  0xa000    C:\Windows\SysWOW64\secur32.dll
0x00000000657a0000  0x105000  C:\Windows\assembly\NativeImages_v4.0.30319_32\System.Configuration\076fae3efb073c01a3a155a96928574b\System.Configuration.ni.dll

andrewstucki · 2020-02-26T18:43:02Z

@marshallmain So that's what I was getting at with:

Architecture could also go under dll or process, but:
a. we'd have to dup the field
b. most of the time under process it's going to be the same as the host architecture unless you're running in some sort of execution subsystem like WSL or WINE for linux or something like that.
c. there's some differences between file formats that support multiple architectures (i.e. fat binaries) and those that don't

Basically shared libraries/"dll"s under linux and Mac systems can have multiple architectures tied to them, in Windows, they're single-valued. In the case of process there's not necessarily a one-to-one mapping between the architecture in the pe file itself and what mode the process is actually executing in.

So, the story around architecture gets kind of confusing between what exists at a higher level construct like a "dll" and what is a particular feature of an object file format itself. If for some reason we want to add something like architecture to "dll" instead of here then when the DLL is a PE file we'd just always have a single valued array that we'd be passing back.

rw-access · 2020-02-26T18:46:49Z

i agree. since PE already exists, anything that belongs in a PE header is fair game to add.
architecture in any other place (other than host) could introduce some ambiguity.

andrewstucki · 2020-02-26T20:08:06Z

going to wait for @webmat 's sign-off since I believe this would add to the 1.5 release scope. Ah, that reminds me... Changelog entry linking this PR...

webmat · 2020-04-01T19:36:47Z

@elasticmachine, run elasticsearch-ci/docs

rw-access · 2020-04-01T20:05:22Z

schemas/pe.yml

+    - name: architecture
+      level: extended
+      type: keyword
+      description: CPU architecture target for the file.


Is it worth adding a note that this is not necessarily the architecture of the machine itself?

I'll leave this one up to you guys.

From the schema POV, there's sometimes a reflex to over-explain, as if we were telling people how to implement the source itself (e.g. a compiler actually populating PE headers in an executable), when in fact the schema's role is simply to explain where to get the data (e.g. getting the "architecture" header from the PE headers) and how to interpret it when looking at events that populate these.

But if you think there's a disconnect to explain or point out between pe.architecture and host.architecture, for example, yeah I think this may make sense.

@rw-access I think that the fact that it says, 'for the file' seems fine to me, and I don't think we should over-explain as I would imagine people who were filling in this field via parsing pe headers would know how to do it. But, if you're thinking it's still vague, we can tighten it up.

webmat

Thanks @andrewstucki

I noted a few things to adjust or discuss further, but nothing big.
I'll trust Endpoint's instinct on whether .architecture should be normalized or where to add it (sounds like pe.architecture is fine and straightforward, so 👍). But from the schema POV, I think it's fine in some cases to not normalize at first, and add instructions to normalize only if it becomes needed.

CHANGELOG.next.md

webmat · 2020-04-01T19:45:09Z

generated/beats/fields.ecs.yml

+      type: keyword
+      ignore_above: 1024
+      description: CPU architecture target for the file.
+      example: x64


If there's a clear set of instructions we can give on how this should be normalized (e.g. linking to another source) we should do that now.

If there isn't, we can leave this up to the source, and only address later, only if needed.

The thinking: we have to balance the amount of work required by sources to get the normalization right. So I think it's fine to tighten this later, only if needed.

schemas/pe.yml

Co-Authored-By: Mathieu Martin <webmat@gmail.com>

andrewstucki · 2020-04-04T14:12:13Z

@webmat updated some of the verbage like you requested and merged master, so if you're 👍 I'll merge

webmat

LGTM

You had requested @dcode's review as well so I'm not merging right away. I'll let you merge whenever you're ready (remove @dcode or after his review).

If you want me to do the merge for any reason, let me know.

* Add architecture and imphash for PE field set * Add changelog entry * Update schemas/pe.yml Co-Authored-By: Mathieu Martin <webmat@gmail.com> Co-authored-by: Mathieu Martin <webmat@gmail.com>

Add architecture and imphash for PE field set

bd385ec

andrewstucki requested review from dcode, rw-access and webmat February 26, 2020 18:07

gabriellandau reviewed Feb 26, 2020

View reviewed changes

rw-access approved these changes Feb 26, 2020

View reviewed changes

gabriellandau approved these changes Feb 26, 2020

View reviewed changes

Add changelog entry

0bef404

webmat added the 1.6.0 label Mar 25, 2020

rw-access reviewed Apr 1, 2020

View reviewed changes

webmat reviewed Apr 1, 2020

View reviewed changes

andrewstucki and others added 2 commits April 4, 2020 09:46

Update schemas/pe.yml

6b11695

Co-Authored-By: Mathieu Martin <webmat@gmail.com>

Merge branch 'master' of github.com:elastic/ecs into imphash

6ca7236

webmat approved these changes Apr 6, 2020

View reviewed changes

andrewstucki removed the request for review from dcode April 6, 2020 18:26

andrewstucki merged commit 1f7fa10 into elastic:master Apr 6, 2020

andrewstucki deleted the imphash branch April 6, 2020 18:27

leehinman mentioned this pull request Aug 18, 2020

[Meta] support new ECS 1.6 fields elastic/beats#19472

Closed

22 tasks

Conversation

andrewstucki commented Feb 26, 2020

Uh oh!

gabriellandau Feb 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewstucki Feb 26, 2020

Choose a reason for hiding this comment

Uh oh!

webmat Apr 1, 2020

Choose a reason for hiding this comment

Uh oh!

marshallmain commented Feb 26, 2020

Uh oh!

gabriellandau commented Feb 26, 2020

Uh oh!

andrewstucki commented Feb 26, 2020

Uh oh!

rw-access commented Feb 26, 2020

Uh oh!

andrewstucki commented Feb 26, 2020

Uh oh!

webmat commented Apr 1, 2020

Uh oh!

rw-access Apr 1, 2020

Choose a reason for hiding this comment

Uh oh!

webmat Apr 1, 2020

Choose a reason for hiding this comment

Uh oh!

andrewstucki Apr 4, 2020

Choose a reason for hiding this comment

Uh oh!

webmat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

webmat Apr 1, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

andrewstucki commented Apr 4, 2020

Uh oh!

webmat left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gabriellandau Feb 26, 2020 •

edited

Loading