Add architecture and imphash for PE field set#763
Add architecture and imphash for PE field set#763andrewstucki merged 4 commits intoelastic:masterfrom
Conversation
| type: keyword | ||
| ignore_above: 1024 | ||
| description: CPU architecture target for the file. | ||
| example: x64 |
There was a problem hiding this comment.
Do we care if we make our own values here or should we use the ones that Microsoft defined? For example, in the sensor outputs x64 but Microsoft uses the nomenclature AMD64 (IMAGE_FILE_MACHINE_AMD64)
There was a problem hiding this comment.
My thought was to normalize it like VirusTotal does, but not entirely sure if we'd have to be strict about this.
There was a problem hiding this comment.
If there's a clear set of instructions we can give on how this should be normalized (e.g. linking to another source) we should do that now.
If there isn't, we can leave this up to the source, and only address later, only if needed.
The thinking: we have to balance the amount of work required by sources to get the normalization right. So I think it's fine to tighten this later, only if needed.
|
The PE field set is windows specific though, right? Wouldn't we want architecture to be OS agnostic and not tied to the PE fields? |
In many cases yes. In every WOW64 Windows process, however, you will have a combination of 32- and 64-bit DLLs loaded. The x64 DLLs implement the WOW64 emulation layer, among other things. Some security products will inject x64 hook DLLs into WOW64 processes. Here's an example WOW64 process where you can see several x64 DLLs loaded from |
|
@marshallmain So that's what I was getting at with:
Basically shared libraries/"dll"s under linux and Mac systems can have multiple architectures tied to them, in Windows, they're single-valued. In the case of So, the story around |
|
i agree. since PE already exists, anything that belongs in a PE header is fair game to add. |
|
going to wait for @webmat 's sign-off since I believe this would add to the 1.5 release scope. Ah, that reminds me... Changelog entry linking this PR... |
|
@elasticmachine, run elasticsearch-ci/docs |
| - name: architecture | ||
| level: extended | ||
| type: keyword | ||
| description: CPU architecture target for the file. |
There was a problem hiding this comment.
Is it worth adding a note that this is not necessarily the architecture of the machine itself?
There was a problem hiding this comment.
I'll leave this one up to you guys.
From the schema POV, there's sometimes a reflex to over-explain, as if we were telling people how to implement the source itself (e.g. a compiler actually populating PE headers in an executable), when in fact the schema's role is simply to explain where to get the data (e.g. getting the "architecture" header from the PE headers) and how to interpret it when looking at events that populate these.
But if you think there's a disconnect to explain or point out between pe.architecture and host.architecture, for example, yeah I think this may make sense.
There was a problem hiding this comment.
@rw-access I think that the fact that it says, 'for the file' seems fine to me, and I don't think we should over-explain as I would imagine people who were filling in this field via parsing pe headers would know how to do it. But, if you're thinking it's still vague, we can tighten it up.
webmat
left a comment
There was a problem hiding this comment.
Thanks @andrewstucki
I noted a few things to adjust or discuss further, but nothing big.
I'll trust Endpoint's instinct on whether .architecture should be normalized or where to add it (sounds like pe.architecture is fine and straightforward, so 👍). But from the schema POV, I think it's fine in some cases to not normalize at first, and add instructions to normalize only if it becomes needed.
| type: keyword | ||
| ignore_above: 1024 | ||
| description: CPU architecture target for the file. | ||
| example: x64 |
There was a problem hiding this comment.
If there's a clear set of instructions we can give on how this should be normalized (e.g. linking to another source) we should do that now.
If there isn't, we can leave this up to the source, and only address later, only if needed.
The thinking: we have to balance the amount of work required by sources to get the normalization right. So I think it's fine to tighten this later, only if needed.
Co-Authored-By: Mathieu Martin <webmat@gmail.com>
|
@webmat updated some of the verbage like you requested and merged master, so if you're 👍 I'll merge |
* Add architecture and imphash for PE field set * Add changelog entry * Update schemas/pe.yml Co-Authored-By: Mathieu Martin <webmat@gmail.com> Co-authored-by: Mathieu Martin <webmat@gmail.com>
So, this PR adds the fields
imphashandarchitectureto the PE field set. Both are commonly used in PE parsing tools and in the security industry (see fields forImphashandTarget Machine). A couple of things to throw out there that people may have in mind:hashwith it.dllorprocess, but:a. we'd have to dup the field
b. most of the time under
processit's going to be the same as the host architecture unless you're running in some sort of execution subsystem like WSL or WINE for linux or something like that.c. there's some differences between file formats that support multiple architectures (i.e. fat binaries) and those that don't
So, the thought was due to the above reasons, these fields should exist as a subset of
pesince they are tied to the file format itself. Thoughts on getting this in for 1.5?