I've been using Elastic stack for Log Analytics use case for a few years since v1.4.0 with more than 30 types of logs. I think most of us know the challenge of not having common fields across log types, so we came up with our internal core and standard fields to enable log correlation. When I heard about the release of ECS, I thought I can just look at the standard document and easily map our current fields to ECS fields.
That's not really the case, at least for me. I guess some new Elastic users may have the same frustration when looking to adopt ECS.
The goal of ECS is to enable and encourage users of Elasticsearch to normalize their event data, so that they can better analyze, visualize, and correlate the data represented in their events.
ECS fields follow a series of guidelines, to ensure a consistent and predictable feel, across various use cases.
If Elastic expects the community to adopt ECS, that means ECS should be as simple as possible, not confusing. I've read and re-read the ECS document multiple times and end up having to search through issues in the Github repo to know how to properly map a field to ECS. Almost have to pull a few people in a meeting when we have a new field.
I like the idea of grouping related fields into fieldsets, but I think there is room for improvement.
source/destination vs. client/server
source/client and server/destination have the same nested fields, so why don't we just choose one pair and get rid of the other? source/destination pair is more generic and will fit most use cases.
Client / server representations can add semantic context to an exchange, which is helpful to visualize the data in certain situations. If your context falls in that category, you should still ensure that source and destination are filled appropriately.
I think this approach is unnecessarily complicated. It confuses users and can significantly increase the storage size at the same time if we are looking at billions of events. Netflow, firewall, web access, or IPS/IDS logs can all use source/destination pair. With two pairs, the questions users face are:
- When to populate source/destination?
- When to populate client/server?
- When to populate two pairs?
- Which pair should I use when searching by field:value syntax?
- Should correlation be done on one pair or the other?
These questions aren't answered in the current ECS document.
log vs. event
I have the same confusion with these two field sets. I believe ECS is designed for Log Analytics use case, which aims to support the new Kibana SIEM app. Usually, any log message that enters the SIEM is treated as an event, so event field set is a great one. Why also log to confuse users? If I want to map attributes of a message to ECS, do I have to jump between log and event field sets?
log.level vs. event.severity? and someone has proposed a event.level here #129
log.original vs. event.original? Would most users know which field to use by reading the ECS document? @ruflin clarified them in #127, but I don't think that's enough. Why don't we just stick with event field set and add event.raw, event.original, event.normalized/transformed or similar nested fields to support the need.
There are many nested fields under the event field set to describe an event. Will the log field set continue to add the same nested fields and end up with the same situation as source/destination and client/server?
I propose getting rid of log field set and keep only event field set.
event field set
Has someone looked at the nested fields under event field set and not found them confusing?
- event.category
- event.dataset
- event.kind
- event.module
- event.type
event.dataset vs. event.module?
event.category vs. event.kind?
Some of these fields come with the following warning:
Warning: In future versions of ECS, we plan to provide a list of acceptable values for this field, please use with caution.
I read the warning as "Don't use these nested fields yet".
The release of ECS was a bit late but a great step for Log Analytics use cases. I'm looking forward to migrating to ECS once the standard is less confusing.
I've been using Elastic stack for Log Analytics use case for a few years since v1.4.0 with more than 30 types of logs. I think most of us know the challenge of not having common fields across log types, so we came up with our internal core and standard fields to enable log correlation. When I heard about the release of ECS, I thought I can just look at the standard document and easily map our current fields to ECS fields.
That's not really the case, at least for me. I guess some new Elastic users may have the same frustration when looking to adopt ECS.
If Elastic expects the community to adopt ECS, that means ECS should be as simple as possible, not confusing. I've read and re-read the ECS document multiple times and end up having to search through issues in the Github repo to know how to properly map a field to ECS. Almost have to pull a few people in a meeting when we have a new field.
I like the idea of grouping related fields into fieldsets, but I think there is room for improvement.
source/destination vs. client/server
source/client and server/destination have the same nested fields, so why don't we just choose one pair and get rid of the other? source/destination pair is more generic and will fit most use cases.
I think this approach is unnecessarily complicated. It confuses users and can significantly increase the storage size at the same time if we are looking at billions of events. Netflow, firewall, web access, or IPS/IDS logs can all use source/destination pair. With two pairs, the questions users face are:
These questions aren't answered in the current ECS document.
log vs. event
I have the same confusion with these two field sets. I believe ECS is designed for Log Analytics use case, which aims to support the new Kibana SIEM app. Usually, any log message that enters the SIEM is treated as an event, so event field set is a great one. Why also log to confuse users? If I want to map attributes of a message to ECS, do I have to jump between log and event field sets?
log.level vs. event.severity? and someone has proposed a event.level here #129
log.original vs. event.original? Would most users know which field to use by reading the ECS document? @ruflin clarified them in #127, but I don't think that's enough. Why don't we just stick with event field set and add event.raw, event.original, event.normalized/transformed or similar nested fields to support the need.
There are many nested fields under the event field set to describe an event. Will the log field set continue to add the same nested fields and end up with the same situation as source/destination and client/server?
I propose getting rid of log field set and keep only event field set.
event field set
Has someone looked at the nested fields under event field set and not found them confusing?
event.dataset vs. event.module?
event.category vs. event.kind?
Some of these fields come with the following warning:
I read the warning as "Don't use these nested fields yet".
The release of ECS was a bit late but a great step for Log Analytics use cases. I'm looking forward to migrating to ECS once the standard is less confusing.