While putting together dashboards for the Auditbeat system module I realized that with the current data model it's not possible to visualize the number of processes, sockets, users, and packages since there is no way to identify a unique entity.
For example, each process can and often will have multiple events of different types (when it starts, when it ends, and when it's reported by a regular state update). There's no one identifying field at the moment to count them properly: process names, executables, args, pid, ppid are all not unique. The same for sockets, users, and packages. Only the host dataset has a host.id field already that should be unique.
I'm proposing to introduce new fields that identify those entities.
As a field name, I'm still torn between:
{entity}.id - we already have host.id doing the same for hosts, but unfortunately, there is user.id so that wouldn't work there.
{entity}.hash - not filled anywhere afaik, so no conflicts. But doesn't follow the convention of host.id.
As for the value, I'm thinking a hash of some of the fields of the entity and the host.id:
- Process:
pid + start + host.id
- Socket:
inode + source.ip + source.port + destination.ip + destination.port + host.id. The possibility of inode reuse with the same IP/port combinations seems remote.
- User:
user.id + user.name + host.id. At least on Linux, this is not foolproof as the user could be deleted and re-created. But I don't think there's really anything we can do since on Linux /etc/passwd is just a text file that can be theoretically edited at will. More likely is that multiple users would share a UID ("virtual" users, some mailservers do this, e.g. Dovecot) in which case our UID to username lookup in various places would probably be off already.
- Package:
name + version + host.id. I guess it would be possible to remove a package and install it with the same name and version but from a different source and we'd treat it as the same, but again I don't think there's much we can do about that. At the moment, at least.
- Login: Just a note here, the
login does not send state and so cardinality is not a problem - every event is unique. That only applies when all the data is from the system module though, not when logins are also reported by the auditd module. For that, we would need an ID that is stable across modules. But I think we can treat that as out of scope for now.
While putting together dashboards for the Auditbeat system module I realized that with the current data model it's not possible to visualize the number of processes, sockets, users, and packages since there is no way to identify a unique entity.
For example, each process can and often will have multiple events of different types (when it starts, when it ends, and when it's reported by a regular state update). There's no one identifying field at the moment to count them properly: process names, executables, args, pid, ppid are all not unique. The same for sockets, users, and packages. Only the host dataset has a
host.idfield already that should be unique.I'm proposing to introduce new fields that identify those entities.
As a field name, I'm still torn between:
{entity}.id- we already havehost.iddoing the same for hosts, but unfortunately, there isuser.idso that wouldn't work there.{entity}.hash- not filled anywhere afaik, so no conflicts. But doesn't follow the convention ofhost.id.As for the value, I'm thinking a hash of some of the fields of the entity and the
host.id:pid + start + host.idinode + source.ip + source.port + destination.ip + destination.port + host.id. The possibility of inode reuse with the same IP/port combinations seems remote.user.id + user.name + host.id. At least on Linux, this is not foolproof as the user could be deleted and re-created. But I don't think there's really anything we can do since on Linux/etc/passwdis just a text file that can be theoretically edited at will. More likely is that multiple users would share a UID ("virtual" users, some mailservers do this, e.g. Dovecot) in which case our UID to username lookup in various places would probably be off already.name + version + host.id. I guess it would be possible to remove a package and install it with the same name and version but from a different source and we'd treat it as the same, but again I don't think there's much we can do about that. At the moment, at least.logindoes not send state and so cardinality is not a problem - every event is unique. That only applies when all the data is from the system module though, not when logins are also reported by theauditdmodule. For that, we would need an ID that is stable across modules. But I think we can treat that as out of scope for now.