Observability Architecture

Context

I realized that architecting a comprehensive observability solution is a task many people struggles with. Because of that, I propose a well-architected framework for some topological scenarios. Each scenario intend to represent a relevant category of computing environment similar to the ones that I've found in the market after some research.

Scenario 1 - on premises environment for a medium-size organization

Environment assumptions

on premises environment comprised of:

vmWare vSphere Cluster over an array of physical servers
some Dell EMC Unity XT storages appliances
a SAN network based on Brocade switches, connecting the storages and the servers
Fortinet Switches and Firewalls with central management into Fortigate
the vSphere cluster contains multiple VMs, some of them are Windows Server, and others are based on Linux
there are some kubernetes clusters based on linux nodes. Some nodes are VMs on the vSphere cluster and some others are bare-metal physical servers
all the physical servers are Dell with Dell idRac management interface
there are severam databases and other softwares that are deployed among the environment
there is a kafka cluster based on Strimzi operator, which is installed into one of the kubernetes clusters
there is a S3 cluster based on Min.io
there is no availability to use public cloud services

Proposed architecture

The proposed architecture consists on the following guidelines:

the log delivery system must be reliable and logs should not be lost
we intend to centralize logs into an unified bus where any solution may ingest or consume log entries, as long as it has the proper permissions to do so
the logs may contain sensitive data and need protection against unauthorized access
logs must be retained for long term and there must be ways to make queries against them, even though queries against historical logs may consume long time to process
recent logs must be easily queried and all the queries must comprise with some latency and availability SLOs
metrics must be stored for long term, bus it is acceptable to compress metrics to longer time periods for historic metrics
metrics must be stored in unified format for many sources
all the long-term stored data must be based on S3 standard on the Min.io cluster

Architecture blueprint

Scenario 2 - on premises environment for a large organization, based on platform engineering and microservices

Environment assumptions

on premises environment comprised of:

large kubernetes clusters
each microservice has its own namespace
each developer team is responsible for a group of microservices
each developer team consumes internal services that are provided by platforms inside the organization
the team responsible for the observability platform should provide a standard self service catalog, so that teams who own microservices consume those services to observe their servicesby their own according to the corporate standards, assuming responsability on their own services' observability
there is no availability to use public cloud services

Proposed architecture

The proposed architecture consists on the following guidelines:

the log delivery system must be reliable and logs should not be lost
each microservice's log, metrics and traces should be consolidated into an auxiliary namespace with a 1:1 mapping to each microservices' namespaces, so that the microservice owner team is also the owner of the auxiliary resources
the platform team consumes the log, metrics and traces from each namespace and ingests filtered data to a central data lake
under situations where the scope of a query or analysis is restricted to that microservice, only the components inside the specific namespace should be used
it is assumed that the microservice owner team should also have access to its logs, metrics and traces
there is no need to retain observability data for long term inside the specific namespaces resources, all of the necessary long term retention will be handled by the observability platform transparently to their customers

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
architecture-1.png		architecture-1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Observability Architecture

Context

Table of contents

Scenario 1 - on premises environment for a medium-size organization

Environment assumptions

Proposed architecture

Architecture blueprint

Scenario 2 - on premises environment for a large organization, based on platform engineering and microservices

Environment assumptions

Proposed architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Observability Architecture

Context

Table of contents

Scenario 1 - on premises environment for a medium-size organization

Environment assumptions

Proposed architecture

Architecture blueprint

Scenario 2 - on premises environment for a large organization, based on platform engineering and microservices

Environment assumptions

Proposed architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages