Skip to content

Collect metrics from Fedora CoreOS machines #86

@bgilbert

Description

@bgilbert

The idea of a Linux distribution collecting metrics about its installed base has long been controversial. Many Linux users do not want their operating system to phone home about the state of their system. At the same time, it is difficult to effectively allocate development resources for an operating system whose installed base is not well understood. Decisions often need to be made about what platforms or container runtimes to prioritize; what hardware, system services, or system or network configurations are commonly used; which corner cases need to be especially well tested during upgrades; and which third-party services provide important compatibility constraints. Metrics can inform these decisions, which benefits the operating system and the userbase as a whole. External mechanisms for measuring the installed base, such as analysis of logs from download mirrors, are inaccurate at best; much better data can be collected from installed machines directly.

We would like to collect metrics from Fedora CoreOS systems by default. We want to be clear about exactly what will be collected, how it will be used, and how to disable that collection. We may also allow opting into collection of additional metrics beyond the default set.

Background: Container Linux

Container Linux metrics are collected via the update system. CoreUpdate collects metrics about each machine that checks in: its update channel, its state in the client state machine, what OS version is running, what version was originally installed, the OEM ID (platform) of the machine, and its checkin history. This works okay but gives us an incomplete picture of the installed base: we do not receive any information about machines behind private CoreUpdate servers, behind a third-party update server such as CoreRoller, or which have updates disabled.

Fedora CoreOS

Fedora CoreOS will not couple metrics to its update system. This not only allows greater freedom in the design of both, but allows privacy-conscious users to disable metrics while continuing to receive automatic updates. (In any event, the Cincinnati update protocol (#83) is not designed to collect client metrics.) We will need a separate metrics-collection system, including:

  • A client daemon or timer unit to perform the collection
  • A cloud service to collect and aggregate metrics data

Initial metrics might include:

  • Random unique identifier for deduplicating reports
  • Platform (cloud environment or hypervisor)
  • On bare-metal systems, a summary of hardware
  • On cloud systems, the instance type
  • Original OS version
  • Current OS version
  • Container runtimes in use
  • Summary of network configuration

Metrics might be grouped into multiple levels, such as minimal and full, which collect different amounts of information. The default could be minimal, with the option to switch to full or off by writing a config file via Ignition. Corresponding documentation should explain what metrics are collected, how the metrics are used, and the consequences of disabling them.

Due to time constraints, functioning metrics collection may not be included in the first release of Fedora CoreOS. However, the first release should include at least the following:

  • A configuration file
  • A service that parses the config file and rejects invalid configs
  • Documentation for configuring metrics collection

In other words, metrics collection should be configurable and documented from day 1. This should reduce unpleasant surprises for the user community when the metrics infrastructure is deployed and enabled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions