Skip to content

Missing logs for ZooKeeperUserExceptions #20048

@ben-efiz

Description

@ben-efiz

Describe the issue
I have ReplicatedReplacingMergeTree on one shard and two replicas managed by ZooKeeper all running on a Kubernetes cluster managed by clickhouse-operator. The setup works fine, data is being replicated correctly but the Prometheus metrics report several ZooKeeperUserExceptions. ClickHouse is not logging those exception, not even on trace level so i have no chance on seeing what the issues are about.

How to reproduce

  • ClickHouse server version 21.1.2
  • ZooKeeper 3.6.1
  • CREATE TABLE with ReplicatedReplacingMergeTree as engine
  • Set logger level to Trace
<logger>
    <level>trace</level>
    <log>/var/log/clickhouse-server/clickhouse-server.log</log>
    <errorlog>/var/log/clickhouse-server/clickhouse-server.err.log</errorlog>
    <size>1000M</size>
    <count>10</count>
</logger>
  • Grep logs for anything which might be related to the error messages, e.g.
 grep -i -E "No node|Bad version|No children for ephemerals|Node exists|Not empty" -r /var/log/clickhouse-server/
  • No information on how to reproduce as i can't see the root cause in the missing logs.

Expected behavior
If Prometheus metrics are reporting ZooKeeperUserExceptions i want to see them also in the logs. It seems they are not fatal errors, but normally i would expect them to see on logger level Error already, otherwise Debug at least Trace.

Error message and/or stacktrace
No stacktrace as i can't see the exceptions in the missing logs.

Additional context
I first thought its related to the ClickHouse Kubernetes operator so i opened a ticket there with additional details but we found out its rather a ClickHouse logger issue.

When looking at the implementation you can find the profile events increment here

ProfileEvents::increment(ProfileEvents::ZooKeeperUserExceptions);

The places using Coordination::isUserError(Error code) are not really logging anything beside one LOG_INFO in a specific condition

LOG_INFO(log, "Block with ID {} already exists (it was just appeared). Renaming part {} back to {}. Will retry write.",

I would expect more logs in that context or the correctly thrown Exceptions being logged where catched.

Metadata

Metadata

Assignees

No one assigned

    Labels

    not plannedKnown issue, no plans to fix it currenltyusability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions