Skip to content

audit logging#8071

Merged
generall merged 22 commits intodevfrom
audit-logging
Feb 9, 2026
Merged

audit logging#8071
generall merged 22 commits intodevfrom
audit-logging

Conversation

@generall
Copy link
Member

@generall generall commented Feb 7, 2026

This PR introduces optional Audit Logging functionality.

This functionality, if enabeld, logs all authentication events (with exception for healthcheck, metrics and internal ops) into a dedicated directory with reach metadata.

Config example:

audit:
   enabled: false
   dir: ./storage/audit
   rotation: daily
   max_log_files: 7
   trust_forwarded_headers: false
Log examples
{"timestamp":"2026-02-07T11:38:47.774671900Z","method":"cluster_status","auth_type":"ApiKey","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:38:47.775804751Z","method":"telemetry_cluster","auth_type":"ApiKey","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:38:47.775878592Z","method":"telemetry_requests","auth_type":"ApiKey","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:38:47.782785083Z","method":"get_collection_cluster","auth_type":"ApiKey","remote":"127.0.0.1","collection":"mj","result":"ok"}
{"timestamp":"2026-02-07T11:38:48.850584097Z","method":"list_collections","auth_type":"ApiKey","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:38:48.851509857Z","method":"list_collections","auth_type":"ApiKey","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:39:06.058730125Z","method":"list_collections","auth_type":"Jwt","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:39:06.059294963Z","method":"telemetry_cluster","auth_type":"Jwt","remote":"127.0.0.1","result":"denied","error":"Forbidden: Global access is required"}
{"timestamp":"2026-02-07T11:39:06.059325539Z","method":"telemetry_requests","auth_type":"Jwt","remote":"127.0.0.1","result":"denied","error":"Forbidden: Global access is required"}
{"timestamp":"2026-02-07T11:39:06.131026875Z","method":"list_collections","auth_type":"Jwt","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:39:06.132768220Z","method":"telemetry_cluster","auth_type":"Jwt","remote":"127.0.0.1","result":"denied","error":"Forbidden: Global access is required"}
{"timestamp":"2026-02-07T11:39:06.132855791Z","method":"telemetry_requests","auth_type":"Jwt","remote":"127.0.0.1","result":"denied","error":"Forbidden: Global access is required"}
{"timestamp":"2026-02-07T11:39:06.201205875Z","method":"list_collections","auth_type":"Jwt","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:39:06.202292016Z","method":"list_collections","auth_type":"Jwt","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:39:06.204635985Z","method":"list_aliases","auth_type":"Jwt","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:39:06.205305160Z","method":"list_aliases","auth_type":"Jwt","remote":"127.0.0.1","result":"ok"}
{"timestamp":"2026-02-07T11:39:06.207033235Z","method":"get_collection","auth_type":"Jwt","remote":"127.0.0.1","collection":"mj","result":"ok"}
{"timestamp":"2026-02-07T11:39:06.209684136Z","method":"get_collection","auth_type":"Jwt","remote":"127.0.0.1","collection":"mj","result":"ok"}
{"timestamp":"2026-02-07T11:39:07.630188759Z","method":"get_collection","auth_type":"Jwt","remote":"127.0.0.1","collection":"mj","result":"ok"}
{"timestamp":"2026-02-07T11:39:07.630418614Z","method":"strict_mode_check","auth_type":"Jwt","remote":"127.0.0.1","collection":"mj","result":"ok"}
{"timestamp":"2026-02-07T11:39:07.630498184Z","method":"scroll","auth_type":"Jwt","remote":"127.0.0.1","collection":"mj","result":"ok"}
{"timestamp":"2026-02-07T11:39:10.864667330Z","method":"strict_mode_check","auth_type":"Jwt","remote":"127.0.0.1","collection":"mj","result":"ok"}
{"timestamp":"2026-02-07T11:39:10.864786859Z","method":"delete_points","auth_type":"Jwt","remote":"127.0.0.1","collection":"mj","result":"denied","error":"Forbidden: Write access to collection mj is required"}

Initial implementation guides:

* `Claims` should have a new optional field `subject`, which can contain any string, provided by the client.

* Currently we propagate only the `Access` wrapped into `ActixAccess` or `extract_access` in tonic.
We would need to additionally propagate `subject` and the `remote` - the IP adress from the request.
New struct would be needed for that, call it `Auth`, and replace usage of `Access` with `Auth` in the codebase.

* In addition to `remote` and `subject`, `Auth` should also contain an `auth_type` enum, which can be `Jwt`, `ApiKey`, `None`.
Depending on how we authenticated the request.

* `Access` have methods like `check_global_access`, `check_collection_access`. 
Now `Auth` object should wrap those methods, but add logging layer to them.
In addition to regular function arguments, it should require also method name, as a `&str`. 

* Logging layer in `Auth` should be congigurable to global audit-log logger. It can be empty, then no logging will be done.

* Audit logger should be possible to configure to log into specific directory with log rotation and retention policies. Logs should be in JSON format.

* Initialization of global audit logger should be done in `main.rs` from the same place everything is configured - `Settings` structure. 

Refactoring of the #8028

Refactor for the use of `X-Forwarded-For` header.

Recently we introduced an extra config parameter called `trust_forwarded_headers`.
This parameter allows usage of `X-Forwarded-For` for logging source of the access.

But we don't actually intend this parameter for usign in regular logs.
It's original intent was to be used in audit logs, as we want to log the real source of the access, not the proxy IP address.

We would need to refactor `trust_forwarded_headers` in the following way:

- move the configuration parameter into the `AuditConfig`
- use the parameter only in the audit logger, use it to decide what to use in the `remote` field of the `Auth` struct.
- Move the logic of parsing `X-Forwarded-For` header into a separate files for both tonic and actix for better readability. Avoid deep parsing of the header, just put it as is into the `remote` field, as it can contain multiple IP addresses, and we don't want to mess with that. Just log it as is, and let the log analysis tools handle it.
- Restore the default behaviour for regular loggers, so the won't depend on `trust_forwarded_headers`

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@lib/storage/src/audit.rs`:
- Around line 166-187: The init_audit_logger function currently returns early
when AuditConfig.enabled is false, so TRUST_FORWARDED_HEADERS is never set in
that case; update init_audit_logger to persist config.trust_forwarded_headers
regardless of config.enabled by moving the
TRUST_FORWARDED_HEADERS.set(config.trust_forwarded_headers) call so it executes
before the enabled check (or alternatively document this coupling in
AuditConfig), ensuring the global forwarded-header flag is applied even when
audit logging is disabled; reference init_audit_logger, AuditConfig, and
TRUST_FORWARDED_HEADERS when making the change.

In `@src/tonic/auth.rs`:
- Around line 44-49: Health-check endpoints are created with AuthType::None
which means emit_audit (lib/storage/src/rbac/auth.rs::emit_audit) will still log
them; change the health-check Auth creation to use AuthType::Internal instead of
AuthType::None (update the Auth::new call that builds the health-check auth in
src/tonic/auth.rs) so emit_audit's existing Internal check skips these requests,
or alternatively update emit_audit to explicitly ignore health-check paths if
you prefer filtering there (reference Auth::new, AuthType::Internal, and
emit_audit when making the change).

In `@src/tonic/forwarded.rs`:
- Around line 13-28: The actix implementation of forwarded_for currently only
checks X-Forwarded-For, creating inconsistent behavior versus tonic's
forwarded_for which checks Forwarded then falls back to X-Forwarded-For; update
the actix forwarded_for function to mirror tonic's logic: fetch request headers,
try the standard FORWARDED header first (using the FORWARDED constant and
to_str().ok()), return its value if present, and only if absent fall back to
checking X_FORWARDED_FOR (using the X_FORWARDED_FOR constant) and return that
value; ensure you use the same string conversion and Option<String> return
pattern as in tonic::forwarded_for.
- Around line 6-12: Update the doc comment in src/tonic/forwarded.rs to
accurately describe that the function extracts the raw value from either the
standard RFC 7239 "Forwarded" header (preferred) or, if absent, falls back to
"X-Forwarded-For"; state that the returned value is provided as-is (may be
comma-separated or contain multiple tokens), no parsing/validation is performed,
and the function returns None when the header is missing or not valid UTF-8.
Reference the header names "Forwarded" and "X-Forwarded-For" and keep the
existing note about downstream tools handling parsing.
🧹 Nitpick comments (1)
src/actix/forwarded.rs (1)

1-28: Clean helper extraction — consider adding Forwarded header support for parity with tonic.

As noted in the tonic review, this file only checks X-Forwarded-For while src/tonic/forwarded.rs also checks the RFC 7239 Forwarded header. Otherwise the code is clean and well-structured.

Comment on lines +166 to +187
pub fn init_audit_logger(config: Option<&AuditConfig>) -> anyhow::Result<Option<WorkerGuard>> {
let Some(config) = config else {
return Ok(None);
};

if !config.enabled {
return Ok(None);
}

// Persist the forwarded-headers flag so it is available globally even
// outside the audit logger itself (e.g. in auth middleware).
let _ = TRUST_FORWARDED_HEADERS.set(config.trust_forwarded_headers);

let (logger, guard) = AuditLogger::new(config)?;
AUDIT_LOGGER
.set(logger)
.map_err(|_| anyhow::anyhow!("Audit logger already initialised"))?;

log::info!("Audit logging enabled, writing to {}", config.dir.display());

Ok(Some(guard))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

trust_forwarded_headers is silently ignored when audit logging is disabled.

When config.enabled is false, the function returns early at Line 172 before TRUST_FORWARDED_HEADERS is set at Line 177. This means the forwarded-header trust flag only takes effect when audit is actively enabled. If a user sets trust_forwarded_headers = true but enabled = false, the remote IP in Auth objects will always use peer_addr().

This may be intentional (no audit → no need for accurate IPs), but it's surprising behavior — the flag name doesn't communicate this dependency. Consider either:

  1. Moving Line 177 before the enabled check, or
  2. Documenting this coupling in the config comments.
Option 1: Set the flag regardless of enabled
     let Some(config) = config else {
         return Ok(None);
     };
 
+    // Persist the forwarded-headers flag so it is available globally even
+    // outside the audit logger itself (e.g. in auth middleware).
+    let _ = TRUST_FORWARDED_HEADERS.set(config.trust_forwarded_headers);
+
     if !config.enabled {
         return Ok(None);
     }
 
-    // Persist the forwarded-headers flag so it is available globally even
-    // outside the audit logger itself (e.g. in auth middleware).
-    let _ = TRUST_FORWARDED_HEADERS.set(config.trust_forwarded_headers);
-
     let (logger, guard) = AuditLogger::new(config)?;
🤖 Prompt for AI Agents
In `@lib/storage/src/audit.rs` around lines 166 - 187, The init_audit_logger
function currently returns early when AuditConfig.enabled is false, so
TRUST_FORWARDED_HEADERS is never set in that case; update init_audit_logger to
persist config.trust_forwarded_headers regardless of config.enabled by moving
the TRUST_FORWARDED_HEADERS.set(config.trust_forwarded_headers) call so it
executes before the enabled check (or alternatively document this coupling in
AuditConfig), ensuring the global forwarded-header flag is applied even when
audit logging is disabled; reference init_audit_logger, AuditConfig, and
TRUST_FORWARDED_HEADERS when making the change.

Comment on lines +44 to +49
let auth = Auth::new(
Access::full("Health check endpoints have full access without authentication"),
None,
remote,
AuthType::None,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how audit logging handles health check / AuthType::None filtering
rg -n 'health' --type rust -i -C3 lib/storage/src/audit.rs
echo "---"
rg -n 'AuthType::None' --type rust -C3 lib/storage/src/audit.rs

Repository: qdrant/qdrant

Length of output: 60


🏁 Script executed:

#!/bin/bash
# Find audit-related files
find . -name "*audit*" -type f | head -20

echo "---"
# Search for AuthType definition and how None is used
rg -n "enum AuthType" --type rust -A 10

echo "---"
# Search for audit logging implementation
rg -n "fn.*audit|impl.*Audit" --type rust -A 5 | head -50

Repository: qdrant/qdrant

Length of output: 3889


🏁 Script executed:

#!/bin/bash
# Check the emit_audit function implementation
rg -n "fn emit_audit" --type rust -A 20 lib/storage/src/rbac/auth.rs

echo "---"
# Check how audit logging handles different AuthTypes
rg -n "AuthType" --type rust -B 2 -A 2 lib/storage/src/audit.rs lib/storage/src/rbac/auth.rs | head -100

echo "---"
# Search for health or healthcheck filtering in audit
rg -n "health|Health" --type rust -i -B 2 -A 2 lib/storage/src/audit.rs

Repository: qdrant/qdrant

Length of output: 2904


Health-check Auth uses AuthType::None — audit logging will NOT skip these entries.

The PR objectives state healthcheck requests are excluded from audit logging, but the emit_audit function (lib/storage/src/rbac/auth.rs:102) only filters AuthType::Internal:

if !is_audit_enabled() || self.auth_type == AuthType::Internal {
    return;
}

Health-check endpoints using AuthType::None will be audit-logged and create noise. Either use AuthType::Internal for health-checks or add explicit filtering logic for health-check paths.

🤖 Prompt for AI Agents
In `@src/tonic/auth.rs` around lines 44 - 49, Health-check endpoints are created
with AuthType::None which means emit_audit
(lib/storage/src/rbac/auth.rs::emit_audit) will still log them; change the
health-check Auth creation to use AuthType::Internal instead of AuthType::None
(update the Auth::new call that builds the health-check auth in
src/tonic/auth.rs) so emit_audit's existing Internal check skips these requests,
or alternatively update emit_audit to explicitly ignore health-check paths if
you prefer filtering there (reference Auth::new, AuthType::Internal, and
emit_audit when making the change).

Comment on lines +136 to 149
/// Extract the per-request [`Auth`] context from a tonic request.
///
/// When no authentication middleware is configured, a default `Auth` with full
/// access is returned.
pub fn extract_auth<R>(req: &mut tonic::Request<R>) -> Auth {
req.extensions_mut().remove::<Auth>().unwrap_or_else(|| {
Auth::new(
Access::full("All requests have full by default access when API key is not configured"),
None,
None,
AuthType::None,
)
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Audit logs will lack remote IP when no API key is configured.

When no AuthKeys are configured, the auth middleware is not added (see src/tonic/mod.rs line 153–161), so extract_auth always hits the fallback path which creates Auth with remote: None. If audit logging is enabled without API key authentication, all audit entries will have no client IP.

Consider extracting the remote address outside of the auth middleware (e.g., in a separate lightweight layer or in extract_auth itself from the request extensions) so audit logs always capture it.

Comment on lines +13 to +28
pub fn forwarded_for(
req: &tonic::codegen::http::Request<tonic::transport::Body>,
) -> Option<String> {
let headers = req.headers();

// Try standard Forwarded header first (RFC 7239)
if let Some(forwarded) = headers.get(FORWARDED).and_then(|v| v.to_str().ok()) {
return Some(forwarded.to_string());
}

// Fallback to legacy X-Forwarded-For header
headers
.get(&X_FORWARDED_FOR)
.and_then(|v| v.to_str().ok())
.map(|s| s.to_string())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Behavioral inconsistency with the actix counterpart.

The tonic forwarded_for checks Forwarded first, then falls back to X-Forwarded-For. The actix version (src/actix/forwarded.rs) only checks X-Forwarded-For. This means the same proxy setup can yield different client IPs in audit logs depending on whether the request arrives via REST or gRPC.

Consider aligning both implementations — either both should handle Forwarded + X-Forwarded-For, or neither should.

🤖 Prompt for AI Agents
In `@src/tonic/forwarded.rs` around lines 13 - 28, The actix implementation of
forwarded_for currently only checks X-Forwarded-For, creating inconsistent
behavior versus tonic's forwarded_for which checks Forwarded then falls back to
X-Forwarded-For; update the actix forwarded_for function to mirror tonic's
logic: fetch request headers, try the standard FORWARDED header first (using the
FORWARDED constant and to_str().ok()), return its value if present, and only if
absent fall back to checking X_FORWARDED_FOR (using the X_FORWARDED_FOR
constant) and return that value; ensure you use the same string conversion and
Option<String> return pattern as in tonic::forwarded_for.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@qdrant qdrant deleted a comment from coderabbitai bot Feb 9, 2026
@generall generall requested a review from timvisee February 9, 2026 12:23
Comment on lines -54 to -58
/// Parses the standard Forwarded header (RFC 7239) and extracts the client IP.
/// Format: Forwarded: for=192.0.2.43;proto=https
/// IPv6 format: Forwarded: for="[2001:db8::1]:4711"
/// Note: RFC 7239 specifies parameter names are case-insensitive.
fn parse_forwarded_header(header: &str) -> Option<String> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that this is complicated logic, I'm not too happy about removing all of it. In my opinion logging real IPs in access logs is still valuable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we didn't bother to do it before the whole discussion about audit logging

return;
}
};
buf.push(b'\n');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not cross platform

Copy link
Member

@timvisee timvisee Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be good enough:

Suggested change
buf.push(b'\n');
#[cfg(not(windows))]
buf.push(b'\n');
#[cfg(windows)]
buf.extend_from_slice(b"\r\n");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want it to be cross-platform? Even if audit logs are created on windows, linux users should be able to read it.

@timvisee
Copy link
Member

timvisee commented Feb 9, 2026

I do have a few concerns about this PR.

With a simple bfb upload or search load I can grow the audit file by multiple megabytes per second. I easily reach 1GB in a short amount of time, and have generated multiple millions of entries. I feel like this needs to me much better, possibly by logging less and/or by changing the lot format.

With the above in mind, we might also want to consider compressing rotated files. The current format would definitely benefit from that. Using gzip already reduces 117MB into 4.9MB.

The audit logging infrastructure doesn't seem very robust. For example, if I delete the audit file, Qdrant happily continues without any warnings or errors. Audit entries are written into the void and are permanently lost.

In the current implementation I do still see a lot of strict mode checks, I believe we want to get rid of them.

  • Upsert with bfb:
    {"timestamp":"2026-02-09T14:33:08.370406428Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:33:08.370406448Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:33:08.370438058Z","method":"upsert_points","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:33:08.370440228Z","method":"upsert_points","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:33:08.371099895Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:33:08.371107395Z","method":"upsert_points","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:33:08.371303954Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:33:08.371312434Z","method":"upsert_points","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:33:08.371424004Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
  • Query with bfb:
    {"timestamp":"2026-02-09T14:34:12.690573046Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.690577576Z","method":"query_batch","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.690991134Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.690998224Z","method":"query_batch","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.691137854Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.691143074Z","method":"query_batch","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.691575662Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.691581072Z","method":"query_batch","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.691653822Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.691659562Z","method":"query_batch","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.692167740Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.692172990Z","method":"query_batch","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.692269470Z","method":"strict_mode_check","auth_type":"None","collection":"benchmark","result":"ok"}
    {"timestamp":"2026-02-09T14:34:12.692275459Z","method":"query_batch","auth_type":"None","collection":"benchmark","result":"ok"}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/common/telemetry_ops/memory_telemetry.rs (1)

38-53: ⚠️ Potential issue | 🟡 Minor

Misleading log message when auth check fails.

The && short-circuit means if epoch::advance() succeeds but auth.check_global_access() fails, the else branch logs "Failed to advance Jemalloc stats epoch", which is incorrect — the actual cause is an authorization failure.

Proposed fix: separate the two checks
-        if epoch::advance().is_ok()
-            && auth
-                .check_global_access(required_access, "telemetry_memory")
-                .is_ok()
-        {
+        if auth
+            .check_global_access(required_access, "telemetry_memory")
+            .is_err()
+        {
+            return None;
+        }
+        if epoch::advance().is_ok() {
             Some(MemoryTelemetry {
                 active_bytes: stats::active::read().unwrap_or_default(),
                 allocated_bytes: stats::allocated::read().unwrap_or_default(),
                 metadata_bytes: stats::metadata::read().unwrap_or_default(),
                 resident_bytes: stats::resident::read().unwrap_or_default(),
                 retained_bytes: stats::retained::read().unwrap_or_default(),
             })
         } else {
             log::info!("Failed to advance Jemalloc stats epoch");
             None
         }
🤖 Fix all issues with AI agents
In `@lib/storage/Cargo.toml`:
- Line 74: Update the local crate's dependency declaration for tracing-appender
so it uses the workspace member instead of a fixed version: add tracing-appender
to the root workspace [dependencies] in the workspace Cargo.toml, then change
the line in this crate's Cargo.toml from tracing-appender = "0.2" to
tracing-appender = { workspace = true } (referencing the dependency name
tracing-appender in this crate and the root workspace Cargo.toml).

In `@lib/storage/src/content_manager/collection_verification.rs`:
- Around line 100-119: The access check in check_strict_mode_timeout currently
calls auth.check_collection_access(...) with the "strict_mode_timeout_check"
audit tag and thus emits an audit entry; change it to use
auth.unlogged_access().check_collection_access(...) (matching
check_strict_mode_toc_batch) so the pre-check does not create audit noise—locate
the call to auth.check_collection_access in check_strict_mode_timeout and wrap
it with unlogged_access() while keeping the same AccessRequirements and tag
string.

In `@src/actix/api/snapshot_api.rs`:
- Line 304: The closure unnecessarily clones auth before moving it into the
async block; update the future creation so the async move uses the already-moved
auth (remove the .clone())—locate the line that constructs future (the async
move invoking do_create_full_snapshot(dispatcher.get_ref(), auth.clone())) and
change it to call do_create_full_snapshot with auth (no .clone()) since auth is
captured by value into the async move closure.

In `@src/actix/forwarded.rs`:
- Around line 5-12: Update the doc comment for forwarded_for_headers to
accurately describe behavior: state that it first checks the RFC 7239
"Forwarded" header and returns its raw value (as-is) and only falls back to the
legacy "X-Forwarded-For" header if "Forwarded" is absent; also keep that no
parsing/validation is performed and that the function returns None when the
chosen header is missing or not valid UTF-8. Refer to the forwarded_for_headers
function name when making the doc change.
🧹 Nitpick comments (13)
src/migrations/single_to_cluster.rs (1)

27-31: Unnecessary .clone()full_access is not used after constructing full_auth.

full_access is consumed only by Auth::new on Line 28 and never referenced again, so the .clone() allocates needlessly.

♻️ Suggested fix
     let full_access = Access::full("Migration from single to cluster");
-    let full_auth = Auth::new(full_access.clone(), None, None, AuthType::Internal);
+    let full_auth = Auth::new(full_access, None, None, AuthType::Internal);
src/tonic/api/points_internal_api.rs (1)

41-43: full_internal_auth() is duplicated across internal API files.

This helper is identical to the one in collections_internal_api.rs (lines 16-18). Consider extracting it into a shared location (e.g., a common module for internal tonic helpers) to avoid the duplication.

#!/bin/bash
# Verify how many places define full_internal_auth
rg -n 'fn full_internal_auth' --type=rust
src/tonic/logging.rs (1)

52-54: Error log doesn't include the error details.

The error is propagated but not logged. Consider logging a summary of the error for operational visibility, especially since the forwarded-IP context has also been removed.

Suggested improvement
                Err(error) => {
-                    log::error!("gRPC request error {method_name}");
+                    log::error!("gRPC request error {method_name}: {error}");
                    Err(error)
                }
#!/bin/bash
# Check if S::Error implements Display (needed for the suggestion)
# Look at the trait bound on error type
rg -n 'type Error' --type=rust -A2 -g '*logging*'
lib/storage/tests/integration/alias_tests.rs (1)

173-181: Consider a more descriptive operation name for the audit log.

The third argument "test" in check_collection_access is the operation name for audit logging. A more descriptive name like "get_collection" or "test_alias_resolution" would be clearer, though this is just test code so it's minor.

lib/storage/src/content_manager/toc/mod.rs (1)

364-373: The _multipass parameter acts as a capability token but isn't used — consider documenting intent.

The _multipass: &CollectionMultipass parameter serves as proof-of-authorization but is never read. A brief doc comment explaining this pattern (e.g., "multipass is required as proof of authorization but not inspected") would help future readers understand why it exists. Compare with the existing collection_aliases method on Lines 376-392, which actively filters aliases by access — this new method deliberately skips that filtering.

config/config.yaml (1)

366-380: Well-documented default configuration.

The commented-out block with clear warnings about trust_forwarded_headers is good. Two observations from the PR discussion worth considering:

  1. There's no option for compressing rotated files (e.g., gzip). Given the reported volume (~1GB quickly under load), a compress_rotated: true option would be valuable.
  2. There's no mention of what happens if the audit directory becomes unavailable (disk full, deleted, etc.) — the PR comments note that Qdrant silently continues without audit entries in this case.

These could be addressed as follow-ups.

src/actix/api/cluster_api.rs (1)

166-178: toc() is called before the access check.

On Line 167, dispatcher.toc(&auth, &pass) is called before the access check on Line 168. While toc() likely doesn't enforce authorization itself, it's a better defensive pattern to check access first and only then obtain the toc reference — consistent with other handlers like recover_current_peer (Line 69-70) and remove_peer (Line 86-89) which check access first.

The same pattern appears in delete_cluster_metadata_key (Lines 190-194).

Proposed fix
     helpers::time(async move {
+        auth.check_global_access(
+            AccessRequirements::new().write(),
+            "update_cluster_metadata_key",
+        )?;
         let toc = dispatcher.toc(&auth, &pass);
-        auth.check_global_access(
-            AccessRequirements::new().write(),
-            "update_cluster_metadata_key",
-        )?;
 
         toc.update_cluster_metadata(key.into_inner(), value.into_inner(), params.wait)
lib/storage/src/rbac/ops_checks.rs (1)

338-348: Audit log for collection meta-operations omits the collection name.

emit_audit is called with None for the collection, but many CollectionMetaOperations variants carry a collection name (e.g., CreateCollection, DeleteCollection, CreatePayloadIndex, DropPayloadIndex). This makes it harder to correlate audit entries with the affected collection.

Consider extracting the collection name from the operation and passing it:

♻️ Suggested approach

You could add a method like fn collection_name(&self) -> Option<&str> on CollectionMetaOperations (or on AuditableOperation) and use it here:

     pub(crate) fn check_collection_meta_operation(
         &self,
         operation: &CollectionMetaOperations,
     ) -> Result<(), StorageError> {
         let result = self
             .unlogged_access()
             .check_collection_meta_operation(operation);
-        self.emit_audit(operation.operation_name(), None, &result);
+        self.emit_audit(operation.operation_name(), operation.collection_name(), &result);
         result
     }
src/actix/auth.rs (1)

128-135: Duplicated remote-IP derivation logic.

The forwarded-header-then-peer-addr pattern appears in both the middleware success path (Lines 129–134) and the FromRequest fallback (Lines 171–176). Consider extracting a small helper (e.g., fn resolve_remote(...)) to keep them in sync and reduce duplication.

♻️ Example helper
fn resolve_remote_from_service_req(req: &ServiceRequest) -> Option<String> {
    if audit_trust_forwarded_headers() {
        forwarded::forwarded_for(req)
    } else {
        None
    }
    .or_else(|| req.peer_addr().map(|a| a.ip().to_string()))
}

A similar helper for HttpRequest (using forwarded_for_http) could DRY up the FromRequest path.

Also applies to: 170-176

src/common/telemetry.rs (1)

92-93: Comment is slightly misleading — only collection-level telemetry uses unlogged access.

The comment "Do not log access to telemetry" on line 92 applies only to CollectionsTelemetry::collect (line 110–112) and HardwareTelemetry::new (line 147), which use the unlogged_access() reference. However, ClusterTelemetry::collect, RequestsTelemetry::collect, and MemoryTelemetry::collect (lines 136–144) receive the full &auth and will emit audit entries via their internal check_global_access calls (visible in the relevant snippets for those methods).

If the intent is to suppress only the per-collection access checks (which would be noisy), consider clarifying the comment to reflect that:

-        // Do not log access to telemetry
-        let access = auth.unlogged_access();
+        // Use unlogged access for collection-level telemetry to avoid per-collection audit noise
+        let access = auth.unlogged_access();
lib/storage/src/audit.rs (1)

100-153: Audit logger write path looks solid.

The serialize-then-write approach ensures atomicity under the Mutex, and the non-blocking channel keeps the critical section short. Error handling via log::error! is appropriate for an audit side-channel.

One note: tracing_appender::non_blocking uses a bounded channel with a default of 128,000 buffered messages. Under the extreme write volumes noted in PR comments (millions of entries), the channel could saturate. By default, it's lossy—new messages are dropped when the queue fills. If audit completeness is important, consider using NonBlockingBuilder::default() with .lossy(false) to apply backpressure instead, or document this lossy behavior.

lib/storage/src/content_manager/toc/point_ops.rs (1)

79-108: Batch methods emit one audit entry per sub-request — may contribute to high log volume.

Each batch method (recommend_batch, core_search_batch, discover_batch, query_batch) calls auth.check_point_op() in a loop, emitting an audit entry per sub-request. As noted in the PR discussion, this can generate millions of entries under search/upsert workloads. Since audit logging is opt-in, this is acceptable, but consider whether a single batch-level audit entry (with a count) would suffice for deployments with high request throughput.

Also applies to: 125-155, 288-318, 357-387

src/common/update.rs (1)

930-957: Use the existing Auth::new_internal() helper instead of manually constructing with Auth::new().

The codebase already has Auth::new_internal(access: Access) which sets subject: None, remote: None, and auth_type: AuthType::Internal. The pattern at src/common/update.rs:953 and 1020 (and also in points_internal_api.rs:42 and collections_internal_api.rs:18) should use this existing helper:

♻️ Refactor suggestion

Replace:

Auth::new(Access::full("Internal API"), None, None, AuthType::Internal)

With:

Auth::new_internal(Access::full("Internal API"))

Apply to all 4 occurrences in the codebase.

async-trait = { workspace = true }

tracing = { workspace = true, optional = true }
tracing-appender = "0.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check workspace root Cargo.toml for tracing-appender and tracing definitions
rg -n 'tracing-appender|tracing' Cargo.toml | head -30

Repository: qdrant/qdrant

Length of output: 669


🏁 Script executed:

# Check the full dependencies section in lib/storage/Cargo.toml
cat -n lib/storage/Cargo.toml | head -100

Repository: qdrant/qdrant

Length of output: 2660


tracing-appender should be a workspace dependency for consistency.

Most dependencies in this project use { workspace = true }. While a few exceptions exist for compatibility-pinned versions (http, protobuf), tracing-appender doesn't fall into that category — especially since its sibling tracing is already a workspace dep.

Add tracing-appender to the workspace [dependencies] in the root Cargo.toml and reference it here as tracing-appender = { workspace = true }.

🤖 Prompt for AI Agents
In `@lib/storage/Cargo.toml` at line 74, Update the local crate's dependency
declaration for tracing-appender so it uses the workspace member instead of a
fixed version: add tracing-appender to the root workspace [dependencies] in the
workspace Cargo.toml, then change the line in this crate's Cargo.toml from
tracing-appender = "0.2" to tracing-appender = { workspace = true } (referencing
the dependency name tracing-appender in this crate and the root workspace
Cargo.toml).

Comment on lines 100 to +119
pub async fn check_strict_mode_timeout(
timeout: Option<usize>,
collection_name: &str,
dispatcher: &Dispatcher,
access: &Access,
auth: &Auth,
) -> Result<VerificationPass, StorageError> {
let Some(timeout) = timeout else {
return Ok(new_unchecked_verification_pass());
};

let toc = get_toc_without_verification_pass(dispatcher, access);
let toc = get_toc_without_verification_pass(dispatcher, auth);

// Check access here first since strict-mode gets checked before `access`.
// If we simply bypassed here, requests to a collection a user doesn't has access to could leak
// information, like existence, strict mode config, payload indices, ...
let collection_pass =
access.check_collection_access(collection_name, AccessRequirements::new())?;
let collection_pass = auth.check_collection_access(
collection_name,
AccessRequirements::new(),
"strict_mode_timeout_check",
)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistency: check_strict_mode_timeout uses logged access while other strict-mode checks use unlogged_access().

check_strict_mode_toc_batch (Line 34) deliberately uses auth.unlogged_access() to avoid audit noise, since the actual operation is logged later. But check_strict_mode_timeout (Line 115) calls auth.check_collection_access(...) directly with "strict_mode_timeout_check" — this will emit an audit entry for what is also a pre-operation check.

This inconsistency could reintroduce the audit log noise that unlogged_access() was designed to prevent, particularly for endpoints that only check timeout strict mode.

Suggested fix
-    let collection_pass = auth.check_collection_access(
-        collection_name,
-        AccessRequirements::new(),
-        "strict_mode_timeout_check",
-    )?;
+    // Strict mode uses unlogged access — actual logging happens
+    // in the operations after the strict mode check.
+    let collection_pass = auth
+        .unlogged_access()
+        .check_collection_access(collection_name, AccessRequirements::new())?;
🤖 Prompt for AI Agents
In `@lib/storage/src/content_manager/collection_verification.rs` around lines 100
- 119, The access check in check_strict_mode_timeout currently calls
auth.check_collection_access(...) with the "strict_mode_timeout_check" audit tag
and thus emits an audit entry; change it to use
auth.unlogged_access().check_collection_access(...) (matching
check_strict_mode_toc_batch) so the pre-check does not create audit noise—locate
the call to auth.check_collection_access in check_strict_mode_timeout and wrap
it with unlogged_access() while keeping the same AccessRequirements and tag
string.

ActixAuth(auth): ActixAuth,
) -> impl Responder {
let future = async move { do_create_full_snapshot(dispatcher.get_ref(), access).await };
let future = async move { do_create_full_snapshot(dispatcher.get_ref(), auth.clone()).await };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unnecessary auth.clone()auth is already moved into the closure.

The async move block captures auth by value and it is not used afterward. The .clone() allocates a copy that is immediately dropped.

Proposed fix
-    let future = async move { do_create_full_snapshot(dispatcher.get_ref(), auth.clone()).await };
+    let future = async move { do_create_full_snapshot(dispatcher.get_ref(), auth).await };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let future = async move { do_create_full_snapshot(dispatcher.get_ref(), auth.clone()).await };
let future = async move { do_create_full_snapshot(dispatcher.get_ref(), auth).await };
🤖 Prompt for AI Agents
In `@src/actix/api/snapshot_api.rs` at line 304, The closure unnecessarily clones
auth before moving it into the async block; update the future creation so the
async move uses the already-moved auth (remove the .clone())—locate the line
that constructs future (the async move invoking
do_create_full_snapshot(dispatcher.get_ref(), auth.clone())) and change it to
call do_create_full_snapshot with auth (no .clone()) since auth is captured by
value into the async move closure.

@qdrant qdrant deleted a comment from coderabbitai bot Feb 9, 2026
@generall generall merged commit 1ae2070 into dev Feb 9, 2026
15 checks passed
@generall generall deleted the audit-logging branch February 9, 2026 18:16
generall added a commit that referenced this pull request Feb 9, 2026
* initial implementation

* internal logging type

* wrap more places into auth function

* [manual refactor] use &Auth in .toc fucntion - minimise direct .access()

* [manual refactor] fmt

* [manual refactor] remove unannotated .access in creation of snapshots

* [manual refactor] remove unannotated .access in cluster telemetry

* [manual refactor] remove unannotated .access

* [manual refactor] do not log /metrics api access

* [manual refactor] do not run the service if audit logging failed to init

* [manual refactor] make auditable operation names for point and cluster updates

* [manual refactor] remove excess cloning of auth object

* [manual refactor] fmt

* [AI] instead of manual writing into the file, use tracing_appender crate

* [AI] simplify configuration to match internal crate

* fmt

* [manual refactor] cover staging options

* [AI] refactor x-forwarded-for handelling

* [manual refactor] use consts for x-forwarded-for + recover handelling for RFC case in tonic

* Update src/tonic/forwarded.rs

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* [manual] review fix: use FORWARDED in actix

* [manual] review fix: do not log strict mode checks

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@timvisee timvisee mentioned this pull request Feb 17, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants