refactor: JSON message with less allocations by weihanglo · Pull Request #16130 · rust-lang/cargo

weihanglo · 2025-10-17T20:59:33Z

What does this PR try to resolve?

This was found during experimenting -Zbuild-analysis with ndjson.

From me tracing the code with cargo expand, basically there shouldn't have any significant performance difference between serde(flatten) and inlining all the fields. Here the differences between them

flatten one calls Serialize::serialize_map without fields size hint so cannot pre-allocate Vec with Vec::with_capacity, whereas inline case calls Serialize::serialize_struct with a known length of fields.
flatten would end up calling Serializer::serialize_map and line calls Serializer::serialize_struct. And in serde_json serializer serialize_struct actually call serailze_map. So no difference on serializer side.
There might be some function calls not inlined I like FlatMapSerializer but I doubt it is costly than allocation.

Here is the cargo-expand'd result:

#[derive(Serialize)]
pub struct Foo<D: Serialize> {
    id: u8,
    #[serde(flatten)]
    data: D,
}

#[derive(Serialize)]
struct Bar {
    a: bool,
}

// Expand to

extern crate serde as _serde;
impl<D: Serialize> _serde::Serialize for Foo<D>
where
    D: _serde::Serialize,
{
    fn serialize<__S>(
        &self,
        __serializer: __S,
    ) -> _serde::__private228::Result<__S::Ok, __S::Error>
    where
        __S: _serde::Serializer,
    {
        let mut __serde_state = _serde::Serializer::serialize_map(
            __serializer,
            _serde::__private228::None,
        )?;
        _serde::ser::SerializeMap::serialize_entry(
            &mut __serde_state,
            "id",
            &self.id,
        )?;
        _serde::Serialize::serialize(
            &&self.data,
            _serde::__private228::ser::FlatMapSerializer(&mut __serde_state),
        )?;
        _serde::ser::SerializeMap::end(__serde_state)
    }
}

impl _serde::Serialize for Bar {
    fn serialize<__S>(
        &self,
        __serializer: __S,
    ) -> _serde::__private228::Result<__S::Ok, __S::Error>
    where
        __S: _serde::Serializer,
    {
        let mut __serde_state = _serde::Serializer::serialize_struct(
            __serializer,
            "Bar",
            false as usize + 1,
        )?;
        _serde::ser::SerializeStruct::serialize_field(
            &mut __serde_state,
            "a",
            &self.a,
        )?;
        _serde::ser::SerializeStruct::end(__serde_state)
    }
}

#[derive(Serialize)]
pub struct Foo<D: Serialize> {
    id: u8,
    a: bool,
}

// Expand to

impl<D: Serialize> _serde::Serialize for Foo<D> {
    fn serialize<__S>(
        &self,
        __serializer: __S,
    ) -> _serde::__private228::Result<__S::Ok, __S::Error>
    where
        __S: _serde::Serializer,
    {
        let mut __serde_state = _serde::Serializer::serialize_struct(
            __serializer,
            "Foo",
            false as usize + 1 + 1,
        )?;
        _serde::ser::SerializeStruct::serialize_field(
            &mut __serde_state,
            "id",
            &self.id,
        )?;
        _serde::ser::SerializeStruct::serialize_field(
            &mut __serde_state,
            "a",
            &self.a,
        )?;
        _serde::ser::SerializeStruct::end(__serde_state)
    }
}

How to test and review this PR?

CI passing.

One change is that reason will no longer be the first field. We could activate the preserve_order feature in serde_json if we want, though that may get more perf loess than the gain.

See the benchmark result in #16130 (comment)

This was found during experimenting `-Zbuild-analysis` with ndjson. From me tracing the code with `cargo expand`, basically there shouldn't have any significant performance difference between `serde(flatten)` and inlining all the fields. Here the differences between them * flatten one calls `Serialize::serialize_map` without fields size hint so cannot pre-allocate Vec with `Vec::with_capacity`, whereas inline case calls `Serialize::serialize_struct` with a known length of fields. * flatten would end up calling `Serializer::serialize_map` and line calls `Serializer::serialize_struct`. And in serde_json serializer `serialize_struct` actually call `serailze_map`. So no difference on serializer side. * There might be some function calls not inlined I like `FlatMapSerializer` but I doubt it is costly than allocation. Here is the `cargo-expand`'d result: ```rust #[derive(Serialize)] pub struct Foo<D: Serialize> { id: u8, #[serde(flatten)] data: D, } #[derive(Serialize)] struct Bar { a: bool, } // Expand to extern crate serde as _serde; impl<D: Serialize> _serde::Serialize for Foo<D> where D: _serde::Serialize, { fn serialize<__S>( &self, __serializer: __S, ) -> _serde::__private228::Result<__S::Ok, __S::Error> where __S: _serde::Serializer, { let mut __serde_state = _serde::Serializer::serialize_map( __serializer, _serde::__private228::None, )?; _serde::ser::SerializeMap::serialize_entry( &mut __serde_state, "id", &self.id, )?; _serde::Serialize::serialize( &&self.data, _serde::__private228::ser::FlatMapSerializer(&mut __serde_state), )?; _serde::ser::SerializeMap::end(__serde_state) } } impl _serde::Serialize for Bar { fn serialize<__S>( &self, __serializer: __S, ) -> _serde::__private228::Result<__S::Ok, __S::Error> where __S: _serde::Serializer, { let mut __serde_state = _serde::Serializer::serialize_struct( __serializer, "Bar", false as usize + 1, )?; _serde::ser::SerializeStruct::serialize_field( &mut __serde_state, "a", &self.a, )?; _serde::ser::SerializeStruct::end(__serde_state) } } ``` ```rust #[derive(Serialize)] pub struct Foo<D: Serialize> { id: u8, a: bool, } // Expand to impl<D: Serialize> _serde::Serialize for Foo<D> { fn serialize<__S>( &self, __serializer: __S, ) -> _serde::__private228::Result<__S::Ok, __S::Error> where __S: _serde::Serializer, { let mut __serde_state = _serde::Serializer::serialize_struct( __serializer, "Foo", false as usize + 1 + 1, )?; _serde::ser::SerializeStruct::serialize_field( &mut __serde_state, "id", &self.id, )?; _serde::ser::SerializeStruct::serialize_field( &mut __serde_state, "a", &self.a, )?; _serde::ser::SerializeStruct::end(__serde_state) } } ```

rustbot · 2025-10-17T20:59:37Z

r? @epage

rustbot has assigned @epage.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

weihanglo · 2025-10-18T22:52:48Z

Just used Claude Haiku 4.5 to generate a criterioin benchamrk script for me. The benchmark was run under opt-level=3 and debug=false.

Here is the result:

Message Type	master `d80156f`	`serde(flatten)`	Improvement
BuildFinished	173.91 ns	89.814 ns	48.5%
Artifact	1.4953 µs	1.3515 µs	9.6%

BTW, it is interesting that I can directly pull in local path dependencies in -Zscript.

Click to expand the AI-generated benchmark script (reviewed by human)

#!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[package]
name = "bench_message_serialization"
edition = "2024"

[dependencies]
cargo = { path = "." }
cargo-util-schemas = { path = "crates/cargo-util-schemas" }
criterion = { version = "0.5" }
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0", features = ["raw_value"] }

[profile.dev]
opt-level = 3
debug = false
---

//! # Machine Message Serialization Benchmark (Criterion-based)
//!
//! ## Quick Start
//! ```sh
//! cargo +nightly -Zscript bench_message_serialization.rs
//! ```
//!
//! ## Overview
//!
//! This benchmark uses criterion.rs for statistically rigorous benchmarking of
//! real message types from the cargo crate (`cargo::util::machine_message`):
//! - BuildFinished (simple)
//! - Artifact (complex)
//!
//! Compares:
//! - Original: Double serialization + string manipulation
//! - Optimized: Single serialization using `#[serde(flatten)]`
//!
//! ## Features
//!
//! - Multiple samples (100) for statistical reliability
//! - Automatic outlier detection and handling
//! - Confidence intervals and variance analysis
//! - Statistical significance testing

use cargo::core::Edition;
use cargo::core::Target;
use cargo::core::compiler::CrateType;
use cargo::util::machine_message::{Artifact, ArtifactDebuginfo, ArtifactProfile, BuildFinished};
use cargo_util_schemas::core::PackageIdSpec;
use criterion::{Criterion, black_box};
use serde::Serialize;
use std::path::PathBuf;
use std::time::Duration;

// ============================================================================
// Trait with Both Original and Optimized Methods
// ============================================================================

pub trait Message: Serialize {
    fn reason(&self) -> &str;

    fn to_json_string_original(&self) -> String {
        // Before: 2 serializations + string concatenation
        let json = serde_json::to_string(self).unwrap();
        assert!(json.starts_with("{\""));
        let reason = serde_json::json!(self.reason());
        format!("{{\"reason\":{},{}", reason, &json[1..])
    }

    fn to_json_string_optimized(&self) -> String {
        // After: 1 serialization using flatten
        #[derive(Serialize)]
        struct WithReason<'a, S: Serialize> {
            reason: &'a str,
            #[serde(flatten)]
            msg: &'a S,
        }
        let with_reason = WithReason {
            reason: self.reason(),
            msg: &self,
        };
        serde_json::to_string(&with_reason).unwrap()
    }
}

impl Message for BuildFinished {
    fn reason(&self) -> &str {
        "build-finished"
    }
}

impl<'a> Message for Artifact<'a> {
    fn reason(&self) -> &str {
        "compiler-artifact"
    }
}

// ============================================================================
// Test Data Builders
// ============================================================================

fn build_finished() -> BuildFinished {
    BuildFinished { success: true }
}

fn artifact<'a>() -> Artifact<'a> {
    let target = Box::leak(Box::new(Target::lib_target(
        "my_package",
        vec![CrateType::Rlib],
        PathBuf::from("/home/user/projects/my-package/src/lib.rs"),
        Edition::Edition2021,
    )));

    Artifact {
        package_id: PackageIdSpec::parse("my-package").unwrap(),
        manifest_path: PathBuf::from("/home/user/projects/my-package/Cargo.toml"),
        target,
        profile: ArtifactProfile {
            opt_level: "3",
            debuginfo: Some(ArtifactDebuginfo::Int(2)),
            debug_assertions: false,
            overflow_checks: false,
            test: false,
        },
        features: vec![
            "default".to_string(),
            "experimental".to_string(),
            "feature3".to_string(),
            "feature4".to_string(),
            "feature5".to_string(),
            "feature6".to_string(),
            "feature7".to_string(),
        ],
        filenames: vec![
            PathBuf::from("/home/user/projects/my-package/target/debug/deps/libmy_package.rlib"),
            PathBuf::from("/home/user/projects/my-package/target/debug/deps/libmy_package.d"),
        ],
        executable: None,
        fresh: false,
    }
}

// ============================================================================
// Benchmark Helper
// ============================================================================

fn benchmark_message<M: Message>(criterion: &mut Criterion, name: &str, msg: M) {
    let mut group = criterion.benchmark_group(name);
    group.sample_size(100);
    group.measurement_time(Duration::from_secs(5));
    group.warm_up_time(Duration::from_secs(3));

    group.bench_function("original", |b| {
        b.iter(|| black_box(&msg).to_json_string_original())
    });

    group.bench_function("optimized", |b| {
        b.iter(|| black_box(&msg).to_json_string_optimized())
    });

    group.finish();
}

// ============================================================================
// Main
// ============================================================================

fn main() {
    let mut criterion = Criterion::default()
        .measurement_time(Duration::from_secs(5))
        .sample_size(100)
        .warm_up_time(Duration::from_secs(3))
        .with_plots();

    eprintln!();
    eprintln!("╔══════════════════════════════════════════════════════════╗");
    eprintln!("║   Machine Message Serialization Benchmark (Criterion)    ║");
    eprintln!("║   Statistical Analysis • Real Cargo Types                ║");
    eprintln!("╚══════════════════════════════════════════════════════════╝");
    eprintln!();

    // Verify correctness before benchmarking
    {
        let bf = build_finished();
        let orig = bf.to_json_string_original();
        let opt = bf.to_json_string_optimized();
        assert_eq!(orig, opt, "BuildFinished outputs don't match!");

        let art = artifact();
        let orig = art.to_json_string_original();
        let opt = art.to_json_string_optimized();
        assert_eq!(orig, opt, "Artifact outputs don't match!");

        eprintln!("✓ Correctness check passed\n");
    }

    eprintln!("📊 Benchmarking BuildFinished (simple message)...");
    eprintln!();
    benchmark_message(&mut criterion, "BuildFinished", build_finished());
    eprintln!();

    eprintln!("📊 Benchmarking Artifact (complex message)...");
    eprintln!();
    benchmark_message(&mut criterion, "Artifact", artifact());
    eprintln!();

    criterion.final_summary();

    eprintln!();
    eprintln!("╔════════════════════════════════════════╗");
    eprintln!("║   Benchmark Complete                   ║");
    eprintln!("║   Results saved to: target/criterion/  ║");
    eprintln!("╚════════════════════════════════════════╝");
    eprintln!();
}

epage · 2025-10-20T16:35:02Z

From me tracing the code with cargo expand, basically there shouldn't have any significant performance difference between serde(flatten) and inlining all the fields. Here the differences between them

Deserialize is where differences are noticed but we're not really setup for that so it should be fine

epage · 2025-10-20T16:39:22Z

Here is the result:
Message Type master d80156f serde(flatten) Improvement
BuildFinished 173.91 ns 89.814 ns 48.5%
Artifact 1.4953 µs 1.3515 µs 9.6%

BTW, it is interesting that I can directly pull in local path dependencies in -Zscript.

BuildFinished is relatively small, making it more sensitive to change.

I'm surprised we noticed that big of a difference for Artifact. Our old version is doing 3 distinct allocations (to_string, json!, format!) vs just 1 (to_string). I'm surprised it makes that much of a difference. Likely the way to get the original version faster is to stream the content all into the same string which would be uglier.

Since this simplifies the code, I'm good with it. I do like having reason first though.

In case there is something unexpected, like key ordering, I looked up when this code was added (#6081) and it doesn't seem like we're losing anything with this change.

Update cargo submodule 7 commits in 367fd9f213750cd40317803dd0a5a3ce3f0c676d..344c4567c634a25837e3c3476aac08af84cf9203 2025-10-15 15:01:32 +0000 to 2025-10-21 21:29:43 +0000 - refactor: Centralize CONTEXT style (rust-lang/cargo#16135) - chore(triagebot): `A-json-output` for machine_message.rs (rust-lang/cargo#16133) - refactor: JSON message with less allocations (rust-lang/cargo#16130) - More warning conversions (rust-lang/cargo#16126) - fix(check): Fix suggested command for bin package (rust-lang/cargo#16127) - fix(script): Remove name sanitiztion outside what is strictly required (rust-lang/cargo#16120) - refactor: Centralize some more styling (rust-lang/cargo#16124) r? ghost

rustbot assigned epage Oct 17, 2025

rustbot added A-console-output Area: Terminal output, colors, progress bar, etc. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 17, 2025

weihanglo added Performance Gotta go fast! A-json-output Area: JSON message output and removed A-console-output Area: Terminal output, colors, progress bar, etc. labels Oct 20, 2025

weihanglo mentioned this pull request Oct 20, 2025

Tracking Issue for cargo-script RFC 3424 #12207

Open

39 tasks

epage approved these changes Oct 20, 2025

View reviewed changes

epage added this pull request to the merge queue Oct 20, 2025

Merged via the queue into rust-lang:master with commit 802826e Oct 20, 2025
25 checks passed

rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 20, 2025

weihanglo deleted the serialize branch October 20, 2025 19:04

weihanglo mentioned this pull request Oct 22, 2025

Update cargo submodule rust-lang/rust#147954

Merged

rustbot added this to the 1.92.0 milestone Oct 22, 2025

weihanglo mentioned this pull request Mar 11, 2026

Make build folder content (target/) deterministic #16693

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: JSON message with less allocations#16130

refactor: JSON message with less allocations#16130
epage merged 1 commit intorust-lang:masterfrom
weihanglo:serialize

weihanglo commented Oct 17, 2025 •

edited

Loading

Uh oh!

rustbot commented Oct 17, 2025

Uh oh!

weihanglo commented Oct 18, 2025

Uh oh!

epage commented Oct 20, 2025

Uh oh!

epage commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

weihanglo commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR try to resolve?

How to test and review this PR?

Uh oh!

rustbot commented Oct 17, 2025

Uh oh!

weihanglo commented Oct 18, 2025

Uh oh!

epage commented Oct 20, 2025

Uh oh!

epage commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

weihanglo commented Oct 17, 2025 •

edited

Loading