Skip to content

Conversation

@jecsand838
Copy link
Contributor

@jecsand838 jecsand838 commented Sep 22, 2025

Which issue does this PR close?

Rationale for this change

@alamb had some recommendations for improving the arrow-avro documentation in #8316. This is a follow-up to address those suggestions.

What changes are included in this PR?

  1. lib.rs documentation
  2. reader/mod.rs improved documentation and inlined examples
  3. writer/mod.rs improved documentation and inlined examples

NOTE: Some doc tests are temporarily ignored until #8371 is merged in.

Are these changes tested?

Yes, doc tests have been included which all run (with the exception of 3 ignored ones that will work soon)

Screenshot 2025-09-22 at 3 36 02 AM Screenshot 2025-09-22 at 3 36 19 AM Screenshot 2025-09-22 at 3 36 34 AM

Are there any user-facing changes?

N/A

@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Sep 22, 2025
@jecsand838 jecsand838 force-pushed the avro-documentation-follow-up branch 2 times, most recently from 7aa5764 to e5cd3d8 Compare September 22, 2025 08:50
@jecsand838 jecsand838 force-pushed the avro-documentation-follow-up branch from e5cd3d8 to 93b6e0a Compare September 22, 2025 08:52
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jecsand838 -- this is looking great. I had a few suggestions, but otherwise it looks like a great improvement to me

//! 3) Decode it back to Arrow using a `Decoder` configured with a `SchemaStore` that
//! maps the schema ID to the Avro schema used by the writer.
//!
//! ```ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we enable this one now that we have merged

//! This example registers the writer schema (computing a Rabin fingerprint), writes a
//! single‑row Avro body, constructs the SOE frame, and decodes it back to Arrow.
//!
//! ```ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here, this should now work, right and we can use the Avro writer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, and I just pushed up those changes.

//! while (n & !0x7F) != 0 { out.push(((n as u8) & 0x7F) | 0x80); n >>= 7; }
//! out.push(n as u8);
//! }
//! fn enc_len(l: usize, out: &mut Vec<u8>) { enc_long(l as i64, out); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example doesn't' actually use an AvroStreamWriter but the text says it does.

Can we update it to use the writer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout! I cleaned this up.

//! {"name":"id","type":"long"}]}"#.to_string());
//! let fp = store.register(avro_schema)?;
//!
//! // Minimal Avro long encoder (zig-zag + varint).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the Avro writer here too? If the idea is to show how to use the decoder, maybe we could just hide the code that creates the input body (aka prefix all the lines related to creating the file with # )

The same thing applies to the examples below too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I made sure to hide the input body code as well.

//! - **Confluent wire format**: magic `0x00` + **big‑endian** 4‑byte schema ID and Avro body.
//! <https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format>
//!
//! ## Quickstart: Write an OCF in memory (runnable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally suggest moving this example to the AvroWriter struct itself as I think it will be easier to find

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the docs before pushing and agree with you. 100% good callout!

//! use std::sync::Arc;
//! use arrow_array::{ArrayRef, Int64Array, RecordBatch};
//! use arrow_schema::{DataType, Field, Schema};
//! use arrow_avro::writer::AvroStreamWriter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise, i suggest moving this to the docs on AvroStreamWriter as I think it will be easier to find

//! If you plan to interoperate with a schema registry, add the appropriate prefix yourself
//! (see links above).
//!
//! ```ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can run too?

jecsand838 and others added 2 commits September 22, 2025 14:52
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@jecsand838 jecsand838 force-pushed the avro-documentation-follow-up branch from 034b6e7 to 2cc0264 Compare September 22, 2025 20:42
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@jecsand838 jecsand838 force-pushed the avro-documentation-follow-up branch from 2cc0264 to 9fe5fe4 Compare September 22, 2025 20:51
@jecsand838 jecsand838 force-pushed the avro-documentation-follow-up branch from f8a11b4 to 76827d7 Compare September 22, 2025 21:49
@jecsand838
Copy link
Contributor Author

Thank you @jecsand838 -- this is looking great. I had a few suggestions, but otherwise it looks like a great improvement to me

@alamb Really appreciate the solid review. I went ahead and pushed up changes that address your comments. Let me know what you think when you get a chance!

@jecsand838 jecsand838 requested a review from alamb September 22, 2025 21:52
@alamb
Copy link
Contributor

alamb commented Sep 23, 2025

CI failure unrelated to this PR

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thanks @jecsand838

@alamb alamb merged commit 3027dbc into apache:main Sep 23, 2025
22 of 23 checks passed
@jecsand838 jecsand838 deleted the avro-documentation-follow-up branch September 23, 2025 18:38
alamb added a commit that referenced this pull request Nov 6, 2025
# Which issue does this PR close?

Follow on to 
- #8402 

# Rationale for this change

Let's tell people about arrow-avro

# What changes are included in this PR?

Touchup the main crate docs to mention arrow-avro

# Are these changes tested?

CI and I tested the docs manually

# Are there any user-facing changes?

Docs only

cc @jecsand838
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants