Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jul 2, 2024

Which issue does this PR close?

Fixes #11210
Part of #11172 and #1813

Rationale for this change

Hopefully this fixes the CI failure we were seeing in #11210 while also improving the documentation

There are several places in the existing documentation for running SQL via SessionContext

  1. https://datafusion.apache.org/user-guide/example-usage.html#run-a-sql-query-against-data-stored-in-a-csv
  2. https://docs.rs/datafusion/latest/datafusion/index.html#sql

Thus we don't need another one in the examples directory

What changes are included in this PR?

  1. Add a SQL section in the library user guide with the basic examples (thanks @tshauck for starting this)
  2. Consolidate parquet_sql, avro_sql, and csv_sql examples into the docs
  3. Make sure the examples run as part of the doctests (cargo doc ...)

Are these changes tested?

Yes, they are run via doctests

Are there any user-facing changes?

New docs

@alamb alamb added the documentation Improvements or additions to documentation label Jul 2, 2024
@github-actions github-actions bot added core Core DataFusion crate and removed documentation Improvements or additions to documentation labels Jul 2, 2024
user_guid_example_tests
);

#[cfg(doctest)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means these tests are run as part of the doc tests

$ cargo test --doc --features avro,json -- library_user_guide

...
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.16s
   Doc-tests datafusion

running 5 tests
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 764) ... ok
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 718) ... ok
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 673) ... ok
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 775) ... ok
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 641) ... ok
...

@alamb alamb force-pushed the alamb/sql_user_guide branch from 07bd95b to f21b576 Compare July 2, 2024 15:22
- [`parse_sql_expr.rs`](examples/parse_sql_expr.rs): Parse SQL text into Datafusion `Expr`.
- [`plan_to_sql.rs`](examples/plan_to_sql.rs): Generate SQL from Datafusion `Expr` and `LogicalPlan`
- [`pruning.rs`](examples/parquet_sql.rs): Use pruning to rule out files based on statistics
- [`pruning.rs`](examples/pruning.rs): Use pruning to rule out files based on statistics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by cleanup

## Single Process

- [`advanced_udaf.rs`](examples/advanced_udaf.rs): Define and invoke a more complicated User Defined Aggregate Function (UDAF)
- [`advanced_udf.rs`](examples/advanced_udf.rs): Define and invoke a more complicated User Defined Scalar Function (UDF)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed examples were just inlined into the guide

with the ListingTableProvider which takes a list of file paths and reads them
as a single table, matching schemas as appropriate

Coming Soon
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be cool to do but this PR is already large enought

@alamb alamb marked this pull request as ready for review July 2, 2024 15:46
Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@alamb
Copy link
Contributor Author

alamb commented Jul 3, 2024

Thank you for the review @jonahgao

@alamb alamb merged commit 03848c5 into apache:main Jul 3, 2024
comphead pushed a commit to comphead/arrow-datafusion that referenced this pull request Jul 8, 2024
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

datafusion-examples CI run is failing: final link failed: No space left on device

2 participants