Skip to content

GH-38916: [R] Simplify dataset and table print output#38917

Merged
thisisnic merged 9 commits intoapache:mainfrom
thisisnic:GH-38916_print_output
Mar 13, 2024
Merged

GH-38916: [R] Simplify dataset and table print output#38917
thisisnic merged 9 commits intoapache:mainfrom
thisisnic:GH-38916_print_output

Conversation

@thisisnic
Copy link
Copy Markdown
Member

@thisisnic thisisnic commented Nov 28, 2023

Rationale for this change

When printing objects with data with lots of rows, the output is long and unwieldy.

What changes are included in this PR?

  • Truncates long schema print output and adds the number of columns to dataset print output.
  • Add number of columns to output so it's clear how many there are in total

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

Before:

library(arrow)
x <- tibble::tibble(!!!letters, .rows = 5)
InMemoryDataset$create(x)
#> InMemoryDataset
#> "a": string
#> "b": string
#> "c": string
#> "d": string
#> "e": string
#> "f": string
#> "g": string
#> "h": string
#> "i": string
#> "j": string
#> "k": string
#> "l": string
#> "m": string
#> "n": string
#> "o": string
#> "p": string
#> "q": string
#> "r": string
#> "s": string
#> "t": string
#> "u": string
#> "v": string
#> "w": string
#> "x": string
#> "y": string
#> "z": string
arrow_table(x)
#> Table
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> $"u" <string>
#> $"v" <string>
#> $"w" <string>
#> $"x" <string>
#> $"y" <string>
#> $"z" <string>
record_batch(x)
#> RecordBatch
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> $"u" <string>
#> $"v" <string>
#> $"w" <string>
#> $"x" <string>
#> $"y" <string>
#> $"z" <string>

After:

library(arrow)

x <- tibble::tibble(!!!letters, .rows = 5)
InMemoryDataset$create(x)
#> InMemoryDataset
#> 26 columns 
#> "a": string
#> "b": string
#> "c": string
#> "d": string
#> "e": string
#> "f": string
#> "g": string
#> "h": string
#> "i": string
#> "j": string
#> "k": string
#> "l": string
#> "m": string
#> "n": string
#> "o": string
#> "p": string
#> "q": string
#> "r": string
#> "s": string
#> "t": string
#> ...
#> Use `schema()` to see entire schema
arrow_table(x)
#> Table
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> ...
#> Use `schema()` to see entire schema
record_batch(x)
#> RecordBatch
#> 5 rows x 26 columns
#> $"a" <string>
#> $"b" <string>
#> $"c" <string>
#> $"d" <string>
#> $"e" <string>
#> $"f" <string>
#> $"g" <string>
#> $"h" <string>
#> $"i" <string>
#> $"j" <string>
#> $"k" <string>
#> $"l" <string>
#> $"m" <string>
#> $"n" <string>
#> $"o" <string>
#> $"p" <string>
#> $"q" <string>
#> $"r" <string>
#> $"s" <string>
#> $"t" <string>
#> ...
#> Use `schema()` to see entire schema

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #38916 has been automatically assigned in GitHub to PR creator.

@paleolimbot
Copy link
Copy Markdown
Member

I see this still has some tests failing...just making sure you weren't waiting on any of us for a review!

@thisisnic
Copy link
Copy Markdown
Member Author

I see this still has some tests failing...just making sure you weren't waiting on any of us for a review!

Nah, life has just been busy, but thanks for checking in on it. I'll ping you once I get time to take a look at it again!

@thisisnic
Copy link
Copy Markdown
Member Author

@paleolimbot This is now ready for review; mind giving this a look over when you get the chance?

r/R/schema.R Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how many fields we're not showing here? Rather than ..., giving a count of hidden fields would be a nice improvement. I'm thinking about how pillar does it:

> starwars
# A tibble: 87 × 14
   name          height  mass hair_color skin_color eye_color birth_year sex   gender homeworld species films vehicles
   <chr>          <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr>  <chr>     <chr>   <lis> <list>  
 1 Luke Skywalk…    172    77 blond      fair       blue            19   male  mascu… Tatooine  Human   <chr> <chr>   
 2 C-3PO            167    75 NA         gold       yellow         112   none  mascu… Tatooine  Droid   <chr> <chr>   
 3 R2-D2             96    32 NA         white, bl… red             33   none  mascu… Naboo     Droid   <chr> <chr>   
 4 Darth Vader      202   136 none       white      yellow          41.9 male  mascu… Tatooine  Human   <chr> <chr>   
 5 Leia Organa      150    49 brown      light      brown           19   fema… femin… Alderaan  Human   <chr> <chr>   
 6 Owen Lars        178   120 brown, gr… light      blue            52   male  mascu… Tatooine  Human   <chr> <chr>   
 7 Beru Whitesu…    165    75 brown      light      blue            47   fema… femin… Tatooine  Human   <chr> <chr>   
 8 R5-D4             97    32 NA         white, red red             NA   none  mascu… Tatooine  Droid   <chr> <chr>   
 9 Biggs Darkli…    183    84 black      light      brown           24   male  mascu… Tatooine  Human   <chr> <chr>   
10 Obi-Wan Keno…    182    77 auburn, w… fair       blue-gray       57   male  mascu… Stewjon   Human   <chr> <chr>   
# ℹ 77 more rows
# ℹ 1 more variable: starships <list>
# ℹ Use `print(n = ...)` to see more rows

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@amoeba
Copy link
Copy Markdown
Member

amoeba commented Mar 8, 2024

Thanks @thisisnic. I see this as a nice improvement and don't see any issues with it. I left a few notes.

Copy link
Copy Markdown
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I have a few minor comments, but nothing major.

I'm not at all opposed to merging this since it's an improvement, but I think #32110 will be the big win here for most end users.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Mar 9, 2024
@github-actions github-actions bot added Component: Documentation awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Mar 10, 2024
@thisisnic
Copy link
Copy Markdown
Member Author

Thanks @jonkeane, I agree we should take a look at #32110 soon, but merge this as an interim thing in the meantime.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 10, 2024
@thisisnic thisisnic merged commit ac1708c into apache:main Mar 13, 2024
@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit ac1708c.

There was 1 benchmark result with an error:

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details. It also includes information about 19 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[R] Simplify dataset and table print output

4 participants