Skip to content

docs: addition of interction/visualization/reporting tutorial#3159

Merged
johanneskoester merged 5 commits intomainfrom
docs/tutorial/intvizrep
Oct 20, 2024
Merged

docs: addition of interction/visualization/reporting tutorial#3159
johanneskoester merged 5 commits intomainfrom
docs/tutorial/intvizrep

Conversation

@johanneskoester
Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester commented Oct 20, 2024

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Introduced a comprehensive tutorial on interaction visualization and reporting with Snakemake.
    • Added new Jupyter notebooks for data manipulation and visualization using Polars and Altair.
    • Created YAML configuration files for managing environment dependencies for data processing and visualization.
  • Documentation

    • Updated the documentation to include new resources in the "Getting Started" section.
    • Enhanced existing documentation files to reflect new visualizations and workflows.
  • Bug Fixes

    • Improved output file handling in Jupyter notebook execution commands.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 20, 2024

Walkthrough

This pull request introduces a new tutorial focused on interaction visualization and reporting in Snakemake, along with a new table of contents entry for it. The tutorial covers setting up a Conda environment, obtaining data, and creating visualizations using R and Python. Additionally, several new YAML configuration files for environment management and multiple Jupyter notebooks for data manipulation and visualization are added. Minor adjustments to existing files enhance documentation and improve output handling in the Jupyter notebook execution process.

Changes

File Change Summary
docs/index.rst Added new entry in ToC: tutorial/interaction_visualization_reporting/tutorial.
docs/tutorial/interaction_visualization_reporting/tutorial.rst Introduced a comprehensive tutorial on Snakemake for interaction visualization and reporting.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/Snakefile Added multiple rules: all, get_data, plot_with_r, plot_with_python, view_with_datavzrd.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/download.yaml Created new conda environment configuration for data processing.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/pystats.yaml Created new conda environment configuration for Python visualization tools.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/rstats.yaml Created new conda environment configuration for R visualization tools.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/get_data.py.ipynb Created new Jupyter notebook for data manipulation using Polars and Vega datasets.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/plot_horsepower_vs_mpg.py.ipynb Created new Jupyter notebook for visualizing horsepower vs. MPG using Altair.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/plot_horsepower_vs_mpg.r.ipynb Created new Jupyter notebook for visualizing horsepower vs. MPG using ggplot2.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/report/cars.rst Added a reference to the cars dataset from the Vega project.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/report/horsepower_vs_mpg.rst Added a description of the horsepower vs. MPG visualization.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/report/workflow.rst Added a description of the Snakemake capabilities for last mile data analysis.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/resources/datavzrd/cars.yaml Created new YAML configuration for visualizing the cars dataset.
docs/tutorial/tutorial.rst Updated title from "Snakemake Tutorial" to "Tutorial: General use".
snakemake/notebook.py Modified execute_script method for better output file handling.
snakemake/report/html_reporter/template/components/menu.js Changed menu item text from "Result" to "Results".

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (23)
docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/download.yaml (1)

4-9: LGTM: Dependencies are well-defined. Consider adding comments for clarity.

The dependencies list is comprehensive and uses specific versions, which is excellent for reproducibility. However, it might be helpful to add brief comments explaining the purpose of each dependency.

Consider adding comments like this:

dependencies:
  - python =3.11  # Base Python interpreter
  - polars =1.1  # Fast DataFrame library for data manipulation
  - vega_datasets =0.9  # Example datasets for visualization
  - ipykernel =6.29  # IPython kernel for Jupyter
  - notebook =7.2  # Jupyter notebook
🧰 Tools
🪛 yamllint

[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)

docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/rstats.yaml (2)

4-10: LGTM: Well-defined dependencies with a minor suggestion.

The dependencies list is well-structured and appropriate for an R-based data analysis environment:

  • Specific versions are provided for each package, ensuring reproducibility.
  • The inclusion of r-irkernel allows R to be used in Jupyter notebooks.

However, there's a minor inconsistency:

The notebook package version (7.2) seems to be higher than the latest stable version (7.0.7). Consider updating it to:

-  - notebook =7.2
+  - notebook =7.0.7

This ensures you're using a stable, released version of the Jupyter notebook package.

🧰 Tools
🪛 yamllint

[error] 10-10: no new line character at the end of file

(new-line-at-end-of-file)


10-10: Add a newline at the end of the file.

To adhere to common best practices and prevent potential issues with certain tools, please add a newline character at the end of the file.

You can do this by adding an empty line at the end of the file:

   - r-irkernel =1.3
+
🧰 Tools
🪛 yamllint

[error] 10-10: no new line character at the end of file

(new-line-at-end-of-file)

docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/pystats.yaml (1)

4-13: LGTM: Dependencies are well-chosen and version-constrained.

The dependencies list is comprehensive and appropriate for a data analysis and visualization environment:

  • Python 3.11 is specified, which is a recent version.
  • Modern libraries for data processing (polars) and visualization (altair, vegafusion) are included.
  • Jupyter-related packages (notebook, ipykernel) are present, supporting interactive reporting.
  • Version constraints are used, which is excellent for reproducibility.

Consider adding a comment at the top of the file explaining the purpose of this environment, e.g.:

# Environment for Python-based data analysis and visualization in the Snakemake tutorial

This would provide quick context for users or contributors examining the file.

🧰 Tools
🪛 yamllint

[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/get_data.py.ipynb (2)

9-19: LGTM! Consider adding a brief comment explaining the data transformation.

The code is well-structured and uses appropriate libraries for data manipulation. The transformation of the "Year" column and the column name modifications improve the data's usability and readability.

Consider adding a brief comment explaining the purpose of the data transformation, e.g.:

# Transform the cars dataset: extract year, lowercase column names, and replace underscores with spaces
cars = pl.from_pandas(data.cars()).with_columns(
    pl.col("Year").dt.year()
).select(
    pl.col("*").name.map(lambda name: name.lower().replace("_", " "))
)

27-29: LGTM! Consider adding error handling.

The code correctly writes the DataFrame to a CSV file using Snakemake's output specification. The use of a tab separator is a good choice.

Consider adding error handling to make the code more robust:

try:
    cars.write_csv(snakemake.output[0], separator="\t")
    print(f"Data successfully written to {snakemake.output[0]}")
except Exception as e:
    print(f"Error writing data: {e}")
    raise
docs/tutorial/interaction_visualization_reporting/workdir/workflow/resources/datavzrd/cars.yaml (2)

8-58: Comprehensive view configuration with room for enhancement.

The view configuration for the 'cars' dataset is well-structured and provides a rich set of visualizations for different car attributes. Great job on using appropriate scales and plot types for various data types.

Some notable features:

  1. The Wikipedia link for the 'name' column adds valuable context.
  2. The use of heatmaps for 'cylinders' and 'origin' provides an intuitive visual representation.
  3. The 'display-mode: detail' for 'displacement' and 'horsepower' allows for more in-depth exploration.

Consider the following enhancements:

  1. Add tooltips to explain what 'display-mode: detail' means for users.
  2. Consider grouping related attributes (e.g., 'miles per gallon', 'horsepower', 'acceleration') for easier comparison.
  3. You might want to add a date format for the 'year' column if it's not already handled by the framework.

Would you like assistance in implementing any of these suggestions?

🧰 Tools
🪛 yamllint

[error] 58-58: no new line character at the end of file

(new-line-at-end-of-file)


58-58: Add a newline character at the end of the file.

The YAML linter has flagged that there's no newline character at the end of the file. While this doesn't affect the functionality, it's good practice to end files with a newline character.

Please add a newline character at the end of the file to resolve this linter warning and improve compatibility with various tools.

🧰 Tools
🪛 yamllint

[error] 58-58: no new line character at the end of file

(new-line-at-end-of-file)

docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/plot_horsepower_vs_mpg.r.ipynb (2)

32-45: Consider removing or commenting out the data output cell

This cell outputs the contents of the 'cars' data frame, which is useful for development and debugging. However, in a final version of the notebook or tutorial, this cell might not be necessary unless it serves a specific educational purpose.

Consider either removing this cell or adding a comment explaining its purpose in the context of the tutorial. If it's meant to show learners how to inspect data, you could add a markdown cell above it explaining this step.


46-61: LGTM with suggestion: Consider adding labels and title to the plot

The code effectively creates an SVG plot using ggplot2, showing the relationship between 'miles per gallon' and 'horsepower'. The use of snakemake@output[[1]] for the output file path maintains consistency with the Snakemake workflow integration.

To enhance the plot's informativeness, consider adding axis labels and a title. You can modify the ggplot code as follows:

ggplot(cars, aes(`miles per gallon`, horsepower)) + 
  geom_point() + 
  theme_classic(16) +
  labs(x = "Miles per Gallon", y = "Horsepower", title = "Horsepower vs. Miles per Gallon")

This addition will make the plot more self-explanatory and professional-looking.

docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/plot_horsepower_vs_mpg.py.ipynb (3)

15-25: Consider adding error handling for file reading.

The cell correctly uses Polars to read the CSV file, which is efficient. The integration with Snakemake using snakemake.input[0] is appropriate. However, consider adding error handling to manage potential issues when reading the file.

Here's a suggestion for adding basic error handling:

try:
    data = pl.read_csv(snakemake.input[0], separator="\t")
    data
except Exception as e:
    print(f"Error reading the file: {e}")
    raise

26-40: LGTM: Well-structured visualization. Consider adding a title and axis labels.

The cell creates an appropriate interactive scatter plot using Altair, effectively visualizing the relationship between miles per gallon and horsepower, with color encoding for origin. The interactivity enhances user experience.

To improve clarity, consider adding a title and explicit axis labels:

chart = alt.Chart(data).mark_point(tooltip=True).encode(
    alt.X("miles per gallon", title="Miles per Gallon"),
    alt.Y("horsepower", title="Horsepower"),
    alt.Color("origin").scale(scheme="accent"),
).properties(
    title="Horsepower vs Miles per Gallon by Origin"
).interactive()
chart

41-49: Consider adding error handling and save confirmation.

The cell correctly uses snakemake.output[0] to save the chart, integrating well with the Snakemake workflow. However, consider adding error handling and providing feedback on successful save.

Here's a suggestion for adding error handling and save confirmation:

try:
    chart.save(snakemake.output[0])
    print(f"Chart successfully saved to {snakemake.output[0]}")
except Exception as e:
    print(f"Error saving the chart: {e}")
    raise
docs/tutorial/tutorial.rst (2)

3-5: Approve title change with a minor suggestion.

The title change from "Snakemake Tutorial" to "Tutorial: General use" is appropriate and aligns well with the PR objective of adding new tutorials. This change effectively positions this document as part of a broader set of tutorials.

Consider capitalizing "General" for consistency with title case:

-Tutorial: General use
+Tutorial: General Use

Line range hint 1-62: Approve overall changes with a minor suggestion for clarity.

The changes to the title are appropriate, and the rest of the document remains relevant and accurate. The content's general nature supports the new, more generic title "Tutorial: General use".

To improve clarity and align with the new title, consider adding a brief introductory sentence after the title to explain that this is a general-use tutorial, and that other specific tutorials (like the new interaction visualization and reporting tutorial) are available elsewhere in the documentation. This will help users understand the structure of the tutorials better.

For example, you could add:

This tutorial provides a general introduction to Snakemake. For more specific tutorials, such as interaction visualization and reporting, please refer to the respective sections in the documentation.
docs/index.rst (1)

141-141: LGTM! Consider adding a descriptive title for the new tutorial entry.

The addition of the new tutorial entry "tutorial/interaction_visualization_reporting/tutorial" to the "Getting started" toctree is appropriate and aligns with the PR objectives. The placement and indentation are correct, maintaining consistency with existing entries.

To improve clarity for users browsing the documentation, consider adding a descriptive title for the new tutorial entry. For example:

-   tutorial/interaction_visualization_reporting/tutorial
+   Interaction, Visualization, and Reporting Tutorial <tutorial/interaction_visualization_reporting/tutorial>

This change would make the purpose of the tutorial more immediately clear in the table of contents.

snakemake/notebook.py (1)

68-72: Improved output handling, but consider further enhancements.

The changes improve the robustness of output handling by ensuring a valid output path is always provided and using absolute paths. However, there are a couple of suggestions for further improvement:

  1. Consider using a unique identifier for the temporary notebook file to prevent potential conflicts when executing multiple notebooks simultaneously. For example:

    import uuid
    temp_notebook = f"{uuid.uuid4()}.ipynb"
    output_parameter = f"--output '{tmp}/{temp_notebook}'"
  2. For consistency, consider using f-strings throughout this method. For example, change line 72 to:

    output_parameter = f"--output {fname_out!r}"
docs/tutorial/interaction_visualization_reporting/tutorial.rst (5)

32-104: LGTM: Clear instructions for data acquisition step

The instructions for creating the Snakemake rule, setting up the Conda environment, and using the Jupyter notebook integration are clear and well-explained. The introduction of the --edit-notebook feature is particularly valuable for interactive development.

Consider adding a brief explanation of why the Year column is converted to an integer. This would help users understand the data preprocessing step better.


164-228: LGTM: Clear instructions for creating an interactive plot with Python

This section effectively demonstrates how to integrate Python and Altair into the Snakemake workflow. The instructions are clear, and the explanation of the interactive features adds significant value to the tutorial.

Consider adding a brief comparison between the R and Python approaches, highlighting the strengths of each (e.g., ggplot2's static plots vs. Altair's interactive features). This could help users choose the most appropriate tool for their needs.


229-330: LGTM: Comprehensive instructions for creating interactive table views

This section effectively introduces Datavzrd and demonstrates its integration into the Snakemake workflow using a wrapper. The explanation of the Datavzrd configuration options is comprehensive and valuable.

Consider adding a brief explanation or link to more information about the YTE template engine used in the Datavzrd configuration. This would help users understand this aspect of the configuration better.


358-454: LGTM: Comprehensive instructions for generating reports

This section effectively explains the importance of reporting and provides clear instructions for annotating rules, creating caption files, and generating reports. The process is well-explained and emphasizes the value of connecting code and results for transparency.

Consider adding a brief explanation of how users can customize the report further, such as adding custom CSS or JavaScript. This would provide advanced users with information on how to tailor the report to their specific needs.


1-454: Excellent tutorial on Snakemake's interaction, visualization, and reporting capabilities

This tutorial provides a comprehensive and well-structured guide to using Snakemake for interactive data analysis, visualization, and reporting. It effectively demonstrates the integration of various tools (R, Python, Datavzrd) within the Snakemake workflow, showcasing the flexibility and power of the framework.

The progression from data acquisition to final reporting is logical and easy to follow. The use of Jupyter notebooks and interactive features is well-explained, providing users with valuable insights into modern data analysis workflows.

To further enhance the tutorial:

  1. Consider adding a troubleshooting section or FAQ to address common issues users might encounter.
  2. Include information on how to extend this workflow, such as adding more data sources or visualization types.
  3. Provide links to more advanced Snakemake features that users might want to explore after completing this tutorial.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/Snakefile (1)

29-29: Duplicate captions in plot_with_r and plot_with_python rules

The caption parameter is set to "report/horsepower_vs_mpg.rst" in both the plot_with_r and plot_with_python rules. If the intention is to use the same caption for both plots, this is acceptable. However, if different captions are desired to distinguish between the R and Python visualizations, consider updating one of them.

To provide distinct captions, you might adjust the caption in the plot_with_python rule:

     caption="report/horsepower_vs_mpg.rst",
+    # Consider changing to a different caption file
+    # caption="report/horsepower_vs_mpg_python.rst",

Also applies to: 47-47

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between e10feef and ebad3aa.

⛔ Files ignored due to path filters (1)
  • CHANGELOG.md is excluded by !CHANGELOG.md
📒 Files selected for processing (16)
  • docs/index.rst (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/tutorial.rst (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/Snakefile (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/download.yaml (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/pystats.yaml (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/rstats.yaml (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/get_data.py.ipynb (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/plot_horsepower_vs_mpg.py.ipynb (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/plot_horsepower_vs_mpg.r.ipynb (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/report/cars.rst (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/report/horsepower_vs_mpg.rst (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/report/workflow.rst (1 hunks)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/resources/datavzrd/cars.yaml (1 hunks)
  • docs/tutorial/tutorial.rst (1 hunks)
  • snakemake/notebook.py (1 hunks)
  • snakemake/report/html_reporter/template/components/menu.js (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/report/cars.rst
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/report/horsepower_vs_mpg.rst
  • docs/tutorial/interaction_visualization_reporting/workdir/workflow/report/workflow.rst
🧰 Additional context used
📓 Path-based instructions (1)
snakemake/notebook.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🪛 yamllint
docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/download.yaml

[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)

docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/pystats.yaml

[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/rstats.yaml

[error] 10-10: no new line character at the end of file

(new-line-at-end-of-file)

docs/tutorial/interaction_visualization_reporting/workdir/workflow/resources/datavzrd/cars.yaml

[error] 58-58: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (22)
docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/download.yaml (1)

1-3: LGTM: Channel specifications are well-defined.

The use of conda-forge and nodefaults channels is a good practice. It ensures up-to-date packages and avoids potential conflicts from default channels.

docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/rstats.yaml (1)

1-3: LGTM: Appropriate channel configuration.

The channel configuration is well-structured:

  • Using conda-forge as the primary channel ensures access to up-to-date packages.
  • The nodefaults channel helps prevent potential conflicts with default channels.

This setup promotes reproducibility and stability in the environment.

docs/tutorial/interaction_visualization_reporting/workdir/workflow/envs/pystats.yaml (1)

1-3: LGTM: Channel configuration looks good.

The channel configuration is well-structured:

  • Using conda-forge as the primary channel is a good practice for up-to-date packages.
  • Including nodefaults helps prevent potential conflicts with default channels.
docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/get_data.py.ipynb (2)

32-52: LGTM! Metadata is appropriate for a modern Jupyter notebook.

The metadata indicates the use of a recent Python version (3.11.10) and follows the standard Jupyter notebook format. This ensures good compatibility with modern libraries and tools.


1-53: Overall, excellent addition to the tutorial!

This notebook effectively demonstrates the integration of Jupyter notebooks with Snakemake workflows for data processing. It aligns well with the PR objectives of adding a tutorial on interaction, visualization, and reporting. The code is well-structured, uses appropriate libraries, and follows good practices. The minor suggestions provided will further enhance its robustness and readability.

docs/tutorial/interaction_visualization_reporting/workdir/workflow/resources/datavzrd/cars.yaml (3)

1-1: Clarify the purpose of the __use_yte__ flag.

The __use_yte__ flag is set to true. Could you provide more information about what this flag does and why it's necessary for this configuration?


3-6: Confirm the dataset configuration.

The dataset configuration looks good. The use of ?input.table for the path suggests that the actual input file path will be dynamically set, which is flexible.

Please confirm that the tab character ("\t") is the correct separator for the expected input data format.


1-58: Overall, excellent configuration for car data visualization.

This YAML configuration file provides a comprehensive and well-structured setup for visualizing the cars dataset. It offers a variety of visualization options that are appropriate for different types of data, enhancing the interaction and reporting capabilities as intended by the new tutorial.

Key strengths:

  1. Flexible input configuration with dynamic path setting.
  2. Rich set of visualizations for various car attributes.
  3. Thoughtful use of different scales and plot types.
  4. Additional context provided through Wikipedia links.

The minor suggestions provided earlier can further improve this already solid configuration. Great job on creating this visualization setup!

🧰 Tools
🪛 yamllint

[error] 58-58: no new line character at the end of file

(new-line-at-end-of-file)

docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/plot_horsepower_vs_mpg.r.ipynb (4)

1-17: LGTM: Appropriate libraries loaded

The code cell correctly loads the necessary R libraries (readr and ggplot2) for data reading and plotting. These are suitable choices for the task at hand.


18-31: LGTM: Effective data reading with Snakemake integration

This cell efficiently reads the TSV file using read_tsv from the readr library. The use of snakemake@input[[1]] as the input file path demonstrates good integration with the Snakemake workflow. Suppressing column type messages with show_col_types = FALSE is a nice touch for cleaner output.


63-79: LGTM: Proper metadata setup for R notebook

The metadata is correctly configured for an R notebook, specifying R as the kernel and language. The inclusion of file extension, MIME type, and R version information (4.4.1) is comprehensive and helpful for notebook execution and compatibility.


1-80: Overall: Well-structured notebook suitable for the tutorial

This R notebook effectively demonstrates data reading, basic exploration, and visualization within a Snakemake workflow. It serves as a good example for the tutorial on interaction, visualization, and reporting in Snakemake. The code is clean, well-integrated with Snakemake, and accomplishes its purpose of creating a plot of horsepower vs. miles per gallon.

The suggested improvements (considering the removal of the data output cell and adding labels to the plot) would further enhance its educational value. Overall, this notebook is a valuable addition to the tutorial.

snakemake/report/html_reporter/template/components/menu.js (1)

52-52: Approved: Improved clarity in menu heading

The change from "Result" to "Results" in the ListHeading component is a good improvement. It enhances the user experience by using the grammatically correct plural form when displaying multiple categories of results.

This minor update aligns well with the PR's focus on enhancing documentation and reporting features in Snakemake.

docs/tutorial/interaction_visualization_reporting/workdir/workflow/notebooks/plot_horsepower_vs_mpg.py.ipynb (3)

1-14: LGTM: Appropriate library imports and configuration.

The cell correctly imports the necessary libraries (Altair for visualization and Polars for data manipulation) and enables the VegaFusion data transformer for Altair, which is good practice for improved performance with large datasets.


52-72: LGTM: Appropriate metadata for a Jupyter notebook.

The metadata section contains standard information for a Jupyter notebook, including kernel specification and language information. The use of a recent Python version (3.11.10) is commendable.


1-73: Overall, excellent notebook demonstrating Snakemake integration with data visualization.

This notebook effectively achieves the PR objective of adding a tutorial on interaction, visualization, and reporting. It demonstrates good practices in data visualization using Altair and integrates seamlessly with Snakemake for input and output management. The suggested minor improvements in error handling and chart labeling will further enhance its robustness and clarity.

The notebook serves as a valuable addition to the Snakemake documentation, providing a clear example of how to create interactive visualizations within a Snakemake workflow.

docs/tutorial/interaction_visualization_reporting/tutorial.rst (3)

1-31: LGTM: Well-structured introduction and setup instructions

The introduction effectively sets the context for the tutorial, explaining the purpose and tools involved. The setup instructions are clear, concise, and should be easy for users to follow.


105-163: LGTM: Clear instructions for creating a plot with R

This section effectively demonstrates how to integrate R and ggplot2 into the Snakemake workflow. The instructions for creating the rule, setting up the Conda environment, and using the Jupyter notebook are clear and consistent with the previous step.


331-357: LGTM: Clear instructions for defining and using default targets

This section effectively explains how to define default targets for the workflow and why they are useful. The instructions for running the complete workflow are clear and concise.

docs/tutorial/interaction_visualization_reporting/workdir/workflow/Snakefile (3)

17-18: Verify the usage of the notebook directive

The notebook directive is used in multiple rules:

  • Lines 17-18 in rule get_data
  • Lines 35-36 in rule plot_with_r
  • Lines 53-54 in rule plot_with_python

Ensure that the notebook directive is compatible with your version of Snakemake. In some cases, integrating Jupyter notebooks within Snakemake may require additional configuration or the use of wrappers.

Also applies to: 35-36, 53-54


3-72: Well-Structured Workflow with Clear Data Processing and Visualization Steps

The Snakefile defines a coherent workflow that effectively sequences data acquisition, visualization using R and Python, and reporting. The use of conda environments and separation of concerns in different rules enhances reproducibility and maintainability.


71-72: Confirm the wrapper specification for view_with_datavzrd rule

In the view_with_datavzrd rule, the wrapper is specified as:

wrapper:
    "v4.7.2/utils/datavzrd"

Ensure that the wrapper path and version are correct and that this wrapper is available. The standard format for specifying wrappers typically includes the full URL or uses the shorthand for official Snakemake wrappers.

Run the following script to check the availability of the specified wrapper:

If the output is 200, the wrapper exists; otherwise, you may need to verify the version number or the path.

✅ Verification successful

Wrapper specification confirmed and available

The view_with_datavzrd rule's wrapper v4.7.2/utils/datavzrd is correctly specified and available.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the specified wrapper exists in the Snakemake wrappers repository.

# Expected Result: The wrapper should exist in the specified location.

# Test: Fetch the wrapper URL and check for a 200 OK response.
WRAPPER_URL="https://github.com/snakemake/snakemake-wrappers/raw/v4.7.2/utils/datavzrd"

# Check the HTTP status code.
curl -o /dev/null -s -w "%{http_code}\n" "$WRAPPER_URL"

Length of output: 208


Script:

#!/bin/bash
# Description: Follow redirects to verify the final location of the specified wrapper.

# Original Wrapper URL
INITIAL_WRAPPER_URL="https://github.com/snakemake/snakemake-wrappers/raw/v4.7.2/utils/datavzrd"

# Follow redirects and get the final URL
FINAL_WRAPPER_URL=$(curl -s -o /dev/null -w "%{url_effective}" -L "$INITIAL_WRAPPER_URL")

# Check the HTTP status code of the final URL
HTTP_STATUS=$(curl -o /dev/null -s -w "%{http_code}\n" "$FINAL_WRAPPER_URL")

echo "Final Wrapper URL: $FINAL_WRAPPER_URL"
echo "HTTP Status Code: $HTTP_STATUS"

Length of output: 700

- polars =1.1
- vega_datasets =0.9
- ipykernel =6.29
- notebook =7.2 No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add a newline at the end of the file.

To adhere to YAML best practices and prevent potential issues with certain tools, please add a newline at the end of the file.

Apply this change:

 - ipykernel =6.29
 - notebook =7.2
+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- notebook =7.2
- notebook =7.2
🧰 Tools
🪛 yamllint

[error] 9-9: no new line character at the end of file

(new-line-at-end-of-file)

Comment on lines +1 to +13
channels:
- conda-forge
- nodefaults
dependencies:
- python =3.11
- polars =1.1
- altair =5.3
- altair_saver =0.5
- vegafusion =1.6
- vegafusion-python-embed =1.6
- vl-convert-python =1.5
- notebook =7.2
- ipykernel =6.29 No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add a newline at the end of the file.

The YAML file is well-structured, but it's missing a newline at the end. This is a common best practice and can prevent issues with some tools.

Please add a newline at the end of the file. You can do this by ensuring there's an empty line after the last line of content:

  - notebook =7.2
  - ipykernel =6.29
+

This small change will resolve the YAML linter warning and adhere to best practices for text files.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
channels:
- conda-forge
- nodefaults
dependencies:
- python =3.11
- polars =1.1
- altair =5.3
- altair_saver =0.5
- vegafusion =1.6
- vegafusion-python-embed =1.6
- vl-convert-python =1.5
- notebook =7.2
- ipykernel =6.29
channels:
- conda-forge
- nodefaults
dependencies:
- python =3.11
- polars =1.1
- altair =5.3
- altair_saver =0.5
- vegafusion =1.6
- vegafusion-python-embed =1.6
- vl-convert-python =1.5
- notebook =7.2
- ipykernel =6.29
🧰 Tools
🪛 yamllint

[error] 13-13: no new line character at the end of file

(new-line-at-end-of-file)

@johanneskoester johanneskoester merged commit 1d94bd1 into main Oct 20, 2024
@johanneskoester johanneskoester deleted the docs/tutorial/intvizrep branch October 20, 2024 19:41
johanneskoester pushed a commit that referenced this pull request Oct 21, 2024
🤖 I have created a release *beep* *boop*
---


##
[8.24.0](v8.23.2...v8.24.0)
(2024-10-21)


### Features

* subsample jobs to speed-up scheduler
([#3112](#3112))
([e10feef](e10feef))


### Documentation

* addition of interction/visualization/reporting tutorial
([#3159](#3159))
([1d94bd1](1d94bd1))
* fix tutorial step numbering
([2d7b9e9](2d7b9e9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant