Skip to content

Add helper function for recipe clean up#223

Merged
ryanking13 merged 24 commits intopyodide:mainfrom
teddygood:feat/build-recipes-clean
Aug 22, 2025
Merged

Add helper function for recipe clean up#223
ryanking13 merged 24 commits intopyodide:mainfrom
teddygood:feat/build-recipes-clean

Conversation

@teddygood
Copy link
Copy Markdown
Contributor

Closes #218

Summary

  • Adds clean subcommand to pyodide build-recipes for removing downloaded sources and intermediate build artifacts.
  • Preserves dist/ by default; can include with --include-dist.
  • Supports package names, tags (tag:<name>), or all packages (--all).
  • No dependency traversal.

Changes

  • Implement clean command with:
    • Default: remove <pkg>/build and <pkg>/build.log.
    • --include-dist: also remove <pkg>/dist and global install dir.
  • Target resolution via existing loader.
  • Added CLI tests (test_clean.py).

Usage

# Clean all (excludes dist/)
pyodide build-recipes clean

# Clean selected targets
pyodide build-recipes clean numpy tag:core

# Include dist/
pyodide build-recipes clean --include-dist

# Custom directories
pyodide build-recipes clean \
  --recipe-dir=PATH_TO_RECIPES \
  --build-dir=PATH_TO_BUILD \
  --install-dir=PATH_TO_DIST \
  PKG1 PKG2

Copy link
Copy Markdown
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @teddygood. I left some comments.

Because the CLI design would need some discussion, I think we can separate this PR. How about

  1. first implementing some helper functions to clean up the recipe build artifacts,
  2. then connect a CLI entrypoints to those functions.

While working on (1), you can probably also refactor some of our code to make it reusable. For instance, locating the log file or build directories are already implemented in recipe package, and hardcoding those directories are quite easy to break.

from pyodide_build import build_env
from pyodide_build.build_env import BuildArgs, init_environment
from pyodide_build.common import get_num_cores
from pyodide_build.errors import ActionableError
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no definition for ActionableError; maybe you forgot to define it, or the hallucination of AI code generators generated it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. This commit was my mistake. I was originally working on issue #218, and after finishing that, I was addressing issue #220 but accidentally committed it to the wrong branch without checking out properly.

Comment on lines +270 to +272
# Register subcommands to the Typer app
app.command("no-deps")(build_recipes_no_deps)
app.command("build")(build_recipes)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I would like to avoid creating more subcommands.

It was my bad that in the original issue (#218), I suggested adding a clean subcommand under the build-recipes. As there can be (at actually is) a Python package named clean, I think having a subcommand clean under the build-recipes can be confusing.

Instead, maybe let's define it as a top-level subcommand, so that we can extend the clean command itself.

pyodide clean # clean all build artifacts
pyodide clean recipes # clean only recipe-related artifacts
pyodide clean <something else>

@hoodmane @agriyakhetarpal any thoughts on the CLI design?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you are suggesting sounds good to me.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, having the possibility to extend pyodide clean SGTM, too.

Comment on lines +458 to +460
# Legacy usage: `pyodide build-recipes` without subcommand will build all packages
build_recipes(
[], # Build all packages when no subcommand is specified
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a breaking change, and I would like to avoid that. As I commented in the above comment, let's not introduce another subcommand.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll drop this commit and remove the subcommand-related changes from this PR so the current behavior stays the same.



@app.command("clean")
def clean(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it easier to write unittests (and also to reduce the dependency on the CLI package) let's split the functions.

  1. The CLI function should only be responsible for handling arguments and filling out the default values.
  2. Then, it should call the actual function that does the cleanup process.
  3. The function that does the clean up can also be separated to multiple small functions that can be tested separately.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll split the CLI from the cleanup logic. The CLI will only parse args/defaults and call small, testable helpers (e.g., resolve targets, remove build/log, optional dist, optional global dist). Unit tests will focus on the helpers.

@teddygood
Copy link
Copy Markdown
Contributor Author

teddygood commented Aug 13, 2025

Thanks for the feedback! To proceed, here’s a small plan - please let me know if this looks good:

  • Keep this PR limited to helper functions + unit tests (no CLI changes).

    • Reuse existing recipe utilities (no hardcoded paths).
    • Default: preserve dist; add an option in helpers to include dist when needed.
    • No dependency traversal; don’t implicitly include the “always” tag.
  • Drop the unrelated commit (9673975) and remove subcommand/callback changes so current behavior stays the same.

  • In a follow-up PR, wire the helpers to a top-level CLI:

    • pyodide clean / pyodide clean recipes [<pkg>|tag:<name>]
    • --include-dist to opt in removing per-package and global install dist.

If this direction sounds good, I’ll push the updates for this PR accordingly. Thanks!

@ryanking13
Copy link
Copy Markdown
Member

Thanks for summarizing the plans.

Reuse existing recipe utilities (no hardcoded paths).

Yes, sounds good. Actually, some paths might be hardcoded in recipe classes as well, but I would like to avoid adding more of hardcoded paths. If you think you can refactor some code to make it more reusable, feel free to go ahead and refactor it. But please do not mix refactoring and feature implementation, unless the change is very trivial. Try to do it step-by-step.

Default: preserve dist; add an option in helpers to include dist when needed.

Yes. Note that there are two dist directories, one for global dist directory that stores all the wheels and another for storing local wheels per package.

No dependency traversal; don’t implicitly include the “always” tag.

This part can be a matter of choice. We can change the decision at any time later. So when implementing the cleanup feature, this part should be easy to modify.

In a follow-up PR, wire the helpers to a top-level CLI:

Yes, sounds good.

--include-dist to opt in removing per-package and global install dist.

Maybe something like --all would be better. Users do not need to know about the name dist. Anyway, the name can be changed easily, so let's discuss the name in a separate PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this file in this PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I forgot to remove it. I’ll remove it in this PR.

@@ -0,0 +1,118 @@
from __future__ import annotations
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is required. Was there some issue with type annotations without this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. There was no issue with type annotations. I added it out of habit. I’ll remove it.

return list(recipes.keys())


def remove_package_build(recipe_dir: Path, build_dir_base: Path, package: str) -> bool:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the remove_... functions are doing the similar thing with duplicate codes, so it would be nice to refactor the functions to be more general.

I think we can have two functions that:

  1. Removes any file or directory given as a parameter
  2. Locate a set of files to remove

Then, the function 1 would have a more simple signature, and can be used in a more general purposes.

If we have more files to delete in the future, we can update function 2 to include those files.

Copy link
Copy Markdown
Contributor Author

@teddygood teddygood Aug 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing as suggested I will introduce _remove_path(path) to delete files or directories and _locate_cleanup_paths_for_package(...) to compute the paths to remove then update clean_recipes(...) to use these helpers and remove the duplicated remove_* functions If we need to delete more artifacts later we will only update the locator function.

"""
Clean recipe build artifacts and optionally dist directories.

Returns the number of items removed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this function should return the number of items removed. It doesn't seem very useful to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I will remove the return value and make the function void, since the count is not particularly useful and no callers rely on it.

Comment on lines +82 to +83
def perform_recipe_cleanup(
*,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make recipe_dir and targets as positional parameters, instead of making everything keyword-only parameters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make recipe_dir and targets positional and keep the rest keyword-only in clean_recipes.

_make_pkg_with_artifacts(pkg)

removed = perform_recipe_cleanup(
recipe_dir=RECIPE_DIR,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to use a real recipe directory for this. I think you can utilize tmp_path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to avoid using a real recipe directory we should use pytest tmp_path to create an isolated temporary recipe directory and generate minimal meta.yaml files so that the loader can discover packages Is this approach okay for these tests?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is okay.

return False


def perform_recipe_cleanup(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming: How about simply clean_recipes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I will rename it to clean_recipes.

Copy link
Copy Markdown
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I left some minor comments, but it looks good overall.

"""
try:
if path.is_dir():
import shutil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(friendly question) Is there a specific reason you wanted to import shutil dynamically? If it were to optimize the speed, I think we don't need to care much about the import speed in this case.

Otherwise, let's move this import statement to the top of the file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I recently started using Cursor and didn’t review the import carefully, sorry about that. You’re right that there’s no need to import it dynamically, I’ll move it to the top as suggested.

logger.info("Removing %s", str(path))
path.unlink(missing_ok=True) # type: ignore[call-arg]
else:
# Path does not exist; nothing to do
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's print a debug message in this case. So we can debug the issue when it does not work well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. I’ll add a debug log for this case so it’s easier to trace when nothing happens.

# Path does not exist; nothing to do
return
except Exception:
# Best-effort cleanup; ignore failures
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I’ll add a debug log here too.

Comment on lines +40 to +46
# Per-package build directory
paths.append(build_dir_base / package / "build")
# Per-package build log
paths.append(recipe_dir / package / "build.log")
# Per-package dist directory (optional)
if include_dist:
paths.append(recipe_dir / package / "dist")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required for this PR)

As I mentioned in the previous comment (#223 (review)), I think this part can be improved to reuse other existing code (or to be more object-oriented).

For instance, we might be able to do something like

builder = RecipeBuilder(recipe_dir, ...)
return [builder.build_dir, builder.log_file, ...]

or maybe the RecipeBuilder class can be responsible for cleaning up the files itself.

builder = RecipeBuilder(recipe_dir, ...)
builder.clean()

Of course, our RecipeBuilder class does not implement such attributes or methods yet, so you don't need to address it in this PR.

If you are interested, feel free to refactor it in a follow-up PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I’ll work on this refactor in a follow-up PR. Thanks for the suggestion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be renamed to test_cleanup now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I’ll rename it.

@ryanking13 ryanking13 changed the title Add clean subcommand to pyodide build-recipes CLI Add helper function for recipe clean up Aug 22, 2025
Copy link
Copy Markdown
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I updated the PR title as the content of the PR changed. Otherwise looks good to me.

@ryanking13 ryanking13 merged commit 48a980d into pyodide:main Aug 22, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add pyodide build-recipes clean subcommand for cleaning up the downloaded files

4 participants