Skip to content

pkg: implement a package search command#12547

Merged
Alizter merged 1 commit intoocaml:mainfrom
punchagan:dune-pkg-search
Oct 17, 2025
Merged

pkg: implement a package search command#12547
Alizter merged 1 commit intoocaml:mainfrom
punchagan:dune-pkg-search

Conversation

@punchagan
Copy link
Copy Markdown
Collaborator

@punchagan punchagan commented Oct 10, 2025

This PR adds an initial version of dune pkg search <query> in an attempt to address #12106.

  • Adds a command dune pkg search >query>. <query> is a required parameter which can be a Perl regular expression that is used to search package names.

  • The command only works in a dune workspace, since it would require
    repositories to be configured to perform the search.

  • Currently, search looks through valid package directory names in a repository and lists the maximum version that has not been marked as avoid-version along with the synopsis.

  • Currently, we only search in the repositories configured in the lock dir for the default context.

NOTE: I'm new to the dune codebase and this is my first "significant" code contribution. Please do point me to things to read and learn about, if there are things that can help with building missing conceptual understanding, etc.

@Alizter
Copy link
Copy Markdown
Collaborator

Alizter commented Oct 10, 2025

I've made some initial comments. Some more points

  1. We sign each commit with a "DCO". Have a look at CONTRIBUTING.md on what to put. Git has an option -S you can locally configure to always do it if you like.

  2. I think writing a cram test demonstrating some of the basic behaviour would be a good idea. You can search in test/blackbox-tests/test-cases/pkg for other tests, perhaps ones about dune pkg outdated would be useful. We can create some mock repositories and test some various things about the command. Typically its easier to have one test per thing we are testing for, as it makes running them concurrently easier, and fixing breakages easier. There is more information in doc/hacking.rst.

Things to test for:

  • basic usage
  • multiple repositories
  • conflicting packages
  • no search result
  • many search results

Since there might be a few, probably best to make a subdirectory in test/.../pkg/.

  1. Regarding the design, Marek will add the fetching step as part of the build soon, so we will have to make a decision if we fetch or not with this command. I think its sensible to always fetch, and when network is not available, warn the user and fallback to the local cache. This will probably end up being implemented as a more general mechanism anyway, but something to be aware of.

@punchagan
Copy link
Copy Markdown
Collaborator Author

punchagan commented Oct 10, 2025

I've made some initial comments. Some more points

Thanks for your quick initial review! It is helpful to validate that the general approach here isn't completely off-track, etc.

  1. We sign each commit with a "DCO". Have a look at CONTRIBUTING.md on what to put. Git has an option -S you can locally configure to always do it if you like.

I'm aware, thanks! My preference is to do it just before marking the PR as ready for merge as a signal that I'm happy with what is here to be merged (if and when it is). But, I can make sure all commits are signed, if that's preferred, generally.

  1. I think writing a cram test demonstrating some of the basic behaviour would be a good idea. You can search in test/blackbox-tests/test-cases/pkg for other tests, perhaps ones about dune pkg outdated would be useful. We can create some mock repositories and test some various things about the command. Typically its easier to have one test per thing we are testing for, as it makes running them concurrently easier, and fixing breakages easier. There is more information in doc/hacking.rst.

Sure, I'll add cram tests. Thanks for the pointer to the docs!

Since there might be a few, probably best to make a subdirectory in test/.../pkg/.

Makes sense. 👍

  1. Regarding the design, Marek will add the fetching step as part of the build soon, so we will have to make a decision if we fetch or not with this command. I think its sensible to always fetch, and when network is not available, warn the user and fallback to the local cache. This will probably end up being implemented as a more general mechanism anyway, but something to be aware of.

Understood.

@punchagan
Copy link
Copy Markdown
Collaborator Author

I have added some cram tests, and addressed some of your initial comments, @Alizter .

I, for one, know that the output produced could use some improvement to make it more readable. But, would you mind taking another look to see if there are other things that need work? Thanks!

@Alizter
Copy link
Copy Markdown
Collaborator

Alizter commented Oct 13, 2025

Regarding the style in the cram tests. We generally like comments to be wrapped at 80 and you don't need to start comments with # in cram syntax, any line starting at col 0 is a comment, and any line starting with $ at 2 is a command, > at 2 is a command continuation and starting at 2 is an output.

@punchagan punchagan force-pushed the dune-pkg-search branch 2 times, most recently from 3047aae to 26f0064 Compare October 13, 2025 15:09
Copy link
Copy Markdown
Member

@rgrinberg rgrinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're on the right track. Thanks for doing this work. I've left some pointers. One last bit of general advice:

Try to keep the implementation of the search itself self contained in bin/pkg and only extend opam_repo as little as possible to provide the apis needed for search.

@punchagan punchagan force-pushed the dune-pkg-search branch 3 times, most recently from 15c12bd to c277143 Compare October 14, 2025 14:18
@punchagan punchagan force-pushed the dune-pkg-search branch 6 times, most recently from 65cc64c to 61e63a4 Compare October 14, 2025 16:28
@punchagan
Copy link
Copy Markdown
Collaborator Author

You're on the right track. Thanks for doing this work.

Thanks for taking a look and the quick confirmation that I'm on the right track! I appreciate it.

Try to keep the implementation of the search itself self contained in bin/pkg and only extend opam_repo as little as possible to provide the apis needed for search.

This makes sense! I've refactored the code taking this piece of advice into account.

@punchagan punchagan changed the title WIP: Implement a package search command pkg: implement a package search command Oct 14, 2025
@punchagan punchagan marked this pull request as ready for review October 14, 2025 17:02
@Alizter
Copy link
Copy Markdown
Collaborator

Alizter commented Oct 15, 2025

Perl style regular expressions adds a lot of extra stuff that probably won't be needed. I think using Re.re is better since we already do that in dune_glob for instance. (This gets used by the (glob ...) dependency. This also saves us from having to explain what Perl style regular expressions are.

However if you think Perl style regular expressions would be useful for package search, I would be open to it. I couldn't come up with a use case myself.

@punchagan
Copy link
Copy Markdown
Collaborator Author

Perl style regular expressions adds a lot of extra stuff that probably won't be needed. I think using Re.re is better since we already do that in dune_glob for instance

Yes, I'm happy to switch over! I'll update the PR soon.

@punchagan
Copy link
Copy Markdown
Collaborator Author

Perl style regular expressions adds a lot of extra stuff that probably won't be needed. I think using Re.re is better since we already do that in dune_glob for instance. (This gets used by the (glob ...) dependency. This also saves us from having to explain what Perl style regular expressions are.

@Alizter I'm a little confused about the suggestion here.

My understanding is that Re.re is the core type of the Re library and multiple different kinds of regular expression languages like Perl regexp, Emacs regexp, etc are supported by the library. I looked at dune-glob and it looks like there's a custom lexer for parsing and creating Re.re objects from the input.

  1. It looks like the glob patterns lean towards making it easier to glob file paths, though it shouldn't stop us from using it for package names. Do you have any thoughts on this?

  2. If that makes sense, does it seem reasonable to use Glob.parse_string_exn to add support for these glob patterns?

@Alizter
Copy link
Copy Markdown
Collaborator

Alizter commented Oct 16, 2025

@punchagan Yes, directly using dune_glob makes sense. It would probably be easier if you changed ~loc in Glob.parse_string_exn to ?loc since we don't have a location at the command line. Then you pass ?loc directly to the User_message.raise.

You can also mention the section from the manual on glob in the help description. https://dune.readthedocs.io/en/stable/concepts/dependency-spec.html#glob. I'm not sure what the best way to link that is since links can change over time, so probably writing "search for glob in the dependency specification" is enough.

@punchagan punchagan force-pushed the dune-pkg-search branch 3 times, most recently from 098d05e to 0cbbdad Compare October 16, 2025 11:34
@punchagan
Copy link
Copy Markdown
Collaborator Author

@punchagan Yes, directly using dune_glob makes sense. It would probably be easier if you changed ~loc in Glob.parse_string_exn to ?loc since we don't have a location at the command line. Then you pass ?loc directly to the User_message.raise.

You can also mention the section from the manual on glob in the help description. https://dune.readthedocs.io/en/stable/concepts/dependency-spec.html#glob. I'm not sure what the best way to link that is since links can change over time, so probably writing "search for glob in the dependency specification" is enough.

Thanks! I've implemented search using dune-glob's functionality.

I didn't do the suggested refactor to Glob.parse_string_exn since loc is a positional argument and not a named argument currently, and the change seemed more "intrusive" than necessary for this PR. I ended up using Loc.none instead.

There's a quirk of using the glob search, though, that may trip up some users. * at the beginning of a glob pattern is not treated as optional. I've added some tests in glob.t to demonstrate this.

@punchagan
Copy link
Copy Markdown
Collaborator Author

punchagan commented Oct 16, 2025

There's a quirk of using the glob search, though, that may trip up some users. * at the beginning of a glob pattern is not treated as optional. I've added some tests in glob.t to demonstrate this.

I pushed a couple of separate commits that implement the search using Re.Str and Re.Glob to see if they give better UX than dune-glob, and don't seem like they are adding a lot of stuff.

Also, sharing some "paraphrased" notes from the discussion we had outside GitHub, for the sake of completeness, here:

To prevent painting ourselves into a corner, it seems like a good idea to just allow for exact substring matches and not worry about anything fancier. Also, nobody really would want to write regular expressions to search package names, would they? If we are able to just pipe the output through grep, it will give advanced users what they need, if they really need to use regular expressions.

@Alizter noted that dune prints to standard error by default and we might need to change it for this command.

EDIT: @Alizter also mentioned that any packages exactly matching a query could be displayed as the top result to make the results nicer.

An additional question:

Based on previous input from @Leonidas-from-XIV , I made the query argument mandatory, since Glob or regex search implementations would allow for listing all the package names if required using the appropriate query argument. If we switch over to just supporting simple substring matches, would it make sense to make the query CLI agument optional again and list all the packages when no query is provided?

@shonfeder
Copy link
Copy Markdown
Member

shonfeder commented Oct 16, 2025

If we switch over to just supporting simple substring matches, would it make sense to make the query CLI agument optional again and list all the packages when no query is provided?

This makes sense to me on the face of it. Comparing with cargo, cargo searc give output this like this:

$ cargo search
A-Mazed = "0.1.0"            # App to generate mazes in terminal
A2VConverter = "0.1.1"       # AudioVideoConverter is a Rust library that provides functionality to convert between audio and vid…
ABC-ECS = "0.2.1"            # A simple, fast, and flexible Entity-Component-System library for Rust.
ABC_Game_Engine = "0.1.2"    # A simple, fast, and flexible Game Engine written in Rust, with simplicity in mind.
ABC_lumenpyx = "0.1.0"       # The official ABC-Engine integration for the lumenpyx rendering engine.
ABtree = "0.8.0"             # AVL and Btree for rust
ADA_Standards = "0.3.0"      # A library to help you handle checks on your ADA projects, especially good to build scripts to chec…
ADuCM302x = "0.1.0"          # A Peripheral Access Crate for the Analog Devices ADuCM302x series of MCUs
AJ_find = "0.1.0"            # find the path for a file or directory
AJ_jmp = "0.1.0"             # jump to a directory
... and 200608 crates more (use --limit N to see more)

So it is "morally" returning everything, but practically limiting the results. But this is because they have have roughly 200k more packages that opam does. IMO, listing all 4k packages from dune pkg search seems sensible, and it makes it trivial for people to use whatever searching tool they want on the output via a pipe.

@punchagan
Copy link
Copy Markdown
Collaborator Author

So it is "morally" returning everything, but practically limiting the results. But this is because they have have roughly 200k more packages that opam does. IMO, listing all 4k packages from dune pkg search seems sensible, and it makes it trivial for people to use whatever searching tool they want on the output via a pipe.

Makes sense to me!

I pushed a commit that implements this basic sub-string search, making the query argument optional. If we are happy with how this works, I think only a couple of things remain here:

  1. Printing the output to stdout instead of stderr to reduce surprises when people try to pipe the output through grep, etc.
  2. Ensuring exact matches show up at the top of the search results for a nicer UX.

Copy link
Copy Markdown
Collaborator

@Leonidas-from-XIV Leonidas-from-XIV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting some possible improvements to the source as I was reading through it.

@punchagan
Copy link
Copy Markdown
Collaborator Author

Suggesting some possible improvements to the source as I was reading through it.

I think I've addressed all the suggestions for improvement. Thank you!

  • Printing the output to stdout instead of stderr to reduce surprises when people try to pipe the output through grep, etc.

Created #12586 to track this, since the issue is a bigger one and not limited to the scope of this PR.

  • Ensuring exact matches show up at the top of the search results for a nicer UX.

I've tweaked the sorting to incorporate this, and added a cram test that demonstrates this (in pkg/search/many-results.t).

@Alizter
Copy link
Copy Markdown
Collaborator

Alizter commented Oct 17, 2025

Let me know when you are finished tweaking stuff and I can merge.

Signed-off-by: Puneeth Chaganti <punchagan@muse-amuse.in>
@punchagan
Copy link
Copy Markdown
Collaborator Author

Let me know when you are finished tweaking stuff and I can merge.

I've tweaked the imports, and I think I'm done with addressing all other review comments too.

@Alizter Alizter merged commit bd4de34 into ocaml:main Oct 17, 2025
26 checks passed
@punchagan punchagan deleted the dune-pkg-search branch October 17, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants