Skip to content

Pkg dependenciestree to handle topological sort#2138

Closed
unmultimedio wants to merge 2 commits intomainfrom
jfigueroa/sort-workspace-modules
Closed

Pkg dependenciestree to handle topological sort#2138
unmultimedio wants to merge 2 commits intomainfrom
jfigueroa/sort-workspace-modules

Conversation

@unmultimedio
Copy link
Member

For dealing with workspaces (push, sync), we need first to sort modules in topological order. This pkg will be useful in the BSR as well for syncing third party modules, where a similar logic lives, and will be replaced by this pkg.

@unmultimedio unmultimedio self-assigned this May 30, 2023
Copy link
Member

@bufdev bufdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this package could be replaced with a DAG and topo sort, see https://github.com/stevenle/topsort as an example

"github.com/bufbuild/buf/private/pkg/stringutil"
)

// DependenciesTree is a tree that holds nodes depending on other nodes. One level deep.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "one level deep" mean? Does this mean this structure only deals with direct dependencies?

Copy link
Member Author

@unmultimedio unmultimedio May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, which for our cases is good. Workspaces have modules (first level), and each module have dependencies (all direct and transitive, second level).

For the third-party modules sync job in the BSR, modules have a single list of dependencies, which are also root nodes that have their own deps.

This doesn't mean we cannot have a multi-level link like a->b->...->z, it just means that all of those need to also be root nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking now that the fact that the tree is one-level deep it's an implementation detail that doesn't need to be disclosed in the docs. To the user, it's a regular graph with nodes that depend on other nodes.

// - c: [d]
// - d: [b] // circular dependency: b.c.d.b

package dependenciestree
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat awkward package name - plurals, named after main type in package. Consider some other name, would need to brainstorm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package is also in pkg, so should be universal and not buf-specific. Is this package intended to be useful across constructs other than modules?

Copy link
Member Author

@unmultimedio unmultimedio May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I thought about it and doesn't feel something that's Buf-specific. Tbh locally I tried other two, I started with toposort, but discarded it because it doesn't only sort, it also validates, and I'd like to leave it open for more functionalities when handling a tree, so I then went to depstree, and then ended up with dependenciestree. Open for suggestions.

for _, opt := range opts {
opt(&config)
}
return &depsTree{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Go Code Standards, this would be in a separate file named deps_tree.go. However, it likely would be named after the interface as well, in this case dependenciesTree, although this is an awkward name. Perhaps instead, standardize on DepTree (and depTree in dep_tree.go).

}

// NewDependenciesTreeOption are options that can be passed when instantiating a new dependencies tree.
type NewDependenciesTreeOption func(*newDependenciesTreeOptions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other options structs in this codebase are named after the type they initialize, in this case dependenciesTreeOption, however it is unlikely this is needed at all (see below).

return t.sort(pendingRootNodes, nil)
}

type newDependenciesTreeOptions struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's likely no need for this at all - you can make the options func(*depTree)

// (external or circular) is possible, tree should be checked against the Validate func.
NewRootNode(rootNode string, dependencies map[string]struct{}) error
// Validate checks that the tree nodes have valid dependencies, with no circular dependencies.
Validate() error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interfaces in this codebase are generally validated upon construction, perhaps you want a *Builder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! A builder strategy will help me here.

Validate() error
// Sort validates that the tree is valid, and then returns the tree in topological order, leafs
// first. Useful for installation or update order.
Sort() ([]string, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need an explicit Validate if Sort validates as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, having a builder solves this.

}

// New initializes a new dependencies tree, ready to have root nodes added to it.
func New(opts ...NewDependenciesTreeOption) DependenciesTree {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically would be named options...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find almost an equal amount of results of opts ... and options .... in the codebase. I'll change it though.

// NewDependenciesTreeOption are options that can be passed when instantiating a new dependencies tree.
type NewDependenciesTreeOption func(*newDependenciesTreeOptions)

// NewDependenciesTreeWithAllowExternalDeps allows a root node in the tree to depend on external
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what an "external dependency" is, can you clarify? What dependencies are internal dependencies, and what are external?

Copy link
Member Author

@unmultimedio unmultimedio May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs can be improved, you're right. I "internal" means that a tree node depends on another tree node, while an external depends on a value that's not a tree node:

{
  aaa: {
    bbb, // internal
    ccc, // internal
  },
  bbb: {
    ccc, // internal
    zzz, // external
  },
  ccc: {},
}

allowExternalDependencies bool
}

func (t *depsTree) validateDependencies(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This package switches between deps and dependencies a lot, see above comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll default to deps.

@unmultimedio
Copy link
Member Author

I think this package could be replaced with a DAG and topo sort, see https://github.com/stevenle/topsort as an example

Thanks for the review, I thought it's a simple enough functionality that could live in the repo instead of vendoring it from another library?

As per the questions: is this a tree? how many root nodes? wdym one-level deep?, probably those can be solved by renaming structs/interfaces, and better documenting? I'd hope the testing scenarios would make it much clearer.

@bufdev
Copy link
Member

bufdev commented May 31, 2023

I get the instinct of not depending on another lib, but for the record, I just looked through https://github.com/stevenle/topsort a bit more carefully including who is using it and who is committing to it, and it looks legitimate. I'm putting up a PR that effectively vendors it with proper attribution - take it or leave it, but just as a competing idea here of what we could do.

@bufdev bufdev mentioned this pull request May 31, 2023
@bufdev
Copy link
Member

bufdev commented Jun 5, 2023

Merged #2141, which I believe replaces this, let me know otherwise.

@bufdev bufdev closed this Jun 5, 2023
@bufdev bufdev deleted the jfigueroa/sort-workspace-modules branch November 27, 2023 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants