Conversation
✅ Deploy Preview for nextflow-docs-staging ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Regarding the schema generation, there are two approaches we could take:
I was initially leaning towards (2), because it would simplify the schema generation and we could validate the available schema properties. But this would also make the params definition really long and verbose in the script, whereas the pipeline code only cares about the name / type / default value. So now I'm starting to lean towards (1). In that case we could have a really concise definition: params {
input: Path
save_intermeds: boolean = false
}
workflow {
println "input = ${params.input}"
println "save_intermeds = ${params.save_intermeds}"
}Or even directly in the entry workflow: workflow {
params:
input: Path
save_intermeds: boolean = false
main:
println "input = ${input}"
println "save_intermeds = ${save_intermeds}"
}This concise syntax will work only if we're certain we don't need anything else in the script. I thought maybe the help text would be useful for CLI help, but that could be provided through a Javadoc comment In this case, the schema generation would look something like this:
|
I'm not entirely sure that this is the case, the schema is used for validation of more than just type. I know that some of these things can be handled with Records (eg.
I'd be curious to see how this might look - ideally for both |
Any kind of validation should be possible through custom types and constructor functions (functions that create the custom type and just implements the validation logic). But not all of those cases can be automatically translated to the schema. For example, I can automatically generate a I could go either way at this point. I like the concise syntax of declaring params in the entry workflow, but CLI libraries like argparse are also pretty standard, so maybe the concise syntax is just too restrictive
Copying your example from our slack convo: workflow {
params:
/**
* Path to comma-separated file containing information about the samples in the experiment.
*
* You will need to create a design file with information about the samples in your experiment
* before running the pipeline. Use this parameter to specify its location.
* It has to be a comma-separated file with 4 columns, and a header row.
* See [usage docs](https://nf-co.re/rnaseq/usage#samplesheet-input).
*/
input: Path
/**
* If generated by the pipeline save the STAR index in the results directory.
*
* If an alignment index is generated by the pipeline use this parameter
* to save it to your results folder.
* These can then be used for future pipeline runs, reducing processing times.
*/
save_reference: boolean = false
main:
// ..
} |
|
Using the Javadoc comment is "better" in the sense that you only need to parse the script to produce the CLI help, you don't have to execute it, which i think would be excessive |
|
See also Pydantic: https://docs.pydantic.dev/latest/concepts/fields/#validate-default-values Simple param: myparam: String = "default-value"Full param: myparam: String = Field(default: "default-value", pattern: "/some.*regex/") |
|
Declaring params in the entry workflow means that you don't need the workflow {
params:
input: Path
save_intermeds: boolean = false
main:
println "input = ${input}"
println "save_intermeds = ${save_intermeds}"
}On the one hand I like that it makes params more like workflow takes. On the other hand, you still need the That would suggest that the params {
input: Path {
description '...'
pattern '*.csv'
}
save_intermeds: boolean = false
}
workflow {
println "input = ${params.input}"
println "save_intermeds = ${params.save_intermeds}"
}Though I always hesitate to add shortcuts if it makes the code less consistent |
|
+1 for the separate Not sure about the squiggly bracket syntax. I like the thinking, but it means that we now have three different types of syntax for them. Nothing for config, That said, the
|
|
Yeah, confusing the The block syntax appeals to me because it is consistent with workflow outputs: // fetchngs...
outputs {
samples: Channel<Sample> {
path '...'
// ...
}
}
// rnaseq...
params {
input: List<Sample> {
// ...
}
}Since we want to be able to match outputs to inputs for pipeline chaining, it makes sense to me that the syntax for inputs and outputs mirror each other. The config is another issue. Let me think through that and write a separate comment... |
|
We could add the same params block syntax to config files if consistency is an issue. But I fear this might feel too "weird" in a configuration context. The nice thing about config params is that they basically have to be simple values (numbers, strings, booleans, etc). So being able to declare the type isn't so important because it can be inferred from the default value. Meanwhile, if we take the hybrid approach of generating a skeleton schema that the user can annotate manually as needed, we don't need to add new syntax to the config file to support things like validation and help text, because those can just be defined in the JSON schema |
|
I really like this syntax proposal a lot: params {
input: Path {
description '...'
pattern '*.csv'
}
save_intermeds: boolean = false
}This is likely in part because it mirrors Python/Pydantic but I think that's a good thing for us to emulate. Imitating the syntax of the most popular typed python extension will make it easier for folks to learn and feel like they can read and understand. |
|
If we're leaning more towards that syntax, we could arguably have all JSON schema parameters covered. I think we need to make sure we're crystal clear on what we want to support. For example, maybe no params {
input: Path { pattern '*.csv' }
save_intermeds: boolean = false
} |
|
@bentsherman - any ideas how the schema builder might be able to reach into Nextflow code to update these definitions based on changes made in the GUI / JSON? |
|
I don't think that will be possible. I think the flow will have to be:
Or if you use the schema builder from scratch, you have to update the params definition by hand (or not use it at all) |
|
I am just thinking of the following user story: I set the type of a parameter to boolean in the parameter defintion. While writing the param definition in the schema builder I see that the tool actually accepts |
|
The kind of discovery work you describe (i.e. figuring out the appropriate type of a param) is exactly the kind of thing that needs to happen in the pipeline code, so that the Nextflow compiler can verify param names and types as they are used in the entry workflow. You can't get that kind of validation in the schema builder. Instead, the schema builder should be used to annotate a fixed set of params with things like help text, icons, form validation rules, etc. It should be primarily concerned with how params are accessed by external systems. |
|
I agree that the logic / definition should be in the pipeline code: at least, that should be the source of truth. I was mostly wondering if we could have some way to update the Nextflow code from the JSON schema builder, to have the best of both worlds. The schema builder has some nice beginner-friendly functionality in it, for example a GUI with a built-in regex tester for writing patterns, and a bunch of built-in help text. Maybe this is something that we could do with @RefTrace ? eg. From the Python CLI that launches the schema builder GUI, then go back and access the Nextflow code to edit it in place. Not sure if that's possible. Or if we launch the schema GUI editor from Nextflow itself (with a local server etc) could there be a callback which is able to edit the Nextflow code? 🤔 We would know the param name and attributes.. |
|
The furthest I would go is to generate a code block in the schema builder that the user can copy into their Nextflow pipeline if they want.. Automatically updating code from an external source is an anti-pattern in my view |
|
Suggestion from call: Would be nice to support single-line comments ( |
christopher-hakkaart
left a comment
There was a problem hiding this comment.
I've added some suggestions to the docs. Looking good.
christopher-hakkaart
left a comment
There was a problem hiding this comment.
Docs looking good 👍
pditommaso
left a comment
There was a problem hiding this comment.
I like the general idea, but there are some points that needs to be improved.
I'm a bit concerned about this:
Config files can override script params or define "config params" which are only used by the config
Maybe I'm misunderstanding, but I don't config params should be managed differently from script params
pditommaso
left a comment
There was a problem hiding this comment.
I believe, this needs some work (as per previous comment)
b4b321e to
069653d
Compare
db9d727 to
082d590
Compare
|
Went ahead and added type annotations for params in this PR, since it seemed weird to add the One thing we need to figure out is what to do with the type casting of CLI params. We don't know at this point whether the user is using the new params definition, so I think we would need to either disable it by default or somehow "undo" it during the params validation. EDIT: Since the |
082d590 to
d4c5bf3
Compare
|
Docs are awesome. |
e48aa31 to
b3e79ba
Compare
|
@pditommaso |
Can I specify That would be like |
|
@muffato yes, you can default to |
e722241 to
95f9fa5
Compare
|
@pditommaso now tests are passing, ready for you |
95f9fa5 to
8bb2b36
Compare
8bb2b36 to
414f513
Compare
8d676e4 to
79e1c57
Compare
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
79e1c57 to
d6496c9
Compare
Related to #4669
This PR is the first step towards defining workflow params entirely in the pipeline script. It allows you to define a
paramsblock like so:params { input: Path save_intermeds: Boolean = false } workflow { // ... }Instead of assigning individual params. The advantage is that Nextflow can validate params natively once the
paramsblock is defined, because it guarantees that all params are declared in one place.There is still some work required to make the validation work, but the high-level flow is:
TODO: