Feat: Implement frontmatter based filtering#163
Conversation
src/lib.rs
Outdated
| match frontmatter.get(self.ignore_frontmatter_keyword) { | ||
| Some(serde_yaml::Value::Bool(true)) => Ok(()), |
There was a problem hiding this comment.
this match is the meat of the change - the bits below are just the other branch of that case being indented - no change in functionality there
There was a problem hiding this comment.
turning off whitespace changes helps with the review here
tests/export_test.rs
Outdated
| } | ||
|
|
||
| #[test] | ||
| fn test_ignore_frontmatter_default_keyword() { |
There was a problem hiding this comment.
testing the behaviour if the keyword is the default keyword private
tests/export_test.rs
Outdated
| } | ||
|
|
||
| #[test] | ||
| fn test_ignore_frontmatter_specific_keyword() { |
There was a problem hiding this comment.
Testing if files are excluded when the keyword is something more specific - using no-expört to check that even with non-ascii characters the exclusion works
There was a problem hiding this comment.
this one and the files below are just making sure all permutations of no frontmatter, frontmatter with keyword true/false and without keyword behave as expected.
There was a problem hiding this comment.
updated the docs to include the new feature
|
@zoni sorry I don't understand how those ci test failures relate to my changes. |
010a232 to
e0f63b4
Compare
|
@zoni good news I managed to fix the whole CI/CD |
| - uses: actions-rs/tarpaulin@v0.1 | ||
| with: | ||
| version: "latest" | ||
| version: "0.22.0" |
There was a problem hiding this comment.
fixes tarpaulin download - see commit message for more details
There was a problem hiding this comment.
upgrading dependencies
There was a problem hiding this comment.
upgrading dependencies
| } | ||
|
|
||
| fn make_link_to_file<'b, 'c>( | ||
| fn make_link_to_file<'c>( |
| fn make_link_to_file<'c>( | ||
| &self, | ||
| reference: ObsidianNoteReference<'b>, | ||
| reference: ObsidianNoteReference<'_>, |
| path_normalized.ends_with(&filename_normalized) | ||
| || path_normalized.ends_with(filename_normalized.clone() + ".md") | ||
| || path_normalized_lowered.ends_with(&filename_normalized.to_lowercase()) | ||
| || path_normalized_lowered.ends_with(filename_normalized.to_lowercase()) |
| /// Reduce a given `MarkdownEvents` to just those elements which are children of the given section | ||
| /// (heading name). | ||
| fn reduce_to_section<'a, 'b>(events: MarkdownEvents<'a>, section: &'b str) -> MarkdownEvents<'a> { | ||
| fn reduce_to_section<'a>(events: MarkdownEvents<'a>, section: &str) -> MarkdownEvents<'a> { |
|
@zoni any issues with this PR? |
|
Thanks for working on this @epsilonhalbe. There is also #67, which addresses the same problem, and which I neglected to see through to the end, but I think the two approaches should converge. Some of the feedback I gave on that PR applies here as well. That is, I'd like to see this implemented entirely through postprocessors without changing the core parsing logic. I'm also wondering if it might be better to base the "export this note yes or no?" decision based on the presence or absence of a specific tag. Then you can do stuff like:
Which would export your notes tagged |
|
PS. I fixed the build on |
|
what is the reason for doing this in postprocessing, my reason for not doing it that way was that one doesn't need to parse the whole file and gets to exit early, which should save resources. regarding the build fix, i thought you just might just cherry pick the fixes. i don't mind to rebase |
That's not actually what is happening though. The full note is still being parsed (line 412) even when it doesn't end up exported (line 414): https://github.com/epsilonhalbe/obsidian-export/blob/e0f63b48ab63aa040215a312e783f3fdcd50106d/src/lib.rs#L412-L414 Regardless, as for the reasons for preferring this to be done in postprocessing, I'll quote the design principles with some emphasis:
Your point about the performance benefit of an early exit (if it actually worked like that) is fair, but I'm not too worried about the need for that performance boost. I also prefer not to have too many special cases here. Having it implemented as a postprocessor means other postprocessors can actually influence the decision whether to export or not, which I think is a better/more consistent design overall. |
|
I'll see when I find time to put this into the post-processor - next week is a bit tight but the week after is looking better |
should close zoni#66
e0f63b4 to
f47e86b
Compare
|
hey I am on the way back from my vacation, this week is looking better, I think I have time to fix the PR |
TODO: tests for tags
|
managed to implement everything using closures, only thing missing is tests for the tags filtering |
zoni
left a comment
There was a problem hiding this comment.
Thanks for sticking with this so far. Missing tests for --skip-tags/--only-tags notwithstanding, we're close to where I feel comfortable accepting this!
src/main.rs
Outdated
| #[options( | ||
| no_short, | ||
| help = "Exclude files with this flag in the frontmatter from the export", | ||
| default = "private" | ||
| )] | ||
| ignore_frontmatter_flag: String, |
There was a problem hiding this comment.
I'm not feeling too sure about keeping this in.
--skip-tags private effectively achieves the same behavior, but in a more generic and flexible way. Are there any use-cases where setting private: true in frontmatter has clearly better end-user ergonomics over asking people to just set a tag with value private?
There's already multiple ways to adjust what obsidian-export selects or ignores (exclude files, --start-at, and now these new options) and while I'm obviously not against offering flexibility, I would like to avoid this growing into having to support tens of options that do almost the same thing.
Even more so because unlike --skip-tags and --only-tags that are optional/no-op when unspecified, this option is forced on users now.
Are you okay with settling just on --skip-tags and --only-tags, dropping this one?
There was a problem hiding this comment.
I included it since I was already halfway there, I don't have any problems with dropping it.
do you have any advice on programmatically changing the frontmatter from private: true to a tag?
There was a problem hiding this comment.
I included it since I was already halfway there, I don't have any problems with dropping it.
Great! Then let's remove that code from the implementation and settle on just the --skip-tags and --only-tags.
do you have any advice on programmatically changing the frontmatter from
private: trueto a tag?
My go-to for this tends to be one-off Python scripts, because it's so easy to quickly iterate on something that works, even if it's ugly or inefficient 🙈
As it happens, you're in luck because I have an example of such a script which I used to update/set some date-related frontmatter attributes, that should get you 90% of the way there already. Uploaded it as a gist at https://gist.github.com/zoni/d3f6a803f552cc946d3bc00447d3e051 (requires python-frontmatter)
You'd just have to change update_frontmatter so it looks something like (untested):
def update_frontmatter(path):
doc = frontmatter.load(path)
tags = doc.get("tags", [])
private = doc.get("private", False)
if private:
tags += ["private"]
doc["tags"] = sorted(set(tags))
path.write_text(frontmatter.dumps(doc), encoding="utf-8")Co-authored-by: Nick Groenen <nick@groenen.me>
Co-authored-by: Nick Groenen <nick@groenen.me>
|
hey @zoni any objections? |
zoni
left a comment
There was a problem hiding this comment.
Apologies, thank you for the nudge! I had missed that you'd already pushed updated tests. 😄
We're getting really close I think, just a few last points to address and fine-tune now.
src/lib.rs
Outdated
| destination: PathBuf, | ||
| start_at: PathBuf, | ||
| frontmatter_strategy: FrontmatterStrategy, | ||
| ignore_frontmatter_keyword: &'a str, |
There was a problem hiding this comment.
You forgot to remove this attribute here, which is no longer used now that we're only operating based on tags
src/postprocessors.rs
Outdated
| PostprocessorResult::Continue | ||
| } | ||
| } | ||
| _ => PostprocessorResult::Continue, |
There was a problem hiding this comment.
Thinking through this a little more, I don't think this case is always correct.
When I run with --only-tags publish (that is, skip_tags = [], only_tags = ["publish"]), I would expect notes without frontmatter (or frontmatter but no tags value) to be skipped (PostprocessorResult::StopAndSkipNote).
In other words, no frontmatter or a missing tags attribute should I think be treated as if tags was present, but empty.
There was a problem hiding this comment.
This postprocessor probably also benefits from some unit tests to verify these various invariants, instead of relying only on the integration tests.
Do you think you could add some? Would you need some extra pointers on how to go about this?
There was a problem hiding this comment.
i think i have some time tomorrow to finish this
There was a problem hiding this comment.
FINALLY I found time again - and (hopefully) fixed up the last bits.
Let me know if this is now the logic you're expecting.
There was a problem hiding this comment.
mh-trimble == epsilonhalbe, one is a work account the other my personal
|
hey @zoni happy with the changes? |
|
I've not forgotten about this, it's on my list to give this a final look-over when I get a bit of time, but I've just been a little preoccupied with other things the past two weeks. Please give me another nudge/ping if I still haven't reviewed this by Sunday. |
|
hey @zoni no worries, take your time. |
zoni
left a comment
There was a problem hiding this comment.
Thank you so much for your patience and persisting through the couple of iterations that we went through to get to this implementation 😄
I will likely make a couple of small changes on some internal implementation details, but this is definitely ready to be merged as-is now. I'll also publish a new release soon to get this out to users.
Thanks for implementing this!
An earlier iteration of #163 included this flag, but the final version only uses `--skip-tags` and `--only-tags`.
|
My content authoring pipeline has evolved to be opt-in for publishing - things are unpublished by default, and publishing (of which obsidian-export is a step in the pipeline) is an opt-in behavior using I currently use a python script that prunes any file without a It looks like using @zoni's script above to do a 1-off transformation from Thanks for building this! Great feature 👍 I realize it's shipped already, but hopefully you will appreciate hearing that it's useful to another person and the use case! |
I had a stab at implementing a filter for exports based on their frontmatter.
private: truethen the file is omitted from the export. this keyword can be adjusted with a flag--ignore-frontmatter-keyword. (suggestions for better naming are highly appreciated - maybe--no-export-frontmatter-keyword).I added tests for the default keyword & changing it. I decided to go the ignore rather than the explicit export keyword bc. I thought the default behaviour of an exporter should be to export and not exporting is the special case.
Should close #66 and duplicate #161
UPD
this now also fixes
cargo fix)