Skip to content

Feat: Implement frontmatter based filtering#163

Merged
zoni merged 7 commits intozoni:mainfrom
epsilonhalbe:feature/frontmatter-export-filter
Dec 2, 2023
Merged

Feat: Implement frontmatter based filtering#163
zoni merged 7 commits intozoni:mainfrom
epsilonhalbe:feature/frontmatter-export-filter

Conversation

@epsilonhalbe
Copy link
Contributor

@epsilonhalbe epsilonhalbe commented Sep 16, 2023

I had a stab at implementing a filter for exports based on their frontmatter.

  • if the frontmatter contains private: true then the file is omitted from the export. this keyword can be adjusted with a flag --ignore-frontmatter-keyword. (suggestions for better naming are highly appreciated - maybe --no-export-frontmatter-keyword).

I added tests for the default keyword & changing it. I decided to go the ignore rather than the explicit export keyword bc. I thought the default behaviour of an exporter should be to export and not exporting is the special case.

Should close #66 and duplicate #161

UPD

this now also fixes

  • clippy/cargo warnings in lib.rs (these were done with cargo fix)
  • the broken code coverage

src/lib.rs Outdated
Comment on lines +413 to +414
match frontmatter.get(self.ignore_frontmatter_keyword) {
Some(serde_yaml::Value::Bool(true)) => Ok(()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this match is the meat of the change - the bits below are just the other branch of that case being indented - no change in functionality there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turning off whitespace changes helps with the review here

}

#[test]
fn test_ignore_frontmatter_default_keyword() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing the behaviour if the keyword is the default keyword private

}

#[test]
fn test_ignore_frontmatter_specific_keyword() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing if files are excluded when the keyword is something more specific - using no-expört to check that even with non-ascii characters the exclusion works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one and the files below are just making sure all permutations of no frontmatter, frontmatter with keyword true/false and without keyword behave as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the docs to include the new feature

@epsilonhalbe
Copy link
Contributor Author

epsilonhalbe commented Sep 16, 2023

@zoni sorry I don't understand how those ci test failures relate to my changes.

@epsilonhalbe epsilonhalbe force-pushed the feature/frontmatter-export-filter branch 2 times, most recently from 010a232 to e0f63b4 Compare September 17, 2023 14:57
@epsilonhalbe
Copy link
Contributor Author

@zoni good news I managed to fix the whole CI/CD

- uses: actions-rs/tarpaulin@v0.1
with:
version: "latest"
version: "0.22.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes tarpaulin download - see commit message for more details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrading dependencies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upgrading dependencies

}

fn make_link_to_file<'b, 'c>(
fn make_link_to_file<'c>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fix

fn make_link_to_file<'c>(
&self,
reference: ObsidianNoteReference<'b>,
reference: ObsidianNoteReference<'_>,
Copy link
Contributor Author

@epsilonhalbe epsilonhalbe Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fix

path_normalized.ends_with(&filename_normalized)
|| path_normalized.ends_with(filename_normalized.clone() + ".md")
|| path_normalized_lowered.ends_with(&filename_normalized.to_lowercase())
|| path_normalized_lowered.ends_with(filename_normalized.to_lowercase())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fix

/// Reduce a given `MarkdownEvents` to just those elements which are children of the given section
/// (heading name).
fn reduce_to_section<'a, 'b>(events: MarkdownEvents<'a>, section: &'b str) -> MarkdownEvents<'a> {
fn reduce_to_section<'a>(events: MarkdownEvents<'a>, section: &str) -> MarkdownEvents<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo fix

@epsilonhalbe
Copy link
Contributor Author

@zoni any issues with this PR?

@zoni
Copy link
Owner

zoni commented Sep 22, 2023

Thanks for working on this @epsilonhalbe. There is also #67, which addresses the same problem, and which I neglected to see through to the end, but I think the two approaches should converge.

Some of the feedback I gave on that PR applies here as well. That is, I'd like to see this implemented entirely through postprocessors without changing the core parsing logic.

I'm also wondering if it might be better to base the "export this note yes or no?" decision based on the presence or absence of a specific tag. Then you can do stuff like:

--only-tags "psychology, business" --skip-tags "private"

Which would export your notes tagged psychology OR business, as long as they aren't tagged private. It becomes a lot more flexible that way as opposed to a fixed boolean attribute.

@zoni
Copy link
Owner

zoni commented Sep 22, 2023

PS. I fixed the build on main, and in general, prefer such unrelated changes/fixes to happen in a separate PR. You will need to do a little bit of rebasing to address the conflicts that has introduced.

@epsilonhalbe
Copy link
Contributor Author

what is the reason for doing this in postprocessing, my reason for not doing it that way was that one doesn't need to parse the whole file and gets to exit early, which should save resources.

regarding the build fix, i thought you just might just cherry pick the fixes.

i don't mind to rebase

@zoni
Copy link
Owner

zoni commented Sep 30, 2023

what is the reason for doing this in postprocessing, my reason for not doing it that way was that one doesn't need to parse the whole file and gets to exit early, which should save resources.

That's not actually what is happening though. The full note is still being parsed (line 412) even when it doesn't end up exported (line 414): https://github.com/epsilonhalbe/obsidian-export/blob/e0f63b48ab63aa040215a312e783f3fdcd50106d/src/lib.rs#L412-L414

Regardless, as for the reasons for preferring this to be done in postprocessing, I'll quote the design principles with some emphasis:

My intention is to keep the core of obsidian-export as limited and small as possible, avoiding changes to the core Exporter struct or any of its methods whenever possible. This improves long-term maintainability and makes investigation of bugs simpler.

To keep the core of obsidian-export small while still supporting a wide range of use-cases, additional functionality should be pushed down into postprocessors as much as possible.

Your point about the performance benefit of an early exit (if it actually worked like that) is fair, but I'm not too worried about the need for that performance boost. I also prefer not to have too many special cases here. Having it implemented as a postprocessor means other postprocessors can actually influence the decision whether to export or not, which I think is a better/more consistent design overall.

@epsilonhalbe
Copy link
Contributor Author

I'll see when I find time to put this into the post-processor - next week is a bit tight but the week after is looking better

@epsilonhalbe epsilonhalbe force-pushed the feature/frontmatter-export-filter branch from e0f63b4 to f47e86b Compare September 30, 2023 20:59
@epsilonhalbe
Copy link
Contributor Author

hey I am on the way back from my vacation, this week is looking better, I think I have time to fix the PR

@epsilonhalbe
Copy link
Contributor Author

managed to implement everything using closures, only thing missing is tests for the tags filtering

Copy link
Owner

@zoni zoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this so far. Missing tests for --skip-tags/--only-tags notwithstanding, we're close to where I feel comfortable accepting this!

src/main.rs Outdated
Comment on lines +48 to +53
#[options(
no_short,
help = "Exclude files with this flag in the frontmatter from the export",
default = "private"
)]
ignore_frontmatter_flag: String,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not feeling too sure about keeping this in.

--skip-tags private effectively achieves the same behavior, but in a more generic and flexible way. Are there any use-cases where setting private: true in frontmatter has clearly better end-user ergonomics over asking people to just set a tag with value private?

There's already multiple ways to adjust what obsidian-export selects or ignores (exclude files, --start-at, and now these new options) and while I'm obviously not against offering flexibility, I would like to avoid this growing into having to support tens of options that do almost the same thing.

Even more so because unlike --skip-tags and --only-tags that are optional/no-op when unspecified, this option is forced on users now.

Are you okay with settling just on --skip-tags and --only-tags, dropping this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included it since I was already halfway there, I don't have any problems with dropping it.
do you have any advice on programmatically changing the frontmatter from private: true to a tag?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included it since I was already halfway there, I don't have any problems with dropping it.

Great! Then let's remove that code from the implementation and settle on just the --skip-tags and --only-tags.

do you have any advice on programmatically changing the frontmatter from private: true to a tag?

My go-to for this tends to be one-off Python scripts, because it's so easy to quickly iterate on something that works, even if it's ugly or inefficient 🙈

As it happens, you're in luck because I have an example of such a script which I used to update/set some date-related frontmatter attributes, that should get you 90% of the way there already. Uploaded it as a gist at https://gist.github.com/zoni/d3f6a803f552cc946d3bc00447d3e051 (requires python-frontmatter)

You'd just have to change update_frontmatter so it looks something like (untested):

def update_frontmatter(path):
    doc = frontmatter.load(path)
    tags = doc.get("tags", [])
    private = doc.get("private", False)

    if private:
        tags += ["private"]
    doc["tags"] = sorted(set(tags))

    path.write_text(frontmatter.dumps(doc), encoding="utf-8")

epsilonhalbe and others added 3 commits October 14, 2023 14:12
Co-authored-by: Nick Groenen <nick@groenen.me>
Co-authored-by: Nick Groenen <nick@groenen.me>
@epsilonhalbe
Copy link
Contributor Author

hey @zoni any objections?

Copy link
Owner

@zoni zoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, thank you for the nudge! I had missed that you'd already pushed updated tests. 😄

We're getting really close I think, just a few last points to address and fine-tune now.

src/lib.rs Outdated
destination: PathBuf,
start_at: PathBuf,
frontmatter_strategy: FrontmatterStrategy,
ignore_frontmatter_keyword: &'a str,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot to remove this attribute here, which is no longer used now that we're only operating based on tags

PostprocessorResult::Continue
}
}
_ => PostprocessorResult::Continue,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through this a little more, I don't think this case is always correct.

When I run with --only-tags publish (that is, skip_tags = [], only_tags = ["publish"]), I would expect notes without frontmatter (or frontmatter but no tags value) to be skipped (PostprocessorResult::StopAndSkipNote).

In other words, no frontmatter or a missing tags attribute should I think be treated as if tags was present, but empty.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This postprocessor probably also benefits from some unit tests to verify these various invariants, instead of relying only on the integration tests.

Do you think you could add some? Would you need some extra pointers on how to go about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i have some time tomorrow to finish this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FINALLY I found time again - and (hopefully) fixed up the last bits.
Let me know if this is now the logic you're expecting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mh-trimble == epsilonhalbe, one is a work account the other my personal

@epsilonhalbe
Copy link
Contributor Author

hey @zoni happy with the changes?

@zoni
Copy link
Owner

zoni commented Nov 13, 2023

I've not forgotten about this, it's on my list to give this a final look-over when I get a bit of time, but I've just been a little preoccupied with other things the past two weeks. Please give me another nudge/ping if I still haven't reviewed this by Sunday.

@epsilonhalbe
Copy link
Contributor Author

hey @zoni no worries, take your time.

Copy link
Owner

@zoni zoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your patience and persisting through the couple of iterations that we went through to get to this implementation 😄

I will likely make a couple of small changes on some internal implementation details, but this is definitely ready to be merged as-is now. I'll also publish a new release soon to get this out to users.

Thanks for implementing this!

@zoni zoni merged commit 018c960 into zoni:main Dec 2, 2023
@zoni zoni mentioned this pull request Dec 2, 2023
zoni added a commit that referenced this pull request Dec 3, 2023
An earlier iteration of #163 included this flag, but the final version
only uses `--skip-tags` and `--only-tags`.
@Masstronaut
Copy link

My content authoring pipeline has evolved to be opt-in for publishing - things are unpublished by default, and publishing (of which obsidian-export is a step in the pipeline) is an opt-in behavior using publish: true.

I currently use a python script that prunes any file without a publish: true from what gets passed to obsidian-export. I'd be happy to drop that if the pruning can be handled by obsidian-export instead. One less step in the pipeline, yay!

It looks like using @zoni's script above to do a 1-off transformation from publish: true to tags = [...tags, "publish"] and then exporting using --only-tags "publish" would maintain my existing opt-in behavior. Do I have that right?

Thanks for building this! Great feature 👍 I realize it's shipped already, but hopefully you will appreciate hearing that it's useful to another person and the use case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support page inclusion using YAML frontmatter

4 participants