Implemented classifier transform by ErikBjare · Pull Request #67 · ActivityWatch/aw-server-rust

ErikBjare · 2019-09-20T10:33:09Z

No description provided.

codecov · 2019-09-20T12:33:58Z

Codecov Report

❗ No coverage uploaded for pull request base (master@589115f). Click here to learn what that means.
The diff coverage is 72.46%.

@@            Coverage Diff            @@
##             master      #67   +/-   ##
=========================================
  Coverage          ?   80.74%           
=========================================
  Files             ?       30           
  Lines             ?     3132           
  Branches          ?        0           
=========================================
  Hits              ?     2529           
  Misses            ?      603           
  Partials          ?        0

Impacted Files	Coverage Δ
src/transform/mod.rs	`85.71% <ø> (ø)`
src/models/event.rs	`53.84% <0%> (ø)`
src/transform/classify.rs	`68.18% <68.18%> (ø)`
src/query/functions.rs	`92.13% <91.66%> (ø)`
tests/query.rs	`88.46% <92.85%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 589115f...f65d6ec. Read the comment docs.

Cargo.toml

src/models/event.rs

src/transform/classify.rs

johan-bjareholt · 2019-09-20T14:34:02Z

src/transform/classify.rs

+}
+
+#[test]
+fn test_classify() {


Currently we have all tests in the tests folder.

The Rust way to do things is to have unittests inside the source and integration tests inside the tests directory. Not sure which way is the best way for us.

I quite like having small unittests close to the function definition. Although it doesn't seem to show up in the coverage properly?

Yeah, coverage is kind of broken currently anyway though, so whatever I guess

…vior)

src/models/event.rs

src/query/functions.rs

src/transform/classify.rs

src/models/event.rs

johan-bjareholt · 2019-09-22T10:58:13Z

src/transform/classify.rs

+
+fn choose_category(tags: HashSet<String>) -> String {
+    tags.iter().fold(&"Uncategorized".to_string(), |acc: &String, item: &String| {
+        if item.matches("->").count() >= acc.matches("->").count() {


This is bad practice, please don't use -> anywhere in the server code and let the web-ui pretty-print it with -> later.

Having tags/categories be arrays doesn't make sense imo. (having classes be a Vec<(Vec<String>, Regex)> would be annoying)

The "->" is not primarily intended to be pretty, it's intended to:

Convey the hierarchy order

Act as an intuitive separator

Differentiate a category from a tag (a category is a special case of a tag)

I'm really unconvinced about changing this into an array, I've been writing all the classifiers like this and it's really not that bad. I'm actually very happy with using "->" in this way, even though I get why you might not like it.

Also, dot uses the "->" separator (which is where I got the idea from, I think).

A nice/fun consequence of this is that if you basically dumped all your categories into a .dot file then you'd get a nice visualization of the category structure.

having classes be a Vec<(Vec, Regex)> would be annoying

I disagree, it clearly defines what the data is without hiding anything

The "->" is not primarily intended to be pretty, it's intended to:

Convey the hierarchy order

Act as an intuitive separator

Differentiate a category from a tag (a category is a special case of a tag)

To me all of those reasons say "pretty" but in a different way.

Arrays have a order

It's a separator, that's for sure. I don't know what intuitive means in this case as it's just a separator, but I assume that you mean more readable which I agree but that also comes at a cost of being a exotic construction which otherwise doesn't fit how the other transforms work which might actually make it less intuitive.

A comment could just as well serve as a differentiation

Also, dot uses the "->" separator (which is where I got the idea from, I think).

A nice/fun consequence of this is that if you basically dumped all your categories into a .dot file then you'd get a nice visualization of the category structure.

Doing ["a", "b"].join("->") would do the same thing, it's not something complex we're talking about here.

Other reasons to avoid it:

Dot is a whole markup language, this is an entry in an already existing language which means that it's an format within a format which is messy

Using -> means extra parsing and simply matching them with the currently weak system by simply splitting them would mean possible errors. There's a huge amount of edge cases in both the aw-server-python and aw-server-rust code which needs extra testing because of this.

If someone automates something which creates tags and it contains "->" it will get messy unless you escape the -> somehow.

Arrays already have a standard hierarchy order which is left to right (same as your "->" solution)

Just because you are used to something and that it currently works doesn't mean that it's a good idea.

I disagree, it clearly defines what the data is without hiding anything

It doesn't. It says a tiny bit about how tags are for whatever reason Vec's, which helps the Rust compiler a little bit but doesn't define anything about the underlying behavior of tags/categories. This is unlike using # for tags and -> for categories, which makes it apparent to the reader that there is a distinction.

A comment could just as well serve as a differentiation

No. Tags begin with #, categories have a hierarchy order. These need to be checked at runtime, and I'm not going to both check if an array is 1-length and starts with a # to determine if it's a tag or not.

Using -> means extra parsing and simply matching them with the currently weak system by simply splitting them would mean possible errors. There's a huge amount of edge cases in both the aw-server-python and aw-server-rust code which needs extra testing because of this.

Not really, the parsing is literally just a check for # or -> and then a split. I'm not sure what edge cases you are referring to and I can't recall any open issues about it.

If someone automates something which creates tags and it contains "->" it will get messy unless you escape the -> somehow.

This seems extremely unlikely, and would be easy for them to just avoid doing.

Regardless, I've gone over solutions like these before in the aw-research code, and what I've come up with was the easiest to work with in the end. So unless you want to get into a long debate over all the other options I've explored that solve the same problem in a better way than the Vec you're proposing (such as letting tags have metadata, or letting the class spec be a dict) I'm going to go ahead and merge this.

I obviously can't disagree with that, but it doesn't help me to merge this PR and isn't an argument for or against the options I mentioned.

I've experimented long enough to have settled on what I believe will be a good API. Unless you think I'm being totally unreasonable here I'd suggest you take my word for the fact that I've explored plenty of options and that I didn't find any of them particularly appealing.

Another option would be to have a category and/or tag type in the query language.

What does this even mean? How does that affect the underlying transform?

Sorry for the short answer before, did not have time to write a proper answer (and probably don't have now either).

I obviously can't disagree with that, but it doesn't help me to merge this PR

Reviews aren't supposed to help getting something merged, it's supposed to increase the quality of the code getting merged.
It's also already been as a PR for 9 days and nothing else has been merged since so there shouldn't be a hurry.
You should also expect that a bigger PR should get a longer review time, I have only taken a quick glance at the code as of now.

A third option is to split up the behavior for tags and categories entirely, in which case I'd be fine with using your Vec<(Vec, Regex)> for the category part and keep using Vec<(String, Regex)> for plain tags. But I don't think that makes sense since it's more or less identical code apart from selecting the highest-ranking category.

To me that sounds reasonable. If they have no reason to be coupled and they do different things they shouldn't be coupled.

What does this even mean? How does that affect the underlying transform?

It was just kind of a stupid idea to maybe have a construct in the language to make things more clear. Either just defining a function called category("work", "project") which returns a actual category type, or possibly even have a custom syntax such as category["work" -> "project"]. I don't think it's a good idea, just brainstorming to see if it's possible to get the best of both worlds.

Reviews aren't supposed to help getting something merged, it's supposed to increase the quality of the code getting merged.

It's the same thing in the end: We only merge code that's in good shape, and reviews help get code in shape, therefore reviews help get things merged. All I was saying was that your short comment didn't help me fix the issues you mentioned.

It's also already been as a PR for 9 days and nothing else has been merged since so there shouldn't be a hurry.

It's been stale for 9 days in part because I myself have been bikeshedding over the very thing you requested changes on since you were seemingly adamant about the status quo not being satisfactory.

You should also expect that a bigger PR should get a longer review time, I have only taken a quick glance at the code as of now.

But it's not big? It's +177 -11. (You might be thinking about the aw-webui PR)

To me that sounds reasonable. If they have no reason to be coupled and they do different things they shouldn't be coupled.

Really? This would lead to a lot of almost duplicated code/APIs in several places (two query2 functions, two transforms, two settings in the web UI). They don't really do different things, they do 90% exactly the same thing, apart from the category selection at the end. All that work just to get rid of a few string splits with -> seems ridiculous tbh.

All I was saying was that your short comment didn't help me fix the issues you mentioned.

Fair enough, misunderstood you

But it's not big? It's +177 -11. (You might be thinking about the aw-webui PR)

The API part is a big thing to me. This will be with us for a long time.

Really? This would lead to a lot of almost duplicated code/APIs in several places (two query2 functions, two transforms, two settings in the web UI). They don't really do different things, they do 90% exactly the same thing, apart from the category selection at the end. All that work just to get rid of a few string splits with -> seems ridiculous tbh.

Two query2 functions is fine, that boilerplate code is minimal and stable.
Two transforms is acceptable. It's worth it for having to avoid a custom string format IMO.
Two settings in the web UI is probably a good thing if they don't belong with each other, no point in mixing them really.

Also, currently there does not seem to be any examples of using tagging in the visualization yet?

Another option would be to try to be more generic and merge the functionality of categories and tags and just have them as one implementation and instead add the ability to save different sets of classes and choose which one you want to use.

Maybe make it optional to have a single event have multiple tags/categories or not and always allow the depth to be more than 1. If you don't want the depth to be more than 1, simply don't specify any classes which has more than 1 entry. That's essentially the same way we do with filter_events_by_keys, it allows supplying multiple keys but most often we only supply one.

Kind of hard to explain, but I hope I get my point across.

src/transform/classify.rs

… for Event, fixed non-category tags not accidentally being used as categories

src/transform/classify.rs

johan-bjareholt · 2019-10-04T21:02:18Z

src/transform/classify.rs

+}
+
+// TODO: Classes should be &Vec<(String, Rule)>
+pub fn autotag(mut events: Vec<Event>, rules: &Vec<(String, Regex)>) -> Vec<Event> {


autotag makes it sound like it comes up with the tags itself (which would be really cool, but unfortunately that's not what it does).
Just "match_tags" would better fit as a name.

I renamed it to just tag because otherwise, I feel like we should also rename the categorize function to match_category and that just seems unnecessarily wordy.

src/transform/classify.rs

…m trait for conversion into/from query2 datatypes

…ports, removed arg_type_x functions

johan-bjareholt

API looks good and from a quick glance code looks good as well!

src/query/datatype.rs

johan-bjareholt · 2019-10-07T14:20:16Z

src/query/datatype.rs

+    }
+}
+
+impl TryFrom<&DataType> for Vec<DataType> {


Awesome with all these TryFrom!

Also nice that you moved types to a seperate file.

progress on classifier transform

9706d8b

ErikBjare requested a review from johan-bjareholt September 20, 2019 10:33

ErikBjare added 3 commits September 20, 2019 13:23

added classify as a query function

4b2981b

added query test for classify, changed tuple order

50b3b3e

fixed issue in travis config

cdd61b1

johan-bjareholt reviewed Sep 20, 2019

View reviewed changes

This was referenced Sep 21, 2019

implemented classify transform ActivityWatch/aw-core#82

Merged

Added classification vis & settings, switched to vue-cli, massiv… ActivityWatch/aw-webui#145

Merged

ErikBjare added 2 commits September 21, 2019 13:23

renamed Event::new_test -> Event::new (and removed test-specific beha…

2747c2f

…vior)

avoided unecessary clone

b2543c0

ErikBjare changed the title ~~Started working on classifier transform~~ Implemented classifier transform Sep 21, 2019

johan-bjareholt reviewed Sep 22, 2019

View reviewed changes

src/transform/classify.rs Outdated Show resolved Hide resolved

johan-bjareholt reviewed Sep 22, 2019

View reviewed changes

src/transform/classify.rs Outdated Show resolved Hide resolved

ErikBjare added 6 commits October 1, 2019 13:41

prefixed computed data properties with '$', implemented Default trait…

41dd23d

… for Event, fixed non-category tags not accidentally being used as categories

fixed test

32b0374

added test for classifying event without matching category

391186f

started splitting classify into categorize and autotag

4f059d5

added categorize as a query2 function

825f50b

added proper/extended parsing of category rules

8540430

johan-bjareholt reviewed Oct 4, 2019

View reviewed changes

ErikBjare added 3 commits October 7, 2019 13:34

made tag function fully functional in query2, changed to using TryFro…

eefda15

…m trait for conversion into/from query2 datatypes

moved all DataType-related things to seperate file, removed unused im…

73b52bf

…ports, removed arg_type_x functions

cleaned up TryFrom traits

f65d6ec

johan-bjareholt approved these changes Oct 7, 2019

View reviewed changes

removed old way of specifying rules for categorize and tag

7bf15f5

johan-bjareholt approved these changes Oct 8, 2019

View reviewed changes

ErikBjare merged commit 48b78f4 into master Oct 18, 2019

ErikBjare deleted the dev/classify branch October 18, 2019 08:44

dependabot bot mentioned this pull request Mar 23, 2023

build(deps): bump android_logger from 0.11.3 to 0.13.1 #356

Merged

Uh oh!

Conversation

ErikBjare commented Sep 20, 2019

Uh oh!

codecov bot commented Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ErikBjare Sep 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ErikBjare Oct 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ErikBjare Oct 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johan-bjareholt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Sep 20, 2019 •

edited

Loading

ErikBjare Sep 22, 2019 •

edited

Loading

ErikBjare Oct 1, 2019 •

edited

Loading

ErikBjare Oct 1, 2019 •

edited

Loading