Conversation
Codecov Report
@@ Coverage Diff @@
## master #67 +/- ##
=========================================
Coverage ? 80.74%
=========================================
Files ? 30
Lines ? 3132
Branches ? 0
=========================================
Hits ? 2529
Misses ? 603
Partials ? 0
Continue to review full report at Codecov.
|
src/transform/classify.rs
Outdated
| } | ||
|
|
||
| #[test] | ||
| fn test_classify() { |
There was a problem hiding this comment.
Currently we have all tests in the tests folder.
The Rust way to do things is to have unittests inside the source and integration tests inside the tests directory. Not sure which way is the best way for us.
There was a problem hiding this comment.
I quite like having small unittests close to the function definition. Although it doesn't seem to show up in the coverage properly?
There was a problem hiding this comment.
Yeah, coverage is kind of broken currently anyway though, so whatever I guess
src/transform/classify.rs
Outdated
|
|
||
| fn choose_category(tags: HashSet<String>) -> String { | ||
| tags.iter().fold(&"Uncategorized".to_string(), |acc: &String, item: &String| { | ||
| if item.matches("->").count() >= acc.matches("->").count() { |
There was a problem hiding this comment.
This is bad practice, please don't use -> anywhere in the server code and let the web-ui pretty-print it with -> later.
There was a problem hiding this comment.
Having tags/categories be arrays doesn't make sense imo. (having classes be a Vec<(Vec<String>, Regex)> would be annoying)
The "->" is not primarily intended to be pretty, it's intended to:
- Convey the hierarchy order
- Act as an intuitive separator
- Differentiate a category from a tag (a category is a special case of a tag)
I'm really unconvinced about changing this into an array, I've been writing all the classifiers like this and it's really not that bad. I'm actually very happy with using "->" in this way, even though I get why you might not like it.
There was a problem hiding this comment.
Also, dot uses the "->" separator (which is where I got the idea from, I think).
A nice/fun consequence of this is that if you basically dumped all your categories into a .dot file then you'd get a nice visualization of the category structure.
There was a problem hiding this comment.
having classes be a Vec<(Vec, Regex)> would be annoying
I disagree, it clearly defines what the data is without hiding anything
The "->" is not primarily intended to be pretty, it's intended to:
- Convey the hierarchy order
- Act as an intuitive separator
- Differentiate a category from a tag (a category is a special case of a tag)
To me all of those reasons say "pretty" but in a different way.
- Arrays have a order
- It's a separator, that's for sure. I don't know what intuitive means in this case as it's just a separator, but I assume that you mean more readable which I agree but that also comes at a cost of being a exotic construction which otherwise doesn't fit how the other transforms work which might actually make it less intuitive.
- A comment could just as well serve as a differentiation
Also, dot uses the "->" separator (which is where I got the idea from, I think).
A nice/fun consequence of this is that if you basically dumped all your categories into a .dot file then you'd get a nice visualization of the category structure.
Doing ["a", "b"].join("->") would do the same thing, it's not something complex we're talking about here.
Other reasons to avoid it:
- Dot is a whole markup language, this is an entry in an already existing language which means that it's an format within a format which is messy
- Using -> means extra parsing and simply matching them with the currently weak system by simply splitting them would mean possible errors. There's a huge amount of edge cases in both the aw-server-python and aw-server-rust code which needs extra testing because of this.
- If someone automates something which creates tags and it contains "->" it will get messy unless you escape the -> somehow.
- Arrays already have a standard hierarchy order which is left to right (same as your "->" solution)
Just because you are used to something and that it currently works doesn't mean that it's a good idea.
There was a problem hiding this comment.
I disagree, it clearly defines what the data is without hiding anything
It doesn't. It says a tiny bit about how tags are for whatever reason Vec's, which helps the Rust compiler a little bit but doesn't define anything about the underlying behavior of tags/categories. This is unlike using # for tags and -> for categories, which makes it apparent to the reader that there is a distinction.
A comment could just as well serve as a differentiation
No. Tags begin with #, categories have a hierarchy order. These need to be checked at runtime, and I'm not going to both check if an array is 1-length and starts with a # to determine if it's a tag or not.
Using -> means extra parsing and simply matching them with the currently weak system by simply splitting them would mean possible errors. There's a huge amount of edge cases in both the aw-server-python and aw-server-rust code which needs extra testing because of this.
Not really, the parsing is literally just a check for # or -> and then a split. I'm not sure what edge cases you are referring to and I can't recall any open issues about it.
If someone automates something which creates tags and it contains "->" it will get messy unless you escape the -> somehow.
This seems extremely unlikely, and would be easy for them to just avoid doing.
Regardless, I've gone over solutions like these before in the aw-research code, and what I've come up with was the easiest to work with in the end. So unless you want to get into a long debate over all the other options I've explored that solve the same problem in a better way than the Vec you're proposing (such as letting tags have metadata, or letting the class spec be a dict) I'm going to go ahead and merge this.
There was a problem hiding this comment.
I obviously can't disagree with that, but it doesn't help me to merge this PR and isn't an argument for or against the options I mentioned.
I've experimented long enough to have settled on what I believe will be a good API. Unless you think I'm being totally unreasonable here I'd suggest you take my word for the fact that I've explored plenty of options and that I didn't find any of them particularly appealing.
Another option would be to have a category and/or tag type in the query language.
What does this even mean? How does that affect the underlying transform?
There was a problem hiding this comment.
Sorry for the short answer before, did not have time to write a proper answer (and probably don't have now either).
I obviously can't disagree with that, but it doesn't help me to merge this PR
Reviews aren't supposed to help getting something merged, it's supposed to increase the quality of the code getting merged.
It's also already been as a PR for 9 days and nothing else has been merged since so there shouldn't be a hurry.
You should also expect that a bigger PR should get a longer review time, I have only taken a quick glance at the code as of now.
A third option is to split up the behavior for tags and categories entirely, in which case I'd be fine with using your Vec<(Vec, Regex)> for the category part and keep using Vec<(String, Regex)> for plain tags. But I don't think that makes sense since it's more or less identical code apart from selecting the highest-ranking category.
To me that sounds reasonable. If they have no reason to be coupled and they do different things they shouldn't be coupled.
What does this even mean? How does that affect the underlying transform?
It was just kind of a stupid idea to maybe have a construct in the language to make things more clear. Either just defining a function called category("work", "project") which returns a actual category type, or possibly even have a custom syntax such as category["work" -> "project"]. I don't think it's a good idea, just brainstorming to see if it's possible to get the best of both worlds.
There was a problem hiding this comment.
Reviews aren't supposed to help getting something merged, it's supposed to increase the quality of the code getting merged.
It's the same thing in the end: We only merge code that's in good shape, and reviews help get code in shape, therefore reviews help get things merged. All I was saying was that your short comment didn't help me fix the issues you mentioned.
It's also already been as a PR for 9 days and nothing else has been merged since so there shouldn't be a hurry.
It's been stale for 9 days in part because I myself have been bikeshedding over the very thing you requested changes on since you were seemingly adamant about the status quo not being satisfactory.
You should also expect that a bigger PR should get a longer review time, I have only taken a quick glance at the code as of now.
But it's not big? It's +177 -11. (You might be thinking about the aw-webui PR)
To me that sounds reasonable. If they have no reason to be coupled and they do different things they shouldn't be coupled.
Really? This would lead to a lot of almost duplicated code/APIs in several places (two query2 functions, two transforms, two settings in the web UI). They don't really do different things, they do 90% exactly the same thing, apart from the category selection at the end. All that work just to get rid of a few string splits with -> seems ridiculous tbh.
There was a problem hiding this comment.
All I was saying was that your short comment didn't help me fix the issues you mentioned.
Fair enough, misunderstood you
But it's not big? It's +177 -11. (You might be thinking about the aw-webui PR)
The API part is a big thing to me. This will be with us for a long time.
Really? This would lead to a lot of almost duplicated code/APIs in several places (two query2 functions, two transforms, two settings in the web UI). They don't really do different things, they do 90% exactly the same thing, apart from the category selection at the end. All that work just to get rid of a few string splits with -> seems ridiculous tbh.
Two query2 functions is fine, that boilerplate code is minimal and stable.
Two transforms is acceptable. It's worth it for having to avoid a custom string format IMO.
Two settings in the web UI is probably a good thing if they don't belong with each other, no point in mixing them really.
Also, currently there does not seem to be any examples of using tagging in the visualization yet?
There was a problem hiding this comment.
Another option would be to try to be more generic and merge the functionality of categories and tags and just have them as one implementation and instead add the ability to save different sets of classes and choose which one you want to use.
Maybe make it optional to have a single event have multiple tags/categories or not and always allow the depth to be more than 1. If you don't want the depth to be more than 1, simply don't specify any classes which has more than 1 entry. That's essentially the same way we do with filter_events_by_keys, it allows supplying multiple keys but most often we only supply one.
Kind of hard to explain, but I hope I get my point across.
… for Event, fixed non-category tags not accidentally being used as categories
src/transform/classify.rs
Outdated
| } | ||
|
|
||
| // TODO: Classes should be &Vec<(String, Rule)> | ||
| pub fn autotag(mut events: Vec<Event>, rules: &Vec<(String, Regex)>) -> Vec<Event> { |
There was a problem hiding this comment.
autotag makes it sound like it comes up with the tags itself (which would be really cool, but unfortunately that's not what it does).
Just "match_tags" would better fit as a name.
There was a problem hiding this comment.
I renamed it to just tag because otherwise, I feel like we should also rename the categorize function to match_category and that just seems unnecessarily wordy.
…m trait for conversion into/from query2 datatypes
…ports, removed arg_type_x functions
johan-bjareholt
left a comment
There was a problem hiding this comment.
API looks good and from a quick glance code looks good as well!
| } | ||
| } | ||
|
|
||
| impl TryFrom<&DataType> for Vec<DataType> { |
There was a problem hiding this comment.
Awesome with all these TryFrom!
Also nice that you moved types to a seperate file.
No description provided.