-
Notifications
You must be signed in to change notification settings - Fork 291
Fail hashing if components are ambiguously ordered. #6007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b925b4b to
b94a51c
Compare
| verifyTermFormatHash :: ComponentHash -> TermFormat.HashTermFormat -> Maybe HashValidationError | ||
| verifyTermFormatHash (ComponentHash hash) (TermFormat.Term (TermFormat.LocallyIndexedComponent elements)) = toMaybe $ do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change that allows Share to catch these.
|
@ChrisPenner Not sure why this would be an issue when they're not top level definitions |
Even if not top-level definitions they still need to be hashed, and AFAIK we treat top-level bindings just as one big let-rec. I think it won't run into quite the same problems, since I'd guess that the variable names are going to be the same on every hash since they're baked in, and you can't have external references into that let-rec, so that problem doesn't apply. The one thing would be that this implies if you load a term, then change a variable name, the hash could change (if you had a 3 element component where element ordering matters), which would be annoying, but not entirely problematic. I'll fix this test, but I don't think we should change the behaviour. |
|
@aryairani Can you write up that alternative messaging you wanted when you get a chance please? Thanks :) |
|
@ChrisPenner Just to be clear, this error can happen even nested deep within another definition? Or when can it happen? I don't think I ever got why it doesn't just apply at the top level. |
Yes, but does it need to be an error if it's not a top-level definition? I guess it's the same situation, where you could have two definitions that each had this equivalent cycle inside it, and then the top level definitions' hashes would inherit the issue?
So then can we make it not an error?
I see, so instead of being unable to change the internal variable name, because you end up with a new hash, you could change it.
Ok |
|
@ChrisPenner Wdyt about And then if it it's practical to print the message exactly once, that'd be better than twice. |
|
@ChrisPenner We should probably have some kind of test that captures this behavior, even if we don't like the current behavior, or at least an easy to find issue about it. |
…ng them in the `errors` directory
…ent ordering error.
I don't know for sure, I'm not familiar with that part of hashing, but since we treat top-level definitions as basically just a let-rec I don't think I'm able to distinguish the two without essentially threading that info all the way through. I'm hesitant to spend much more time on this tbh, it's already soaked up like a full week that could've been spent on implementing the fix.
Yes, I think so, but not reliably, since it would only allow you to change it if it caused a re-ordering, which isn't always going to be the case.
I can't get the line number without specializing the whole algorithm to be I also can't tell whether we're inside another definition without rewriting things, which I'm hesitant to do because it may introduce other bugs, and would likely take a similar chunk of time to just implementing the fix. I know it seems like I'm not able to budge on a lot of things that make this error better, but that's mainly because I don't think it makes sense to invest the big chunk of work it'd take to do most of these things (which would later need to be undone) when we could instead invest that work in a proper fix. Or at the very least, I don't want to delay getting this validation in place. If someone uploads a problematic term to share that will generate possibly weeks of additional work. Even if we update the error message later I'd really like to get this merged as soon as possible to prevent such a situation. |
|
@ChrisPenner I think the info is already threaded through in the |
IIRC The less powerful ABT.Var was the one from a paper, and the bigger one was extra functionality that we added. |
@ChrisPenner It's cool. Let's just do something reasonable for tests, get CI passing, and merge it, and then create a nice ticket to track adding the tie-breaking algorithm. |
aryairani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assuming testing is reasonable and CI is passing
The actual I think maybe we had different expectations for the fix; I was hoping to just get a patch in to stop Share from getting infected to buy time to work on other fixes (whether that be a rewrite to the hashing algorithm to fix the problem, or a refactoring to get better error messages). Are you wanting to do something more substantial the first time round? I'd really like to get the Share part deployed ASAP so let's see where we can compromise to help us both be happy :) |
d4ce73d to
21fdf94
Compare
Was there some additional testing on top of the |
Not necessarily; when I wrote that, I didn't know what LSP test crash was exactly (4c8d17f), so I just meant if you had to remove some meaningful test to get it to pass, that we should replace it with some other test to fill the gap. But I think we're good! |
Overview
A temporary measure to prevent components with ambiguous element orderings from propagating through the ecosystem; particularly in Share.
More context on this issue: #2787
This doesn't fix the issue, but it does prevent the problematic situation caused by a single hash having multiple semantically different definitions; which would cause completely erroneous behaviour in certain cases.
I analyzed Share and no such definitions are in the ecosystem at the moment, this change should keep it that way.
@aryairani we'd talked about different approaches for implementing this without making large changes to the codebase;
you seemed hesitant to thread errors all the way through only to (hopefully) remove them later, so instead I inserted calls to
error, which should have the same effect but with a slightly worse user experience.Here's what happens if you write a component like this:
It prevents the user from adding/updating such a component, which accomplishes the goal, but in a bit of a scary way.
Implementation approach and notes
Interesting/controversial decisions
There's some nuance, we could choose allow components where all elements are identical, since those don't technically cause any problems, but it's somewhat annoying and difficult to check if this is the case for non-trivial components, and AFAIK there's never a good reason to write such a component since you could just recurse instead.
This also is the first introduction of the idea that hashing a component could fail; which is undesirable.
We plan to lift this restriction by introducing a complete ordering of component elements in the future, but it involves implementing a whole new component ordering algorithm. (see #2787 )
Test coverage
Loose ends
It would be nice to properly resolve this issue by implementing a complete element ordering algorithm.