Conversation
As long as the fields are public you don't need a getter to access them, and since there's no abstraction at all I removed them while troubleshooting. The one place we instantiate a SID we pass a long array, no need for varargs.
We encountered a OOM error while trying to turn this IRI into a SID: http://dbpedia.org/resource/Permian–Triassic_extinction_event (note the unicode em-dash instead of a hyphen) This reimplementation avoids an infinite loop. It also preserves a bug in the original impl wherein zero bytes are removed, regardless of whether they were padding or not. A fix for that will appear in a subsequent PR.
| this.nameCodes = nameCodes; | ||
| } | ||
|
|
||
| public int getNamespaceCode() { |
There was a problem hiding this comment.
Why did the getters need to be removed?
There was a problem hiding this comment.
I removed them during troubleshooting, I just saw no reason to add them back, they add no meaningful abstraction, and you can access public fields by their name just fine. Removing them also makes my Clojure-bound wariness about hiding data happier : )
There was a problem hiding this comment.
We need to add them back. This is a Java class, not Clojure. Those fields should not be public, and they should not be settable. This change makes them settable.
I don't think anything about the Java class needed to change to fix this bug.
There was a problem hiding this comment.
My understanding is that final indicates that the field cannot be mutated and is more than sufficient for safety, but I've gone ahead and reverted the changes.
There was a problem hiding this comment.
I think we should strive to write code as idiomatically as possible in whatever language we're writing in. I would not have written a clojure namespace hiding data behind a function, but that is the Java way, and this is a Java class.
There was a problem hiding this comment.
My bold take is that idiomatic java is in this case just wrong - getters are a bad design and the convention for using them comes from a time when final did not exist.
Especially since this is in performance sensitive code my thought is why waste a stack frame on something we don't need. Same thought with changing the signature from long... to long[], we only ever supply an array so no need to generate vararg handling. But in the end it's only a stack frame and I'm sure the JIT will handle both cases correctly anyways.
This reverts commit ed9bb63.
src/clj/fluree/db/util/bytes.cljc
Outdated
| n'')))))) | ||
| [l] | ||
| (->> [56 48 40 32 24 16 8 0] ;; byte offsets | ||
| (map (fn [i] (bit-and (bit-shift-right l i) 0xFF))) |
There was a problem hiding this comment.
This probably doesn't matter in practice, but I think we should keep the loop structure here instead of using map + remove. This code ideally should be as fast as possible because it's often called repeatedly when processing a query.
There's a lot of other things to optimize, and the functional implementation is still way faster than having to do an async query the way it was before, so I don't feel that strongly about changing it back.
There was a problem hiding this comment.
You raise a good point, I've rewritten it to be a loop. criterium/quick-bench pegs it as a little more than twice as fast now - ~300ns per long vs ~850ns with the seq version. I've also added a property test to make sure that it's working for inputs we didn't consider.
Using criterium/quick-bench to verify, this implementation is about twice as fast as the seq-based one. old way: Execution time mean : 863.869922 ns new way: Execution time mean : 318.420721 ns
6d3006d to
3ca2c42
Compare
We do not properly handle zero bytes - we indiscriminately remove them all. However, zero bytes are not valid in an IRI, so in practice this ok. The only zero bytes that we have are the ones we add ourselves for padding.
3ca2c42 to
0db07fc
Compare
This fixes an OutOfMemoryError that occurred when we exhaust the heap space while trying to turn the long array on a SID back into a string. The IRI in question in this case was is
http://dbpedia.org/resource/Permian%E2%80%93Triassic_extinction_event.I've started adding property tests to ensure that all UTF-8 strings can be roundtripped properly, but that has exposed a few more uncovered cases around how we deal with padding. In the interest of making this fix available quickly, this PR doesn't include those (failing) tests yet.