You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a draft proposal for a generic hasher interface (#88) for this library.
How this Works
Adds PhfHasher trait
Adds generic PhfHasher parameter to maps/sets
Adds FmtConstPath trait to hook PhfHasher into phf_codegen
Uses quote::ToTokens trait to hook PhfHasher into phf_macros
Note on Point 1
Right now, the PhfHasher trait is only responsible for the initial PHF distribution (think SipHasher keys) and the actual hashing (computing Hashes object). The PHF generation algorithm is still the same as before, but if it makes more sense to use a separate generation algorithm for each hashing algorithm, then we should replace PhfHasher with something like the following which is a little simpler than what the current draft proposes:
pubtraitPhfHasher{/// PHF Generation ErrortypeError;/// Try to generate a perfect hash function for the list of `entries`.fntry_generate<T>(entries:&[T]) -> Result<HashState,Self::Error>whereT:PhfHash;/// Hashes the data at `x` returning the computed displacement parameters.fnhash<T>(&self,x:&T) -> HasheswhereT:PhfHash;}
where HashState and Hashes are the computed map/displacements and index respectively.
Also, in this PR, phf_generator is removed since it only contains the generation algorithm which should live closer to PhfHasher and eventually could be part of it.
Note on Point 4
Right now, the macro interface itself has not changed so it's not possible with this current draft to use custom hashers with macros. To enable this, we would need to have to extend the macro interface or have some sort of type inference to detect the hasher. It's still not clear to me how we should do this.
@JohnTitor
I think I'm interested in making macros that prioritize fast lookups for static data (set/map etc).
I'm not sure if I should look to start a separate project or try to add to phf.
Sharing some thoughts, @JohnTitor I'd appreciate your feedback when you get a chance (no rush!).
It seems like phf prioritizes space over speed, true to its name - not settling for less than perfect hashing.
The main use case seems to be having a compact-as-possible const map/set.
This PR (I've just barely skimmed it) aimed to add the ability to supply any hash function.
I would like to take it even further:
The ability to relax the minimal hashing requirement (i.e. be willing to produce a sparse non-minimal array, i.e. not a surjective mapping, not a phf).
This allows for very fast lookups using arbitrary hash functions that otherwise may never produce a perfect mapping, or take a very long time to do so.
The ability to supply a comparison function for the keys.
This can be used to squeeze that little bit of extra performance by supplying comparison function more optimal than the default for the key type by utilizing known properties of the set of keys.
For example, suppose the keys are strings that are known to have unique suffixes.
Do think this can/should be a feature of the phf crate?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a draft proposal for a generic hasher interface (#88) for this library.
How this Works
PhfHashertraitPhfHasherparameter to maps/setsFmtConstPathtrait to hookPhfHasherintophf_codegenquote::ToTokenstrait to hookPhfHasherintophf_macrosNote on Point 1
Right now, the
PhfHashertrait is only responsible for the initial PHF distribution (think SipHasher keys) and the actual hashing (computingHashesobject). The PHF generation algorithm is still the same as before, but if it makes more sense to use a separate generation algorithm for each hashing algorithm, then we should replacePhfHasherwith something like the following which is a little simpler than what the current draft proposes:where
HashStateandHashesare the computed map/displacements and index respectively.Also, in this PR,
phf_generatoris removed since it only contains the generation algorithm which should live closer toPhfHasherand eventually could be part of it.Note on Point 4
Right now, the macro interface itself has not changed so it's not possible with this current draft to use custom hashers with macros. To enable this, we would need to have to extend the macro interface or have some sort of type inference to detect the hasher. It's still not clear to me how we should do this.