0-dependency ENSIP-15 in C#
- Reference Implementation: adraffy/ens-normalize.js
- Unicode:
17.0.0 - Spec Hash:
4febc8f5d285cbf80d2320fb0c1777ac25e378eb72910c34ec963d0a4e319c84
- Unicode:
- Passes 100% ENSIP-15 Validation Tests
- Passes 100% Unicode Normalization Tests
- Space Efficient:
~58KB .dllusing Inline Blobs via make.js - Legacy Support:
netstandard1.1,net35,netcoreapp3.1 - Nuget Repository:
using ADRaffy.ENSNormalize;
ENSNormalize.ENSIP15 // Main Library (global instance)Primary API ENSIP15
// string -> string
// throws on invalid names
ENSNormalize.ENSIP15.Normalize("RaFFY🚴♂️.eTh"); // "raffy🚴♂.eth"
// works like Normalize()
ENSNormalize.ENSIP15.Beautify("1⃣2⃣.eth"); // "1️⃣2️⃣.eth"Additional NormDetails (Experimental)
// works like Normalize(), throws on invalid names
// string -> NormDetails
NormDetails details = ENSNormalize.ENSIP15.NormalizeDetails("💩ì.a");
string Name; // normalized name
bool PossiblyConfusing; // if name should be carefully reviewed
HashSet<Group> Groups; // unique groups in name
HashSet<EmojiSequence> Emojis; // unique emoji in name
string GroupDescription = "Emoji+Latin"; // group summary for name
bool HasZWJEmoji; // if any emoji contain 200DOutput-based Tokenization Label
// string -> Label[]
// never throws
Label[] labels = ENSNormalize.ENSIP15.Split("💩Raffy.eth_");
// [
// Label {
// Input: [ 128169, 82, 97, 102, 102, 121 ],
// Tokens: [
// OutputToken { Codepoints: [ 128169 ], IsEmoji: true }
// OutputToken { Codepoints: [ 114, 97, 102, 102, 121 ] }
// ],
// Normalized: [ 128169, 114, 97, 102, 102, 121 ],
// Group: Group { Name: "Latin", ... }
// },
// Label {
// Input: [ 101, 116, 104, 95 ],
// Tokens: [
// OutputToken { Codepoints: [ 101, 116, 104, 95 ] }
// ],
// Error: NormException { Kind: "underscore allowed only at start" }
// }
// ]- Group —
ENSIP15.Groups: IList<Group> - EmojiSequence —
ENSIP15.Emojis: IList<EmojiSequence> - Whole —
ENSIP15.Wholes: IList<Whole>
All errors are safe to print. NormException { Kind: string, Reason: string? } is the base exception. Functions that accept names as input wrap their exceptions in InvalidLabelException { Label: string, Error: NormException } for additional context.
"disallowed character"— DisallowedCharacterException{ Codepoint }"illegal mixture"— IllegalMixtureException{ Codepoint, Group, OtherGroup? }"whole-script confusable"— ConfusableException{ Group, OtherGroup }"empty label""duplicate non-spacing marks""excessive non-spacing marks""leading fenced""adjacent fenced""trailing fenced""leading combining mark""emoji + combining mark""invalid label extension""underscore allowed only at start"
Normalize name fragments for substring search:
// string -> string
// only throws InvalidLabelException w/DisallowedCharacterException
ENSNormalize.ENSIP15.NormalizeFragment("AB--");
ENSNormalize.ENSIP15.NormalizeFragment("..\u0300");
ENSNormalize.ENSIP15.NormalizeFragment("\u03BF\u043E");
// note: Normalize() throws on theseConstruct safe strings:
// int -> string
ENSNormalize.ENSIP15.SafeCodepoint(0x303); // "◌̃"
ENSNormalize.ENSIP15.SafeCodepoint(0xFE0F); // "{FE0F}"
// IList<int> -> string
ENSNormalize.ENSIP15.SafeImplode(new int[]{ 0x303, 0xFE0F }); // "◌̃{FE0F}"Determine if a character shouldn't be printed directly:
// ReadOnlyIntSet (like IReadOnlySet<int>)
ENSNormalize.ENSIP15.ShouldEscape.Contains(0x202E); // RIGHT-TO-LEFT OVERRIDE => trueDetermine if a character is a combining mark:
// ReadOnlyIntSet
ENSNormalize.ENSIP15.CombiningMarks.Contains(0x20E3); // COMBINING ENCLOSING KEYCAP => trueUnicode Normalization Forms NF
using ADRaffy.ENSNormalize;
// string -> string
ENSNormalize.NF.NFC("\x65\u0300"); // "\xE8"
ENSNormalize.NF.NFD("\xE8"); // "\x65\u0300"
// IEnumerable<int> -> List<int>
ENSNormalize.NF.NFC(new int[]{ 0x65, 0x300 }); // [0xE8]
ENSNormalize.NF.NFD(new int[]{ 0xE8 }); // [0x65, 0x300]