Skip to content

Add German word list for BIP0039#721

Closed
DavidMStraub wants to merge 4 commits intobitcoin:masterfrom
DavidMStraub:bip39de
Closed

Add German word list for BIP0039#721
DavidMStraub wants to merge 4 commits intobitcoin:masterfrom
DavidMStraub:bip39de

Conversation

@DavidMStraub
Copy link
Copy Markdown

This adds a German word list with the following properties:

  • Only ASCII letters (no äöüß, no dashs or apostrophes)
  • Only nouns in nominative case (no plurals, no genitives), i.e. all startig with an uppercase letter
  • At least 4, at most 6 letters
  • First 4 letters identify the word uniquely
  • No offensive words
  • Tried to avoid words with ambiguous spelling (Delphin, Delfin)

@luke-jr
Copy link
Copy Markdown
Member

luke-jr commented Sep 7, 2018

@benpixel
Copy link
Copy Markdown

What are the chances for this to get merged soon?

@rodasmith
Copy link
Copy Markdown

Are 'Abfall' and 'Apfel' too phonetically similar? Would a blind user listening to a spoken seed be potentially confused by having both those words in the list?

@DavidMStraub
Copy link
Copy Markdown
Author

@rodasmith, no, those two actually sound quite distinct (Upfall vs. Upfle would be a rough English transcription).

@rodasmith
Copy link
Copy Markdown

The words 'Graf' and 'Graph' are homophones, so one should be replaced.

@rodasmith
Copy link
Copy Markdown

Another pair of homophones: 'Miene' and 'Mine'.

@DavidMStraub
Copy link
Copy Markdown
Author

Here is a few words I've thrown out when shortening to 2048 that can be put back in if more problems appear:

Abakus
Ahorn
Alpen
Antrag
Baby
Banjo
...

@rodasmith
Copy link
Copy Markdown

I'm also not sure whether 'Faser' and 'Phase' are distinct enough.

@rodasmith
Copy link
Copy Markdown

Are 'Spray' and 'Spree' homophones?

@rodasmith
Copy link
Copy Markdown

How about 'Staat' and 'Stadt'? The vowel is longer the the former but I don't know whether that is a clear enough distinction.

@DavidMStraub
Copy link
Copy Markdown
Author

Spray and Spree are totally different, the latter is proununced with 'Sh'.

Staat and Stadt is a very long vs a very short vowel, hardly mistakable. Faser vs Phase might be problematic, throwing the latter out.

@rodasmith
Copy link
Copy Markdown

Are 'Uran' and 'Urahn' homophones?

@DavidMStraub
Copy link
Copy Markdown
Author

No, emphasis on 2nd vs 1st syllable.

@rodasmith
Copy link
Copy Markdown

Ack (now that those pairs of homophones are removed)
I am not a native German speaker, so review by a native speaker would be welcome, but from my non-native perspective of German, this LGTM.

@dabura667
Copy link
Copy Markdown

Is there a possibility that the user will forget that all words start with capital?

Does capitalizing the first letter have a grammatical meaning for conjugation in German?

I would say to go with whatever is in the dictionary. If nouns in the dictionary all start capitalized, then I think this is OK.

If a user has to remember case (I know some people who write in all upper case no matter what) then this is not a good idea I think.

@rodasmith
Copy link
Copy Markdown

rodasmith commented Sep 17, 2018

German nouns are always capitalized, so that won't be a problem.

@pbengert
Copy link
Copy Markdown

pbengert commented Oct 2, 2018

Abart <-> Abort
While beeing pronounced differently the only differ in a vs o.
Could be a problem when you recreate the seed from handwriting

@pbengert
Copy link
Copy Markdown

pbengert commented Oct 2, 2018

Some more comments from reading the list (thanx for compiling it!)
"Daum" -> I'm not sure what it means. Maybe use "Daumen"
"Epoxyd" -> maybe a missspelling, should be "Epoxid"
"Faun" -> I'm not sure what it means. Maybe use "Fauna"
"Fluor" <-> "Flur" - seems and pronounces rather similar
"Kreis" <-> "Greis" - depending on local dialect pronouces rather similar
"Pose" <-> "Posse" - pronounces differently but maybe errorprone
"Spiels" -> I'm not sure what it means. Maybe use "Spiel"
"Neger" -> seems offending
"Domina", "Heroin", "Kokain" -> might be offending to some ( I imagine explaining my grandma what a seed is ab why it is important and why bitcoin is important and the seed starts with Domina Heroin Kokain ...)

@DavidMStraub
Copy link
Copy Markdown
Author

Thanks @pbengert, you have some good points there, especially the offending words I overlooked. I'll try to see how I can replace them.

Forbidding words that differ by a single letter but are pronounced very differently (Abart/Abort, Pose/Posse) maybe seems a bit too restrictive to me...

@DavidMStraub
Copy link
Copy Markdown
Author

@pbengert, I think I've removed all the problematic words and replaced them with innocuous ones.

@thomasklemm
Copy link
Copy Markdown

@DavidMStraub Thanks for putting this list together, it feels quite good already. Since this feels like a major decision for the ecosystem and probably can't be changed easily, I want to point out a few things:

  • To me the words here feel in general much more similar to others on the list than in the English BIP39 dictionary (e.g. Doge, Dogge; Wulst, Wurst, Wust; Sold, Sole, Solo; Warze, Wanze; Zelle, Cell). Technically speaking my guess would be that the Levenshtein distance between many words here is much less than in the English list, though I haven't looked deeper into that. Does someone have a tool against which BIP39 lists can be checked?
  • Some of the words feel a bit hard to spell, even as a native German I wouldn't have gotten some of them right since there's multiple correct German spellings for that word (e.g. Aceton can also be Azeton, Hachse can also be Haxe depending on where in Germany you live)
  • Some words have a negative connotation (e.g. Amok, Sexist, Eunuch, Geck, Geifer, Voyeur, Zicke)
  • Some words feel quite "sciency" and might not be in the active vocabulary of many Germans (e.g. Spin, Binom, Axiom, Anion, Boson)
  • Some outliers that have an unclear meaning or feel just very uncommon: Obrist (had to Google that), Lyzeum (outdated), Odem, Sept (unclear, short for September?)

I'll make some concrete suggestions tomorrow, feel like we could bring in some words like city names (Berlin, Kiel, Halle) or common first names instead of the mentioned ones above.

What's the reasoning behind limiting to 6 characters? Couldn't we diversify the words quite a bit if 7 or 8 characters were allowed like with the English list?

@DavidMStraub
Copy link
Copy Markdown
Author

@thomasklemm, you find the words "Obrist", "Odem" and "Sept" (https://de.wikipedia.org/wiki/Sept) too uncommon, want to exclude scientific terms, but ask for a larger Levenshtein distance. And in the end we should still have 2048 words without any Umlaute or ß (this is a significant constraint). I don't think this is realistic, but in any case it would mean starting from scratch. If you are serious about this, I guess it's better to close this PR and make a new one.

By the way, I don't have the impression the English wordlist is very different qualitatively.

  • arm - armed
  • auto - autumn
  • awake - away
  • card - cart
  • clock - clog
  • define - defy

and so on.

@DavidMStraub
Copy link
Copy Markdown
Author

To make this more quantitative, this notebook proves that the distribution of pairwise Levenshtein distances is not worse than for the English word list.

@majuric
Copy link
Copy Markdown

majuric commented Nov 5, 2018

@DavidMStraub Thanks for creating this PR. I can't really contribute to the discussion unfortunately but I was just wondering are there any indications if this might get merged any time soon ?

@DavidMStraub
Copy link
Copy Markdown
Author

@majuric that's a good question. At this point I find it frustrating that nothing happens even though I addressed many suggestions.

@SebastianFloKa
Copy link
Copy Markdown

Many Thanks to @DavidMStraub for initiating this and I fully agree with his intention to achieve a BIP0039 German wordlist with low number of letters per word. One will see the advantage when engraving the mnemonic into stone, steel, etc. - it would save quite a lot of time to many people.
Actually I was not aware of this ongoing discussion here and already tried to create a list with a maximum of 5 letters per word. Pending on the underlying criteria, this wordlist was not too bad but with room for improvement.
Concerning the wordlist proposed by David I agree with @thomasklemm that there is also room for improvement, for example regarding the choice of some words. On top, there is this conflict with the Spanish wordlist which is still not clear if this is a “must-have” or “nice-to-have” criteria.
If you don’t mind, I will try to structure this topic (list of criteria) and try to come up with a revised proposal in parallel (~CW49).

@SebastianFloKa
Copy link
Copy Markdown

Habe die Ehre,
here is a first proposal to structure the requirements. If you see anything missing or wrong, please inform.
20181219 - BIP0039 German Wordlist - SOR.pdf

As @DavidMStraub already mentioned, it’s indeed not possible to create a reasonable BIP0039 German Wordlist out of nouns with maximum 6 letters and a minimum levenshtein distance of 2 (TWO) plus all the other criteria at the same time.
There are obviously two main possibilities for a next step / evaluation:

  1. Scenario 1: keep the intended maximum of six letters per word and allow some of following criteria:
    a. acceptance of more types of words than nouns (verbs, adjectives, ?)
    b. acceptance of levenshtein distance of ONE “addition / substraction” (Example: TIER & STIER)
    c. acceptance of levenshtein distance of ONE “substitution” of non-similar sounding letters (Example: ERDE & ERLE allowed / ERDE & ERBE not allowed).
    --> b & c could be limited for example to a certain amount of levenshtein collision, for example a max. of one collision (Example: MAUS & LAUS & HAUS not allowed / MAUS & LAUS allowed)
    --> or/and requiring different grammatical gender of words (Example: MAUS (feminin) & LAUS (feminin) not allowed / MAUS (feminin) & HAUS (neutrum) allowed). For verbal transmission the different article could help to identify the correct word. Example on the phone: MAUS wie die Maus in Tom & Jerry / HAUS wie das Haus in dem ich wohne
  2. Scenario 2: Increase the number of accepted letters to 7 or even to 8 and allow only words with minimum levenshtein distance of TWO or more.

Other BIP0039 Wordlists solved this quite differently, most allow verb and adjectives, some allow levenshtein distance of ONE --> see attached overview.
So there are two main questions to be answered now:
A) Do you see any concerns using verbs and adjectives? If we go for capital letters of the complete word (what I highly recommend) I don’t see a roadblocker. Do you?
B) If you don’t mind, I will continue to check for a reasonable solution within Scenario 1 or do you see a minimum levenshtein distance of TWO in any case as a must have?

@DonaldTsang DonaldTsang mentioned this pull request Dec 24, 2018
22 tasks
@SebastianFloKa
Copy link
Copy Markdown

In order to proceed, attached a wordlist related to above mentioned Scenario 1 (keep max. 6 letters per word) based on following compromises:

  • Adjectives supplementary to nouns.
  • Due to high similarity of many verbs up to 6 letters to their corresponding nouns (LAUFEN & LAUF, LIEBEN & LIEBE) there’s no big advantage expected: Verbs were therefore NOT taken into consideration.
  • Levensthein distance (LS) “Substitution” of 1 allowed but not two words with similar letters “B&P”, “D&T”, “G&K”, “S&Z”, “M&N” (GRAD & GRAT).
  • LS “Substitution” of 1 allowed but maximum 2 collisions allowed (= max. 3 affected words).
  • LS “Addition” of 1 allowed but not if double letter causing the creation of a new word (example: OFFEN & OFEN, ROBBE & ROBE, ROGGEN & ROGEN) or if an “H” is involved (RUHM & RUM).
  • LS “Permutation” of 1 was possible to be excluded completely (OSLO & SOLO, FORSCH & FROSCH).

I assume it’s easier for everybody to look into a PDF-file (at least for the moment), so here’s the Proposal-2019-01-15 and the related overview of criteria (SOR):
BIP0039 German Wordlist - Proposal-2019-01-15.pdf
BIP0039 German Wordlist - SOR-2019-01-15.pdf

This is (at least close to) the best compromise between the max. length of 6 letters per word and the non-similarity of words (better than BIP0039 English WL, not so good as BIP0039 French WL) + all the other criteria. By the way: The average length of word would currently be only 5,1 letters, the English WL is 5,4 and the French WL is 6,8 --> so not bad.
@thomasklemm, @DavidMStraub and others:

  • What’s your opinion on this Proposal? Would you agree to above described compromise?
  • If yes, do you still see any “NO-GO-words” (embarrassing / far too unknown / too similar)? Very minor modifications might be possible, for major changes we will have to increase the accepted length per word.

@DavidMStraub
Copy link
Copy Markdown
Author

Thanks @SebastianFloKa for this impressive proposal. I will have a detailed look at the wordlist and let you know in case I notice any problems (I don't expect to). I agree that scenario 1 makes a lot of sense.

On a purely aesthetical note, I don't quite understand why you want to have it all-caps. IMO that makes it much less readable (and quite ugly). I don't see any ambiguities when using the proper capitalization as there are no words that differ only by capitalization.

Good work!

@SebastianFloKa
Copy link
Copy Markdown

So looks like we all agree that writing nouns completely in lowercase is a no-go for Germans, so this should be out of scope.

Majuscule font (= all caps, = uppercase) in a text is uncomfortable to read - I fully agree with @DavidMStraub. But for standalone information with outstanding character (importance) and need to be associated as such, it's quite common to use all-caps in German. For example the ID-card, passport and driving license of Germany and Austria: All relevant information (names, street, colour of eyes, authority, nationality, etc.) are written in uppercase. Many official documents (tax documents, opening a bank account, contracts etc.) require to be filled out in all-caps as well. This is mainly to avoid confusion.

If somebody stores his seed on an electronic device (wallet), the type of character doesn’t matter. But for people creating a physical backup of their seed, for example with beat letters (= Schlagbuchstaben in German) into steel plates or using prefabricated letters etc., it matters a lot. Allowing a mixture of lowercase AND uppercase would require a separate set of beat letters (respectively prefabricated letters) and cause a more confusing and time consuming procedure to manually create the Backup. As the intention of a BIP0039 Wordlist is to provide an easy to handle solution to users, the amount of potential characters should be kept as low as possible - which would be fulfilled by using all-caps only.

Then the issue with same words as noun or adjective or verb, and there are quite some of them: Husten/husten, Klasse/klasse, Nutzen/nutzen, Rennen/rennen, Wissen/wissen etc.. There will always be uncertainties about the correct writing if a mixture between lowercase and uppercase is allowed, particularly if words are transmitted verbally. If only uppercase characters are set, it doesn’t matter which meaning somebody connects to a word (such as homonyms), the orthography will always be predefined and therefore unambiguous.

If restoring of a partly unreadable seed is necessary, it could be helpful as well. If there will exist many different wordlists in the future (a tendency we can already see today), it would help to distinguish the BIP0039 German Wordlist from others. Software tools (wallets) could take advantage of this difference and provide more relevant word proposals to the user and request for uppercase letters (or exchange automatically) if German is pre-set as language.
To be fair, the last point is an advantage (or even neutral point) and not an argument to go for majuscule (uppercase) for the BIP0039 German Wordlist – but IMO the others are.

@SebastianFloKa
Copy link
Copy Markdown

Thinking about pros and cons of limiting words to 6 letters it might make sense to extend this to at least 7 or maybe even 8 letter per word. I think it's worth it.
I could work on a new proposal around August this year...

@neox5
Copy link
Copy Markdown

neox5 commented Jun 2, 2020

What's the current state of this issue? I would offer my help

@DavidMStraub
Copy link
Copy Markdown
Author

As far as I'm concerned, I've abandoned this. It's really a weird PR. First, I get a few useful comments, make a few improvements. Then, I get a list of requests from @thomasklemm, some of which I proved even with a Jupyter notebook are not realistic/reasonable. No reply. Half a year later, @SebastianFloKa comes with a totally different suggestion, but does not actually submit a PR.

@FaustmannChr, thanks for offering your help, but if I were you I wouldn't invest time unless one of the maintainers confirms that this actually has a chance of getting merged.

@neox5
Copy link
Copy Markdown

neox5 commented Jun 2, 2020

Yeah @DavidMStraub, I think you are right...

And many thanks for the summary and the fast response!

Copy link
Copy Markdown

@SebastianFloKa SebastianFloKa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start 3rd approach: levenshtein distances & other main requirements included

Comment on lines +1 to +2048
Abart
Abbau
Abbild
Abend
Abfall
Abflug
Abgas
Abgott
Abguss
Abhang
Abitur
Abkehr
Ablage
Abluft
Abort
Abraum
Abrede
Abrieb
Abruf
Absatz
Absud
Abtei
Abwahl
Abweg
Abwind
Abwurf
Abzug
Aceton
Achse
Ader
Affe
Afrika
Agave
Agent
Agonie
Ahnung
Akazie
Akkord
Akku
Akne
Akte
Aktie
Aktor
Aktuar
Akzent
Alarm
Alben
Albino
Album
Alge
Alibi
Alkali
Allee
Alltag
Altar
Altbau
Amboss
Ameise
Amme
Amok
Ampel
Amsel
Ananas
Anbau
Anden
Anfall
Anflug
Angabe
Angina
Angler
Anhalt
Anhieb
Anion
Anis
Anlage
Anmut
Anode
Anorak
Anreiz
Anruf
Ansatz
Ansitz
Anteil
Antifa
Antje
Anzahl
Anzug
Apfel
Appell
Apsis
Araber
Arbeit
Arche
Areal
Arena
Arie
Arktis
Armada
Armee
Armut
Aroma
Arrest
Arsen
Artist
Arznei
Arzt
Asbest
Asche
Asiat
Asien
Asket
Aspekt
Asyl
Atem
Athlet
Atmung
Atoll
Atom
Attest
Attika
Aufbau
Aufruf
Aufzug
Auge
Aula
Aura
Ausbau
Aushub
Ausruf
Auster
Ausweg
Auszug
Autist
Auto
Axiom
Bahre
Baiser
Bake
Balsam
Balte
Bammel
Banane
Bande
Bank
Bann
Barbar
Barde
Barium
Baron
Bart
Basar
Base
Baske
Bast
Batik
Batzen
Bauamt
Bauch
Bauxit
Beamer
Becher
Beere
Beet
Befehl
Befund
Behang
Behelf
Beil
Bein
Beirat
Belag
Beleg
Bengel
Beruf
Besatz
Besen
Besitz
Bestie
Besuch
Betrag
Bett
Beule
Beweis
Bezirk
Bezug
Bibel
Biene
Bier
Biest
Bieter
Bikini
Bilanz
Bild
Binder
Binom
Binse
Biotop
Birke
Birne
Bistum
Biwak
Bizeps
Blatt
Blech
Blei
Blitz
Blume
Bluse
Blut
Bock
Boden
Bohle
Bohne
Boiler
Boje
Bolzen
Bombe
Bonbon
Bonmot
Bonus
Bonze
Boogie
Boot
Bord
Borke
Borste
Boson
Bote
Botin
Bowle
Boxer
Brand
Brauer
Bravur
Brei
Brezel
Brille
Brise
Brite
Brokat
Bronze
Brot
Bruch
Bruder
Brust
Bube
Buch
Bude
Bukett
Bund
Burg
Burka
Busen
Cabrio
Camp
Casino
Celle
Chaos
Chef
Chemie
Chip
Chlor
Chor
Chrom
Dach
Dackel
Dame
Damm
Dampf
Darm
Dasein
Datei
Dativ
Dattel
Datum
Dauer
Daum
Daune
Degen
Deich
Dekan
Dekor
Dekret
Delikt
Demenz
Demut
Denker
Despot
Detail
Deut
Devise
Diadem
Diakon
Dialog
Dieb
Diele
Diener
Diktat
Dill
Dimmer
Ding
Dinkel
Diode
Dioxid
Dipol
Diskus
Disput
Distel
Diva
Diwan
Docht
Dock
Doge
Dogge
Dogma
Dohle
Doktor
Doku
Dolch
Domina
Doping
Doppel
Dorf
Dorn
Dorsch
Dose
Dosis
Dotter
Dozent
Drache
Dragee
Draht
Drama
Drange
Dreck
Dreher
Drift
Droge
Drohne
Druide
Duell
Duett
Duft
Dunst
Durst
Dusel
Dynamo
Ebbe
Eber
Echo
Eder
Efeu
Egel
Egge
Egoist
Ehrung
Eibe
Eichel
Eigelb
Eigner
Eiland
Eilgut
Eilzug
Eimer
Einbau
Einehe
Einrad
Einsen
Einzel
Eisen
Eiter
Eklat
Ekzem
Elan
Elch
Elegie
Elfe
Elite
Elle
Elster
Emanze
Emblem
Embryo
Emirat
Empore
Endung
Engel
Enkel
Ente
Entzug
Enzian
Enzym
Epilog
Epoche
Epos
Epoxyd
Erbgut
Erbin
Erbse
Erdgas
Erdung
Eremit
Erfolg
Erguss
Erhalt
Erker
Erlass
Erle
Ersatz
Erwerb
Esche
Esel
Esser
Essig
Etage
Etappe
Ethik
Ethnie
Etui
Eule
Eunuch
Euro
Euter
Examen
Exil
Exkurs
Exot
Expo
Exzess
Fabel
Fabrik
Fahne
Fahrer
Fakt
Falke
Fall
Falter
Falz
Fang
Farbe
Farce
Farn
Fasan
Faser
Fass
Faun
Fazit
Fehde
Fehler
Feld
Felge
Fell
Fels
Ferien
Ferkel
Ferse
Fetzen
Fiasko
Fibel
Fieber
Figur
Filet
Filius
Film
Fimmel
Finanz
Finder
Finger
Fink
Finne
Finte
Firma
First
Fisch
Fiskus
Fistel
Fjord
Flachs
Fladen
Flair
Flak
Flanke
Flaum
Fleck
Flegel
Fliese
Flinte
Flirt
Flop
Flosse
Fluch
Flug
Fluid
Fluor
Flur
Fluss
Flut
Flyer
Fohlen
Folie
Fond
Foren
Form
Forum
Foto
Foul
Foyer
Frack
Fratze
Frau
Freak
Freske
Frevel
Friede
Frist
Fron
Frucht
Frust
Fuge
Fuhre
Fund
Funk
Furche
Furie
Furore
Furt
Fusel
Fusion
Futur
Gabe
Gala
Galgen
Galle
Galopp
Gang
Ganove
Gans
Garage
Garbe
Garde
Garn
Garten
Gasse
Gast
Gatte
Gaucho
Gauda
Gaumen
Gauner
Gaze
Geber
Gebiet
Gebote
Gebung
Geck
Gedeck
Geduld
Gefahr
Gegend
Gegner
Gehabe
Gehege
Gehirn
Gehupe
Gehweg
Geier
Geifer
Geist
Geiz
Gelage
Geld
Gemach
Genf
Genie
Genom
Genre
Gerede
Gerste
Gerte
Geruch
Gesang
Gesetz
Geste
Gesuch
Getue
Gewalt
Gewebe
Gewinn
Geysir
Gicht
Giebel
Gift
Gigant
Gilde
Ginko
Gipfel
Gips
Giro
Gitter
Glanz
Glas
Glatze
Gleis
Glied
Globus
Glocke
Glosse
Glut
Glykol
Gnade
Gnom
Gnosis
Gockel
Gold
Golf
Gondel
Gong
Gosse
Gote
Gotha
Gotik
Gott
Grab
Grad
Graf
Gral
Gramm
Granat
Graph
Gras
Grat
Graus
Gravur
Graz
Greis
Grete
Griffe
Grips
Grog
Groll
Gros
Grotte
Grube
Gruft
Grund
Gruppe
Gulden
Gully
Gummi
Gunst
Gurke
Gurt
Guru
Guss
Gusto
Haar
Hachse
Hacker
Hafen
Haft
Hagel
Hain
Halde
Halm
Hammel
Hanau
Hand
Hanf
Hang
Hantel
Harem
Harfe
Harn
Harz
Hase
Hass
Haube
Hauer
Haupt
Heber
Hebung
Hecht
Heck
Heer
Hefe
Heft
Hehl
Heide
Heirat
Heizer
Hektar
Held
Helfer
Helium
Helm
Hemd
Hengst
Henkel
Henne
Herd
Hering
Heroin
Herr
Herta
Herz
Hetzer
Heus
Hexer
Hieb
Hilde
Hilfe
Himmel
Hinz
Hippie
Hirn
Hirse
Hirte
Hitze
Hiwi
Hobby
Hocker
Hofrat
Hoftor
Hofweg
Hoheit
Hohn
Holger
Holm
Honig
Hopfen
Horde
Hormon
Horror
Hort
Hose
Hospiz
Hostie
Hotdog
Hotel
Hufe
Huhn
Humbug
Humor
Humus
Hund
Hunne
Husar
Husum
Hybris
Hymne
Iberer
Idee
Idiot
Idol
Idyll
Ikarus
Ikone
Iltis
Imam
Imbiss
Imker
Import
Impuls
Inbus
Inder
Indio
Infekt
Info
Ingwer
Inhalt
Inka
Inland
Insel
Inzest
Irak
Irin
Irland
Ironie
Irrtum
Irrung
Irrweg
Isar
Isotop
Jagd
Jahr
Januar
Jargon
Jauche
Jawort
Jazz
Jemen
Jena
Jens
Jeside
Joch
Jodkur
Jodler
Joker
Judo
Jugend
Juli
Jumbo
Juni
Junker
Jura
Jurist
Juror
Jury
Justiz
Jute
Juwel
Kabel
Kabine
Kachel
Kader
Kadi
Kaff
Kahn
Kairo
Kajak
Kakao
Kaktee
Kalk
Kamel
Kamin
Kamm
Kampf
Kanal
Kanne
Kanon
Kanu
Kanzel
Kaplan
Kapuze
Karat
Karte
Kasko
Kasse
Kasten
Kasus
Kater
Kattun
Kauz
Kaviar
Kegler
Kehle
Keil
Keim
Keks
Kelch
Kelle
Kenia
Kenner
Kerbel
Kerker
Kerl
Kerze
Ketzer
Keule
Kibbuz
Kicker
Kiefer
Kiemen
Kies
Killer
Kilo
Kind
Kinn
Kino
Kiosk
Kipper
Kirmes
Kissen
Kiste
Kita
Kitt
Kleber
Klerus
Klette
Kleve
Klima
Klinik
Klippe
Klops
Klos
Klotz
Kluft
Knabe
Knacks
Knast
Knauf
Knecht
Kneipe
Kniffe
Knigge
Knilch
Knirps
Knolle
Knospe
Kobold
Koffer
Koje
Kojote
Kokain
Kokon
Koks
Kolben
Kolik
Koloss
Koma
Kombi
Komet
Komik
Komma
Konsul
Konto
Konvoi
Konzil
Kopf
Kopie
Korb
Kordel
Kork
Korn
Korse
Kosak
Kosmos
Kosovo
Kost
Krabbe
Krach
Kragen
Krake
Kram
Kran
Krater
Kraut
Kredit
Kreis
Kreml
Kreole
Krepp
Kresse
Kreta
Kreuz
Krim
Kripo
Krise
Kritik
Kroate
Krone
Kropf
Krug
Krume
Kruste
Krux
Kuba
Kubus
Kuchen
Kufe
Kuhle
Kulanz
Kuli
Kult
Kummer
Kumpan
Kunde
Kunst
Kunz
Kupfer
Kuppe
Kuramt
Kurbad
Kurde
Kurie
Kurort
Kurs
Kuss
Kutte
Labor
Lachs
Lack
Lader
Ladung
Lady
Lage
Lagune
Laib
Laie
Lakai
Laken
Lama
Lamm
Lampe
Lanze
Lappen
Laptop
Larve
Lasso
Laster
Latex
Latte
Latz
Laub
Lauch
Laune
Laus
Lava
Lawine
Leber
Leck
Leder
Legat
Legion
Leguan
Lehm
Lehrer
Leib
Leier
Leim
Leine
Leiter
Lektor
Lemma
Lende
Lenker
Lenz
Lepra
Lerche
Lesart
Lesbe
Leser
Lesung
Lette
Leute
Libero
Libido
Libyen
Lied
Lift
Liga
Lilie
Limes
Limit
Limo
Lindau
Lineal
Linie
Link
Linse
Linz
Lippe
List
Liter
Litze
Lizenz
Lobby
Loch
Loggia
Logik
Logo
Lohn
Losung
Lothar
Lotion
Lotte
Luchs
Luft
Lugano
Lumen
Lumpen
Lunge
Lunte
Lust
Lutz
Luxus
Luzern
Lydien
Lyrik
Lyzeum
Macho
Macke
Made
Magd
Magen
Magie
Magma
Magnet
Mahl
Mahner
Mail
Main
Mais
Makel
Makler
Makro
Maler
Malz
Mama
Mandat
Manege
Mangan
Manie
Manko
Mantel
Manual
Mappe
Marder
Marmor
Marter
Masche
Masern
Maske
Masse
Mast
Mathe
Matrix
Matsch
Maul
Maurer
Maus
Maut
Meer
Mehl
Meile
Meise
Mekka
Melder
Melone
Memel
Memo
Mentor
Meran
Merkel
Mesner
Messer
Metall
Meter
Methan
Metier
Metro
Metz
Meute
Mexiko
Mieder
Mief
Miene
Mieter
Mieze
Mikado
Milbe
Milch
Miliz
Milz
Mime
Mimik
Mimose
Minden
Mine
Mini
Minute
Misere
Mist
Mitte
Mixer
Mixtur
Mode
Modul
Mofa
Mogul
Mohn
Mohr
Mokka
Moldau
Mole
Moment
Monat
Mond
Monika
Monsun
Moos
Moped
Mops
Moral
Morbus
Mord
Mosaik
Mosel
Moskau
Most
Motel
Motiv
Motor
Motte
Muffe
Mulde
Mumie
Mumm
Mumps
Mund
Murks
Musik
Muskat
Mutant
Mutti
Myrrhe
Mystik
Mythen
Nabe
Nacht
Nacken
Nadel
Nager
Nahost
Name
Napf
Narbe
Narr
Nase
Nato
Natron
Natter
Natur
Neapel
Nebel
Neckar
Neffe
Neger
Nehmer
Neid
Nektar
Nelke
Nenner
Neodym
Neon
Nepp
Neptun
Nerv
Nerz
Nest
Netz
Neubau
Neuron
Neutra
Nichte
Nickel
Niere
Nils
Niltal
Nimbus
Nische
Nitrat
Niveau
Nixe
Nizza
Nocken
Nomade
Nomen
Nonne
Nord
Norm
Note
Notiz
Notruf
Nougat
Novum
Nuance
Nudel
Nugat
Nullen
Nummer
Nuss
Nutzer
Nylon
Nymphe
Oase
Obacht
Obdach
Obfrau
Obhut
Objekt
Oblate
Obmann
Oboe
Oboist
Obrist
Obst
Odem
Ofen
Oheim
Okular
Olymp
Omen
Onkel
Oper
Opium
Optik
Opus
Orakel
Orbit
Orden
Ordner
Organ
Orgel
Orgie
Orkan
Ortung
Osmane
Osmose
Osten
Ostsee
Otter
Ozean
Ozon
Pacht
Packer
Paket
Pakt
Palais
Palme
Panik
Panne
Panzer
Papa
Pappel
Papst
Parade
Pardon
Parfum
Parole
Passau
Pasta
Patron
Patzer
Pauker
Pech
Pedal
Pegel
Pein
Pelz
Pensum
Pest
Pfad
Pfahl
Pfalz
Pfand
Pfau
Pfeil
Pferd
Pflock
Pflug
Pforte
Pfote
Pfuhl
Pfund
Pfusch
Pharao
Phase
Phobie
Phrase
Physik
Pickel
Pieper
Pille
Pilot
Pils
Pilz
Pinie
Pirat
Pirsch
Piste
Pixel
Pizza
Plakat
Plasma
Plenum
Pneu
Pocken
Podest
Podien
Poesie
Poet
Pogrom
Pointe
Pokal
Polin
Polle
Polung
Polyp
Pomade
Pommes
Pomp
Poncho
Ponton
Pony
Popanz
Pore
Pose
Posse
Potenz
Pracht
Prag
Pranke
Pratze
Prinz
Prolog
Promi
Prosa
Proton
Prunk
Psalm
Pudel
Pulk
Pulle
Puls
Pult
Pulver
Puma
Punsch
Puppe
Purist
Purpur
Pute
Putzer
Puzzle
Pyjama
Pylon
Quader
Qual
Quant
Quark
Quasar
Quere
Quint
Quirl
Quitte
Quiz
Rabe
Rache
Radar
Radio
Radler
Radon
Ragout
Rahm
Rakete
Ralf
Rallye
Rampe
Ramsch
Ranzen
Rappe
Raps
Rasse
Raster
Ratte
Raub
Rauch
Raudi
Raum
Raupe
Rausch
Raver
Razzia
Realo
Rebe
Redner
Reeder
Reflex
Reform
Regal
Reggae
Regie
Regler
Regung
Reigen
Reim
Reis
Reiter
Rekord
Rekrut
Rektor
Relais
Relief
Renate
Renner
Rente
Replik
Report
Reptil
Reset
Rest
Retter
Reue
Revier
Revue
Rezept
Rhein
Rheuma
Rhodos
Riege
Riemen
Ries
Riff
Rille
Rind
Ring
Rippe
Risiko
Risse
Riten
Ritual
Ritzel
Rivale
Robe
Rodler
Rodung
Roggen
Rohr
Rolf
Rosine
Rost
Rotor
Route
Rowdy
Ruanda
Rubel
Rubrik
Ruck
Rudel
Rudi
Rugby
Ruhm
Ruin
Rummel
Rumpf
Rums
Rune
Russe
Rute
Saal
Saat
Sabbat
Sache
Sack
Sadist
Safari
Safe
Safran
Saft
Saga
Saison
Saite
Salat
Salbei
Saldo
Salon
Salto
Salve
Salz
Samba
Saphir
Sarde
Sarg
Sarkom
Sascha
Satin
Satz
Sauger
Saum
Sauna
Schaf
Scheck
Schlot
Schnee
Schrei
Schub
Schwan
Seebad
Seele
Seeweg
Segen
Segler
Seher
Seide
Seite
Sekret
Sekt
Selfie
Semit
Semmel
Senat
Sender
Senf
Senkel
Senner
Sensor
Sepp
Sept
Serbe
Serie
Serum
Sesam
Sessel
Setzer
Seuche
Sexist
Sexte
Shrimp
Sicht
Sieb
Sieder
Sieg
Siesta
Siff
Signal
Sigrid
Silbe
Silke
Silo
Single
Sinn
Sinus
Siphon
Sippe
Sirene
Sirup
Sitte
Sitz
Skala
Skat
Sketch
Skizze
Sklave
Skript
Slalom
Slawe
Smog
Snob
Socke
Soda
Sofa
Sohle
Sohn
Soja
Sold
Sole
Solo
Sommer
Sonate
Sonde
Sonett
Sopran
Sorbe
Sorte
Sozius
Spagat
Spalt
Span
Sparer
Spatz
Speck
Spesen
Spezi
Spiels
Spin
Spion
Spital
Spore
Spray
Spree
Sprit
Spruch
Spuk
Spur
Staat
Stab
Stadt
Stall
Stamm
Star
Stasi
Statik
Stau
Steak
Steg
Steher
Stiel
Stift
Stigma
Stil
Stirn
Stock
Stoff
Stola
Strahl
Strick
Stroh
Stube
Stuck
Studie
Stuhl
Stulle
Stunk
Sturm
Stuss
Stute
Sucher
Suffix
Suite
Suizid
Sulfat
Sultan
Sumpf
Sunnit
Suppe
Surfer
Sushi
Swing
Sylt
Symbol
Synode
Syntax
Syrer
Syrien
System
Szene
Tabak
Tadler
Tagbau
Tagung
Taille
Takt
Taler
Talg
Tanne
Tante
Tarif
Tasche
Tasse
Taster
Tatort
Tatze
Taunus
Tausch
Taxi
Team
Techno
Teeei
Teer
Tees
Tegel
Teich
Teig
Teiler
Teint
Telex
Tempo
Tenne
Tenor
Tensor
Term
Terror
Terz
Test
Teufel
Text
Theke
Thema
Tick
Tiegel
Tier
Tiger
Tilde
Tinte
Tipp
Tirade
Tirol
Tisch
Tivoli
Toast
Tobias
Tofu
Tokio
Tomate
Tonarm
Tonika
Tonne
Topf
Torf
Torte
Torus
Tour
Trab
Tracht
Trafo
Tragik
Trakt
Trapez
Trasse
Traum
Treff
Trend
Treppe
Tresen
Treter
Trias
Tribun
Trick
Trier
Trikot
Trio
Trip
Tritte
Troll
Tropf
Tross
Trott
Trubel
Truhe
Trunk
Trupp
Tube
Tuch
Tuff
Tugend
Tulpe
Tumor
Tumult
Tundra
Tunnel
Tupel
Tupfer
Turban
Turm
Tusch
Tutor
Typhus
Typus
Tyrann
Ufer
Uhren
Ulla
Ulme
Ulrich
Umbau
Umbra
Umfang
Umfeld
Umgang
Umhang
Umkehr
Umlage
Umluft
Umsatz
Umweg
Umzug
Unart
Undank
Unding
Unehre
Unfall
Unfug
Ungar
Unheil
Unikat
Unlust
Unmut
Unrat
Unruh
Unsinn
Untat
Untier
Unwort
Unzahl
Unze
Urahn
Uran
Urbild
Urform
Urin
Urlaub
Urne
Urteil
Urtier
Urwald
Urwelt
Urzeit
Usus
Utopie
Vakanz
Valenz
Vasall
Vase
Vater
Vektor
Vene
Ventil
Verb
Verein
Verlag
Verrat
Vers
Verzug
Vesper
Veto
Vetter
Video
Vieh
Vikar
Viper
Virus
Visier
Visum
Vize
Vlies
Vogel
Vogt
Vokal
Volk
Volt
Vorbau
Vorhof
Vorort
Vorrat
Vortag
Vorzug
Votum
Voyeur
Vulkan
Waage
Wabe
Waffe
Wahl
Wahn
Waise
Walart
Walzer
Wams
Wange
Wanne
Wanst
Wanze
Wappen
Warze
Wasser
Wecker
Wehmut
Weib
Weiher
Weiler
Weimar
Weizen
Welpe
Welt
Werber
Werfer
Werk
Werner
Wesen
Wespe
Wicke
Widder
Wien
Wille
Wimpel
Wind
Winter
Winzer
Wipfel
Wirt
Witwe
Witz
Woche
Woge
Wolke
Wonne
Wort
Wrack
Wucht
Wulst
Wunsch
Wurf
Wurst
Wust
Zacken
Zahn
Zander
Zange
Zaster
Zaum
Zaun
Zebra
Zeche
Zecke
Zehe
Zehner
Zeiger
Zeile
Zeiten
Zelle
Zement
Zenit
Zensor
Zepter
Zicke
Ziege
Zieher
Zierde
Ziffer
Zikade
Zimt
Zink
Zinn
Zins
Zipfel
Zirkel
Zitat
Zoff
Zoll
Zombie
Zone
Zopf
Zuber
Zubrot
Zucht
Zucker
Zufall
Zufuhr
Zugabe
Zukauf
Zulage
Zuname
Zunder
Zunft
Zunge
Zuruf
Zusatz
Zuse
Zutat
Zutun
Zuzug
Zweck
Zweig
Zwerg
Zwist
Zyklen
Zypern

This comment was marked as duplicate.

@SebastianFloKa
Copy link
Copy Markdown

@cr initiated a second attempt to create a BIP-0039 German Wordlist at #942 which was closed recently, so let’s continue here with a third attempt.
After quite some work I can offer a proposal that fulfils the basic requirements for a BIP-0039 Wordlist: limitations of length per word and levenshtein distance addition, substitution & permutation not lower than 2.

  • The new proposal follows @DavidMStraub requirement of nominative nouns even excluding countries, cities, persons, names etc.
  • @thomasklemm requested to change to more commonly used words, this should be the case now. Even some words may be part of discussion as there are not endless words like “love” or “peace”. A List of optional words that don’t conflict with levenshtein distance & length requirements etc. for you to work with can be provided.
  • @cr requested on top to avoid collision with other released BIP-0039-Wordlists which is taken to 100% into consideration with this new proposal. Of course not possible to achieve this for all wordlists in line.
  • In order to bring in cultural specialty to the BIP-0039 I keep up with the proposal of all-caps as writing nouns in lower-case-letters is conflicting with common sense of German language. A positive side-effect compared with the current proposal is that the number of used characters reduces from 52 to 26. This is an advantage not only for self-filled cold wallets.
  • Going the extra mile even the levenshtein distance “addition” was reduced to a value lower than 3 for the first 3 letters for words with a related meaning. So only 20 words with low correlation are left (Lanze & Pflanze, Sekt & Insekt, etc.).

A standard computer spell checker was used, so prior to release (if this will ever happen) we should find somebody competent to double check. Same for the base criteria check (levenshtein etc.) we should ask somebody such as @bitmover-studio to ensure our tools work properly.

There are some discussions on external platforms ongoing, be invited to join here.

@SebastianFloKa
Copy link
Copy Markdown

... sorry, will adjust the PR properly later ...

Here the adjusted "special considerations", which need to be added later to the main BIP-0039:

  1. Words can be uniquely determined typing the first 4 characters.
  2. Words contain between 3 to 8 letters per word
  3. No words with 1 letter of difference (no levenshtein distance substitution, addition or permutation lower than 2)
  4. No words already used in other official BIP-0039-Wordlists
  5. No accents or special characters.
  6. Orthography based on German spelling reform of 2006 and based on the German Duden 2021
  7. Only singular nouns and plural tantum nouns (if no singular exists).
  8. All-Caps in order to address nouns not written in lowercase in German and keep number of characters to 26 (A-Z) only.
  9. No words with the exact sound of another word with different spelling inside the list.
  10. No offensive words and no words implying negative, sad or bad feelings.

Copy link
Copy Markdown

@rodasmith rodasmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these words seem to be homophones according to my understanding of German pronunciation. I think that may cause problems for blind Germans who use an audio interface to work with their seed mnemonic.

Gong
Gosse
Gote
Gotha
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pronounced the same as "Gote"?

Novum
Nuance
Nudel
Nugat
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pronounced the same as "Nougat"?

Pensum
Pest
Pfad
Pfahl
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In northern Germany, is this pronounced the same as "Fall"?

Pest
Pfad
Pfahl
Pfalz
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In northern Germany, is this pronounced the same as "Falz"?

Pfeil
Pferd
Pflock
Pflug
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In northern Germany, is this pronounced the same as "Flug"?

Pforte
Pfote
Pfuhl
Pfund
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In northern Germany, is this pronounced the same as "Fund"?

Sohn
Soja
Sold
Sole
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pronounced the same as "Sohle"?

Unzahl
Unze
Urahn
Uran
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pronounced the same as "Urahn"?

@SebastianFloKa
Copy link
Copy Markdown

@rodasmith you are referring to the initial proposal. The new proposal is inside the comment section because I’m not the initiator. Will check for a way to make new proposal more visible - maybe with a separate fork.
Sorry @rodasmith for the time spent and I hope you will stay motivated to check the new list as well because it’s supposed to be significantly improved. Prior to do so we should align our expectations. If you check the conversation in other wordlists it’s not uncommon to have one word of a homophone added to the wordlist but of course not both of them. I therefore assume you are not implying to avoid for the German wordlist any word that could sound similar to any other word which is not part of the list, right? But I’m open to go the extra mile here as well, therefore a proposal for a definition prior to implement modifications to the new proposal:
If a homophone for a word exists, only one of these words is allowed in the wordlist under condition that using grammatical gender ensures unambiguous spelling.
Example:
(der) Graph & (der) Graf: None of these is allowed because both are singular nouns with same grammatical gender (masculine).
(die) Kuh & (der) Coup: Different grammatical gender so the more common one (Kuh) would be used.
(der) Wirt & wird: Wirt is allowed because there exists no noun to mix up and the list consists of nouns only.
In that way verbal transmission of the wordlist is ensured.
D’accord?

@rodasmith
Copy link
Copy Markdown

rodasmith commented Feb 22, 2021

@rodasmith you are referring to the initial proposal. The new proposal is inside the comment section because I’m not the initiator. Will check for a way to make new proposal more visible - maybe with a separate fork.
Sorry @rodasmith for the time spent and I hope you will stay motivated to check the new list as well because it’s supposed to be significantly improved. Prior to do so we should align our expectations. If you check the conversation in other wordlists it’s not uncommon to have one word of a homophone added to the wordlist but of course not both of them. I therefore assume you are not implying to avoid for the German wordlist any word that could sound similar to any other word which is not part of the list, right? But I’m open to go the extra mile here as well, therefore a proposal for a definition prior to implement modifications to the new proposal:
If a homophone for a word exists, only one of these words is allowed in the wordlist under condition that using grammatical gender ensures unambiguous spelling.
Example:
(der) Graph & (der) Graf: None of these is allowed because both are singular nouns with same grammatical gender (masculine).
(die) Kuh & (der) Coup: Different grammatical gender so the more common one (Kuh) would be used.
(der) Wirt & wird: Wirt is allowed because there exists no noun to mix up and the list consists of nouns only.
In that way verbal transmission of the wordlist is ensured.
D’accord?

The restriction is against having two or more words that sound the same, regardless of gender. The list could include 'Graph' or 'Graf' but not both.

@SebastianFloKa
Copy link
Copy Markdown

Based on the confusion caused by above improvement proposal in the comment section and for improved traceability etc. a new PR for this 3rd attempt was created - see #1071.

@bitmover-studio
Copy link
Copy Markdown
Contributor

A standard computer spell checker was used, so prior to release (if this will ever happen) we should find somebody competent to double check. Same for the base criteria check (levenshtein etc.) we should ask somebody such as @bitmover-studio to ensure our tools work properly.

@SebastianFloKa I will be happy to help. Just mention me when you have your list ready!

@SebastianFloKa
Copy link
Copy Markdown

A standard computer spell checker was used, so prior to release (if this will ever happen) we should find somebody competent to double check. Same for the base criteria check (levenshtein etc.) we should ask somebody such as @bitmover-studio to ensure our tools work properly.

@SebastianFloKa I will be happy to help. Just mention me when you have your list ready!

@bitmover-studio excellent, thank you. Other spell checkers found one more error (Avocado) which will be replaced during the next improval loop which is in preparation. You will be informed - thanks again.

@kallewoof
Copy link
Copy Markdown
Contributor

@DavidMStraub Consider closing this in favor of #1071 unless you believe the two proposals are competing with each other. (It appears like #1071 extends on this one, but I could be mistaken.)

@luke-jr
Copy link
Copy Markdown
Member

luke-jr commented Jul 2, 2021

@luke-jr luke-jr closed this Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.