Skip to content

Replace the German word list#59

Merged
grempe merged 4 commits intogrempe:masterfrom
klamann:master
Dec 2, 2020
Merged

Replace the German word list#59
grempe merged 4 commits intogrempe:masterfrom
klamann:master

Conversation

@klamann
Copy link
Copy Markdown

@klamann klamann commented Nov 2, 2020

The current German word list from The Diceware Passphrase Home Page comes with a few major flaws:

  • many two-letter strings that are no German words
  • lots of numbers and special characters
  • misspelled words
  • highly unusual words and word forms

I have compiled a new word list from the DeReKo 2014 corpus, which only consists of actual German words that are in common use and (hopefully) easier to remember. The word list can be found at klamann/diceware-dereko.

Major benefits:

  • consists of the most frequent words in written German language
  • avoids words that are problematic in passphrases, e.g. words that contain umlauts (äöü) or very long words (we have a lot of those in German and some of them are pretty common)
  • only uses the most common inflexion of a word; you should only have to remeber the word itself, not some random piece of grammar that is attached to the word

Here's a few random examples of passphrases from the current and the new word list

  • old
    • korb ek leert china haspe go
    • unzart krung kakadu dp viel krokus
    • batist manche bug speer koks grufti
  • new
    • antike antwortete einfall entschloss woanders kegeln
    • meiden einige silber unmut eifersucht zeitspanne
    • testen innenhof stamm datei jahrestag spannung

As a native German speaker, I find passphrases generated from this list way easier to remember. Please consider merging :)

... with one that contains actual German words that are in common use
source: https://github.com/klamann/diceware-dereko
@klamann
Copy link
Copy Markdown
Author

klamann commented Nov 10, 2020

is this project still alive? Anyone interested in merging this? @grempe maybe?

@grempe
Copy link
Copy Markdown
Owner

grempe commented Nov 10, 2020

Please see #55

I wont accept wholesale replacement of the upstream lists maintained by the creator of Diceware.

If you feel strongly enough on the issue I would consider adding an alternative DE list but I'd need a native German speaker (other than yourself) to vet the quality and appropriateness of the lists words.

@klamann
Copy link
Copy Markdown
Author

klamann commented Nov 10, 2020

Hi, it's fine for me if you want to keep the previous list, though I wouldn't go so far as to say that it is "maintained" by anyone. It clearly wasn't drafted by a native speaker in the first place, as indicated by all the random character combinations and other words that look like they could be German, but really aren't. You might assume that I'm exaggerating here, but it really is that bad. That's why I invested the time to create an alternative.

How would you like to approach the vetting process?

@grempe
Copy link
Copy Markdown
Owner

grempe commented Nov 11, 2020

I took a closer look at your methodology for creating the list and the source material. It seems well thought out and not just a random list of words with no provenance for the contents as some others that have been provided to me are.

I'd be happy to merge a pull request that adds this as an alternative DE word list in addition to the original upstream list.

No additional vetting now required.

Thank you.

@klamann
Copy link
Copy Markdown
Author

klamann commented Nov 11, 2020

Updated it so that both the old list by Benjamin Tenne and the new list based on DeReKo data is available. I wasn't sure about the labels, I named them German (DeReKo) and German (Tenne) for now.

@klamann
Copy link
Copy Markdown
Author

klamann commented Nov 23, 2020

@grempe are you fine with the changes in this PR or is there anything else I can do?

@sreith1
Copy link
Copy Markdown

sreith1 commented Nov 23, 2020

Hi there,

another German native here. I can only endorse @klamann and this PR. The current German list is really anything but good unfortunately. It leads me (and probably others) to push the button several times and choose some words I like by myself, which leads to a lower entropy.

I also just checked the submitted list (without reading every single entry). Apparently they are all common German words, it looks really very good to me. The only disadvantage might be that some of these words are somewhat longer, but I don't know if there is a way around it in German. Anyway I would really love to see this merged.

Thank you both!

@klamann
Copy link
Copy Markdown
Author

klamann commented Nov 24, 2020

Hi @sreith1, thanks for your feedback! I limited the length of the words in the list to 10 characters, because I didn't want to exclude too many common words that are that long, but I could easily make another list with a limit of 8 characters or maybe even 7.

@klamann
Copy link
Copy Markdown
Author

klamann commented Nov 28, 2020

I updated the list so that it now only contains words with a length of 3 to 8 characters (thanks @sreith1 for the suggestion). There are more words in the list now that are not as common, but this gives us shorter passphrases with the same security properties, which is more user friendly.

@grempe is there anything you'd like to change about this PR or can we merge this?

@grempe
Copy link
Copy Markdown
Owner

grempe commented Nov 29, 2020

I will try to get to this this week. Thank you.

@grempe grempe merged commit e230e85 into grempe:master Dec 2, 2020
@grempe
Copy link
Copy Markdown
Owner

grempe commented Dec 2, 2020

I've merged and pushed this change. Thanks for the patience and the nice work. Please let me know if you see any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants