Stop Words

List of common stop words in various languages.

The words are normalized to Unicode's normal form C.

Maintaining the lists

There is a manage.py script useful for maintaining the word lists.

To merge the English word list with new lists, you can use the following:

python -m manage merge en /tmp/new_list.txt /tmp/another_new_list.txt

The language code above is used for two purposes:

Determining the source file based on languages.json
Determining the libICU locale to use when comparing words

If new words are added manually, you can use the following to maintain the sorting order:

python -m manage sort en

or simply

python -m manage sort-all

The management script contains code that can be used as a library. See the LanguageDataIndex class and the sort_word_list function for more details.

Available languages

Arabic
Bulgarian
Catalan
Chinese
Czech
Danish
Dutch
English
Finnish
French
German
Greek
Gujarati
Hindi
Hebrew
Hungarian
Indonesian
Malaysian
Italian
Japanese
Korean
Norwegian
Polish
Portuguese
Romanian
Russian
Slovak
Spanish
Swedish
Turkish
Ukrainian
Vietnamese
Persian/Farsi

Contributing

You know how ;)

Programming languages support

Python: https://github.com/Alir3z4/python-stop-words
dotnet: https://github.com/hklemp/dotnet-stop-words
rust: https://github.com/cmccomb/rust-stop-words

License

Attribution 4.0 International (CC BY 4.0)

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arabic.txt		arabic.txt
bulgarian.txt		bulgarian.txt
catalan.txt		catalan.txt
chinese.txt		chinese.txt
czech.txt		czech.txt
danish.txt		danish.txt
dutch.txt		dutch.txt
english.txt		english.txt
finnish.txt		finnish.txt
french.txt		french.txt
german.txt		german.txt
greek.txt		greek.txt
gujarati.txt		gujarati.txt
hebrew.txt		hebrew.txt
hindi.txt		hindi.txt
hungarian.txt		hungarian.txt
indonesian.txt		indonesian.txt
italian.txt		italian.txt
japanese.txt		japanese.txt
korean.txt		korean.txt
languages.json		languages.json
malaysian.txt		malaysian.txt
manage.py		manage.py
norwegian.txt		norwegian.txt
persian.txt		persian.txt
polish.txt		polish.txt
portuguese.txt		portuguese.txt
romanian.txt		romanian.txt
russian.txt		russian.txt
slovak.txt		slovak.txt
spanish.txt		spanish.txt
swedish.txt		swedish.txt
turkish.txt		turkish.txt
ukrainian.txt		ukrainian.txt
vietnamese.txt		vietnamese.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stop Words

Maintaining the lists

Available languages

Contributing

Programming languages support

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 17

Languages

License

Alir3z4/stop-words

Folders and files

Latest commit

History

Repository files navigation

Stop Words

Maintaining the lists

Available languages

Contributing

Programming languages support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 17

Languages

Packages