cucco
cucco copied to clipboard
Features added:
Hello,
Great project, I was trying to do something like that until I find yours. In order to solve all my needs I had to add a few things. Let me know what you think.
Unfortunately I couldn't make the test cases for that because I don't have experience with pytest, but I'm adding examples of how it works.
Things that I have added:
- Custom stop words option in the remove_stop_words method
- Added the method "remove_numbers" to remove the numbers
- Added the method "replace_custom_regex" to remove a custom regex in the text.
Sample code to test my changes:
from cucco.cucco import Cucco
from cucco.config import Config
cucco_config = Config(language='en')
c = Cucco(config=cucco_config)
# Replace numbers but those that are only numbers, not numbers between letters.
c.replace_numbers("this 3333 i3s a text with the number 2")
# Removing custom regex, for example all #foo and @bar
import re
regex = re.compile(r"[#@]\w+", re.IGNORECASE)
c.replace_custom_regex(regex=regex, text= "Test a string #foo to replace @bar")
# Removing custom stop words
# This is the default one, with all stop words:
c.remove_stop_words("Test to remove stop words")
# This is with a custom set of stop words (in case that you want to use your own set):
c.remove_stop_words("Test to remove stop words", custom_stop_words=['test', 'to'])
Sample code with the console output:
In [1]: from cucco.cucco import Cucco
...:
...: from cucco.config import Config
...:
...: cucco_config = Config(language='en')
...:
...: c = Cucco(config=cucco_config)
...:
In [3]: # Replace numbers but those that are only numbers, not numbers between letters.
...: c.replace_numbers("this 3333 i3s a text with the number 2")
Out[3]: 'this i3s a text with the number'
In [4]: # Removing custom regex, for example all #foo and @bar
...: import re
...: regex = re.compile(r"[#@]\w+", re.IGNORECASE)
...: c.replace_custom_regex(regex=regex, text= "Test a string #foo to replace @bar")
...:
Out[4]: 'Test a string to replace '
In [5]: # This is the default one, with all stop words:
...: c.remove_stop_words("Test to remove stop words")
...: 'test remove stop words'
...:
Out[5]: 'test remove stop words'
In [6]: # This is with a custom set of stop words (in case that you want to use your own set):
...: c.remove_stop_words("Test to remove stop words", custom_stop_words=['test', 'to'])
...: 'remove stop words'
...:
Out[6]: 'remove stop words'
Thank you for your contribution. I really appreciate.
It will take me some time to review it. As you can see, this is still a single guy project and these days I'm a bit busy. But for sure I will review it and try to add your great suggestions.
Cheers.