Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Nice! 🔥
Just approved the workflows, let's go 🚀
Feel free to ping me, tho I think you are up to a great start!
|
Sadly ran into numpy compilation issue on Windows (32bit). Details: Hopefully, the fix is on the way: |
|
The fix has merged to rust-numpy. Now we need to wait for a new release that includes the merge. |
|
There is no need to change anything in code. Just re-running checks on this PR is sufficient since |
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks a TON! Let's remove the name and good to go
|
🎉 thanks for the PR 🤗 |
| &self, | ||
| py: Python<'_>, | ||
| input: Vec<&PyAny>, | ||
| input: Bound<'_, PyList>, |
There was a problem hiding this comment.
actually this broke tokenizers because it only supports PyList now 😓 looking into a fix!
There was a problem hiding this comment.
@diliop we need this to be probably PySequence, but I am not sure about the fix
There was a problem hiding this comment.
Hey, I just saw this :( Yeah I "assumed" that Vec was only accepting list from Python hence PyList but if you need tuples then PySequence is indeed the way to go. Py* will give you the benefit of the Python type check at almost zero cost as you mentioned in your PR so they should be preferred where possible. That said, there should be tests covering this so I can pick this up over the weekend and make sure anything else I changed is also covered.
There was a problem hiding this comment.
Cool yeah was in a rush to fix this, forgot about the tests, super nice if you want to add them 🤗
There was a problem hiding this comment.
Took a quick stab at adding tests but from the looks of it I will need to spend a bit more time here to do this right since PySequence is not the right solution after all. The TL;DR here is that the reason why the change I made was not caught by tests is because the test covering this line was turned off (here). Turning the test back on will now fail on parsing ndarray as an additional input type. So encode_batch and encode_batch_fast need to support list, tuple and ndarray. I think I can support all 3 by changing the input arg type to something like Vec<Bound<'_, PyAny>> with some changes in PreTokenizedEncodeInput and TextEncodeInput. I will have some time to work on this over the weekend so hopefully I have fix for this soon.
There was a problem hiding this comment.
#1679 should be the fix. Still planning to add more tests but this my hope is would be enough to restore the previous functionality.
There was a problem hiding this comment.
Thanks a lot for diving into it!
Upgrade from PyO3
0.21to0.22Closes #1639