Fix to allow unicode py2.7 by jklymak · Pull Request #40 · nucleic/kiwi

jklymak · 2017-09-09T21:31:21Z

This is to fix #39. It works on the test I have. I admit I didn't run the other tests.

It basically allows from __future__ import unicode_literals to be used in python 2.7. I tested in Python 2.7 with and with out that line at the top for the test, and in python 3 w/ that line.

from __future__ import unicode_literals

import kiwisolver as kiwi

Variable = kiwi.Variable
solver = kiwi.Solver()
top = Variable('boo')
c = (top == 1.0)
solver.addConstraint(c | 'strong')
solver.dump()

@sccolbert @MatthieuDartiailh

Thanks!

sccolbert · 2017-09-09T21:48:12Z

py/pythonhelpers.h

 #define MOD_INIT_FUNC(name) PyMODINIT_FUNC init##name(void)
 #endif

+#define FROM_STRING PyUnicode_FromString


Why did you make this change? Returning unicode in Py27 where it was previously str is an API-incompatible change.

sccolbert · 2017-09-09T21:48:46Z

py/util.h

+      str = PyUnicode_AsUTF8( value );
 #else
-    if( PyString_Check( value ) )
+    if( PyString_Check( value ) | PyUnicode_Check( value ))


I would make this and if { } else if {} else {} block instead of nested if

I agree it looks a bit inelegant, and I'm not the most elegant programmer, so no offense if you or someone else changes it. But, you need an outer if to check for a string to send to the string translators below, and then you need to decide which string decoder you want.

I am not sure it would make this any more elegant as we would have either to duplicate the following logic (string value comparison) or refactor it in a function. To me this looks fine.

Yep. You're both right. I only looked at the diff preview and was missing the extra context.

MatthieuDartiailh · 2017-09-10T15:44:43Z

py/util.h

+      str = PyUnicode_AsUTF8( value );
 #else
-    if( PyString_Check( value ) )
+    if( PyString_Check( value ) | PyUnicode_Check( value ))


I am not sure it would make this any more elegant as we would have either to duplicate the following logic (string value comparison) or refactor it in a function. To me this looks fine.

MatthieuDartiailh · 2017-09-10T15:46:05Z

py/util.h

    {
-        std::string str( PyString_AS_STRING( value ) );
+      if( PyUnicode_Check( value ) )
+          str = PyString_AS_STRING(PyUnicode_AsASCIIString( value ) );


Why not use PyUnicode_AsUTF8 as we do on Python 3 ?

Yeah. PyUnicode_AsASCIIString also returns a new object, so this line will cause a memory leak.
https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsASCIIString

MatthieuDartiailh · 2017-09-10T15:47:19Z

To me this looks mostly good to go. I like in particular the fact it actually remove some lines of C : )

MatthieuDartiailh · 2017-09-10T16:05:11Z

@sccolbert Are you happy with it ? After merging this, we could tag a 1.0.1 as if this goes into matplotlib they will want a tag.

sccolbert · 2017-09-10T16:09:10Z

@MatthieuDartiailh Not quite happy with it yet, because of the memory leak I mentioned above.

Previously PyString_AS_STRING(PyUnicode_AsASCIIString was leaking a ref.

jklymak · 2017-09-10T16:36:48Z

@MatthieuDartiailh No rush on the tag if you and @sccolbert want to accumulate other feedback. 1.0.0 fails matplotlib for python 2.7, but does fine for 3.6, which I think most of the devs are on. The changes I'm making in matplotlib will take a good while to get out of development, and I know it works in 2.7 so failing that test for a while (I'd bet a couple of months at least) should be fine.

jklymak · 2017-09-10T16:41:22Z

BTW, thanks for fixing my python/C/Unicode ignorance. Its been a long time since I coded in C, I'm mostly just a hacker in python, and I have only a vague understanding that encoding text is a pain.

FWIW, the matplotlibwork I'm doing just needs ASCII (users aren't exposed to the strings), but I can imagine other users might try unicode.

jklymak · 2017-09-10T16:44:13Z

BTW this works fine w/ my test. Not sure if you want to add that test. There are probably a good chunk of py2/3 projects that use the unicode future...

MatthieuDartiailh · 2017-09-10T17:01:10Z

Now that I am done messing up the C code .... I will add a test by simply forcing a string to be unicode using the u prefix.

[ci skip]

MatthieuDartiailh · 2017-09-10T17:20:54Z

@jklymak you forgot the setName method of variable. I fixed that and updated the tests. @sccolbert is this now good to go for you ?

MatthieuDartiailh · 2017-09-10T18:57:28Z

Fixed the possible segfault under Python 2

MatthieuDartiailh · 2017-09-11T18:46:04Z

This is ready for review.

jklymak · 2017-09-11T19:25:17Z

@MatthieuDartiailh Would you like me to rebase this so it is just one commit? matplotlib often asks for this just to keep the commit tree clean. I think it makes blame a lot easier if its one commit per pull request merge). OTOH if you want the history above to make perfect sense, then rebasing is a less good idea....

jklymak · 2017-09-11T19:53:07Z

@MatthieuDartiailh Not having managed a codebase before I didn't know that existed. Very useful! I'll let you guys deal with it on merge.

sccolbert · 2017-09-11T19:55:48Z

py/util.h

+      {
+          ascii_str = PyUnicode_AsASCIIString( value );
+          if( !ascii_str )
+              return 0;


should be return false;

This block should probably use the UTF8 encoding just like we do for py3. You can make use of a smart pointer so you don't need to manage the ref count manually. I would remove the PyObject* ascii_str; declaration, and use a block like this:

PythonHelpers::PyObjectPtr py_str( PyUnicode_AsUTF8String( value ) ); if( !py_str ) return false; str = PyString_AS_STRING( py_str.get() );

sccolbert · 2017-09-11T20:05:59Z

py/util.h

 inline bool
 convert_to_strength( PyObject* value, double& out )
 {
+    std::string str;


This variable declaration does not need to be hoisted.

sccolbert · 2017-09-11T20:06:10Z

py/util.h

    if( PyUnicode_Check( value ) )
    {
-        std::string str( PyUnicode_AsUTF8( value ) );
+      str = PyUnicode_AsUTF8( value );


this line did not need to be changed.

sccolbert · 2017-09-11T20:06:35Z

py/util.h

+
+    if( PyString_Check( value ) | PyUnicode_Check( value ))
    {
-        std::string str( PyString_AS_STRING( value ) );


This should be changed to std::string str;

sccolbert · 2017-09-11T20:06:45Z

py/util.h

+      str = PyUnicode_AsUTF8( value );
 #else
-    if( PyString_Check( value ) )
+    PyObject* ascii_str;


This is not needed. See longer comment below.

sccolbert · 2017-09-11T20:08:08Z

py/variable.cpp

+         if( !ascii_str )
+             return 0;
+         str = PyString_AS_STRING( ascii_str );
+         Py_DECREF( ascii_str );


This block can be done the same way as above with a smart pointer.

ChrisBarker-NOAA · 2017-09-11T22:50:45Z

py/tests/test_constraint.py

        assert c.strength() == getattr(strength, s)

+    if sys.version_info < (3,):
+        with pytest.raises(UnicodeEncodeError):


in py2, folks can very often accidentally use a mixture of Unicode and plain strings. It would be best if that was allowed.

Ideally, you convert all py2 strings into unicode at the calling boundary, and then expect everything to be unicode internally.

I haven't dug into your C++ code (why in the world are you hand-writing the bindings??? -- Cython would make this a lot easier) -- but a little utility conversion function should be fairly easy to write and drop in everywhere a string is expected.

It looks like this is testing for non-ascii character -- but if it's allowed in py3, why not py2??

I haven't dug into your C++ code (why in the world are you hand-writing the bindings??? -- Cython would make this a lot easier)

It really doesn't, and this sort of comment is not helpful.

no, it's not -- sorry.

But in any case, I have now looked at the C++ and it looks like you want a utf-8 encoded std::string from a python string.

In python, you would do this by calling .encode('utf-8') on the object, and that would work on py2 strings and unicode objects, and py3 strings. So maybe you could call the python from C++, but that's awkward, and maybe slow, so you could instead call:

PyString_Encode on py2 strings, and

PyUnicode_AsUTF8String on py2 unicode objects and py3 strings.

In any case, why require ascii on py2 and allow full unicode on py3?

Sorry my mistake. I chose to stay close to the original contributor implementation. I will fix this.

Also add support for defining a Constaint using a unicode string for the operator.

MatthieuDartiailh · 2017-09-12T13:17:39Z

After much more tinkering with the C-API I managed to get Python 2 to accept any unicode name for a variable (both at creation and when setting it) and accept unicode for strength and constraint operator.
I added a new helper function in util.h to handle the conversion and avoid code duplication.

sccolbert · 2017-09-12T15:07:44Z

This is a really good update, but there's one more place we need to be defensive with the new changes. I'm going to approve my old change request and make a new comment on the new diff.

sccolbert · 2017-09-12T15:12:26Z

py/variable.cpp

+      std::string c_name;
+      if( !convert_pystr_to_str(name, c_name) )
+          return 0;
+    	new( &self->variable ) kiwi::Variable( c_name );


Since this change causes the function to possible return early after the python variable object has been allocated, we need to use a smart pointer to guard the lifetime of that object. Here's an updated version of this function which does that:

static PyObject* Variable_new( PyTypeObject* type, PyObject* args, PyObject* kwargs ) { static const char *kwlist[] = { "name", "context", 0 }; PyObject* context = 0; PyObject* name = 0; if( !PyArg_ParseTupleAndKeywords( args, kwargs, "|OO:__new__", const_cast<char**>( kwlist ), &name, &context ) ) return 0; PyObjectPtr pyvar( PyType_GenericNew( type, args, kwargs ) ); if( !pyvar ) return 0; Variable* self = reinterpret_cast<Variable*>( pyvar.get() ); self->context = xnewref( context ); if( name != 0 ) { #if PY_MAJOR_VERSION >= 3 if( !PyUnicode_Check( name ) ) return py_expected_type_fail( name, "unicode" ); #else if( !( PyString_Check( name ) | PyUnicode_Check( name ) ) ) { return py_expected_type_fail( name, "str or unicode" ); } #endif std::string c_name; if( !convert_pystr_to_str(name, c_name) ) return 0; new( &self->variable ) kiwi::Variable( c_name ); } else { new( &self->variable ) kiwi::Variable(); } return pyvar.release(); }

Thanks for catching this.

sccolbert · 2017-09-12T16:20:24Z

Thanks!

MatthieuDartiailh · 2017-09-12T16:23:20Z

A squash and merge would have been cleaner for the history ( I tinkered a lot and used Travis for Python 2.7 feedback) but it does not really matter.

jklymak · 2017-10-23T00:18:15Z

@MatthieuDartiailh @scolbert

Would it be possible to issue a release based on this change? That way I can get #9082 in matplotlib merged.

MatthieuDartiailh · 2017-10-23T05:55:59Z

I will try to do one this week. Ping me if I forget.

MatthieuDartiailh · 2017-10-24T16:25:39Z

The release is done

jklymak · 2017-10-24T16:29:50Z

Thanks!

MatthieuDartiailh · 2017-10-24T16:40:41Z

In case you need it the conda-forge update is also on its way.

jklymak · 2017-10-24T23:01:31Z

Thanks so much! Seems to be working great.

Fix to allow unicode py2.7

f6ab7f6

sccolbert reviewed Sep 9, 2017

View reviewed changes

Reveresed change in helper

24956fd

MatthieuDartiailh reviewed Sep 10, 2017

View reviewed changes

MatthieuDartiailh added 7 commits September 10, 2017 18:11

py: use PyUnicode_AsUTF8

aa6f292

Previously PyString_AS_STRING(PyUnicode_AsASCIIString was leaking a ref.

py: stupid typo

29c198e

py: again... a typo

be4d373

py: PyUnicode_AsUTF8 is Python 3 only

8aec3c7

py: ... more stupidity

69f58c7

py: my C is rusty

7c99586

py: hopefully

aa85dcb

MatthieuDartiailh added 5 commits September 10, 2017 19:07

py: fix Variable.setName and add tests

0818a04

py: fixes

86158b2

py: last one

cb262ad

py: fix poorly written tests

b128ee5

Update releasenotes

a431e05

[ci skip]

MatthieuDartiailh added 2 commits September 10, 2017 20:52

py: properly handle unicode decoding error (py2)

9707b0f

py: proper exception in tests

9bcfc03

sccolbert requested changes Sep 11, 2017

View reviewed changes

ChrisBarker-NOAA reviewed Sep 11, 2017

View reviewed changes

MatthieuDartiailh added 10 commits September 12, 2017 09:44

py: proper support of unicode names

66ac30a

Also add support for defining a Constaint using a unicode string for the operator.

py: typo

188f8bd

py: typo 2

d4ab9cd

py: even more typos

065688a

py: ... typo

b0b3c2b

py: last typos...

cc7ba90

py

b40adae

py: update tests

f263308

py: allow creating a variable with any utf8 string in Py2

6abc592

py: try with an initialized pointer

1077958

sccolbert approved these changes Sep 12, 2017

View reviewed changes

sccolbert requested changes Sep 12, 2017

View reviewed changes

py: apply sccolbert corrections

5aee009

sccolbert approved these changes Sep 12, 2017

View reviewed changes

sccolbert merged commit 1c89ab8 into nucleic:master Sep 12, 2017

jklymak mentioned this pull request Sep 12, 2017

[MRG] Constrained_layout (geometry manager) matplotlib/matplotlib#9082

Merged

18 tasks

jklymak deleted the fixpy27unicode branch October 24, 2017 22:12

Uh oh!

Conversation

jklymak commented Sep 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MatthieuDartiailh commented Sep 10, 2017

Uh oh!

MatthieuDartiailh commented Sep 10, 2017

Uh oh!

sccolbert commented Sep 10, 2017

Uh oh!

jklymak commented Sep 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Sep 10, 2017

Uh oh!

jklymak commented Sep 10, 2017

Uh oh!

MatthieuDartiailh commented Sep 10, 2017

Uh oh!

MatthieuDartiailh commented Sep 10, 2017

Uh oh!

MatthieuDartiailh commented Sep 10, 2017

Uh oh!

MatthieuDartiailh commented Sep 11, 2017

Uh oh!

jklymak commented Sep 11, 2017

Uh oh!

jklymak commented Sep 11, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MatthieuDartiailh commented Sep 12, 2017

Uh oh!

sccolbert commented Sep 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sccolbert commented Sep 12, 2017

Uh oh!

MatthieuDartiailh commented Sep 12, 2017

Uh oh!

jklymak commented Sep 9, 2017 •

edited

Loading

jklymak commented Sep 10, 2017 •

edited

Loading