Skip to content

Fix to allow unicode py2.7#40

Merged
sccolbert merged 27 commits intonucleic:masterfrom
jklymak:fixpy27unicode
Sep 12, 2017
Merged

Fix to allow unicode py2.7#40
sccolbert merged 27 commits intonucleic:masterfrom
jklymak:fixpy27unicode

Conversation

@jklymak
Copy link
Contributor

@jklymak jklymak commented Sep 9, 2017

This is to fix #39. It works on the test I have. I admit I didn't run the other tests.

It basically allows from __future__ import unicode_literals to be used in python 2.7. I tested in Python 2.7 with and with out that line at the top for the test, and in python 3 w/ that line.

from __future__ import unicode_literals

import kiwisolver as kiwi

Variable = kiwi.Variable
solver = kiwi.Solver()
top = Variable('boo')
c = (top == 1.0)
solver.addConstraint(c | 'strong')
solver.dump()

@sccolbert @MatthieuDartiailh

Thanks!

#define MOD_INIT_FUNC(name) PyMODINIT_FUNC init##name(void)
#endif

#define FROM_STRING PyUnicode_FromString
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make this change? Returning unicode in Py27 where it was previously str is an API-incompatible change.

str = PyUnicode_AsUTF8( value );
#else
if( PyString_Check( value ) )
if( PyString_Check( value ) | PyUnicode_Check( value ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this and if { } else if {} else {} block instead of nested if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it looks a bit inelegant, and I'm not the most elegant programmer, so no offense if you or someone else changes it. But, you need an outer if to check for a string to send to the string translators below, and then you need to decide which string decoder you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it would make this any more elegant as we would have either to duplicate the following logic (string value comparison) or refactor it in a function. To me this looks fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. You're both right. I only looked at the diff preview and was missing the extra context.

str = PyUnicode_AsUTF8( value );
#else
if( PyString_Check( value ) )
if( PyString_Check( value ) | PyUnicode_Check( value ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it would make this any more elegant as we would have either to duplicate the following logic (string value comparison) or refactor it in a function. To me this looks fine.

py/util.h Outdated
{
std::string str( PyString_AS_STRING( value ) );
if( PyUnicode_Check( value ) )
str = PyString_AS_STRING(PyUnicode_AsASCIIString( value ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use PyUnicode_AsUTF8 as we do on Python 3 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. PyUnicode_AsASCIIString also returns a new object, so this line will cause a memory leak.
https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsASCIIString

@MatthieuDartiailh
Copy link
Member

To me this looks mostly good to go. I like in particular the fact it actually remove some lines of C : )

@MatthieuDartiailh
Copy link
Member

@sccolbert Are you happy with it ? After merging this, we could tag a 1.0.1 as if this goes into matplotlib they will want a tag.

@sccolbert
Copy link
Member

@MatthieuDartiailh Not quite happy with it yet, because of the memory leak I mentioned above.

@jklymak
Copy link
Contributor Author

jklymak commented Sep 10, 2017

@MatthieuDartiailh No rush on the tag if you and @sccolbert want to accumulate other feedback. 1.0.0 fails matplotlib for python 2.7, but does fine for 3.6, which I think most of the devs are on. The changes I'm making in matplotlib will take a good while to get out of development, and I know it works in 2.7 so failing that test for a while (I'd bet a couple of months at least) should be fine.

@jklymak
Copy link
Contributor Author

jklymak commented Sep 10, 2017

BTW, thanks for fixing my python/C/Unicode ignorance. Its been a long time since I coded in C, I'm mostly just a hacker in python, and I have only a vague understanding that encoding text is a pain.

FWIW, the matplotlibwork I'm doing just needs ASCII (users aren't exposed to the strings), but I can imagine other users might try unicode.

@jklymak
Copy link
Contributor Author

jklymak commented Sep 10, 2017

BTW this works fine w/ my test. Not sure if you want to add that test. There are probably a good chunk of py2/3 projects that use the unicode future...

@MatthieuDartiailh
Copy link
Member

Now that I am done messing up the C code .... I will add a test by simply forcing a string to be unicode using the u prefix.

@MatthieuDartiailh
Copy link
Member

@jklymak you forgot the setName method of variable. I fixed that and updated the tests. @sccolbert is this now good to go for you ?

@MatthieuDartiailh
Copy link
Member

Fixed the possible segfault under Python 2

@MatthieuDartiailh
Copy link
Member

This is ready for review.

@jklymak
Copy link
Contributor Author

jklymak commented Sep 11, 2017

@MatthieuDartiailh Would you like me to rebase this so it is just one commit? matplotlib often asks for this just to keep the commit tree clean. I think it makes blame a lot easier if its one commit per pull request merge). OTOH if you want the history above to make perfect sense, then rebasing is a less good idea....

@jklymak
Copy link
Contributor Author

jklymak commented Sep 11, 2017

@MatthieuDartiailh Not having managed a codebase before I didn't know that existed. Very useful! I'll let you guys deal with it on merge.

py/util.h Outdated
{
ascii_str = PyUnicode_AsASCIIString( value );
if( !ascii_str )
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be return false;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block should probably use the UTF8 encoding just like we do for py3. You can make use of a smart pointer so you don't need to manage the ref count manually. I would remove the PyObject* ascii_str; declaration, and use a block like this:

          PythonHelpers::PyObjectPtr py_str( PyUnicode_AsUTF8String( value ) );
          if( !py_str )
              return false;
          str = PyString_AS_STRING( py_str.get() );

py/util.h Outdated
inline bool
convert_to_strength( PyObject* value, double& out )
{
std::string str;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable declaration does not need to be hoisted.

py/util.h Outdated
if( PyUnicode_Check( value ) )
{
std::string str( PyUnicode_AsUTF8( value ) );
str = PyUnicode_AsUTF8( value );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line did not need to be changed.


if( PyString_Check( value ) | PyUnicode_Check( value ))
{
std::string str( PyString_AS_STRING( value ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be changed to std::string str;

py/util.h Outdated
str = PyUnicode_AsUTF8( value );
#else
if( PyString_Check( value ) )
PyObject* ascii_str;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. See longer comment below.

py/variable.cpp Outdated
if( !ascii_str )
return 0;
str = PyString_AS_STRING( ascii_str );
Py_DECREF( ascii_str );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block can be done the same way as above with a smart pointer.

assert c.strength() == getattr(strength, s)

if sys.version_info < (3,):
with pytest.raises(UnicodeEncodeError):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in py2, folks can very often accidentally use a mixture of Unicode and plain strings. It would be best if that was allowed.

Ideally, you convert all py2 strings into unicode at the calling boundary, and then expect everything to be unicode internally.

I haven't dug into your C++ code (why in the world are you hand-writing the bindings??? -- Cython would make this a lot easier) -- but a little utility conversion function should be fairly easy to write and drop in everywhere a string is expected.

It looks like this is testing for non-ascii character -- but if it's allowed in py3, why not py2??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dug into your C++ code (why in the world are you hand-writing the bindings??? -- Cython would make this a lot easier)

It really doesn't, and this sort of comment is not helpful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's not -- sorry.

But in any case, I have now looked at the C++ and it looks like you want a utf-8 encoded std::string from a python string.

In python, you would do this by calling .encode('utf-8') on the object, and that would work on py2 strings and unicode objects, and py3 strings. So maybe you could call the python from C++, but that's awkward, and maybe slow, so you could instead call:

PyString_Encode on py2 strings, and

PyUnicode_AsUTF8String on py2 unicode objects and py3 strings.

In any case, why require ascii on py2 and allow full unicode on py3?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my mistake. I chose to stay close to the original contributor implementation. I will fix this.

@MatthieuDartiailh
Copy link
Member

After much more tinkering with the C-API I managed to get Python 2 to accept any unicode name for a variable (both at creation and when setting it) and accept unicode for strength and constraint operator.
I added a new helper function in util.h to handle the conversion and avoid code duplication.

@sccolbert
Copy link
Member

This is a really good update, but there's one more place we need to be defensive with the new changes. I'm going to approve my old change request and make a new comment on the new diff.

py/variable.cpp Outdated
std::string c_name;
if( !convert_pystr_to_str(name, c_name) )
return 0;
new( &self->variable ) kiwi::Variable( c_name );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this change causes the function to possible return early after the python variable object has been allocated, we need to use a smart pointer to guard the lifetime of that object. Here's an updated version of this function which does that:

static PyObject*
Variable_new( PyTypeObject* type, PyObject* args, PyObject* kwargs )
{
	static const char *kwlist[] = { "name", "context", 0 };
	PyObject* context = 0;
	PyObject* name = 0;

	if( !PyArg_ParseTupleAndKeywords(
		args, kwargs, "|OO:__new__", const_cast<char**>( kwlist ),
		&name, &context ) )
		return 0;

	PyObjectPtr pyvar( PyType_GenericNew( type, args, kwargs ) );
	if( !pyvar )
		return 0;

	Variable* self = reinterpret_cast<Variable*>( pyvar.get() );
	self->context = xnewref( context );

	if( name != 0 )
	{
#if PY_MAJOR_VERSION >= 3
		if( !PyUnicode_Check( name ) )
			return py_expected_type_fail( name, "unicode" );
#else
		if( !( PyString_Check( name ) | PyUnicode_Check( name ) ) )
		{
			return py_expected_type_fail( name, "str or unicode" );
		}
#endif
		std::string c_name;
		if( !convert_pystr_to_str(name, c_name) )
			return 0;
		new( &self->variable ) kiwi::Variable( c_name );
	}
	else
	{
		new( &self->variable ) kiwi::Variable();
	}

	return pyvar.release();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this.

@sccolbert
Copy link
Member

Thanks!

@sccolbert sccolbert merged commit 1c89ab8 into nucleic:master Sep 12, 2017
@MatthieuDartiailh
Copy link
Member

A squash and merge would have been cleaner for the history ( I tinkered a lot and used Travis for Python 2.7 feedback) but it does not really matter.

@jklymak
Copy link
Contributor Author

jklymak commented Oct 23, 2017

@MatthieuDartiailh @scolbert

Would it be possible to issue a release based on this change? That way I can get #9082 in matplotlib merged.

@MatthieuDartiailh
Copy link
Member

I will try to do one this week. Ping me if I forget.

@MatthieuDartiailh
Copy link
Member

The release is done

@jklymak
Copy link
Contributor Author

jklymak commented Oct 24, 2017

Thanks!

@MatthieuDartiailh
Copy link
Member

In case you need it the conda-forge update is also on its way.

@jklymak jklymak deleted the fixpy27unicode branch October 24, 2017 22:12
@jklymak
Copy link
Contributor Author

jklymak commented Oct 24, 2017

Thanks so much! Seems to be working great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

py27+kiwi doesn't like from __future__ import unicode_literals

4 participants