Object comparison

There are things that vary greatly from language to language. After variable scope, let’s look at object comparison. That is, how are objects compared to each other? How are they evaluated as a condition?

Boolean conversion

The first question is: how are certain objects converted when evaluated as a condition? If you have:

>>> value = ...
>>> True if value else False

What is the result? Under the hood Python is calling “bool(value)” to convert to a boolean. The function PyObject_IsTrue() is the function whose goal is to perform that evaluation:

int
PyObject_IsTrue(PyObject *v)
{
    Py_ssize_t res;
    if (v == Py_True)
        return 1;
    if (v == Py_False)
        return 0;
    if (v == Py_None)
        return 0;
    else if (v->ob_type->tp_as_number != NULL &&
             v->ob_type->tp_as_number->nb_bool != NULL)
        res = (*v->ob_type->tp_as_number->nb_bool)(v);
    else if (v->ob_type->tp_as_mapping != NULL &&
             v->ob_type->tp_as_mapping->mp_length != NULL)
        res = (*v->ob_type->tp_as_mapping->mp_length)(v);
    else if (v->ob_type->tp_as_sequence != NULL &&
             v->ob_type->tp_as_sequence->sq_length != NULL)
        res = (*v->ob_type->tp_as_sequence->sq_length)(v);
    else
        return 1;
    /* if it is negative, it should be either -1 or -2 */
    return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
}

The code tells us the following:

None is interpreted as False (lines 9, 10)
If the type implements number functions, use the conversion to boolean nb_bool (lines 11-13). Built-in types int and float implement it through respectively long_bool() and float_bool() which return True if the number is 0 (or 0.0) and True otherwise
Objects whose type implements mapping methods (i.e length, subscript methods) are interpreted as True if it is non-empty and False if empty (lines 14-16)
- The most common types are lists, tuples, maps, strings and ranges
- Instances whose classes defines the method __len__ are considered implementing mapping methods and use that __len__ to determine the result (False is it returns 0, True otherwise)
Objects whose type implements sequence methods (i.e. can have an iterator) are interpreted as True if non-empty and False if empty (lines 17-19)
Any other type is interpreted as True (line 21)

In other words, None, zero, and objects whose length is zero (whether lists, tuples, maps, strings or custom objects) are interpreted as False. Everything else is interpreted as True. Note that in Python 2.x, user classes could define a method __nonzero__, but this has disappeared with Python 3.

“is” comparison

The other evaluation mechanism is when you compare two objects. This may seem like a no-brainer, but it can hide a few surprises.

First of all, Python has two comparison operators: “is” and “==”. The “is” operator is a pure object ID comparison. Considering that Python is sometimes reusing some built-in types (but not always), you should be careful with this operator. In the case of strings for example, using “is” can bring unexpected results:

>>> string2 = "ThisIsATest"
>>> string1 = "ThisIsATest"
>>> string1 is string2
True
>>> string1 = "This is a test"
>>> string2 = "This is a test"
>>> string1 is string2
False

As we have previously seen, strings with no space are interned and reused in order to avoid duplicates. As a result, you are looking at the same object in line 3 but at two different objects in line 7. Likewise, comparing numbers with “is” will return a different result whether the number is pre-allocated or not:

>>> nb = 200
>>> nb is 200
True
>>> nb = 1000
>>> nb is 1000
False

“==” comparison

The “==” comparison operator (or rich comparison) is a bit more complex and relies on the do_richcompare function which calls the type method tp_richcompare, when defined, as it if the case for multiple built-in types. User classes can define their own comparison operator by defining a __eq__ method.

If the two objects compared have a comparison operator defined, Python will use the one from the object on the left hand. If only one of them has it defined, Python will used that operator.

None compared to itself is True, and False when compared to anything else. This may seem like an evidence, but this is not always the case. In JavaScript, “NaN == NaN” returns False (NaN = Not a Number)
A comparison between two lists or two tuples will perform a recursive comparison of all the elements and will return True all the elements are equal (using the “==” operator of course). This mean that two lists do NOT need to be the same object to be considered equal – just have the same elements in the same order.
A comparison between two dictionaries will return True if they have the same key/value pairs, the values being compared using the “==” operator.
User classes can define their own comparison operator by defining a __eq__ function
- If both object type(s) implement __eq__, the type from the left-side object will be used for the comparison
- If only one of the two objects type(s) implement __eq__, this method will be used (see example below)
- If neither object type(s) implement __eq__, they will be considered equal only if they are the same object

>>> class MyClass(object):
...     def __init__(self, nb):
...             self.number = nb
...     def __eq__(self, nb):
...             return self.number == nb
...
>>> obj = MyClass(42)
>>> obj == 41
False
>>> obj == 42
True
>>> 42 == obj
True

In general, two objects of a different built-in type do not compare with each other by default. Two notable exceptions are True and False that are considered equal to respectively 1 and zero. Even though a number like 3 is evaluated to True as a condition, “3 == True” will return False where as “1 == True” will return True. Under the hood, True and False are implemented as the numbers one and zero, albeit as different objects than the actual numbers 1 and 0. But as a result, when compared with numbers they behave as 0 and 1 – whether the operation is comparison, order, binary operator, etc (see Objects/boolobject.c).

/* The objects representing bool values False and True */

struct _longobject _Py_FalseStruct = {
    PyVarObject_HEAD_INIT(&PyBool_Type, 0)
    { 0 }
};

struct _longobject _Py_TrueStruct = {
    PyVarObject_HEAD_INIT(&PyBool_Type, 1)
    { 1 }
};

Which comparison operator to choose?

In most cases, the “==” operator is preferred, in particular when dealing with numbers or strings. There are however cases where the “is” operator can come in handy.

A first case is when you want to test whether you are dealing with a particular list.

>>> list_ref = [1, 2, 3]
>>> def add_elt(elt):
...     global list_ref
...     if elt is not list_ref:
...             list_ref.append(elt)
...     return list_ref
...
>>> add_elt(list_ref)
[1, 2, 3]
>>> add_elt([1, 2, 3])
[1, 2, 3, [1, 2, 3]]

In the above case, we use “is not” to make sure that “list_ref” does not contain itself. Using “if elt != list_ref” could work if the function was not mutating anything and just returning “list_ref + [elt]” – who cares whether we passing “list_ref” or with an exact copy as function argument? But because the function does mutate “list_ref”, the argument actually passed does matter.

Another use is when a function can return zero or False, each value having a different meaning. In PHP this is the case of strpos() (the equivalent of str.find()), forcing to use “=== false” (the PHP equivalent of the operator “is”) to differentiate whether the substring is at the beginning of the string (so at offset 0) or is not found inside the string.

If str.find() does not work that way in Python, some functions may. This is where the “is” operator comes in handy. The values None, True and False are indeed singleton objects (i.e. there is only one instance of them). Using the “is” operator helps checking unambiguously if a value is equal to False and not just evaluated to False. Let’s define a function strpos() that behaves just like in PHP and try to come up with a test that successfully checks whether a string is contained in another. For example, checking that strpos(“This is a test”, “This”) returns the number 0 and that strpos(“This is a test”, “No”) returns False.

>>> def strpos(string, substring):
...     res = string.find(substring)
...     return False if res < 0 else res
...
>>> True if strpos('This is a test', 'This') else False
False   # wrong result, it should be True
>>> not (strpos('This is a test', 'This') == False)
False   # wrong result, it should be True
>>> strpos('This is a test', 'This') >= 0
True    # right result...
>>> strpos('This is a test', 'No') >= 0
True    # ... except it always return True

As we can see, comparing the result with False or a number (the two results returned by the function) using the “==” or “>=” operators does not work. Let’s now try with the operator “is”:

>>> not(strpos('This is a test', 'This') is False)
True
>>> strpos('This is a test', 'This') is not False
True
>>> strpos('This is a test', 'No') is not False
False

Using “is not False” is not only accurate, but is also very human-readable.

	lpoulain on The Garbage Collector
	stef1996 on The Garbage Collector
	lpoulain on The Garbage Collector
	stef1996 on The Garbage Collector
	Python Garbage Colle… on The Garbage Collector

Yet Another Python Internals Blog

Object comparison

Boolean conversion

“is” comparison

“==” comparison

Which comparison operator to choose?

Leave a comment Cancel reply

Boolean conversion

“is” comparison

“==” comparison

Which comparison operator to choose?

Partager:

Related

Leave a comment Cancel reply