Negatively updating a Python dict [NOT "key"]

I am looking for a way to update/access a Python dictionary by addressing all keys that do NOT match the key given.

That is, instead of the usual dict[key], I want to do something like dict[!key]. I found a workaround, but figured there must be a better way which I cannot figure out at the moment.

# I have a dictionary of counts
dicti = {"male": 1, "female": 200, "other": 0}

# Problem: I encounter a record (cannot reproduce here) that 
# requires me to add 1 to every key in dicti that is NOT "male", 
# i.e. dicti["female"], and  dicti["other"], 
# and other keys I might add later

# Here is what I am doing and I don't like it
dicti.update({k: v + 1 for k,v in dicti.items() if k != "male"})

Solution:

If you have to perform this “add to others” operation more often, and if all the values are numeric, you could also subtract from the given key and add the same value to some global variable counting towards all the values (including that same key). For example, as a wrapper class:

import collections
class Wrapper:
    def __init__(self, **values):
        self.d = collections.Counter(values)
        self.n = 0
    def add(self, key, value):
        self.d[key] += value
    def add_others(self, key, value):
        self.d[key] -= value
        self.n += value
    def get(self, key):
        return self.d[key] + self.n
    def to_dict(self):
        if self.n != 0:  # recompute dict and reset global offset
            self.d = {k: v + self.n for k, v in self.d.items()}
            self.n = 0
        return self.d

Example:

>>> dicti = Wrapper(**{"male": 1, "female": 200, "other": 0})
>>> dicti.add("male", 2)
>>> dicti.add_others("male", 5)
>>> dicti.get("male")
3
>>> dicti.to_dict()
{'other': 5, 'female': 205, 'male': 3}

The advantage is that both the add and the add_others operation are O(1) and only when you actually need them, you update the values with the global offset. Of course, the to_dict operation still is O(n), but the updated dict can be saved and only recomputed when add_other has been called again in between.

Create an indicator column based on one column being within +/- 5% of another column

I would like to populate the ‘Indicator’ column based on both charge columns. If ‘Charge1’ is within plus or minus 5% of the ‘Charge2’ value, set the ‘Indicator’ to RFP, otherwise leave it blank (see example below).

ID  Charge1  Charge2  Indicator
1   9.5      10       RFP
2   22       20 
3   41       40       RFP
4   65       80 
5   160      160      RFP
6   315      320      RFP
7   613      640      RFP
8   800      700    
9   759      800    
10  1480     1500     RFP

I tried using a .loc approach, but struggled to establish if ‘Charge1’ was within +/- 5% of ‘Charge2’.

Solution:

In [190]: df.loc[df.eval("Charge2*0.95 <= Charge1 <= Charge2*1.05"), 'RFP'] = 'REP'

In [191]: df
Out[191]:
   ID  Charge1  Charge2  RFP
0   1      9.5       10  REP
1   2     22.0       20  NaN
2   3     41.0       40  REP
3   4     65.0       80  NaN
4   5    160.0      160  REP
5   6    315.0      320  REP
6   7    613.0      640  REP
7   8    800.0      700  NaN
8   9    759.0      800  NaN
9  10   1480.0     1500  REP

How to efficiently find the indices of matching elements in two lists

I am working on two large data sets, and my question is as follows.

Suppose I have two lists:

list1 = [A,B,C,D]

list2 = [B,D,A,G]

How can I efficiently find the matching index, using Python, other than O(n2) searching? The result should look like:

matching_index(list1,list2) -> [(0,2),(1,0),(3,1)]

Solution:

Without duplicates

If your objects are hashable and your lists have no duplicates, you can create an inverse index of the first list and then traverse the second list. This traverses each list only once and thus is O(n).

def find_matching_index(list1, list2):

    inverse_index = { element: index for index, element in enumerate(list1) }

    return [(index, inverse_index[element])
        for index, element in enumerate(list2) if element in inverse_index]

find_matching_index([1,2,3], [3,2,1]) # [(0, 2), (1, 1), (2, 0)]

With duplicates

You can extend the previous solution to account for duplicates. You can keep track of multiple index with a set.

def find_matching_index(list1, list2):

    # Create an inverse index which keys are now sets
    inverse_index = {}

    for index, element in enumerate(list1):

        if element not in inverse_index:
            inverse_index[element] = {index}

        else:
            inverse_index[element].add(index)

    # Traverse the second list    
    matching_index = []

    for index, element in enumerate(list2):

        # We have to create one pair by element in the set of the inverse index
        if element in inverse_index:
            matching_index.extend([(x, index) for x in inverse_index[element]])

    return matching_index

find_matching_index([1, 1, 2], [2, 2, 1]) # [(2, 0), (2, 1), (0, 2), (1, 2)]

Unfortunately, this is no longer O(n). Consider the case where you input [1, 1] and [1, 1], the output is [(0, 0), (0, 1), (1, 0), (1, 1)]. Thus by the size of the output, the worst case is O(n^2).

Although, this solution is still O(n) if there are not duplicates.

How can I apply a function to itself?

Suppose I have function, f, which takes in some variable and returns a variable of the same type. For simplicity, let’s say

def f(x):
    return x/2+1

I’m interested in applying f to itself over and over. Something like f(f(f(...(f(x))...))).

I could do this like

s = f(x)
for i in range(100):
    s = f(s)

But I was wondering if there was a simpler, less verbose way to doing the same thing. I wan’t to avoid for loops (just as a challenge to myself). Is there maybe some way of using map or a similar function to accomplish this?

Solution:

Is there maybe some way of using map or a similar function to accomplish this?

Not map, but reduce. I wouldn’t use it for this, but you could call reduce on an n-item sequence to cause f to be called n times. For example:

>>> def f(x):
...   return x+1
... 
>>> reduce(lambda n,_: f(n), range(100), 42)
142

Explanation:

  • n is assigned each successive return value of f.
  • _ is the list of numbers from range(100). These numbers are all ignored. All that matters is how many there are.
  • 42 is the starting value.

100 nested calls to f(f(f...(f(42))...)) results in 142.

make operators overloading less redundant in python?

I’m writing a class overloading the list type.
I just wrote this and I’m wondering if there exists any other way less redundant to do it :

class Vector:
def __mul__(self, other):
    #Vector([1, 2, 3]) * 5 => Vector([5, 10, 15])
    if isinstance(other, int) or isinstance(other, float):
        tmp = list()
        for i in self.l:
            tmp.append(i * other)
        return Vector(tmp)
    raise VectorException("We can only mul a Vector by a scalar")

def __truediv__(self, other):
    #Vector([1, 2, 3]) / 5 => Vector([0.2, 0.4, 0.6])
    if isinstance(other, int) or isinstance(other, float):
        tmp = list()
        for i in self.l:
            tmp.append(i / other)
        return Vector(tmp)
    raise VectorException("We can only div a Vector by a Scalar")

def __floordiv__(self, other):
    #Vector([1, 2, 3]) // 2 => Vector([0, 1, 1])
    if isinstance(other, int) or isinstance(other, float):
        tmp = list()
        for i in self.l:
            tmp.append(i // other)
        return Vector(tmp)
    raise VectorException("We can only div a Vector by a Scalar")

As you can see, every overloaded method is a copy/paste of the previous with just small changes.

Solution:

What you want to do here is dynamically generate the methods. There are multiple ways to do this, from going super-dynamic and creating them on the fly in a metaclass’s __getattribute__ (although that doesn’t work for some special methods—see the docs)
to generating source text to save in a .py file that you can then import. But the simplest solution is to create them in the class definition, something like this:

def _make_op_method(op):
    def _op(self, other):
        if isinstance(other, int) or isinstance(other, float):
            tmp = list()
            for i in self.l:
                tmp.append(op(i. other))
            return Vector(tmp)
        raise VectorException("We can only {} a Vector by a scalar".format(
            op.__name__.strip('_'))
    _op.__name__ = op.__name__
    return _op

__mul__ = _make_op(operator.__mul__)
__truediv__ = _make_op(operator.__truediv__)
# and so on

You can get fancier and set _op.__doc__ to an appropriate docstring that you generate (see functools.wraps in the stdlib for some relevant code), and build __rmul__ and __imul__ the same way you build __mul__, and so on. And you can write a metaclass, class decorator, or function generator that wraps up some of the details if you’re going to be doing many variations of the same thing. But this is the basic idea.

The operator.mul, etc., come from the operator module in the stdlib—they’re just trivial functions where operator.__mul__(x, y) basically just calls x * y, and so on, made for when you need to pass around an operator expression as a function.

There are some examples of this kind of code in the stdlib—although far more examples of the related but much simpler __rmul__ = __mul__.

The key here is that there’s no difference between names you create with def and names you create by assigning with =. Either way, __mul__ becomes an attribute of the class, and its value is a function that does what you want.

If you don’t understand how that works, you probably shouldn’t be doing this, and should settle for Ramazan Polat’s answer. It’s not quite as compact, or as efficient, but it’s surely easier to understand.

NumPy broadcasting to improve dot-product performance

This is a rather simple operation, but it is repeated millions of times in my actual code and, if possible, I’d like to improve its performance.

import numpy as np

# Initial data array
xx = np.random.uniform(0., 1., (3, 14, 1))
# Coefficients used to modify 'xx'
a, b, c = np.random.uniform(0., 1., 3)

# Operation on 'xx' to obtain the final array 'yy'
yy = xx[0] * a * b + xx[1] * b + xx[2] * c

The last line is the one I’d like to improve. Basically, each term in xx is multiplied by a factor (given by the a, b, c coefficients) and then all terms are added to give a final yy array with the shape (14, 1) vs the shape of the initial xx array (3, 14, 1).

Is it possible to do this via numpy broadcasting?

Solution:

We could use broadcasted multiplication and then sum along the first axis for the first alternative.

As the second one, we could also bring in matrix-multiplication with np.dot. Thus, giving us two more approaches. Here’s the timings for the sample provided in the question –

# Original one
In [81]: %timeit xx[0] * a * b + xx[1] * b + xx[2] * c
100000 loops, best of 3: 5.04 µs per loop

# Proposed alternative #1
In [82]: %timeit (xx *np.array([a*b,b,c])[:,None,None]).sum(0)
100000 loops, best of 3: 4.44 µs per loop

# Proposed alternative #2
In [83]: %timeit np.array([a*b,b,c]).dot(xx[...,0])[:,None]
1000000 loops, best of 3: 1.51 µs per loop

How can I determine the reason for a Python Type Error

I’m currently using a try/except block to treat a particular variable as an iterable when I can, but handle it a different, though correct, manner when it isn’t iterable.

My problem is that a TypeException may be thrown for reasons other than trying to iterate with a non-iterable. My check was to use the message attached to the TypeException to ensure that this was the reason and not something like an unsupported operand.

But messages as a part of exceptions have been deprecated. So, how can I check on the reason for my TypeException?

For the sake of completeness, the code I’m using is fairly similar to this:

            try:
               deref = [orig[x].value.flatten() for x in y]
            except TypeError as ex:
                if "object is not iterable" in ex.message:
                    x = y
                    deref = [orig[x].value.flatten()]
                else:
                    raise

Solution:

Separate the part that throws the exception you’re interested in from the parts that throw unrelated exceptions:

try:
    iterator = iter(y)
except TypeError:
    handle_that()
else:
    do_whatever_with([orig[x].value.flatten() for x in iterator])

Python – How to pass a method as an argument to call a method from another library

I want to pass a method as an argument that will call such method from another python file as follows:

file2.py

def abc():
    return 'success.'

main.py

import file2
def call_method(method_name):
    #Here the method_name passed will be a method to be called from file2.py
    return file2.method_name()

print(call_method(abc))

What I expect is to return success.

If calling a method within the same file (main.py), I notice it is workable. However, for case like above where it involves passing an argument to be called from another file, how can I do that?

Solution:

You can use getattr to get the function from the module using a string like:

import file2
def call_method(method_name):
    return getattr(file2, method_name)()

print(call_method('abc'))

create module function alias by import or assigment

Say I have import a module by using

import m

and now I want an alias to its function, I can use

from m import f as n

or

n = m.f

I think there is no difference, is one preferred than another?

Solution:

There is no difference, as far as using the object n is concerned.

There is a slight logical difference: the first way will leave a name m bound in scope, and the second way will not. Though, the m module would still get loaded into sys.modules with either approach.

Using the import statement for this is more commonly seen.

Index a NumPy array row-wise

Say I have a NumPy array:

>>> X = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
>>> X
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

and an array of indexes that I want to select for each row:

>>> ixs = np.array([[1, 3], [0, 1], [1, 2]])
>>> ixs
array([[1, 3],
       [0, 1],
       [1, 2]])

How do I index the array X so that for every row in X I select the two indices specified in ixs?

So for this case, I want to select element 1 and 3 for the first row, element 0 and 1 for the second row, and so on. The output should be:

array([[2, 4],
       [5, 6],
       [10, 11]])

A slow solution would be something like this:

output = np.array([row[ix] for row, ix in zip(X, ixs)])

however this can get kinda slow for extremely long arrays. Is there a faster way to do this without a loop using NumPy?

EDIT: Some very approximate speed tests on a 2.5K * 1M array (10GB):

np.array([row[ix] for row, ix in zip(X, ixs)]) 0.16s

X[np.arange(len(ixs)), ixs.T].T 0.175s

X.take(idx+np.arange(0, X.shape[0]*X.shape[1], X.shape[1])[:,None]) 33s

np.fromiter((X[i, j] for i, row in enumerate(ixs) for j in row), dtype=X.dtype).reshape(ixs.shape) 2.4s

Solution:

You can use this:

X[np.arange(len(ixs)), ixs.T].T

Here is the reference for complex indexing.