How to make all of the permutations of a password for brute force?

So I was trying to make a program that brute forces passwords.

Firstly, I made a program for a password of length 1:

password = input('What is your password?\n')
chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'

def brute_force():
    for char in chars:
        if char == password:
            return char

print(brute_force())

Then I edited it for a password of length 2:

def brute_force():
    guess = [None, None]
    for char in chars:
        guess[0] = char
        for char2 in chars:
            guess[1] = char2
            if ''.join(guess) == password:
                return ''.join(guess)

Finally I did the same for a password of length 3:

def brute_force():
    guess = [None, None, None]
    for char in chars:
        guess[0] = char
        for char2 in chars:
            guess[1] = char2
            for char3 in chars:
                guess[2] = char3
                if ''.join(guess) == password:
                    return ''.join(guess)

How could I generalize this for a variable called length which contains the integer value of the lenght of the password?

Solution:

You can use the following recursive function:

def brute_force(string, length, goal):
    if length == 1:
        for c in chars:
            if string + c == goal:
                return string + c
        return False
    for c in chars:
         s = brute_force(string + c, length - 1, goal)
         if s:
             return s
    return False

which you can call with syntax like:

>>> brute_force('', 3, 'bob')
'bob'
>>> brute_force('', 2, 'yo')
'yo'

Why does this work?

We always call each function with the three variables: string, length and goal. The variable string holds the current guess up to this point, so in the first example, string will be everything up to bob such as ab, bo etc.

The next variable length holds how many characters there are to go till the string is the right length.

The next variable goal is the correct password which we just pass through and is compare against.

In the main body of the function, we need to first check the case where length is 1. This is the case when we already have a string that is one off the length of the goal and we just want to check the final character. To check this last character, we just loop through the chars and for each one (c), we add (concatenate) that character to the end of the current string and compare that against our goal.

If it matches, then we return the result of that concatenation, otherwise we keep looping and check the next character. Eventually we reach the end of the for-loop and it is here that we return False. We return either the solution or False to indicate to the function which called us (the call above in the stack) that we found the right password.

We have now finished the case where length = 1 and now need to handle the other cases.

In these cases, the aim is to do exactly what we do in the case where length is one, but instead of comparing the result of the concatenation with the goal, we want to instead call the function again (I know it’s slightly confusing) with the result of this concatenation. You will notice though that we also have the other two variables to worry about: length and goal.

Well, to handle these, we just need to think what the next function needs to know. It already has the string as this was the result of concatenating the next character in the chars string and the length is just going to be one less as we just added one to the string through the concatenation and the goal is clearly going to be the same – we are still searching for the same password.

Now that we have called this function, it will run through subtracting one from the length at each of the subsequent calls it makes until it eventually reaches the case where length == 1. And we are at the easy case again and already know what to do!

So, after calling it, the function will return one of two things, either False indicating that the last node did not find the password (so this would occur in the case where something like ab reached the end in our search for bob so returned False after no solution was found), or, the call could return the actual solution.

Handling these cases is simple, if we got the actual solution, we just want to return that up the chain and if we got a fail (False), we just want to return False And that will indicate to the node above us that we did not succeed and tell it to continue its search.

So now, we just need to know how to call the function and that is simple, we just need to send in an empty string and a target length and goal value and let the recursion take place.


Note one last thing is that if you wanted this to be even neater, you could modify the function definition to:

def brute_force(length, goal, string=''):
    ...

and change the recursive call within. This way, you could call the function with something just like: brute_force(3, 'bob') and wouldn’t need to specify what string should start at. This is just something that you can add in if you want, but isn’t necessary for the function to work.

Concatenate string literals to generate variable name

Question

In python, what is the shortest/easiest way to create a function which allows me to access a variable by concatenating two string literals to generate the variable’s name?


Background

In C, I can do something like so:

#define CONCAT(x,y) x ## _ ## y

Then, later in my code, I could do something like:

int i = CONCAT(PRODUCT,BANANA).

Assuming a macro exists with the name PRODUCT_BANANA, its value is now assigned to i. I can accomplish something similar in shell scripts via indirection.


Question – Redux

How can I accomplish this same functionality in python? I’m doing this because I have a python class with thousands of variables for dozens of different products, i.e.

class A(object):
    BANANA_ADDRESS0 = 0xABCD;
    PINEAPPLE_ADDRESS0 = 0x1234;
    BANANA_ADDRESS1 = 0x4567;
    PINEAPPLE_ADDRESS1 = 0x1000;

I’d like to be able to have a function that can be, for example, executed via someFunc("BANANA", "ADDRESS0"), resolve the value as A.BANANA_ADDRESS0, and return the associated value (0xABCD, in this case).


Extra

Assuming the above is possible, is it possible to have the function always interpret the supplied function arguments as string literals, so function calls don’t need the arguments wrapped in single/double quotes? i.e. so it can be called via someFunc(BANANA, ADDRESS0), rather than someFunc("BANANA", "ADDRESS0")?

Solution:

The first part is easy:

class A(object):
    BANANA_ADDRESS0 = 0xABCD;
    PINEAPPLE_ADDRESS0 = 0x1234;
    BANANA_ADDRESS1 = 0x4567;
    PINEAPPLE_ADDRESS1 = 0x1000;

    @classmethod
    def some_func(cls, name_a, name_b):
        name = '{}_{}'.format(name_a, name_b)
        return getattr(cls, name)

value = A.some_func('BANANA', 'ADDRESS1')

But the second part is not possible unless you have a limited set of names, in which case you would also have to have

BANANA = 'BANANA'
PINEAPPLE = 'PINEAPPLE'

etc

Convert 4 one-to-one mapped lists into a list of dicts (python)

I have 4 lists where the elements are one-to-one mapped. There are tens of thousands of elements. I want to create one dict giving the 4 properties for each element, and then I want to put these dicts into a list. (My end goal is to create a pandas DataFrame and save it as an HDF5 file.)

Is there an easy memory-efficient way to do this, perhaps using zip() and dict() instead of a for loop?

As a working example for Python, please consider:

list1 = ['obj1','obj2','obj3']
list2 = ['cat','dog','tree']
list3 = [7,8,9]
list4 = ['red','green','blue']

So the idea is that in the end I want a list of dicts that looks like

[{'obj':'obj1','type':'cat','num':7,'color':'red'}, 
 {'obj':'obj2','type':'dog','num':8,'color':'green'}, 
 {'obj':'obj3','type':'tree','num':9,'color':'blue'}]

Solution:

Since you tag pandas , By using to_dict

pd.DataFrame({'obj':list1,'type':list2,'num':list3,'color':list4}).to_dict('r')
Out[1204]: 
[{'color': 'red', 'num': 7, 'obj': 'obj1', 'type': 'cat'},
 {'color': 'green', 'num': 8, 'obj': 'obj2', 'type': 'dog'},
 {'color': 'blue', 'num': 9, 'obj': 'obj3', 'type': 'tree'}]

Python/Bash – Get filenames with escaped characters

How can I parse filenames with spaces, parentheses etc. into a variable?
Ex.

'Album Artist - Song name (feat Musician) [Year]'

to

'Album\ Artist\ \- Song\ name\ \(feat\ Musician\)\ \[Year\]'

I get the right format with re.escape(filename). However if I store the print from re.escape into a variable it gets reversed to the initial naming. I know that I could use the "string".replace('x', 'y') method. But it does not appeal safe to me.

Does anybody know how I can fix this or work around this problem?
Using Python 3.5.3 btw.

EDIT example code:

>>> import re 
>>> # this is an example array in the format how my filenames are named stored in files                                                                  >>> files = ['AA - BB (CC) [DD]', 'EE - FF (GG) [HH]', 'II - JJ (KK) [LL]']
>>> for f in files:
...     print(f)
...
AA - BB (CC) [DD]
EE - FF (GG) [HH]
II - JJ (KK) [LL]
>>> for f in files:
...     print(re.escape(f))
...
AA\ \-\ BB\ \(CC\)\ \[DD\] # desired format
EE\ \-\ FF\ \(GG\)\ \[HH\]
II\ \-\ JJ\ \(KK\)\ \[LL\]
>>> escaped = re.escape(files[0])
>>> escaped
'AA\\ \\-\\ BB\\ \\(CC\\)\\ \\[DD\\]' # actual result
>>>

Solution:

The underlying problem sounds like passing file names that may contain characters that would need to be escaped as arguments to another program. I suggest looking at subprocess.

Specifically, see frequently used arguments:

args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names). If passing a single string, either shell must be True (see below) or else the string must simply name the program to be executed without specifying any arguments.

For example:

import subprocess

file_names = [r'AA - BB (CC) [DD]', r'EE - FF (GG) [HH]', r'II - JJ (KK) [LL]']

for file_name in file_names:
    subprocess.call([r'touch', file_name])

How to combine lists of different lengths repeating elements?

so I have say two lists:

list1 = [1,2,3]

and

list2 = [4]

and I need to combine them to produce the following output:

list3=[[1,4],[2,4],[3,4]]

itertools doesn’t seem to have a method to accomplish this, the zip function ends when the second list does…

I’m sure there’s a one liner out there, but I’m finding too much stuff about similar but not the same problems on here and google.

Thanks for any help!

Solution:

You can iterate over the list and concatenate the list2 value and the element for the current iteration:

list1 = [1,2,3]
list2 = [4]
new_list = [[a]+list2 for a in list1]

Output:

[[1, 4], [2, 4], [3, 4]]

Or, an alternative, although lower solution using map:

final_list = map(lambda x:[x, list2[0]], list1)

Output:

[[1, 4], [2, 4], [3, 4]]

Edit boolean and operator

So I’ve been messing around with the standard operators in classes to try and see what i can make, but i haven’t been able to find how to edit the boolean and operator.

I can edit the bitwise &operator by defining __and__(self), but not the way that and behaves. Does anyone know how I can change the behavior of a and b where a and bare instances of the class I’m making?

Thanks in advance!

Solution:

In Python 2, and and or access __nonzero__:

>>> class Test(object):
...     def __nonzero__(self):
...         print '__nonzero__ called'
...         return True
... 
>>> Test() and 1
__nonzero__ called
1

In Python 3, __nonzero__ has been renamed to __bool__.

>>> class Test:
...     def __bool__(self):
...         print('__bool__ called')
...         return True
... 
>>> Test() and 1
__bool__ called
1

Note that short-circuit evaluation might suppress a call to __nonzero__ or __bool__.

>>> 0 and Test()
0
>>> 1 or Test()
1

Another speciality to be aware of is that Python is trying to access __len__ if __nonzero__ / __bool__ is not defined and treats the object as truthy if __len__ returns a value other than 0. If both methods are defined, __nonzero__ / __bool__ wins.

>>> class Test:
...     def __len__(self):
...         return 23
... 
>>> Test() and True
True
>>>
>>> class Test:
...     def __len__(self):
...         return 23
...     def __bool__(self):
...         return False
... 
>>> Test() and True
<__main__.Test object at 0x7fc18b5e26d8> # evaluation stops at Test() because the object is falsy
>>> bool(Test())
False

Is there any way i can have this return something other than a bool, like, say, a list of bools?

Unfortunately, no. The documentation states that the method should return False or True but in fact you get a TypeError if you let it return something else.

>>> class Test:
...     def __bool__(self):
...         return 1
... 
>>> Test() and 42
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __bool__ should return bool, returned int
>>> 
>>> bool(Test())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __bool__ should return bool, returned int

4D array from 2D arrays

I have certain number of Numpy arrays with the shape

print(x.shape)
>>>(256,256)

How can I stack them so that the shape is

print(y.shape)
>>>(certainnumber,256,256,1)

I’ve been trying with np.stack and np.concatenate but I only get out of axis errors or stuff like

print(y.shape)
>>>(anothernumber,256) 

Solution:

You can add an axis argument to np.stack to specify which axis you want to stack along:

arrs = [np.random.rand(256, 256) for i in range(11)]
out = np.stack(arrs, axis=0)
out.shape
# (11, 256, 256)

(Note that axis defaults to zero).

If you need to add a one at the end of the shape, then use a newaxis

out[..., np.newaxis].shape
(11, 256, 256, 1)

Compare elements of one nested list with another nested list

I got two list of lists

l1 = [[1,2,3],[4,5,6],[7,8,9]]
l2 = [['a','b',4],['c','d',1],['e','f',12],['i','j',18]]

I would like to iterate over l1 and check if l1[0] matches with any l2[2], In this case the output should be [1, l1[0],l2[0]] otherwise output is [0, l1[0], l2[0]]. Output should be a single nested list(or list of tuples) with result for each element of l1. Both lists can have different sizes.

I tried solving this with for-loop like:

output = list()
for i in l1:
   matched = 0
   for j in l2:
       if j[2] == i[0]:
          output.append([1,i[0], j[0]])
          matched = 1
    if matched == 0:
       output.append([0,i[0]])

This give correct output

[[1, 1, 'c'], [1, 4, 'a'], [0, 7]]

However I am looking for a more compact solution. Is it possible to solve this with list comprehension of something similar which can reduce number of lines involved?

I tried a nested list comprehension but couldn’t make it work

out = [[(1,i[0],k[0]) if(k[2] == i[0]) else (0,i[0],k[0]) for k in l2] for i in l1]
print(out)
[[(0, 1, 'a'), (1, 1, 'c'), (0, 1, 'e'), (0, 1, 'i')], [(1, 4, 'a'), (0, 4, 'c'), (0, 4, 'e'), (0, 4, 'i')], [(0, 7, 'a'), (0, 7, 'c'), (0, 7, 'e'), (0, 7, 'i')]]

Solution:

it seems that you’re not using all your elements. However, I’d build a dict out of l2 for quick lookup & concision (one-liner would be probably possible, but at the expense of readability & performance)

I’d make that follow with a list comprehension including a ternary to issue 2 or 3 elements depending if the item is found (so no need for a fancy int(if a in l2d) since we can issue 0 or 1 directly). Like this:

l1 = [[1,2,3],[4,5,6],[7,8,9]]
l2 = [['a','b',4],['c','d',1],['e','f',12],['i','j',18]]

l2d = {v[2]:v[0] for v in l2}  # not using v[1]

result = [[1,a,l2d[a]] if a in l2d else [0,a] for a,_,_ in l1]  # using only first element of each l1 triplet...

result:

[[1, 1, 'c'], [1, 4, 'a'], [0, 7]]

(note that carrying other unused items isn’t really helping to understand the issue)

pandas multiply using dictionary values across several columns

Given the following dataframe:

import pandas as pd
df = pd.DataFrame({
    'a': [1,2,3,4,5],
    'b': [5,4,3,3,4],
    'c': [3,2,4,3,10],
    'd': [3, 2, 1, 1, 1]
})

And the following list of parameters:

params = {'a': 2.5, 'b': 3.0, 'c': 1.3, 'd': 0.9}

Produce the following desired output:

   a  b   c  d  output
0  1  5   3  3    24.1
1  2  4   2  2    21.4
2  3  3   4  1    22.6
3  4  3   3  1    23.8
4  5  4  10  1    38.4

I have been using this to produce the result:

df['output'] = [np.sum(params[col] * df.loc[idx, col] for col in df)
                 for idx in df.index]

However, this is a very slow approach and I’m thinking there has to be a better way using built-in pandas functionality.

I also thought of this:

# Line up the parameters
col_sort_key = list(df)
params_sorted = sorted(params.items(), key=lambda k: col_sort_key.index(k[0]))

# Repeat the parameters *n* number of times
values = [v for k, v in params_sorted]
values = np.array([values] * df.shape[0])

values
array([[ 2.5,  3. ,  1.3,  0.9],
       [ 2.5,  3. ,  1.3,  0.9],
       [ 2.5,  3. ,  1.3,  0.9],
       [ 2.5,  3. ,  1.3,  0.9],
       [ 2.5,  3. ,  1.3,  0.9]])

# Multiply and add
product = df[col_sort_key].values * values
product
array([[  2.5,  15. ,   3.9,   2.7],
       [  5. ,  12. ,   2.6,   1.8],
       [  7.5,   9. ,   5.2,   0.9],
       [ 10. ,   9. ,   3.9,   0.9],
       [ 12.5,  12. ,  13. ,   0.9]])

np.sum(product, axis=1)
array([ 24.1,  21.4,  22.6,  23.8,  38.4])

But that seems a bit convoluted! Any thoughts on a native pandas try?

Solution:

You can use assign + mul + sum:

df1 = df.assign(**params).mul(df).sum(1)
print (df1)
0    24.1
1    21.4
2    22.6
3    23.8
4    38.4
dtype: float64

And dot + Series constructor:

df1 = df.dot(pd.Series(params))
print (df1)
0    24.1
1    21.4
2    22.6
3    23.8
4    38.4
dtype: float64

Python Pandas: Assign Last Value of DataFrame Group to All Entries of That Group

In Python Pandas, I have a DataFrame. I group this DataFrame by a column and want to assign the last value of a column to all rows of another column.

I know that I am able to select the last row of the group by this command:

import pandas as pd

df = pd.DataFrame({'a': (1,1,2,3,3), 'b':(20,21,30,40,41)})
print(df)
print("-")
result = df.groupby('a').nth(-1)
print(result)

Result:

   a   b
0  1  20
1  1  21
2  2  30
3  3  40
4  3  41
-
    b
a    
1  21
2  30
3  41

How would it be possible to assign the result of this operation back to the original dataframe so that I have something like:

   a   b b_new
0  1  20 21
1  1  21 21
2  2  30 30
3  3  40 41
4  3  41 41

Solution:

Use transform with last:

df['b_new'] = df.groupby('a')['b'].transform('last')

Alternative:

df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1])

print(df)
   a   b  b_new
0  1  20     21
1  1  21     21
2  2  30     30
3  3  40     41
4  3  41     41

Solution with nth and join:

df = df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a')
print(df)
   a   b  b_new
0  1  20     21
1  1  21     21
2  2  30     30
3  3  40     41
4  3  41     41

Timings:

N = 10000

df = pd.DataFrame({'a':np.random.randint(1000,size=N),
                   'b':np.random.randint(10000,size=N)})

#print (df)


def f(df):
    return df.join(df.groupby('a')['b'].nth(-1).rename('b_new'), 'a')

#cᴏʟᴅsᴘᴇᴇᴅ1
In [211]: %timeit df['b_new'] = df.a.map(df.groupby('a').b.nth(-1))
100 loops, best of 3: 3.57 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ2
In [212]: %timeit df['b_new'] = df.a.replace(df.groupby('a').b.nth(-1))
10 loops, best of 3: 71.3 ms per loop

#jezrael1
In [213]: %timeit df['b_new'] = df.groupby('a')['b'].transform('last')
1000 loops, best of 3: 1.82 ms per loop

#jezrael2
In [214]: %timeit df['b_new'] = df.groupby('a')['b'].transform(lambda x: x.iat[-1])
10 loops, best of 3: 178 ms per loop

#jezrael3
In [219]: %timeit f(df)
100 loops, best of 3: 3.63 ms per loop

Caveat

The results do not address performance given the number of groups, which will affect timings a lot for some of these solutions.