parse string for key, value pairs with a known key delimiter

How can I convert a string to a dict, if key strings are known substrings with definite delimiters? Example:

s = 'k1:text k2: more text k3:andk4: more yet'
key_list = ['k1','k2','k3']
(missing code)
# s_dict = {'k1':'text', 'k2':'more text', 'k3':'andk4: more yet'}  

In this case, keys must be preceded by a space, newline, or be the first character of the string and must be followed (immediately) by a colon, else they are not parsed as keys. Thus in the example, k1,k2, and k3 are read as keys, while k4 is part of k3‘s value. I’ve also stripped trailing white space but consider this is optional.

Solution:

You can use re.findall to do this:

>>> import re
>>> dict(re.findall(r'(?:(?<=\s)|(?<=^))(\S+?):(.*?)(?=\s[^\s:]+:|$)', s))
{'k1': 'text', 'k2': ' more text', 'k3': 'andk4: more yet'}

The regular expression requires a little trial-and-error. Stare at it long enough, and you’ll understand what it’s doing.

Details

(?:          
   (?<=\s)   # lookbehind for a space 
   |         # regex OR
   (?<=^)    # lookbehind for start-of-line
)     
(\S+?)       # non-greedy match for anything that isn't a space
:            # literal colon
(.*?)        # non-greedy match
(?=          # lookahead (this handles the third key's case)
   \s        # space  
   [^\s:]+   # anything that is not a space or colon
   :         # colon
   |
   $         # end-of-line
)

How to count longest uninterrupted sequence in pandas

Let’s say I have pd.Series like below

s = pd.Series([False, True, False,True,True,True,False, False])    

0    False
1     True
2    False
3     True
4     True
5     True
6    False
7    False
dtype: bool

I want to know how long is the longest True sequence, in this example, it is 3.

I tried it in a stupid way.

s_list = s.tolist()
count = 0
max_count = 0
for item in s_list:
    if item:
        count +=1
    else:
        if count>max_count:
            max_count = count
        count = 0
print(max_count)

It will print 3, but in a Series of all True, it will print 0

Solution:

Option 1
Use a the series itself to mask the cumulative sum of the negation. Then use valud_counts

(~s).cumsum()[s].value_counts().max()

3

explanation

  1. (~s).cumsum() is a pretty standard way to produce distinct True/False groups

    0    1
    1    1
    2    2
    3    2
    4    2
    5    2
    6    3
    7    4
    dtype: int64
    
  2. But you can see that the group we care about is represented by the 2s and there are four of them. That’s because the group is initiated by the first False (which becomes True with (~s)). Therefore, we mask this cumulative sum with the boolean mask we started with.

    (~s).cumsum()[s]
    
    1    1
    3    2
    4    2
    5    2
    dtype: int64
    
  3. Now we see the three 2s pop out and we just have to use a method to extract them. I used value_counts and max.


Option 2
Use factorize and bincount

a = s.values
b = pd.factorize((~a).cumsum())[0]
np.bincount(b[a]).max()

3

explanation
This is a similar explanation as for option 1. The main difference is in how I a found the max. I use pd.factorize to tokenize the values into integers ranging from 0 to the total number of unique values. Given the actual values we had in (~a).cumsum() we didn’t strictly need this part. I used it because it’s a general purpose tool that could be used on arbitrary group names.

After pd.factorize I use those integer values in np.bincount which accumulates the total number of times each integer is used. Then take the maximum.


Option 3
As stated in the explanation of option 2, this also works:

a = s.values
np.bincount((~a).cumsum()[a]).max()

3

How can I store multiple function as a value of the dictionary?

In the following code I try to store multiple functions as a value of the dictionary. This code doesn’t work. The two functions are returned as a tuple. But I don’t want to iter over the dictionary. I want to use a special key, and then I want the dictionary to run the two functions.

from functools import partial

def test_1(arg_1 = None):
     print "printing from test_1 func with text:", arg_1

def test_2(arg_2 = None):
     print "printing from test_2 func with text:", arg_2

dic = {'a':(partial(test_1, arg_1 = 'test_1'),
            partial(test_2, arg_2 = 'test_2'))}

dic['a']()

Solution:

You can build a closure to do that like:

Code:

def chain_funcs(*funcs):
    """return a callable to call multiple functions"""
    def call_funcs(*args, **kwargs):
        for f in funcs:
            f(*args, **kwargs)

    return call_funcs

Test Code:

def test_1(arg_1=None):
    print("printing from test_1 func with text: %s" % arg_1)


def test_2(arg_2=None):
    print("printing from test_2 func with text: %s" % arg_2)


from functools import partial
dic = {'a': chain_funcs(partial(test_1, arg_1='test_1'),
                        partial(test_2, arg_2='test_2'))}

dic['a']()

Results:

printing from test_1 func with text: test_1
printing from test_2 func with text: test_2

How can I delete a repeated dictionary in list?

for dynamic values sometimes the value will be keep repeating, say if a variable

table = [
    {'man':'tim','age':'2','h':'5','w':'40'},
    {'man':'jim','age':'4','h':'3','w':'20'},
    {'man':'jon','age':'24','h':'5','w':'80'}, 
    {'man':'tim','age':'2','h':'5','w':'40'},
    {'man':'tto','age':'7','h':'4','w':'49'}    
]

here {'man':'tim','age':'2','h':'5','w':'40'} dictionary set repeat twice these are dynamic value.

How can I stop repeating this, so list will not contain any repeated dictionary before rendering it to templates?

edited: actual data

[{'scorecardid': 1, 'progress2': 'preview', 'series2': 'Afghanistan v Zimbabwe in UAE, 2018', 'Commentary1': '/Commentary1', 'commentaryid': 1, 'matchid2': '10', 'matchno2': '5th ODI', 'teams2': 'AFG vs ZIM', 'matchtype2': 'ODI', 'Scorecard1': '/Scorecard1', 'status2': 'Starts on Feb 19 at 10:30 GMT'}, {'six2': '0', 'scorecardid': 2, 'overs5': '4', 'fours1': '0', 'overs10': '20', 'Batting_team_img': 'images/RSA.png', 'wickets20': '5', 'wickets6': '1', 'Bowling_team_img': 'images/IND.png', 'maidens6': '0', 'Batting team': 'RSA', 'matchid2': '9', 'name6': 'Unadkat', 'teams2': 'RSA vs IND', 'wickets10': '9', 'desc10': 'Inns', 'runs5': '32', 'matchtype2': 'T20', 'Scorecard1': '/Scorecard2', 'runs1': '2', 'wickets5': '0', 'runs6': '33', 'runs2': '0', 'maidens5': '0', 'runs20': '203', 'name5': 'Bumrah*', 'progress2': 'complete', 'Commentary1': '/Commentary2', 'fours2': '0', 'series2': 'India tour of South Africa, 2017-18', 'name1': 'Junior Dala*', 'commentaryid': 2, 'matchno2': '1st T20I', 'six1': '0', 'overs6': '4', 'Bowling team': 'IND', 'balls2': '2', 'balls1': '3', 'name2': 'Shamsi', 'overs20': '20', 'runs10': '175', 'desc20': 'Inns', 'status2': 'Ind won by 28 runs'}, {'scorecardid': 3, 'overs5': '0.4', 'fours1': '0', 'overs10': '18.4', 'Batting_team_img': 'images/BAN.png', 'wickets20': '4', 'wickets6': '1', 'Bowling_team_img': 'images/SL.png', 'Batting team': 'BAN', 'matchid2': '6', 'name6': 'Shanaka', 'teams2': 'BAN vs SL', 'wickets10': '10', 'desc10': 'Inns', 'runs5': '3', 'matchtype2': 'T20', 'Scorecard1': '/Scorecard3', 'runs1': '1', 'wickets5': '2', 'runs6': '5', 'maidens5': '0', 'runs20': '210', 'progress2': 'complete', 'Commentary1': '/Commentary3', 'name5': 'Gunathilaka*', 'series2': 'Sri Lanka tour of Bangladesh, 2018', 'name1': 'Nazmul Islam', 'commentaryid': 3, 'matchno2': '2nd T20I', 'six1': '0', 'overs6': '1.5', 'Bowling team': 'SL', 'maidens6': '0', 'balls1': '1', 'overs20': '20', 'runs10': '135', 'desc20': 'Inns', 'status2': 'SL won by 75 runs'}, {'six2': '2', 'scorecardid': 4, 'overs5': '4', 'fours1': '1', 'overs10': '20', 'Batting_team_img': 'images/NZ.png', 'wickets20': '7', 'wickets6': '1', 'Bowling_team_img': 'images/ENG.png', 'maidens6': '0', 'Batting team': 'NZ', 'matchid2': '4', 'name6': 'Tom Curran', 'teams2': 'NZ vs ENG', 'wickets10': '4', 'desc10': 'Inns', 'runs5': '41', 'matchtype2': 'T20', 'Scorecard1': '/Scorecard4', 'runs1': '7', 'wickets5': '0', 'runs6': '32', 'runs2': '37', 'maidens5': '0', 'runs20': '194', 'name5': 'Chris Jordan*', 'progress2': 'complete', 'Commentary1': '/Commentary4', 'fours2': '2', 'series2': 'England, Australia, New Zealand T20I Tri-Series, 2018', 'name1': 'de Grandhomme*', 'commentaryid': 4, 'matchno2': '6th Match', 'six1': '0', 'overs6': '3', 'Bowling team': 'ENG', 'balls2': '30', 'balls1': '5', 'name2': 'Chapman', 'overs20': '20', 'runs10': '192', 'desc20': 'Inns', 'status2': 'Eng won by 2 runs'}, {'scorecardid': 5, 'overs5': '7.4', 'fours1': '3', 'runs20': '213', 'six2': '0', 'commentaryid': 5, 'Batting team': 'SAUS', 'matchid2': '18770', 'matchno2': '21st Match', 'wickets10': '3', 'overs10': '49.4', 'matchtype2': 'TEST', 'runs1': '26', 'overs6': '8', 'runs6': '39', 'runs2': '49', 'name1': 'Mennie*', 'name5': 'Daniel Fallins*', 'series2': 'Sheffield Shield, 2017-18', 'Commentary1': '/Commentary5', 'wickets6': '1', 'runs11': '281', 'six1': '0', 'runs10': '192', 'balls1': '58', 'overs11': '74.1', 'maidens5': '1', 'desc21': '1st Inns', 'status2': 'South Aus won by 7 wkts', 'runs5': '51', 'wickets11': '10', 'desc11': '1st Inns', 'desc20': '2nd Inns', 'wickets20': '10', 'wickets21': '10', 'teams2': 'NSW vs SAUS', 'balls2': '85', 'Scorecard1': '/Scorecard5', 'wickets5': '1', 'progress2': 'Result', 'runs21': '256', 'fours2': '6', 'desc10': '2nd Inns', 'name6': 'Stobo', 'maidens6': '1', 'Bowling team': 'NSW', 'name2': 'Ferguson', 'overs20': '68.4', 'overs21': '90.4'}, {'six2': '0', 'scorecardid': 6, 'overs5': '4', 'fours1': '0', 'overs10': '20', 'Batting_team_img': 'images/RSA.png', 'wickets20': '5', 'wickets6': '1', 'Bowling_team_img': 'images/IND.png', 'maidens6': '0', 'Batting team': 'RSA', 'matchid2': '19166', 'name6': 'Unadkat', 'teams2': 'RSA vs IND', 'wickets10': '9', 'desc10': 'Inns', 'runs5': '32', 'matchtype2': 'T20', 'Scorecard1': '/Scorecard6', 'runs1': '2', 'wickets5': '0', 'runs6': '33', 'runs2': '0', 'maidens5': '0', 'runs20': '203', 'name5': 'Bumrah*', 'progress2': 'Result', 'Commentary1': '/Commentary6', 'fours2': '0', 'series2': 'India tour of South Africa, 2017-18', 'name1': 'Junior Dala*', 'commentaryid': 6, 'matchno2': '1st T20I', 'six1': '0', 'overs6': '4', 'Bowling team': 'IND', 'balls2': '2', 'balls1': '3', 'name2': 'Shamsi', 'overs20': '20', 'runs10': '175', 'desc20': 'Inns', 'status2': 'Ind won by 28 runs'}]

Solution:

Since your records do not appear to have a unique identifier to differentiate records, you will need to hash on all key-value pairs. This approach will work as long as you do not have nested mutable objects inside your dictionaries.

I’ll use an OrderedDict here to maintain order.

from collections import OrderedDict
list(
     map(
         dict, 
         OrderedDict.fromkeys(
             map(frozenset, map(dict.items, table)), None
         )
     )
)

[{'age': '2', 'h': '5', 'man': 'tim', 'w': '40'},
 {'age': '4', 'h': '3', 'man': 'jim', 'w': '20'},
 {'age': '24', 'h': '5', 'man': 'jon', 'w': '80'},
 {'age': '7', 'h': '4', 'man': 'tto', 'w': '49'}]

Here’s what’s going on:

  1. Convert each dictionary to a frozenset of tuples. frozensets are hashable.
  2. Hash each frozenset as a key into an OrderedDict. Duplicates are removed automatically.
  3. Retrieve keys and convert back into a list of dictionaries.

There are many ways to reproduce the algorithm described above. I’ve used the functional programming tool – map – which python offers.

Count each group sequentially pandas

I have a df that I am grouping by two columns. I want to count each group sequentially. The code below counts each row within a group sequentially. This seems easier than I think but can’t figure it out.

df = pd.DataFrame({
    'Key': ['10003', '10009', '10009', '10009',
            '10009', '10034', '10034', '10034'], 
    'Date1': [20120506, 20120506, 20120506, 20120506,
              20120620, 20120206, 20120206, 20120405],
    'Date2': [20120528, 20120507, 20120615, 20120629,
              20120621, 20120305, 20120506, 20120506]
})


df['Count'] = df.groupby(['Key','Date1']).cumcount() + 1

Anticipated result:

    Date1       Date2       Key    Count
0   20120506    20120528    10003  1
1   20120506    20120507    10009  2
2   20120506    20120615    10009  2
3   20120506    20120629    10009  2
4   20120620    20120621    10009  3
5   20120206    20120305    10034  4
6   20120206    20120506    10034  4
7   20120405    20120506    10034  5

Solution:

You’re looking for groupby + ngroup:

df['Count'] = df.groupby(['Key','Date1']).ngroup() + 1
df

      Date1     Date2    Key  Count
0  20120506  20120528  10003      1
1  20120506  20120507  10009      2
2  20120506  20120615  10009      2
3  20120506  20120629  10009      2
4  20120620  20120621  10009      3
5  20120206  20120305  10034      4
6  20120206  20120506  10034      4
7  20120405  20120506  10034      5

ngroup simply gives each group a label.

Python – Sorting a list item by alphabet in a list of lists, and have other lists follow the swapping order

I am trying to sort a list of lists in Python by the first row (specifically not using Numpy, I know there are many solutions using Numpy but this is a question that specifically asks for a way without using Numpy)

Here is my list of lists:

listOfLists = [ ['m', 'e', 'l', 't', 's'],
                ['g', 'p', 's', 'k', 't'],
                ['y', 'q', 'd', 'h', 's'] ]

I am looking to sort this list 1) alphabetically BUT 2) only by the first list item, the vertical slices should just follow the order of the first list item. For example:

newListofLists = [ ['e', 'l', 'm', 's', 't'],
                   ['p', 's', 'g', 't', 'k'],
                   ['q', 'd', 'y', 's', 'h'] ]

The first item in listOfLists is ‘melts’, which is then sorted alphabetically to become ‘elmst’. The rest of the items in the list of list aren’t sorted alphabetically, rather they are ‘following’ the switch and sort pattern of the first item in the list.

I may be being ridiculous but I’ve spent hours on this problem (which forms part of a larger program). I have tried slicing the first item from the list of lists and sorting it alphabetically on its own, then comparing this to a slice of the first list in the list of lists that HASN’T been sorted, and comparing positions. But I just can’t seem to get anything working.

Solution:

You can transpose the list using zip, sort the transpose, and then transpose that list back into one of the correct dimensions.

listOfLists = [ ['m', 'e', 'l', 't', 's'],
                ['g', 'p', 's', 'k', 't'],
                ['y', 'q', 'd', 'h', 's'] ]

print(list(zip(*sorted(zip(*listOfLists)))))
# [('e', 'l', 'm', 's', 't'), ('p', 's', 'g', 't', 'k'), ('q', 'd', 'y', 's', 'h')]

Edit:

As @StevenRumbalski points out in the comments, the above will completely sort the vertical slices (by first letter, then second letter, etc), instead of sorting them stably by first letter (sorting by first letter, then by relative order in the input). I’ll reproduce his solution here for visibility:

from operator import itemgetter
list(map(list, zip(*sorted(zip(*listOfLists), key=itemgetter(0)))))

.join() function not working in Python

I was wondering if someone could help me figure out why my few lines of code isn’t working in Python. I am trying to create my own version of the Battleship game, but I can’t seem to get the .join() function to work.

Here is my code:

board = []

for x in range(5):
    board.append(["O"*5])

def print_board (board_in):
    for row in board_in:
        print(" ".join(row))

print_board(board)

However, my output ends up being:

OOOOO
OOOOO
OOOOO
OOOOO
OOOOO

when I feel like it should be:

O O O O O
O O O O O
O O O O O
O O O O O
O O O O O

Any help is appreciated! Thank you!

Solution:

Your problem is here:

board.append(["O" *5 ])

Doing "O" * 5 doesn’t create a list of strings. It simply creates a single string:

>>> "O"*5
'OOOOO'
>>> 

Thus, what you basically doing when using str.join is:

>>> ' '.join(['OOOOO'])
'OOOOO'
>>> 

This of course, fails, since the list passed into str.join has a single element, and str.join works by concatenating a list with multiple elements. From the documentation for str.join:

str.join(iterable)

Return a string which is the concatenation of the strings in iterable. A TypeError will be raised if there are any non-string values in iterable, including bytes objects. The separator between elements is the string providing this method.

What you need to do instead is create each row in your board with five 'O's:

board = [["O"] * 5 for _ in range(5)]

Regex using increasing sequence of numbers Python

Say I have a string:

teststring =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!" 

That I would like as:

testlist = ["1.3 Hello how are you", "1.4 I am fine, thanks 1.2 Hi There", "1.5 Great!"]

Basically, splitting only on increasing digits where the difference is .1 (i.e. 1.2 to 1.3).

Is there a way to split this with regex but only capturing increasing sequential numbers? I wrote code in python to sequentially iterate through using a custom re.compile() for each one and it is okay but extremely unwieldy.

Something like this (where parts1_temp is a given list of the x.x. numbers in the string):

parts1_temp = ['1.3','1.4','1.2','1.5']
parts_num =  range(int(parts1_temp.split('.')[1]), int(parts1_temp.split('.')[1])+30)
parts_search = ['.'.join([parts1_temp.split('.')[0], str(parts_num_el)]) for parts_num_el in parts_num]
#parts_search should be ['1.3','1.4','1.5',...,'1.32']

for k in range(len(parts_search)-1):
    rxtemp = re.compile(r"(?:"+str(parts_search[k])+")([\s\S]*?)(?=(?:"+str(parts_search[k+1])+"))", re.MULTILINE)
    parts_fin = [match.group(0) for match in rxtemp.finditer(teststring)]

But man is it ugly. Is there a way to do this more directly in regex? I imagine this is feature that someone would have wanted at some point with regex but I can’t find any ideas on how to tackle this (and maybe it is not possible with pure regex).

Solution:

This method uses finditer to find all locations of \d+\.\d+, then tests whether the match was numerically greater than the previous. If the test is true it appends the index to the indices array.

The last line uses list comprehension as taken from this answer to split the string on those given indices.

Original Method

This method ensures the previous match is smaller than the current one. This doesn’t work sequentially, instead, it works based on number size. So assuming a string has the numbers 1.1, 1.2, 1.4, it would split on each occurrence since each number is larger than the last.

See code in use here

import re

indices = []
string =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!"
regex = re.compile(r"\d+\.\d+")
lastFloat = 0

for m in regex.finditer(string):
    x = float(m.group())
    if lastFloat < x:
        lastFloat = x
        indices.append(m.start(0))

print([string[i:j] for i,j in zip(indices, indices[1:]+[None])])

Outputs: ['1.3 Hello how are you ', '1.4 I am fine, thanks 1.2 Hi There ', '1.5 Great!']


Edit

Sequential Method

This method is very similar to the original, however, on the case of 1.1, 1.2, 1.4, it wouldn’t split on 1.4 since it doesn’t follow sequentially given the .1 sequential separator.

The method below only differs in the if statement, so this logic is fairly customizable to whatever your needs may be.

See code in use here

import re

indices = []
string =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!"
regex = re.compile(r"\d+\.\d+")
lastFloat = 0

for m in regex.finditer(string):
    x = float(m.group())
    if (lastFloat == 0) or (x == round(lastFloat + .1, 1)):
        lastFloat = x
        indices.append(m.start(0))

print([string[i:j] for i,j in zip(indices, indices[1:]+[None])])

Can I access a nested dict with a list of keys?

I would like to access a dictionary programmatically. I know how to do this with a recursive function, but is there a simpler way?

example = {'a': {'b': 'c'},
           '1': {'2': {'3': {'4': '5'}}}}

keys = ('a', 'b')
example[keys] = 'new'
# Now it should be
#     example = {'a': {'b': 'new'},
#                '1': {'2': {'3': {'4': '5'}}}}


keys = ('1', '2', '3', '4')
example[keys] = 'foo'
# Now it should be
#     example = {'a': {'b': 'new'},
#                '1': {'2': {'3': {'4': 'foo'}}}}


keys = ('1', '2')
example[keys] = 'bar'
# Now it should be
#     example = {'a': {'b': 'new'},
#                '1': {'2': 'bar'}}

Solution:

This solution creates another dictionary with same keys and then updates the existing dictionary:

#!/usr/bin/env python

from six.moves import reduce


def update2(input_dictionary, new_value, loc):
    """
    Update a dictionary by defining the keys.

    Parameters
    ----------
    input_dictionary : dict
    new_value : object
    loc : iterable
        Location

    Returns
    -------
    new_dict : dict

    Examples
    --------
    >>> example = {'a': {'b': 'c'}, '1': {'2': {'3': {'4': '5'}}}}

    >>> update2(example, 'new', ('a', 'b'))
    {'a': {'b': 'new'}, '1': {'2': {'3': {'4': '5'}}}}

    >>> update2(example, 'foo', ('1', '2', '3', '4'))
    {'a': {'b': 'new'}, '1': {'2': {'3': {'4': 'foo'}}}}

    >>> update2(example, 'bar', ('1', '2'))
    {'a': {'b': 'new'}, '1': {'2': 'bar'}}
    """
    new_dict = reduce(lambda x, y: {y: x}, reversed(loc), new_value)
    input_dictionary.update(new_dict)
    return input_dictionary

if __name__ == '__main__':
    import doctest
    doctest.testmod()

use string, list or tuple for access keys