json.loads() returns a string

Why is json.loads() returning a string? Here’s is my code:

import json

d = """{
    "reference": "123432",
    "business_date": "2019-06-18",
    "final_price": 40,
    "products": [
        {
            "quantity": 4,
            "original_price": 10,
            "final_price": 40,
        }
    ]
}"""

j = json.loads(json.dumps(d))
print(type(j))

Output:

<class 'str'>

Shouldn’t it returning a json object? What change is required here?

Solution:

Two points:

  1. You have a typo in your products key : "final_price": 40, should be "final_price": 40 (without comma)
  2. j should be json.loads(d)

Output

dict

EDIT

Reasons why you can not have a trailing comma in a json objects are explained in this post Can you use a trailing comma in a JSON object?

Unfortunately the JSON specification does not allow a trailing comma. There are a few browsers that will allow it, but generally you need to worry about all browsers.

Keyboard shortcuts giving me errors in tkinter

I am trying to create a text editor with python 3 and tkinter. The text editor works great except for when I try to use my keyboard shortcuts. Whenever I use any of the shortcuts, I get an error that says this:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/tkinter/__init__.py", line 1699, in __call__
    return self.func(*args)
TypeError: newFile() takes 0 positional arguments but 1 was given

newFile() can be replaced with copySelected(), selectAll(), or whatever command I am trying to use. This only happens when I am trying to use the key bindings. It works just fine from the menu bar. The wierd thing is that when I am cutting, copying, or pasting I get the error they actually work in the app. Here is the key binding code:

textField.bind("<Command-n>", newFile)
textField.bind("<Command-N>", newFile)
textField.bind("<Command-o>", openFile)
textField.bind("<Command-O>", openFile)
textField.bind("<Command-s>", saveFile)
textField.bind("<Command-S>", saveFile)
textField.bind("<Command-n>", newFile)
textField.bind("<Command-n>", newFile)
textField.bind("<Command-z>", undo)
textField.bind("<Command-Z>", undo)
textField.bind("<Command-Shift-z>", redo)
textField.bind("<Command-Shift-Z>", redo)
textField.bind("<Command-x>", cutSelected)
textField.bind("<Command-X>", cutSelected)
textField.bind("<Command-c>", copySelected)
textField.bind("<Command-C>", copySelected)
textField.bind("<Command-v>", paste)
textField.bind("<Command-V>", paste)
textField.bind("<Command-a>", selectAll)
textField.bind("<Command-A>", selectAll)

I am currently testing the code on Mac OS but I have already made the code os specific so that it will work on Windows and Linux as well. The Windows and Linux code is exactly the same other than the fact that Command is replaced with Control. The error occurs on all three of the platforms.

Any help is greatly appreciated. Thanks!

Solution:

When you bind a key to the function, tkinter will automatically pass an object to the callback. This object represents the event that caused the callback to be called. It has information such as the widget that received the event, the x and y coordinate of the mouse, and other details unique to the event (mouse button, keyboard character, etc).

When you bind a function to an event, your function must be able to accept this parameter. For example:

def newFile(event):
    ...

Note that this is different than if you call the function via the command attribute of a widget. In that case no event object is passed. If you want to be able to call the function both via a binding and via a command attribute then you can make the parameter optional (and make sure that your function doesn’t actually attempt to use it, since it may not be present):

def newFile(event=None):
    ...

Purpose of multiprocessing.Pool.apply and multiprocessing.Pool.apply_async

See example and execution result below:

#!/usr/bin/env python3.4
from multiprocessing import Pool
import time
import os

def initializer():
    print("In initializer pid is {} ppid is {}".format(os.getpid(),os.getppid()))

def f(x):
    print("In f pid is {} ppid is {}".format(os.getpid(),os.getppid()))
    return x*x

if __name__ == '__main__':
    print("In main pid is {} ppid is {}".format(os.getpid(), os.getppid()))
    with Pool(processes=4, initializer=initializer) as pool:  # start 4 worker processes
        result = pool.apply(f, (10,)) # evaluate "f(10)" in a single process
        print(result)

        #result = pool.apply_async(f, (10,)) # evaluate "f(10)" in a single process
        #print(result.get())

Gives:

$ ./pooleg.py
In main pid is 22783 ppid is 19542
In initializer pid is 22784 ppid is 22783
In initializer pid is 22785 ppid is 22783
In initializer pid is 22787 ppid is 22783
In f pid is 22784 ppid is 22783
In initializer pid is 22786 ppid is 22783
100

As is clear from the output: 4 processes were created but only one of them actually did the work (called f).

Question: Why would I create a pool of > 1 workers and call apply() when the work f is done only by one process ? And same thing for apply_async() because in that case also the work is only done by one worker.

I don’t understand the use cases in which these functions are useful.

Solution:

First off, both are meant to operate on argument-tuples (single function calls), contrary to the Pool.map variants which operate on iterables. So it’s not an error when you observe only one process used when you call these functions only once.


You would use Pool.apply_async instead of one of the Pool.map versions, where you need more fine grained control over the single tasks you want to distribute.

The Pool.map versions take an iterable and chunk them into tasks, where every task has the same (mapped) target function.
Pool.apply_async typically isn’t called only once with a pool of >1 workers. Since it’s asynchronous, you can iterate over manually pre-bundled tasks and submit them to several
worker-processes before any of them has completed. Your task-list here can consist of different target functions like you can see in this answer here. It also allows registering callbacks for results and errors like in this example.

These properties make Pool.apply_async pretty versatile and a first-choice tool for unusual problem scenarios you cannot get done with one of the Pool.map versions.


Pool.apply indeed is not widely usefull at first sight (and second). You could use it to synchronize control flow in a scenario where you start up multiple tasks with apply_async first and then have a task which has to be completed before you fire up another round of tasks with apply_async.

Using Pool.apply could also just mean sparing you to create a single extra Process for an in-between task, when you already have a pool which is currently idling.

How to convert list of list into structured dict, Python3

I have a list of list, the content of which should be read and store in a structured dictionary.

my_list = [
    ['1', 'a1', 'b1'],
    ['',  'a2', 'b2'],
    ['',  'a3', 'b3'],
    ['2', 'c1', 'd1'],
    ['',  'c2', 'd2']]

The 1st, 2nd, 3rd columns in each row represents 'id', 'attr1', 'attr2'. If 'id' in a row is not empty, a new object starts with this 'id'. In the example above, there are two objects. The object with 'id' being '1' has 3 elements in both 'attr1' and 'attr2'; while the object with 'id' being '2' has 2 elements in both 'attr1' and 'attr2'. In my real application, there can be more objects, and each object can have an arbitrary number of elements.

For this particular example, the outcome should be

my_dict = {
    'id': ['1', '2'],
    'attr1': [['a1', 'a2', 'a3'], ['c1', 'c2']]
    'attr2': [['b1', 'b2', 'b3'], ['d1', 'd2']]

Could you please show me how to write a generic and efficient code to achieve it?

Thanks!

Solution:

Just build the appropriate dict in a loop with the right conditions:

d = {f: [] for f in ('id', 'attr1', 'attr2')}

for id, attr1, attr2 in my_list:
    if id:
        d['id'].append(id)
        d['attr1'].append([])
        d['attr2'].append([])
    d['attr1'][-1].append(attr1)
    d['attr2'][-1].append(attr2)

Python / Get unique tokens from a file with a exception

I want to find the number of unique tokens in a file. For this purpose I wrote the below code:

splittedWords = open('output.txt', encoding='windows-1252').read().lower().split()
uniqueValues = set(splittedWords)

print(uniqueValues)

The output.txt file is like this:

Türkiye+Noun ,+Punc terörizm+Noun+Gen ve+Conj kitle+Noun imha+Noun silah+Noun+A3pl+P3sg+Gen küresel+Adj düzey+Noun+Loc olus+Verb+Caus+PastPart+P3sg tehdit+Noun+Gen boyut+Noun+P3sg karsi+Adj+P3sg+Loc ,+Punc tüm+Det ülke+Noun+A3pl+Gen yay+Verb+Pass+Inf2+Gen önle+Verb+Pass+Inf2+P3sg hedef+Noun+A3pl+P3sg+Acc paylas+Verb+PastPart+P3pl ,+Punc daha+Noun güven+Noun+With ve+Conj istikrar+Noun+With bir+Num dünya+Noun düzen+Noun+P3sg için+PostpPCGen birlik+Noun+Loc çaba+Noun göster+Verb+PastPart+P3pl bir+Num asama+Noun+Dat gel+Verb+Pass+Inf2+P3sg+Acc samimi+Adj ol+Verb+ByDoingSo arzula+Verb+Prog2+Cop .+Punc 
Ab+Noun ile+PostpPCNom gümrük+Noun Alan+Noun+P3sg+Loc+Rel kurumsal+Adj iliski+Noun+A3pl 
club+Noun toplanti+Noun+A3pl+P3sg 
Türkiye+Noun+Gen -+Punc At+Noun gümrük+Noun isbirlik+Noun+P3sg komite+Noun+P3sg ,+Punc Ankara+Noun Anlasma+Noun+P3sg+Gen 6+Num madde+Noun+P3sg uyar+Verb+When ortaklik+Noun rejim+Noun+P3sg+Gen uygula+Verb+Pass+Inf2+P3sg+Acc ve+Conj gelis+Verb+Inf2+P3sg+Acc sagla+Verb+Inf1 üzere+PostpPCNom ortaklik+Noun Konsey+Noun+P3sg+Gen 2+Num /+Punc 69+Num sayili+Adj karar+Noun+P3sg ile+Conj teknik+Noun komite+Noun mahiyet+Noun+P3sg+Loc kur+Verb+Pass+Narr+Cop .+Punc 
nispi+Adj 
nisbi+Adj 
görece+Adj+With 
izafi+Adj 
obur+Adj 

With this code I can get the unique tokens like Türkiye+Noun, Türkiye+Noun+Gen. But I want to get forexample Türkiye+Noun, Türkiye+Noun+Gen like only one token before the + sign. I only want Türkiye part. In the end Türkiye+Noun and Türkiye+Noun+Gen tokens needs to be same and only treated as a single unique token. I think I need to write regex for this purpose.

Solution:

It seems the word you want is always the 1st in a list of '+'-joined words:

Split the splitted words at + and take the 0th one:

text = """Türkiye+Noun ,+Punc terörizm+Noun+Gen ve+Conj kitle+Noun imha+Noun silah+Noun+A3pl+P3sg+Gen küresel+Adj düzey+Noun+Loc olus+Verb+Caus+PastPart+P3sg tehdit+Noun+Gen boyut+Noun+P3sg karsi+Adj+P3sg+Loc ,+Punc tüm+Det ülke+Noun+A3pl+Gen yay+Verb+Pass+Inf2+Gen önle+Verb+Pass+Inf2+P3sg hedef+Noun+A3pl+P3sg+Acc paylas+Verb+PastPart+P3pl ,+Punc daha+Noun güven+Noun+With ve+Conj istikrar+Noun+With bir+Num dünya+Noun düzen+Noun+P3sg için+PostpPCGen birlik+Noun+Loc çaba+Noun göster+Verb+PastPart+P3pl bir+Num asama+Noun+Dat gel+Verb+Pass+Inf2+P3sg+Acc samimi+Adj ol+Verb+ByDoingSo arzula+Verb+Prog2+Cop .+Punc 
Ab+Noun ile+PostpPCNom gümrük+Noun Alan+Noun+P3sg+Loc+Rel kurumsal+Adj iliski+Noun+A3pl 
club+Noun toplanti+Noun+A3pl+P3sg 
Türkiye+Noun+Gen -+Punc At+Noun gümrük+Noun isbirlik+Noun+P3sg komite+Noun+P3sg ,+Punc Ankara+Noun Anlasma+Noun+P3sg+Gen 6+Num madde+Noun+P3sg uyar+Verb+When ortaklik+Noun rejim+Noun+P3sg+Gen uygula+Verb+Pass+Inf2+P3sg+Acc ve+Conj gelis+Verb+Inf2+P3sg+Acc sagla+Verb+Inf1 üzere+PostpPCNom ortaklik+Noun Konsey+Noun+P3sg+Gen 2+Num /+Punc 69+Num sayili+Adj karar+Noun+P3sg ile+Conj teknik+Noun komite+Noun mahiyet+Noun+P3sg+Loc kur+Verb+Pass+Narr+Cop .+Punc 
nispi+Adj 
nisbi+Adj 
görece+Adj+With 
izafi+Adj 
obur+Adj """

splittedWords = text.lower().replace("\n"," ").split()
uniqueValues = set( ( s.split("+")[0] for s in splittedWords))

print(uniqueValues)

Output:

{'imha', 'çaba', 'ülke', 'arzula', 'terörizm', 'olus', 'daha', 'istikrar', 'küresel', 
 'sagla', 'önle', 'üzere', 'nisbi', 'türkiye', 'gelis', 'bir', 'karar', 'hedef', '2', 
 've', 'silah', 'kur', 'alan', 'club', 'boyut', '-', 'anlasma', 'iliski', 
 'izafi', 'kurumsal', 'karsi', 'ankara', 'ortaklik', 'obur', 'kitle', 'güven', 
 'uygula', 'ol', 'düzey', 'konsey', 'teknik', 'rejim', 'komite', 'gümrük', 'samimi', 
  'gel', 'yay', 'toplanti', '.', 'asama', 'mahiyet', 'ab', '69', 'için', 
 'paylas', '6', '/', 'nispi', 'dünya', 'at', 'sayili', 'görece', 'isbirlik', 'birlik', 
 ',', 'tüm', 'ile', 'düzen', 'uyar', 'göster', 'tehdit', 'madde'}

You might need to do some additional cleanup to remove things like

',' '6' '/'

Split and remove anything thats just numbers or punctuation

from string import digits, punctuation

remove=set(digits+punctuation)

splittedWords = text.lower().split()
uniqueValues = set( ( s.split("+")[0] for s in splittedWords))

# remove from set anything that only consists of numbers or punctuation
uniqueValues = uniqueValues - set ( x for x in uniqueValues if all(c in remove for c in x))
print(uniqueValues)

to get it as:

{'teknik', 'yay', 'göster','hedef', 'terörizm', 'ortaklik','ile', 'daha', 'ol', 'istikrar', 
 'paylas', 'nispi', 'üzere', 'sagla', 'tüm', 'önle', 'asama', 'uygula', 'güven', 'kur', 
 'türkiye', 'gel', 'dünya', 'gelis', 'sayili', 'ab', 'club', 'küresel', 'imha', 'çaba', 
 'olus', 'iliski', 'izafi', 'mahiyet', 've', 'düzey', 'anlasma', 'tehdit', 'bir', 'düzen', 
 'obur', 'samimi', 'boyut', 'ülke', 'arzula', 'rejim', 'gümrük', 'karar', 'at', 'karsi', 
 'nisbi', 'isbirlik', 'alan', 'toplanti', 'ankara', 'birlik', 'kurumsal', 'için', 'kitle', 
 'komite', 'silah', 'görece', 'uyar', 'madde', 'konsey'} 

Python, why is this lamdba function not correct?

flight_data is dataframe in panda:

  for c in flight_data.columns:
      if ('Delay' in c):
          flight_data[c].fillna(0, inplace = True)

How do I do this in 1 line using lambda function?

map(lambda c: flight_data[c].fillna(0, inplace = True), list(filter(lambda c : 'Delay' in c, flight_data.columns)))

Why aren’t these two equivalent?

When printing out the data, NaN is not replaced by 0.

Solution:

Don’t use lambda

lambda only obfuscates logic here. Just specify in-scope columns and use fillna directly:

cols = df.filter(like='Delay').columns
df[cols] = df[cols].fillna(0)

How do I do this in 1 line using lambda function?

But to answer your question, you can do this without relying on side-effects of map or a list comprehension:

df = df.assign(**df.pipe(lambda x: {c: x[c].fillna(0) for c in x.filter(like='Delay')}))

Find the dictionary from List which has key-pair 'isGeo':True

How to Find the dictionary from List which has key-pair ‘isGeo’:True

dimensions = [{'key': 2600330, 'id': 'location', 'name': 'Location', 'isGeo': True, 'geoType': 'region'}, {'key': 2600340, 'id': 'subject', 'name': 'Subject', 'isGeo': False, 'geoType': None}, {'key': 2600350, 'id': 'measure', 'name': 'Measure', 'isGeo': False, 'geoType': None}]

I want to below result:

{'key': 2600330, 'id': 'location', 'name': 'Location', 'isGeo': True, 'geoType': 'region'}

Solution:

Use next with a generator expression:

res = next((d for d in dimensions if d['isGeo']), None)

{'key': 2600330, 'id': 'location', 'name': 'Location', 'isGeo': True, 'geoType': 'region'}

Since you tagged , you can also use Pandas:

import pandas as pd

df = pd.DataFrame(dimensions)
res = df.loc[df['isGeo']].iloc[0].to_dict()

The above solutions assume you want only the first dictionary satisfying your condition. If you want a list of dictionaries use:

res = [d for d in dimensions if d['isGeo']]
res = df.loc[df['isGeo']].to_dict('records')

DynamoDB scan not returning desired output

I have a simple python script that is scanning a DynamoDB table. The table holds ARNs for all the accounts I own. There is one primary key “ARNs” of data type string. When I scan the table, I would like to only get the ARN string returned. I am having trouble finding anything in the boto3 documentation that can accomplish this. Below is my code, the returned output, and the desired output.

CODE:

import boto3

dynamo = boto3.client('dynamodb')

# Scans Dynamo for all account role ARNs 
def get_arns():

    response = dynamo.scan(TableName='AllAccountARNs')

    print(response)

get_arns()

OUTPUT:

{'ARNs': {'S': 'arn:aws:iam::xxxxxxx:role/custom_role'}},
{'ARNs': {'S': 'arn:aws:iam::yyyyyyy:role/custom_role'}},
{'ARNs': {'S': 'arn:aws:iam::zzzzzzz:role/custom_role'}}

DESIRED OUPUT:

arn:aws:iam::xxxxxxx:role/custom_role
arn:aws:iam::yyyyyyy:role/custom_role
arn:aws:iam::zzzzzzz:role/custom_role

Solution:

Here’s an example of how to do this with a boto3 DynamoDB Client:

import boto3

ddb = boto3.client('dynamodb')

rsp = ddb.scan(TableName='AllAccountARNs')

for item in rsp['Items']:
  print(item['ARNs']['S'])

Here’s the same thing, but using a boto3 DynamoDB Table Resource:

import boto3

dynamodb = boto3.resource('dynamodb')
tbl = dynamodb.Table('AllAccountARNs')

rsp = tbl.scan()

for item in rsp['Items']:
  print(item['ARNs'])

Note that these examples do not handle large result sets. If LastEvaluatedKey is present in the response, you will need to paginate the result set. See the boto3 documentation.

For more information on Client vs. Resource, see here.

Difference between get and dunder getitem

I am reading Fluent Python and trying to get a deeper understanding of dictionaries.

So when I run the below, the results are easy to understand in that both get() and dunder getitem() return the same result

sample = {'a':1, 'b':2}
print(sample.__getitem__('a')) # 1
print(sample.get('a')) # 1

When I subclass dict with get(), I get a working instance

class MyDict(dict):
    def __missing__(self, key):
        return 0

    def get(self, key):
        return self[key]

d = MyDict(sample)
print(d['a']) # 1
print(d['c']) # 0

Now if I replace get() with dunder getitem() I get an error and I am unsure why.

class MyDict2(dict):
    def __missing__(self, key):
        return 0

    def __getitem__(self, key):
        return self[key]

d = MyDict2(sample)
print(d['a'])
print(d['c'])

error

RecursionError: maximum recursion depth exceeded while calling a Python object

So the question is, what is the difference between get and dunder getitem in this situation and why does this cause a recursion error?

Solution:

That is because self[key] in MyDict2.__getitem__(key) is equivalent to (i.e., calls) self.__getitem__(key) => infinite recursion.

Get multiple values in an xml file

        <!-- someotherline -->
<add name="core" connectionString="user id=value1;password=value2;Data Source=datasource1.comapany.com;Database=databasename_compny" />

I need to grab the values in userid , password, source, database. Not all lines are in the same format.My desired result would be (username=value1,password=value2, DataSource=datasource1.comapany.com,Database=databasename_compny)

This regex seems little bit more complicated as it is more complicated. Please, explain your answer if possible.

I realised its better to loop through each line. Code I wrote so far

while read p || [[ -n $p ]]; do
  #echo $p
  if [[ $p =~ .*connectionString.* ]]; then
    echo $p
  fi
done <a.config

Now inside the if I have to grab the values.

Solution:

For this solution I am considering:

  • Some lines can contain no data
  • No semi-colon ; is inside the data itself (nor field names)
  • No equal sign = is inside the data itself (nor field names)

A possible solution for you problem would be:

#!/bin/bash

while read p || [[ -n $p ]]; do

  # 1. Only keep what is between the quotes after connectionString=
  filteredLine=`echo $p | sed -n -e 's/^.*connectionString="\(.\+\)".*$/\1/p'`;

  # 2. Ignore empty lines (that do not contain the expected data)
  if [ -z "$filteredLine" ]; then
    continue;
  fi;

  # 3. split each field on a line
  oneFieldByLine=`echo $filteredLine | sed -e 's/;/\r\n/g'`;

  # 4. For each field
  while IFS= read -r field; do

    # extract field name + field value
    fieldName=`echo $field | sed 's/=.*$//'`;
    fieldValue=`echo $field | sed 's/^[^=]*=//' | sed 's/[\r\n]//'`;

    # do stuff with it
    echo "'$fieldName' => '$fieldValue'";

  done < <(printf '%s\n' "$oneFieldByLine")

done <a.xml

Explanations

General sed replacement syntax :

  • sed 's/a/b/' will replace what matches the regex a by the content of b
  • Step 1

    • -n argument tells sed not to output if no match is found. In this case this is useful to ignore useless lines.
    • ^.* – anything at the beginning of the line
    • connectionString=" – literally connectionString=”
    • \(.\+\)" – capturing group to store anything in before the closing quote
    • .*$" – anything until the end of the line
    • \1 tells sed to replace the whole match with only the capturing group (which contains only the data between the quotes)
    • p tells sed to print out the replacement
  • Step 3

    • Replace ; by \r\n ; it is equivalent to splitting by semi-colon because bash can loop over line breaks
  • Step 4 – field name

    • Replaces literal = and the rest of the line with nothing (it removes it)
  • Step 4 – field value

    • Replaces all the characters at the beginning that are not = ([^=] matches all but what is after the ‘^’ symbol) until the equal symbol by nothing.
    • Another sed command removes the line breaks by replacing it with nothing.