Generate a dataframe with fake data
Python pandas exercise with dataset
The objective of this Python pandas exercise with dataset is to generate a fake Pandas dataframe with 5 columns and 10 rows (1 per client). The dataset will simulate the data of clients of an international stock broker. Your task is to create the table with the following columns:
- “client_id”: each client ID. It is unique. Create the column with the following IDs:
‘VC425621′,’A65455′,’A625592′,’UX740233’ ‘VC567122′,’A560358′,’XZ419261′,’UX752984’,
‘VC331481′,’XZ574391’. - “account”. The account id. The structure of all accounts id is “A”+6 integers.
Generate 1 fake account number for each client. For example: A856939. - “equity”. The number of shares each client has bought. The client who bought the less were 950 shares and the account with maximum shares has 52620.
Assign a random number of shares inside that range. - “ticker”. The symbol of the stock owned by every client. A ticker is a combination of letters and numbers that represent a particular stock or security listed on an exchange.
Assign randomly one of the following to each client:
BKR, COP, CTRA, CVX, DVN, FANG, HAL, HES, MMM, ABT, ACN, AMZN, AXP, KMI, MPC, MRO, PSX, PXD, SLB
AIG, ADI, AAPL, BBY. - “balance”. The cash balance of clients. Assign the following values to each client:
VC425621: 200 000,
A65455: 1560,
A625592: 58960
UX740233: 20100
VC567122: 5689
A560358: 98563
XZ419261: 12356,
UX752984: 2369,
VC331481: 7895,
XZ574391:21540 - “currency”: the currency of the “balance” amount. All clients have it in USD. Assign “USD” to all of them.
- “commissions”: the commissions rate depends on the balance amount. Calculate the rate applied to each client based on the following conditions:
– If balance < 10 000 USD –> 0.15%
– If balance between 10 000 and 50 000USD –>0.05%
– If balance >= 50 0000 USD –>0 - “comm_amount”: The commission amount paid. To calculate it, just multiply “balance” and “commissions”
Expected output
SOLUTIONS
## Import the libraries
import pandas as pd
import numpy as np
import random
## We create the dataframe and name it broker
broker = pd.DataFrame()
## 1) We create a list with clients IDs
client = ['VC425621','A65455','A625592','UX740233', 'VC567122','A560358','XZ419261','UX752984',
'VC331481','XZ574391']
## We create the first column
broker['client_id'] = client
## 2) Define a function to generate a random 6 digits number and concatenate it to letter "A"
def generate_account(broker):
random_digits = random.randint(100001,999999)
return 'A' + str(random_digits)
broker['account'] = broker.apply(generate_account,axis=1)
## 3) Generate 10 random values between the provided range.
broker['equity'] = np.random.randint(950,52620,size=10)
## 4) Assign a random ticker to every account
tikers = ['BKR', 'COP', 'CTRA', 'CVX', 'DVN', 'FANG', 'HAL', 'HES', 'MMM', 'ABT', 'ACN', 'AMZN', 'AXP', 'KMI', 'MPC', 'MRO', 'PSX', 'PXD', 'SLB','AIG', 'ADI', 'AAPL', 'BBY']
broker['ticker'] = np.random.choice(tikers,size=10)
## 5) Create a dictionary to assign the corresponding values to each client
cash = {'VC425621':200000,'A65455': 1560,'A625592': 58960,'UX740233': 20100,'VC567122': 5689,'A560358': 98563,'XZ419261': 12356, 'UX752984': 2369,'VC331481': 7895,'XZ574391':21540}
broker['balance'] = broker.client_id.map(cash)
## 6) Insert USD for all clients
broker['currency'] = 'USD'
## 7) Define a function to implment the commissions rule
def commiss(broker):
if broker['balance']<10000:
return 0.0015
elif broker['balance']>=10000 and broker['balance']< 50000:
return 0.0005
else:
return 0
broker['commissions'] = broker.apply(commiss,axis=1)
## 8) Multiply the commissions and the balance columns
broker['comm_amount'] = broker.balance * broker.commissions
If you wish to keep practicing data wrangling with Pandas, visit Practity’s real projects!