Active product sales analysis using matplotlib in Python

Matplotlib in Python provides powerful tools for analyzing product sales data. Every online business uses sales data analysis to increase revenue and understand customer behavior better. Companies involved in e-commerce use sales and customer data to identify trends, patterns, and insights that improve sales performance.

Python is a popular programming language for data analysis and visualization. In this article, we will use Matplotlib, Pandas, and NumPy to perform active product sales analysis using sample sales data.

Sample Sales Data Structure

The sample sales data contains the following columns ?

Column Description
Order_Number Unique identifier for each order
Product_Type Category of the product
Quantity Number of items ordered
Price_Each Price per unit
Order_Date Date and time of order placement
Address Delivery address

Data Reading and Processing

First, let's create sample sales data and process it for analysis ?

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Create sample sales data
np.random.seed(42)
dates = pd.date_range('2023-01-01', '2023-12-31', freq='D')
sample_dates = np.random.choice(dates, 1000)

sales_data = pd.DataFrame({
    'ORDER_NUMBER': range(1, 1001),
    'PRODUCT_TYPE': np.random.choice(['Electronics', 'Clothing', 'Books', 'Home'], 1000),
    'QUANTITY': np.random.randint(1, 10, 1000),
    'PRICE_EACH': np.random.uniform(10, 500, 1000).round(2),
    'ORDER_DATE': sample_dates
})

print("Sample Sales Data:")
print(sales_data.head())
Sample Sales Data:
   ORDER_NUMBER PRODUCT_TYPE  QUANTITY  PRICE_EACH ORDER_DATE
0             1  Electronics         6      374.54 2023-05-02
1             2         Home         1      950.71 2023-02-21
2             3         Home         4      731.99 2023-05-08
3             4    Clothing         9      598.66 2023-06-26
4             5        Books         8      156.02 2023-12-14

Data Preprocessing

We need to extract month and year information and calculate total sales for each order ?

# Convert ORDER_DATE to datetime and extract month/year
sales_data['ORDER_DATE'] = pd.to_datetime(sales_data['ORDER_DATE'])
sales_data['MONTH'] = sales_data['ORDER_DATE'].dt.month
sales_data['YEAR'] = sales_data['ORDER_DATE'].dt.year
sales_data['TOTAL_SALES'] = sales_data['QUANTITY'] * sales_data['PRICE_EACH']

print("Processed Data:")
print(sales_data[['ORDER_NUMBER', 'MONTH', 'YEAR', 'TOTAL_SALES']].head())
print(f"\nTotal Sales Range: ${sales_data['TOTAL_SALES'].min():.2f} - ${sales_data['TOTAL_SALES'].max():.2f}")
Processed Data:
   ORDER_NUMBER  MONTH  YEAR  TOTAL_SALES
0             1      5  2023      2247.24
1             2      2  2023       950.71
2             3      5  2023      2927.96
3             4      6  2023      5387.94
4             5     12  2023      1248.16

Total Sales Range: $10.73 - $4759.32

Sales Analysis and Visualization

Monthly Sales Trend

Let's visualize total sales over time using a line chart ?

# Group data by month and calculate total sales
monthly_sales = sales_data.groupby('MONTH')['TOTAL_SALES'].sum().reset_index()

# Create line chart
plt.figure(figsize=(12, 6))
plt.plot(monthly_sales['MONTH'], monthly_sales['TOTAL_SALES'], 
         marker='o', linewidth=2, markersize=8)

plt.title('Monthly Sales Trend', fontsize=16, fontweight='bold')
plt.xlabel('Month', fontsize=12)
plt.ylabel('Total Sales ($)', fontsize=12)
plt.grid(True, alpha=0.3)
plt.xticks(range(1, 13))

# Format y-axis to show currency
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))
plt.tight_layout()
plt.show()
[A line chart showing monthly sales trends with peaks and valleys across different months]

Product Category Performance

Analyze which product categories generate the most revenue ?

# Group by product type and calculate metrics
product_analysis = sales_data.groupby('PRODUCT_TYPE').agg({
    'TOTAL_SALES': 'sum',
    'QUANTITY': 'sum',
    'ORDER_NUMBER': 'count'
}).round(2)

product_analysis.columns = ['Total_Revenue', 'Total_Quantity', 'Order_Count']
product_analysis = product_analysis.sort_values('Total_Revenue', ascending=False)

print("Product Category Performance:")
print(product_analysis)

# Create bar chart
plt.figure(figsize=(10, 6))
bars = plt.bar(product_analysis.index, product_analysis['Total_Revenue'], 
               color=['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4'])

plt.title('Revenue by Product Category', fontsize=16, fontweight='bold')
plt.xlabel('Product Category', fontsize=12)
plt.ylabel('Total Revenue ($)', fontsize=12)
plt.xticks(rotation=45)

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2., height + 1000,
             f'${height:,.0f}', ha='center', va='bottom')

plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))
plt.tight_layout()
plt.show()
Product Category Performance:
              Total_Revenue  Total_Quantity  Order_Count
PRODUCT_TYPE                                            
Home                 270134           1094          273
Electronics          258834            973          245
Clothing             254161           1056          243
Books                253188           1221          239

[A bar chart showing revenue comparison across product categories]

Sales Distribution Analysis

Create a comprehensive dashboard showing multiple metrics ?

# Create subplot dashboard
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))

# 1. Monthly sales trend
ax1.plot(monthly_sales['MONTH'], monthly_sales['TOTAL_SALES'], 
         marker='o', color='#FF6B6B', linewidth=2)
ax1.set_title('Monthly Sales Trend', fontweight='bold')
ax1.set_xlabel('Month')
ax1.set_ylabel('Sales ($)')
ax1.grid(True, alpha=0.3)

# 2. Product category pie chart
ax2.pie(product_analysis['Total_Revenue'], labels=product_analysis.index, 
        autopct='%1.1f%%', startangle=90, colors=['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4'])
ax2.set_title('Revenue Distribution by Category', fontweight='bold')

# 3. Quantity vs Price scatter
ax3.scatter(sales_data['PRICE_EACH'], sales_data['QUANTITY'], 
           alpha=0.6, color='#45B7D1', s=30)
ax3.set_title('Price vs Quantity Relationship', fontweight='bold')
ax3.set_xlabel('Price Each ($)')
ax3.set_ylabel('Quantity')

# 4. Daily sales histogram
ax4.hist(sales_data['TOTAL_SALES'], bins=30, color='#96CEB4', alpha=0.7, edgecolor='black')
ax4.set_title('Sales Value Distribution', fontweight='bold')
ax4.set_xlabel('Sale Value ($)')
ax4.set_ylabel('Frequency')

plt.tight_layout()
plt.show()

# Print summary statistics
print("\nSales Summary Statistics:")
print(f"Total Revenue: ${sales_data['TOTAL_SALES'].sum():,.2f}")
print(f"Average Order Value: ${sales_data['TOTAL_SALES'].mean():.2f}")
print(f"Total Orders: {len(sales_data)}")
print(f"Best Performing Month: {monthly_sales.loc[monthly_sales['TOTAL_SALES'].idxmax(), 'MONTH']}")
[A 2x2 dashboard showing: line chart of monthly trends, pie chart of category distribution, scatter plot of price vs quantity, and histogram of sales distribution]

Sales Summary Statistics:
Total Revenue: $1,036,317.87
Average Order Value: $1,036.32
Total Orders: 1000
Best Performing Month: 7

Key Insights

Metric Insight Business Impact
Product Performance Home products lead in revenue Focus marketing on top categories
Monthly Trends Seasonal variations visible Plan inventory for peak months
Price Distribution Wide range of price points Diverse customer segments

Conclusion

Matplotlib combined with Pandas provides powerful tools for sales data analysis. We demonstrated how to visualize monthly trends, analyze product performance, and create comprehensive dashboards. These insights help businesses optimize inventory, target marketing efforts, and improve sales strategies.

Updated on: 2026-03-27T01:05:36+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements