How to Convert Excel to YAML in Python
Converting Excel data to YAML format is a common task when preparing configuration files or structured data for applications.
In this tutorial, you’ll learn how to convert Excel files to YAML using various Python libraries.
Using csv with PyYAML (Excel to CSV to YAML)
You can start by saving your Excel file as CSV format from Microsoft Excel and then use the csv and yaml libraries to convert it to YAML.
import csv
import yaml
with open('data.csv', 'r', encoding='utf-8') as csv_file:
reader = csv.DictReader(csv_file)
data_list = list(reader)
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
Assuming your data.csv looks like this:
Name,Age,City Ahmed,30,Cairo Fatima,25,Alexandria Hassan,35,Giza
The generated data.yaml will be:
- Age: '30' City: Cairo Name: Ahmed - Age: '25' City: Alexandria Name: Fatima - Age: '35' City: Giza Name: Hassan
Using pandas with PyYAML
You can use pandas library to read Excel files directly and then convert the data to YAML.
import pandas as pd
import yaml
df = pd.read_excel('data.xlsx')
data_list = df.to_dict(orient='records')
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
If your data.xlsx contains:
| Name | Age | City |
|——–|—–|————-|
| Ahmed | 30 | Cairo |
| Fatima | 25 | Alexandria |
| Hassan | 35 | Giza |
The resulting data.yaml will be:
- Age: 30 City: Cairo Name: Ahmed - Age: 25 City: Alexandria Name: Fatima - Age: 35 City: Giza Name: Hassan
This code reads the Excel file using pandas and writes the data to a YAML file.
Using openpyxl with PyYAML
To read Excel files without pandas, you can use openpyxl.
from openpyxl import load_workbook
import yaml
wb = load_workbook('data.xlsx')
sheet = wb.active
data_list = []
headers = [cell.value for cell in next(sheet.iter_rows(min_row=1, max_row=1))]
for row in sheet.iter_rows(min_row=2, values_only=True):
data = dict(zip(headers, row))
data_list.append(data)
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
The data.yaml file will contain:
- Age: 30 City: Cairo Name: Ahmed - Age: 25 City: Alexandria Name: Fatima - Age: 35 City: Giza Name: Hassan
This code uses openpyxl to read the Excel file and converts it to YAML.
Using xlrd with PyYAML
Although xlrd no longer supports .xlsx files, you can still use it for .xls files.
import xlrd
import yaml
wb = xlrd.open_workbook('data.xls')
sheet = wb.sheet_by_index(0)
data_list = []
headers = sheet.row_values(0)
for row_idx in range(1, sheet.nrows):
row_values = sheet.row_values(row_idx)
data = dict(zip(headers, row_values))
data_list.append(data)
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
The generated data.yaml will be:
- Age: 30.0 City: Cairo Name: Ahmed - Age: 25.0 City: Alexandria Name: Fatima - Age: 35.0 City: Giza Name: Hassan
Handle Multiple Sheets in Excel
If your Excel file contains multiple sheets, you can process each sheet individually.
import pandas as pd
import yaml
# Read all sheets
xlsx = pd.ExcelFile('data.xlsx')
all_data = {}
for sheet_name in xlsx.sheet_names:
df = pd.read_excel(xlsx, sheet_name=sheet_name)
data_list = df.to_dict(orient='records')
all_data[sheet_name] = data_list
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
yaml.dump(all_data, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
The data.yaml file will have data organized by sheet names:
Sheet1:
- Age: 30
City: Cairo
Name: Ahmed
- Age: 25
City: Alexandria
Name: Fatima
Sheet2:
- Age: 40
City: Luxor
Name: Ibrahim
- Age: 28
City: Aswan
Name: Mona
This code reads all sheets from the Excel file and writes them to YAML with sheet names as keys.
Represent Data By Grouping
You can process hierarchical relationships in your Excel data to represent nested data.
import pandas as pd
import yaml
df = pd.read_excel('data.xlsx')
grouped = df.groupby('City')
data_dict = {}
for city, group in grouped:
data_list = group[['Name', 'Age']].to_dict(orient='records')
data_dict[city] = data_list
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
yaml.dump(data_dict, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
Assuming data.xlsx contains:
| Name | Age | City |
|——–|—–|————-|
| Ahmed | 30 | Cairo |
| Fatima | 25 | Alexandria |
| Hassan | 35 | Cairo |
| Mona | 28 | Alexandria |
The resulting data.yaml will be:
Alexandria:
- Age: 25
Name: Fatima
- Age: 28
Name: Mona
Cairo:
- Age: 30
Name: Ahmed
- Age: 35
Name: Hassan
This code groups data by city and creates a nested YAML structure.
Filter Specific Columns Before Conversion
You can select specific columns from the Excel file before converting to YAML.
import pandas as pd
import yaml
# Read Excel file and select specific columns
df = pd.read_excel('data.xlsx', usecols=['Name', 'City'])
data_list = df.to_dict(orient='records')
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
The data.yaml will contain:
- City: Cairo Name: Ahmed - City: Alexandria Name: Fatima - City: Giza Name: Hassan
This code reads only the ‘Name’ and ‘City’ columns and writes them to YAML.
Convert Excel with Date Formats
To handle date formats during the conversion, you can parse and format dates appropriately.
import pandas as pd
import yaml
df = pd.read_excel('data.xlsx', parse_dates=['JoinDate'])
# Convert dates to string format
df['JoinDate'] = df['JoinDate'].dt.strftime('%Y-%m-%d')
# Convert DataFrame to list of dictionaries
data_list = df.to_dict(orient='records')
with open('data.yaml', 'w', encoding='utf-8') as yaml_file:
yaml.dump(data_list, yaml_file, allow_unicode=True, default_flow_style=False)
Output:
Assuming data.xlsx contains:
| Name | JoinDate |
|——–|———–|
| Ahmed | 2020-05-15|
| Fatima | 2021-07-10|
| Hassan | 2019-09-20|
The generated data.yaml will be:
- JoinDate: '2020-05-15' Name: Ahmed - JoinDate: '2021-07-10' Name: Fatima - JoinDate: '2019-09-20' Name: Hassan
This code reads the ‘JoinDate’ column as dates, formats them, and writes the data to YAML.
Without converting the dates, you will get YAML like this:
- JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp - 1589500800000000000 - null - null - 10 Name: Ahmed - JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp - 1625875200000000000 - null - null - 10 Name: Fatima - JoinDate: !!python/object/apply:pandas._libs.tslibs.timestamps._unpickle_timestamp - 1568937600000000000 - null - null - 10 Name: Hassan
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.