import pandas as pdThe script starts by importing the pandas library, which is a powerful data manipulation and analysis tool in Python.
def load_and_rename_clinical_data(filepath='brca_metabric_clinical_data.csv'):
clinical_data = pd.read_csv(filepath)
clinical_data.rename(columns={
'Overall Survival (Months)': 'OS_MONTHS',
'Overall Survival Status': 'OS_STATUS',
'Pam50 + Claudin-low subtype': 'CLAUDIN_SUBTYPE'
}, inplace=True)
return clinical_data- Purpose: Load clinical data from a CSV file and rename specific columns for easier reference.
- Steps:
- Read CSV File:
pd.read_csv(filepath)loads the CSV file into a DataFrame. - Rename Columns:
rename(columns={...}, inplace=True)renames specific columns:- 'Overall Survival (Months)' to 'OS_MONTHS'
- 'Overall Survival Status' to 'OS_STATUS'
- 'Pam50 + Claudin-low subtype' to 'CLAUDIN_SUBTYPE'
- Return DataFrame: The modified DataFrame is returned.
- Read CSV File:
def load_and_rename_nsd1_data(filepath='NSD1_mRNA_expression.csv'):
nsd1_data = pd.read_csv(filepath)
nsd1_data.rename(columns={
'Sample ID': 'Sample_ID',
'NSD1: mRNA expression z-scores relative to all samples (log microarray)': 'NSD1'
}, inplace=True)
return nsd1_data- Purpose: Load NSD1 mRNA expression data from a CSV file and rename specific columns for easier reference.
- Steps:
- Read CSV File:
pd.read_csv(filepath)loads the CSV file into a DataFrame. - Rename Columns:
rename(columns={...}, inplace=True)renames specific columns:- 'Sample ID' to 'Sample_ID'
- 'NSD1: mRNA expression z-scores relative to all samples (log microarray)' to 'NSD1'
- Return DataFrame: The modified DataFrame is returned.
- Read CSV File:
def merge_data(clinical_data, nsd1_data):
merged_data = pd.merge(clinical_data, nsd1_data, left_on='Sample ID', right_on='Sample_ID')
return merged_data- Purpose: Merge the clinical and NSD1 data based on the sample IDs.
- Steps:
- Merge DataFrames:
pd.merge(...)merges the two DataFrames on the 'Sample ID' column from the clinical data and the 'Sample_ID' column from the NSD1 data. - Return Merged DataFrame: The merged DataFrame is returned.
- Merge DataFrames:
def save_data(data, filepath='merged_clinical_nsd1_data.csv'):
data.to_csv(filepath, index=False)- Purpose: Save the merged data to a CSV file.
- Steps:
- Save to CSV:
data.to_csv(filepath, index=False)saves the DataFrame to a CSV file without the index column.
- Save to CSV:
def clean_data(data):
cleaned_data = data.dropna(subset=['OS_MONTHS', 'OS_STATUS', 'CLAUDIN_SUBTYPE', 'NSD1'])
return cleaned_data- Purpose: Clean the data by removing rows with missing values in critical columns.
- Steps:
- Drop Missing Values:
data.dropna(subset=[...])removes rows that have missing values in any of the specified columns. - Return Cleaned DataFrame: The cleaned DataFrame is returned.
- Drop Missing Values:
def main():
clinical_data = load_and_rename_clinical_data()
nsd1_data = load_and_rename_nsd1_data()
merged_data = merge_data(clinical_data, nsd1_data)
cleaned_data = clean_data(merged_data)
save_data(cleaned_data, 'cleaned_clinical_nsd1_data.csv')
print("Cleaned data saved to 'cleaned_clinical_nsd1_data.csv'")- Purpose: Coordinate the loading, merging, cleaning, and saving of the data.
- Steps:
- Load Clinical Data:
load_and_rename_clinical_data()loads and renames the clinical data. - Load NSD1 Data:
load_and_rename_nsd1_data()loads and renames the NSD1 data. - Merge Data:
merge_data(clinical_data, nsd1_data)merges the two DataFrames. - Clean Data:
clean_data(merged_data)cleans the merged data by removing rows with missing values. - Save Data:
save_data(cleaned_data, 'cleaned_clinical_nsd1_data.csv')saves the cleaned data to a CSV file. - Print Confirmation:
print(...)prints a message confirming that the data has been saved.
- Load Clinical Data:
if __name__ == "__main__":
main()- Purpose: Ensure the
mainfunction is called when the script is run directly. - Steps:
- Check Execution Context:
if __name__ == "__main__":checks if the script is being run directly. - Call Main Function:
main()is called to execute the main data processing steps.
- Check Execution Context: