Breast cancer is one of the world's most common types of cancer. It is an aggressive, highly heterogeneous disease caused by mammary epithelial cell changes driven by genetic and epigenetic alterations. Reversible epigenetic regulation promotes breast cancer heterogeneity between and within tumors' cells by enabling cellular plasticity.
On a molecular level, tumors can be classified into subtypes based on a combination of histopathology of the primary tumor and the expression pattern of hormones estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2). The proliferative capacity of the tumor cells, as measured by the Ki67 marker, also contributes to tumor classification. Recent technological advances also implemented genomic and transcriptomic profiling for classification. Of the five main breast cancer subtypes, Luminal A and B express hormone receptors. Luminal A displays low Ki67 and is HER2 negative, whereas luminal B tumors express high levels of Ki67 and can be either HER2 positive or negative. Normal-like tumors express hormone receptors but not HER2 and Ki67, as well as additional genomic and transcriptomic profiling6. HER2 positive, as its name applies, expresses amplified HER2 but not hormone receptors. Finally, basal-like breast cancer (by and large overlapping with triple-negative breast cancer, TNBC) is defined by a lack of expression of hormonal receptors and HER22.

Our understanding of the relationship between histone post-translational modifications (PTMs) and the capacity of mammary tumor cells to develope is limited. In this project, relying on recent findings, I will explore the role of a specific histone modification, H3K36me2, and its central modulator, NSD1, in promoting breast cancer and the difference between the different subtypes.
Nuclear receptor binding SET domain (NSD) proteins consist of NSD1, NSD2, and NSD3. These proteins participate in the regulation of tumor initiation and progression. However, the biological functions of NSD family in BC progression remain unclear. Here, I propose to explore the underlying mechanisms and biological functions of NSD1 in BC progression. We formulated a hypothesis that NSD1 would strengthen BC cell drug resistance and lead to poor prognosis in patients with BC.
-
explore the effect of NSD1 gene expression on the overall survival of BC patients with different subtypes (Luminal A, Luminal B and Basal-like)
-
Investigate the cellular pathways affected by NSD1 levels within patients with different subtypes (Luminal A, Luminal B and Basal-like)
To run this scripts, you need to have the following Python libraries installed :
pandasopenpyxlmatplotlibnumpylifelinesgseapy
You can install these libraries using listed in requirements.txt, using pip:
pip install -r requirements.txt1. Export data from cBioPrtal * Choose a database according to the cancer type. In this project, I investigated the data from METABRIC * Download clinical data and expression levels of the gene of choice. I chose NSD1 * Download survival and mRNA expression data for groups with different expression levels of the gene of interest for each subtype separately. One might opt to compare data using either the median or quartiles. I chose to compare the bottom and the top quartiles of the expression of NSD1 and compare Luminal A (LumA), Luminal B(LumB) and Basal-like (Basal)
2. Coordinate the loading, merging, cleaning, and saving of the data by
- Execute the script
data_processing.py
python data_processing.py- The script will yield
cleaned_clinical_nsd1_data.csvthat will be used in the next part - Detailed explanations and requirements can be found in
data_processing_explained.md - Testing the script:
python pytest test_data_processing.py3. analyze the data
- Execute the script
data_analysis.py
python data_analysis.py- The script will yield a figure with 2 panels: Kaplan-Meier plot and barplot of the enriched pathwats
- Detailed explanations and requirements can be found in
data_analysis_explained.md - Testing the script:
python pytest test_data_analysis.py