Skip to content

roisiegelman/NSD1-in-Breast-Cancer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

101 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Elucidating the role of NSD1 in breast cancer progression

introduction

Breast cancer is one of the world's most common types of cancer. It is an aggressive, highly heterogeneous disease caused by mammary epithelial cell changes driven by genetic and epigenetic alterations. Reversible epigenetic regulation promotes breast cancer heterogeneity between and within tumors' cells by enabling cellular plasticity.

On a molecular level, tumors can be classified into subtypes based on a combination of histopathology of the primary tumor and the expression pattern of hormones estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2). The proliferative capacity of the tumor cells, as measured by the Ki67 marker, also contributes to tumor classification. Recent technological advances also implemented genomic and transcriptomic profiling for classification. Of the five main breast cancer subtypes, Luminal A and B express hormone receptors. Luminal A displays low Ki67 and is HER2 negative, whereas luminal B tumors express high levels of Ki67 and can be either HER2 positive or negative. Normal-like tumors express hormone receptors but not HER2 and Ki67, as well as additional genomic and transcriptomic profiling6. HER2 positive, as its name applies, expresses amplified HER2 but not hormone receptors. Finally, basal-like breast cancer (by and large overlapping with triple-negative breast cancer, TNBC) is defined by a lack of expression of hormonal receptors and HER22. image

Our understanding of the relationship between histone post-translational modifications (PTMs) and the capacity of mammary tumor cells to develope is limited. In this project, relying on recent findings, I will explore the role of a specific histone modification, H3K36me2, and its central modulator, NSD1, in promoting breast cancer and the difference between the different subtypes.image

Nuclear receptor binding SET domain (NSD) proteins consist of NSD1, NSD2, and NSD3. These proteins participate in the regulation of tumor initiation and progression. However, the biological functions of NSD family in BC progression remain unclear. Here, I propose to explore the underlying mechanisms and biological functions of NSD1 in BC progression. We formulated a hypothesis that NSD1 would strengthen BC cell drug resistance and lead to poor prognosis in patients with BC.

main goals

  1. explore the effect of NSD1 gene expression on the overall survival of BC patients with different subtypes (Luminal A, Luminal B and Basal-like)

  2. Investigate the cellular pathways affected by NSD1 levels within patients with different subtypes (Luminal A, Luminal B and Basal-like)

Requirements

To run this scripts, you need to have the following Python libraries installed :

  • pandas
  • openpyxl
  • matplotlib
  • numpy
  • lifelines
  • gseapy

You can install these libraries using listed in requirements.txt, using pip:

pip install -r requirements.txt

Technical steps:

1. Export data from cBioPrtal * Choose a database according to the cancer type. In this project, I investigated the data from METABRIC * Download clinical data and expression levels of the gene of choice. I chose NSD1 * Download survival and mRNA expression data for groups with different expression levels of the gene of interest for each subtype separately. One might opt to compare data using either the median or quartiles. I chose to compare the bottom and the top quartiles of the expression of NSD1 and compare Luminal A (LumA), Luminal B(LumB) and Basal-like (Basal)

2. Coordinate the loading, merging, cleaning, and saving of the data by

  • Execute the script data_processing.py
python data_processing.py
  • The script will yield cleaned_clinical_nsd1_data.csv that will be used in the next part
  • Detailed explanations and requirements can be found in data_processing_explained.md
  • Testing the script:
python pytest test_data_processing.py

3. analyze the data

  • Execute the script data_analysis.py
python  data_analysis.py
  • The script will yield a figure with 2 panels: Kaplan-Meier plot and barplot of the enriched pathwats
  • Detailed explanations and requirements can be found in data_analysis_explained.md
  • Testing the script:
python pytest test_data_analysis.py

This project was originally implemented as part of the Python programming course at the Weizmann Institute of Science taught by Gabor Szabo.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages