Resume Parser Application

This is a resume parser application built using Streamlit, Spacy, Plotly, and Google Serp API. It helps in identifying the job title, skills, and experience of candidates from their resumes. Additionally, it provides a job search feature based on the job title and location using Google Serp API.

How to Start

Follow the steps below to set up and run the application:

1. Clone the Repository

Clone this repository to your local machine using the following command:

git clone <repository-url>

2. Install the dependencies

switch to the main directory

cd Resume-Reader
pip install -r requirements.txt

3. Run the application

streamlit run main.py

Training statistics -

We have fine-tuned two models namely roberta-base for Named Entity Extraction and bert-base for generation of job title after getting the skills as keywords in input.

Model Name	Training Time	GPU used	Inference Speed
roberta-base	1 hour	Tesla-T4 (Colab)	9.4 seconds
bert-base	1 hour	P-100 (Kaggle)	3.2 seconds

utils.py

This files defines a set of functions and configurations for parsing and processing text data from various formats such as PDF, DOCX, and HTML. Below is an explanation of each component:

Parsing Functions:

parse_pdf(pdf_content):
- Uses PyMuPDF (fitz) library to parse text content from a PDF file.
- It iterates through each page of the PDF document, extracts text, and concatenates it.
extract_text_from_resume(html_content):
- Parses HTML content using BeautifulSoup.
- Extracts text from HTML elements and preprocesses it by removing extra whitespaces.
parse_docx(docx_content):
- Utilizes the python-docx library to extract text content from a DOCX file.
- It reads each paragraph from the document and concatenates them into a single text string.

Text Processing Functions:

preprocess_text(text):
- Removes extra whitespaces from the input text.
process_text(text):
- Splits the text into sentences based on full stops.
- Concatenates the sentences with a newline after each.

Visualizations and Configuration:

colors: Defines a dictionary containing colors for different named entities.
options: Configuration for entity visualization using spaCy's displacy module.
- Specifies named entities (e.g., "JOB TITLE", "SKILLS") and their associated colors.

Libraries Used:

PyMuPDF (fitz): For PDF parsing.
BeautifulSoup (bs4): For HTML parsing.
python-docx: For parsing DOCX files.
spacy: For named entity recognition and visualization.
plotly: For generating interactive visualizations like graphs and charts.

app.py

This code defines a Streamlit application for parsing and visualizing resume data, as well as performing job searches based on extracted information. Below is an explanation of each component:

Libraries Used:

streamlit: Framework for building interactive web applications.
spacy: For named entity recognition and visualization.
plotly: For generating interactive visualizations like graphs and charts.
pandas: For data manipulation and analysis.

Components:

Introduction Page:
- Displays a welcome message and an about section in the sidebar.
- Provides a file uploader to upload resumes in PDF, DOCX, or HTML format.
- Processes the uploaded file to extract entities (Job Title, Skills, etc.).
- Visualizes the extracted entities using spaCy's displacy module.
Process Uploaded File:
- Parses the uploaded file based on its format (PDF, DOCX, HTML).
- Utilizes spaCy for named entity recognition on the parsed text.
- Renders the parsed text and named entities using Streamlit components.
Visualization Page:
- Displays visualizations of extracted entities from the resume.
- Generates a sunburst chart using Plotly Express to visualize entity categories and values.
Search Jobs Page:
- Allows users to search for job opportunities based on extracted skills or job titles.
- Users can enter a query (job title or keyword) and location for the job search.
- Performs a job search using predefined functions from the 'jobs' module.
- Logs the job search event.

Logging:

File Logging: Logs events such as resume upload, entity extraction, visualization, and job searches to a log file named 'resume_parser.log'.
Provides information about the time taken for model loading and inference.

Video Demonstration

Screen.Recording.2024-05-17.at.9.21.14.PM.2.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
model-best-with-job-titles		model-best-with-job-titles
model-best		model-best
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
Custom_NER.ipynb		Custom_NER.ipynb
Job-Title-Prediction.ipynb		Job-Title-Prediction.ipynb
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Resume Parser Application

How to Start

1. Clone the Repository

2. Install the dependencies

3. Run the application

Training statistics -

utils.py

Parsing Functions:

Text Processing Functions:

Visualizations and Configuration:

Libraries Used:

app.py

Libraries Used:

Components:

Logging:

Video Demonstration

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Resume Parser Application

How to Start

1. Clone the Repository

2. Install the dependencies

3. Run the application

Training statistics -

utils.py

Parsing Functions:

Text Processing Functions:

Visualizations and Configuration:

Libraries Used:

app.py

Libraries Used:

Components:

Logging:

Video Demonstration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages