Resume Classification Using NLP

Project Overview:

This project involves the implementation of both statistical and modern deep learning models to classify resumes. Utilizing Decision Trees, Random Forests, Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) networks, alongside word embeddings with Word2Vec, the project aims to accurately categorize resumes into predefined job roles. By comparing the performance of these diverse models, the project demonstrates the strengths and applicability of each approach in the context of natural language processing and resume classification.


Objectives

  1. Compare Classification Approaches:
    • Statistical Models: Utilize models like Naive Bayes, Logistic Regression, and Random Forest with feature extraction techniques such as Bag of Words and TF-IDF.
    • Embedding-Based Models: Explore deep learning models like CNN and LSTM with Word2Vec embeddings.
  2. Assess Model Performance:
    • Use metrics such as Accuracy, Precision, Recall, and F1-Score to compare models.
    • Identify strengths and limitations of each approach based on dataset size and complexity.
  3. Contribute to Recruitment Efficiency:
    • Demonstrate scalable and accurate resume classification for large datasets.
    • Highlight practical NLP applications in recruitment, focusing on reduced bias, improved accuracy, and time savings.

Implementation

1. Exploratory Data Analysis (EDA)

2. Data Preprocessing

3. Baseline Statistical Model

4. Traditional Statistical Models

5. Embedding-Based Models


Performance Insights


Key Findings

Model Suitability

Comparative Advantages


Contributions

  1. Improved Recruitment Processes:
    • Automating resume classification enhances accuracy and reduces bias in initial screening stages.
  2. Scalable Solutions:
    • Demonstrates scalability for processing large datasets in recruitment and beyond.
  3. NLP Advancements:
    • Offers a framework for organizations to adopt similar technologies for real-world applications.