Data Scientist

Projects
Jul 2024 - Aug 2024
- Developed a Convolutional Neural Network (CNN) model achieving 95% accuracy in classifying tomato plant diseases.
- Engineered a web application for real-time image classification, processing over 1,500 user-uploaded images.
- Improved disease diagnosis speed by 50%, significantly aiding agricultural productivity.
Jul 2024 - Aug 2024
- Accomplished over 99% prediction accuracy in identifying the impact of review sentiment on rating scores
and user engagement by developing a machine learning regression model trained on categorized review text
and thumbs-up counts.
- Boosted model performance as measured by R² and MAE by transforming user-generated
content into sentiment scores (1–5) and performing extensive feature engineering and text preprocessing.
- Delivered key insights into user behavior by classifying review content into positive, neutral,
and negative, and visualizing trends using Python libraries like Matplotlib and Seaborn.
Jun 2024 - July 2024
- Accomplished generation of 10+ critical business insights as measured by their alignment with ad hoc executive requests,
by writing and optimizing complex SQL queries to support decision-making for a leading hardware company.
- Improved data accessibility and clarity by structuring queries for sales trends, inventory analysis, and customer segmentation,
simulating real-world challenges posed by a data analytics director.
- Demonstrated both technical proficiency and business communication skills through the creation
of clear documentation and rationale behind each SQL query.
Jun 2024 - Aug 2024
- Accomplished 92% classification accuracy in identifying 10+ celebrities from .png images by training a Convolutional
Neural Network (CNN) on a dataset of over 5,000 labeled images.
- Improved model robustness and reduced overfitting by implementing data augmentation techniques and optimizing
with early stopping and dropout layers.
- Delivered a fully functional, user-friendly Streamlit web application that allows users to upload images and
receive real-time predictions, enabling interactive deployment of deep learning models.
Jun 2024 - Aug 2024
- Accomplished 95%+ diagnostic accuracy in predicting diabetes as measured by evaluation on test data by
training and deploying a machine learning model using user inputs such as age, BMI, and glucose levels.
- Enabled self-assessment for over 100+ users by designing an intuitive Streamlit web interface for
real-time, accessible predictions.
- Improved model reliability and reduced uncertainty by selecting key features and fine-tuning classification
algorithms, leading to more confident health risk assessments.
July 2024 - Sep 2024
- Implemented an NLP pipeline using advanced machine-learning techniques to classify news articles with over 98% accuracy.
- Reduced false positives by 35%, ensuring reliable detection of misinformation.
- Designed a web interface allowing users to analyze news content interactively, boosting engagement by 30%.
Sep 2025 – Dec 2025
- Studied UNICEF and World Bank data to evaluate how malaria prevention efforts (ITNs) relate to child mortality across diverse regions and income groups.
- Created an interactive global dashboard with geospatial maps and income-stratified visualizations to communicate health disparities.
- Used Random Forest modeling and fairness metrics (Brier Score) to assess predictive reliability across income subgroups.
- Identified a minimal direct relationship between ITN coverage and mortality (r = 0.13), highlighting socioeconomic inequalities as key drivers.
- Provided evidence-based insights on equitable ITN distribution and health policy targeting.