A selection of my data science, machine learning, and computational humanities work — applied ML pipelines, NLP research, analytics, and AI systems. Featured projects have GitHub repositories where the code is public.
Multimodal pipeline combining YOLO panel segmentation, OCR, and VLM/LLM-based reverse prompting to extract structured semantic representations from 26K–64K mid-20th-century German comic panels. Ongoing research at Stanford.
Custom German Romanticism corpus and NLP pipeline (spaCy, BERTopic, MALLET, R tidytext) tracking thematic evolution across early, transitional, and late Romanticism through topic modeling and cross-period comparative analysis.
End-to-end e-commerce analytics and ML project in R Markdown over 1.2M+ transaction records. Built clustering, regression, and tree-based models to surface profitability drivers, and delivered a polished R Markdown report with multi-panel analytics visualizations.
End-to-end multi-class credit tier prediction on 4M+ financial records. Improved accuracy by 4% and reduced false positives by 6% through ensemble modeling and feature engineering. Volunteer engagement with the Microsoft Data Science team.
Independent ML project tackling severe class imbalance (500 frauds in 200K+ records). Benchmarked Random Forest, SGDClassifier, and MLP with PCA, downsampling, and class reweighting to identify the best algorithm for production fraud detection.
Two-phase AI engine for Gomoku (Five-in-a-Row): Phase 1 used Minimax with Alpha-Beta pruning and a custom evaluation function; Phase 2 extended it with self-play and Q-learning for adaptive competitive play.
LightGBM model predicting patient psychological risk levels from medical and behavioral indicators. Achieved AUC 0.87 on imbalanced healthcare data, with classification reports and confusion-matrix visualizations designed for clinical interpretability.
Production-ready fraud detection pipeline on 800K+ Google Pay transactions. Reduced false positives by 8.7% and improved recall by 2.87% through feature engineering, careful model selection, and a tuned GBDT. Volunteer engagement with Google engineers.