Hi, I’m Peter.
I’m a data scientist working at the intersection of applied machine learning and computational language research. My work spans fraud detection, credit risk, and large-scale ML pipelines on the applied side, and NLP, topic modeling, and multimodal analysis on the research side.
I’m currently pursuing two graduate degrees in parallel — an MS in Data Science at Johns Hopkins and an MA in German Studies at Stanford, with a research specialization in Computational NLP & Textual Data Science. The combination is deliberate: I care about building ML systems that are not only accurate but interpretable, defensible, and grounded in real-world context.
Education
-
Johns Hopkins University — Master of Science in Data Science
Jan 2026 – Dec 2026 · GPA 4.0 -
Stanford University — Master of Arts in German Studies (Specialization in Computational NLP & Textual Data Science)
Sep 2024 – Jun 2026 · GPA 3.6 -
University of California, Santa Barbara — Bachelor of Arts in German Studies
Sep 2019 – Sep 2023 · GPA 3.74
Related coursework in Applied Mathematics and Data Science
What I work on
- Applied ML for risk and decisioning. End-to-end pipelines for credit tier prediction, fraud detection, and customer segmentation — including independent volunteer engagements with the Microsoft Data Science team and Google engineering team.
- NLP and topic modeling on historical text. Building corpora, OCR pipelines, and BERTopic / MALLET workflows to study thematic evolution in 19th-century German Romantic literature.
- Multimodal pipelines for cultural data. Combining YOLO-based panel segmentation, OCR, and VLM/LLM reverse prompting to extract structured semantic representations from tens of thousands of comic panels.
Skills
Programming: Python · R · SQL · Java · HTML/CSS
ML & Stats: Tree-based and regression models · SVM · unsupervised methods · model evaluation · feature engineering · MLOps
Data: pandas · NumPy · SciPy · scikit-learn · seaborn · Matplotlib · Plotly · Apache Parquet
Cloud & Tools: AWS (S3, SageMaker, EC2) · Google Cloud BigQuery · GitHub · Jupyter · Flask · Tableau
Specialties: NLP · multimodal analysis · feature engineering · MLOps
Languages: Mandarin (native) · English (fluent) · German (fluent)
Certifications
- IBM Data Science Professional Certificate — Fall 2025
- Google Data Analytics Professional Certificate — Summer 2025
What I’m looking for
I’m actively interviewing for full-time roles starting after graduation — Data Scientist, Data Analyst, Machine Learning Engineer, Business Analyst, and BI Analyst positions. I’m especially drawn to teams working on interpretable modeling, risk and decision-making, fraud and trust, or applied NLP where the work reaches a real user.
If you’re hiring, collaborating on something at the language/ML boundary, or just want to chat — my email is in the sidebar, and I’m easy to reach on LinkedIn.
Elsewhere
- CV — see the CV page for the full picture
- Projects — selected work lives on the Projects page
- Code — most of my open work is on GitHub