Hi, I’m Peter.

I’m a data scientist working at the intersection of applied machine learning and computational language research. My work spans fraud detection, credit risk, and large-scale ML pipelines on the applied side, and NLP, topic modeling, and multimodal analysis on the research side.

I’m currently pursuing two graduate degrees in parallel — an MS in Data Science at Johns Hopkins and an MA in German Studies at Stanford, with a research specialization in Computational NLP & Textual Data Science. The combination is deliberate: I care about building ML systems that are not only accurate but interpretable, defensible, and grounded in real-world context.

Education

  • Johns Hopkins UniversityMaster of Science in Data Science
    Jan 2026 – Dec 2026 · GPA 4.0

  • Stanford UniversityMaster of Arts in German Studies (Specialization in Computational NLP & Textual Data Science)
    Sep 2024 – Jun 2026 · GPA 3.6

  • University of California, Santa BarbaraBachelor of Arts in German Studies
    Sep 2019 – Sep 2023 · GPA 3.74
    Related coursework in Applied Mathematics and Data Science

What I work on

  • Applied ML for risk and decisioning. End-to-end pipelines for credit tier prediction, fraud detection, and customer segmentation — including independent volunteer engagements with the Microsoft Data Science team and Google engineering team.
  • NLP and topic modeling on historical text. Building corpora, OCR pipelines, and BERTopic / MALLET workflows to study thematic evolution in 19th-century German Romantic literature.
  • Multimodal pipelines for cultural data. Combining YOLO-based panel segmentation, OCR, and VLM/LLM reverse prompting to extract structured semantic representations from tens of thousands of comic panels.

Skills

Programming: Python · R · SQL · Java · HTML/CSS
ML & Stats: Tree-based and regression models · SVM · unsupervised methods · model evaluation · feature engineering · MLOps
Data: pandas · NumPy · SciPy · scikit-learn · seaborn · Matplotlib · Plotly · Apache Parquet
Cloud & Tools: AWS (S3, SageMaker, EC2) · Google Cloud BigQuery · GitHub · Jupyter · Flask · Tableau
Specialties: NLP · multimodal analysis · feature engineering · MLOps
Languages: Mandarin (native) · English (fluent) · German (fluent)

Certifications

  • IBM Data Science Professional Certificate — Fall 2025
  • Google Data Analytics Professional Certificate — Summer 2025

What I’m looking for

I’m actively interviewing for full-time roles starting after graduation — Data Scientist, Data Analyst, Machine Learning Engineer, Business Analyst, and BI Analyst positions. I’m especially drawn to teams working on interpretable modeling, risk and decision-making, fraud and trust, or applied NLP where the work reaches a real user.

If you’re hiring, collaborating on something at the language/ML boundary, or just want to chat — my email is in the sidebar, and I’m easy to reach on LinkedIn.

Elsewhere

  • CV — see the CV page for the full picture
  • Projects — selected work lives on the Projects page
  • Code — most of my open work is on GitHub