German Romanticism Corpus Analysis (1780–1840) — NLP + Topic Modeling

Published: December 31, 2025

Role: Lead researcher · Affiliation: Stanford University · Period: Sep 2025 – Dec 2025

Overview

Designed and built a custom German Romanticism textual corpus (1780–1840), then ran a full NLP pipeline on it to study how aesthetic, philosophical, and poetic priorities shifted across the early, transitional, and late phases of the movement.

What I did

Constructed the corpus from scratch by collecting and cataloging literary works across five historical-linguistic “buckets” spanning 1780–1840, performing OCR extraction and text cleaning to standardize heterogeneous historical sources.
Applied multiple NLP frameworks — spaCy and BERTopic in Python, MALLET for LDA-based topic modeling, and R (tidytext) for cross-validation — to conduct topic modeling, topic-flow analysis, and cross-period comparison.
Constructed document–topic matrices, visualized topic transitions over time, and examined thematic evolution across the three Romantic periods.
Analyzed linguistic patterns, recurring motifs, and conceptual clusters to explore how philosophical and poetic priorities shifted — bridging humanities-based interpretation with quantitative ML methods.
Delivered an analytic report (GRC Analytic Report) documenting the methodology, results, and humanistic interpretation in one integrated document.

Why it matters

This is the kind of project that only works if you take both sides seriously — the literary-historical context that decides which periods, authors, and texts go into the corpus, and the computational rigor that makes the topic models reproducible and the trends defensible. The output is a quantitative view of Romanticism’s thematic arc that complements, rather than replaces, traditional close reading.

Tech stack

Python · spaCy · BERTopic · MALLET · R · tidytext · OCR pipelines · LDA · topic modeling · Jupyter

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Peter Cheng

Overview

What I did

Why it matters

Tech stack

Share on