E-Commerce Profitability Analytics — Amazon India Dataset

Published: October 31, 2025

Role: Independent project · Period: Spring 2025 – Oct 2025

Overview

A full end-to-end e-commerce analytics and machine learning project on the Amazon India transaction dataset, executed entirely in R and R Markdown with light use of Google Sheets for manual category refinement. The deliverable was a polished R Markdown report combining narrative interpretation with multi-panel analytics visualizations — the kind of artifact a business analytics team would actually circulate.

What I did

Cleaned and merged messy multi-source CSV files representing 1.2M+ transactions, standardized inconsistent labels, and validated dataset quality.
Engineered features including profit margin %, discount ratios, and category-level aggregates that turned raw transaction rows into business-meaningful signals.
Performed comprehensive EDA using tidyverse and ggplot2 to identify revenue trends, high-performing product groups, and pricing anomalies.
Built multiple predictive and segmentation models:
- k-means clustering for customer/product segmentation
- Linear regression for baseline profitability drivers
- Tree-based models (CART, Random Forest, Gradient Boosting) for nonlinear interactions
Applied clustering and regression to optimize pricing strategies, generating recommendations on optimal price bands and high-margin categories.
Delivered a polished R Markdown report combining narrative, code, and visuals — not just a notebook, but a document a stakeholder can read.

Why it matters

E-commerce decision-making is dominated by pricing, mix, and promotion choices. This project shows how a single analyst can take messy multi-source data and produce both the predictive models and the human-readable report a category manager would actually use.

Tech stack

R · R Markdown · tidyverse · ggplot2 · Google Sheets · k-means · linear regression · CART · Random Forest · Gradient Boosting

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Peter Cheng

Overview

What I did

Why it matters

Tech stack

Share on