E-Commerce Profitability Analytics — Amazon India Dataset
Published:
Role: Independent project · Period: Spring 2025 – Oct 2025
Overview
A full end-to-end e-commerce analytics and machine learning project on the Amazon India transaction dataset, executed entirely in R and R Markdown with light use of Google Sheets for manual category refinement. The deliverable was a polished R Markdown report combining narrative interpretation with multi-panel analytics visualizations — the kind of artifact a business analytics team would actually circulate.
What I did
- Cleaned and merged messy multi-source CSV files representing 1.2M+ transactions, standardized inconsistent labels, and validated dataset quality.
- Engineered features including profit margin %, discount ratios, and category-level aggregates that turned raw transaction rows into business-meaningful signals.
- Performed comprehensive EDA using tidyverse and ggplot2 to identify revenue trends, high-performing product groups, and pricing anomalies.
- Built multiple predictive and segmentation models:
- k-means clustering for customer/product segmentation
- Linear regression for baseline profitability drivers
- Tree-based models (CART, Random Forest, Gradient Boosting) for nonlinear interactions
- Applied clustering and regression to optimize pricing strategies, generating recommendations on optimal price bands and high-margin categories.
- Delivered a polished R Markdown report combining narrative, code, and visuals — not just a notebook, but a document a stakeholder can read.
Why it matters
E-commerce decision-making is dominated by pricing, mix, and promotion choices. This project shows how a single analyst can take messy multi-source data and produce both the predictive models and the human-readable report a category manager would actually use.
Tech stack
R · R Markdown · tidyverse · ggplot2 · Google Sheets · k-means · linear regression · CART · Random Forest · Gradient Boosting