Customer Segmentation with SQL (RFM)
📊 Project Overview
This project uses the Online Retail II dataset (2009–2011) loaded into BigQuery.
The goal is to segment customers into actionable groups using Recency, Frequency, and Monetary (RFM) analysis.
Business Question:
How can we identify high-value customers (“Champions”), loyal buyers, new customers, and those at risk of churn?
🧹 Step 1. Data Cleaning
We start by cleaning the raw transactional data:
- Exclude cancellations (invoices starting with C
)
- Drop rows with missing Customer IDs
- Keep only positive prices and quantities
- Aggregate line items into invoice-level totals
``sql CREATE OR REPLACE TABLE ecom.orders_clean AS SELECT Invoice AS order_id, CAST(
Customer IDAS INT64) AS customer_id, DATE(InvoiceDate) AS order_date, ANY_VALUE(Country) AS country, SUM(Quantity * Price) AS total_amount FROM ecom.ecom_raw_online_retail_ii WHERE NOT STARTS_WITH(Invoice, 'C') AND
Customer ID` IS NOT NULL AND Quantity > 0 AND Price > 0 GROUP BY order_id, customer_id, order_date;
Next, we calculate Recency, Frequency, and Monetary value for each customer.
Recency = days since last order
Frequency = number of completed orders
Monetary = total spend
WITH bounds AS ( SELECT MAX(order_date) AS ref_date FROM ecom.orders_clean ) SELECT customer_id, DATE_DIFF((SELECT ref_date FROM bounds), MAX(order_date), DAY) AS recency_days, COUNT(*) AS frequency, SUM(total_amount) AS monetary FROM ecom.orders_clean GROUP BY customer_id;
We assign quintile scores (1–5) to each metric using NTILE.
Smaller recency = higher score
Higher frequency and spend = higher score
WITH q AS ( SELECT *, NTILE(5) OVER (ORDER BY recency_days ASC) AS r_raw, NTILE(5) OVER (ORDER BY frequency DESC) AS f_score, NTILE(5) OVER (ORDER BY monetary DESC) AS m_score FROM ecom.vw_rfm_base ) SELECT customer_id, (6 - r_raw) AS r_score, f_score, m_score, recency_days, frequency, monetary FROM q;
Based on RFM scores, we map customers into segments:
SELECT customer_id, r_score, f_score, m_score, CASE WHEN r_score>=4 AND f_score>=4 AND m_score>=4 THEN ‘Champions’ WHEN r_score>=4 AND f_score>=3 THEN ‘Loyal’ WHEN r_score>=3 AND f_score<=2 AND r_score<=3 THEN ‘At Risk’ WHEN r_score<=2 AND f_score<=2 THEN ‘Hibernating’ WHEN r_score>=4 AND f_score=1 THEN ‘New’ ELSE ‘Potential Loyalist’ END AS segment FROM ecom.vw_rfm_scored;
Finally, we summarize the customer and revenue mix by segment:
SELECT segment, COUNT(DISTINCT customer_id) AS customers, SUM(monetary) AS revenue, ROUND(100 * SUM(monetary) / NULLIF(SUM(SUM(monetary)) OVER (), 0), 1) AS revenue_pct FROM ecom.vw_rfm_segments GROUP BY segment ORDER BY revenue DESC;
New customers dominate revenue (62%) — likely because many are making first-time bulk purchases.
Potential Loyalists (17%) — big customer base but less spend per customer → huge opportunity to convert into Loyal or Champions.
At Risk (10%) — moderate revenue share but small base → worth targeting with win-back campaigns.
Loyal + Champions together <5% of revenue — unusual pattern compared to most businesses, suggests retention and repeat-purchase strategy is weak.
Hibernating (6%) — older, inactive customers; low impact but worth occasional reactivation efforts.
✅ Results
Segment | Customers | Revenue (USD) | Revenue % |
---|---|---|---|
New | 903 | 11,021,535 | 62.1% |
Potential Loyalist | 3,231 | 2,983,970 | 16.8% |
At Risk | 514 | 1,744,678 | 9.8% |
Hibernating | 358 | 1,128,319 | 6.4% |
Loyal | 548 | 767,687 | 4.3% |
Champions | 324 | 98,138 | 0.6% |
💡 Business Insights
- Retention challenge: Despite having >300 “Champions,” they drive <1% of revenue.
- Revenue concentration: 62% of revenue comes from new customers, showing high acquisition but poor retention.
- Growth opportunity: With 3,200+ Potential Loyalists, even a 10% uplift in their repeat purchases would double the Loyal/Champion revenue.
- Strategic action:
- Launch loyalty/reward programs to convert “Potential Loyalists” into repeat buyers.
- Win-back campaigns (email, discounts) for “At Risk” customers.
- Analyze why “Champions” spend less — are they one-time bulk buyers?
- Launch loyalty/reward programs to convert “Potential Loyalists” into repeat buyers.