Skip to main content

One post tagged with "analytics"

View All Tags

Measuring the Impact of Recommendation Systems

· 3 min read
Eric Ngo
ML Engineer @dodo, ex-ML engineers @Meta/Samsung-Research/7-11

Building a recommendation system is only half the battle. To understand its true value, you must be able to measure its impact on both user behavior and business health. Measuring a RecSys requires a combination of online and offline metrics.

1. Business & North Star Metrics

These are the top-level indicators of whether your RecSys is actually helping the business grow.

  • Conversion Rate (CVR): The percentage of users who take a desired action (e.g., purchase, sign-up) after interacting with a recommendation.
  • Click-Through Rate (CTR): Measures the immediate relevance of your recommendations. A high CTR suggests your "Discovery" layer is working.
  • Average Order Value (AOV): Personalized cross-selling often leads to users adding more items to their cart, increasing the total value per session.
  • Customer Lifetime Value (CLV): Long-term measurement of how personalized experiences improve retention and total spend over time.
  • Revenue Lift: The delta in revenue between a personalized experience and a baseline (e.g., popularity-based).

2. Engagement Metrics

These help you understand how users are interacting with the system on a daily basis.

  • Time Spent on Platform: Personalized feeds (like TikTok or Netflix) are designed to maximize this metric.
  • Retention Rate: Are users coming back? High-quality recommendations are one of the strongest drivers of Day-7 and Day-30 retention.
  • Session Depth: How many items a user views or interacts with in a single session.

3. Machine Learning (Offline) Metrics

Before deploying a model, data scientists use these to evaluate its mathematical performance on historical data.

  • Recall@K: Out of all the items a user actually interacted with, how many did the model correctly predict in the top $K$ results?
  • Precision@K: Out of the top $K$ items recommended, how many were actually relevant?
  • NDCG (Normalized Discounted Cumulative Gain): Measures the quality of the ranking. It rewards the model for placing the most relevant items at the very top of the list.
  • MRR (Mean Reciprocal Rank): Specifically focuses on the position of the first relevant item.

4. Ecosystem Health Metrics

A healthy RecSys doesn't just show the same 5 popular items. It must maintain a diverse and fresh ecosystem.

  • Coverage: The percentage of your total catalog that is actually being recommended to at least one user. High coverage ensures "long-tail" items are discovered.
  • Novelty: Measures how "new" or "surprising" the recommendations are to the user.
  • Diversity: Ensures the list of recommendations isn't too repetitive (e.g., showing 10 identical black t-shirts).

Conclusion

The most successful companies don't just pick one metric. They use a balanced scorecard—combining offline accuracy (NDCG) with online business impact (Revenue Lift) and long-term ecosystem health (Diversity).

To see how these metrics vary across different types of models, check out our next post.