Contents

System Design for Modern Recommendation System

System Requirement

  • Design a recommender system to recommend top 10 items for the user on e-commerce platform
  • Model retraining pipeline

Define the Business Goal

  • What is valuable to the e-commerce platform?

    • Most of the e-commerce’s income comes from the partial of cash flow between customer and seller. Increase the platform transaction amount is what we care about.
  • Setting the Online Evaluation Metric base on our goal

    • GMV (Gross Merchandise Value) = Page View * Checkout Conversion Rate * Average Order Value
  • Usually online evaluation metric could be multiple at the early stage of the recommender system

    • ex
      • checkout CVR
      • add-to-cart CVR
      • favorite CVR

Define the Offline Evaluation for the Ranking System

  • How to define the score for ranking the items?

    • score = p1^a * p2^b * p3^c * price of item
      • p1 = checkout conversion rate
      • p2 = add-to-cart conversion rate
      • p3 = click through rate
      • a, b, c are tunable hyperparameter
  • Evaluation metrics

    • MAP
    • MRR
    • NDCG

Define Each Row Structure of the Dataset

  • <user profile, item profile, score>
  • User profile
    • User ID
    • User preference
    • User behavior
    • Demographic property
  • Item profile
    • Item ID
    • Seller’s information
    • Item content
    • Item statistical features
      • user rating
      • like, click, buying rate in last 7/30 days
  • Score
    • Calculate from the user and item interaction
    • Score = p1^a * p2^b * p3^c * price of item
      • p1 = checkout conversion rate
      • p2 = add-to-cart conversion rate
      • p3 = click through rate
      • a, b, c are tunable hyperparameter

Baseline Ranking Algorithm

  • Are there any ranking service existed in online service?
  • Rule based
    • Ranking directly by item’s like, click, buying rate

Ranking Model

  • Matrix factorization
  • Two tower’s model

System Design Architecture

Made by myself
  • Vector database
    • Store item’s vector by model offline inference
  • Kafka
    • Handling the user events in real-time

Design Deep Dive - Multi-Stage Ranking System

  • Directly using only two tower’s model would maybe reach the performance limit
  • It’s usually has business purpose for inserting particular item
Made by myself
  • Phase 1: Retrieval

    • Purpose
      • First stage for generate the potential candidate
      • Reduce the loading for the computation
    • Multiple recall channels
      • Item-based collaborative filter
      • Content based filter
        • Good for cold start
    • Model
      • Two tower’s model
        • Store the item vector into vector database
  • Phase 2: Filter

    • Purpose
      • Filter the items bought by the user
      • Filter the empty remaining item
      • Filter the item scored 0 by the user
  • Phase 3: Rank

    • Purpose
      • Use more complex model to capture the dependency between the feature and output score for getting more accurate ranking
    • Model
      • GBDT
      • Deep Cross Network
      • Factorize Machine
    • Caching the ranking result
  • Phase 4 Filter

    • Purpose
      • Filtering user’s page selection
        • price
        • product type
  • Phase 5: Re-rank

    • Purpose
      • Avoid to putting near items together
      • Inserting particular time for every N items for business purpose

Design Deep Dive - Training Pipeline

Made by myself
  • Item Index/Model update
    • Fully update
      • Update the model and the item index at every 00:00 night which has lowest system concurrency
    • Incrementally update
      • To achieve online learning
      • Update the model and the item index for every 5 minutes/ 1 hour
      • Only update the item with new ID