swelearn 0.1.0

Offline terminal-based SWE interview prep tool
---
title: "Design a Ride-Sharing Service (e.g., Uber)"
tags: [geospatial, real-time, matching, websockets, payments, location-tracking]
difficulty: hard
estimated_time: 60min
companies: [Uber, Lyft, Amazon, Apple, DoorDash]
---

## Problem Statement

Design a ride-sharing platform where riders can request rides and nearby drivers
are matched, with real-time location tracking throughout the trip.

## Clarifying Questions

Ask these before designing:

- Scale? (Assume 10M rides/day, 1M concurrent drivers broadcasting location)
- Matching radius? (Assume ~5 km, configurable)
- ETA accuracy required? (Best-effort; within ±1 min acceptable for MVP)
- Payments in scope? (High-level only — not full payment system design)
- Surge pricing? (Mention as a concept, don't fully design)

## Capacity Estimates

  Location updates: 1M active drivers * 1 update/4sec = 250,000 writes/sec
  Ride requests:    10M/day = ~115 req/sec (bursty at commute hours)
  Matching latency: must complete in < 5 seconds end-to-end

## High-Level Design

```
[Rider App]                  [Driver App]
     |                            |
     | Request ride               | Location update (every 4s)
     v                            v
[API Gateway / Load Balancer]
     |           |
     v           v
[Ride Service] [Location Service]
     |               |
     v               v
[Matching Engine] [Location DB (geospatial)]
     |               (Redis GEO / PostGIS)
     v
[Trip Service] ──> [Notification Service]
     |
     v
[Payments Service]
```

## Location Service

The highest-write component. Handles 250K updates/sec.

**Storage: Redis GEO commands**
  - `GEOADD drivers_online <lng> <lat> <driver_id>` on each update
  - `GEORADIUS drivers_online <lng> <lat> 5 km` for nearby driver lookup
  - Redis GEO uses a geohash under the hood (sorted set with score = geohash)
  - In-memory → sub-millisecond reads; perfect for real-time queries

**Geohash alternative for interviews:**
  - Divide world into a grid of cells at fixed precision (e.g., 6 chars ≈ 1.2km²)
  - Driver's location encoded to geohash cell; store set of drivers per cell
  - Query = check current cell + 8 neighboring cells for nearby drivers

Driver location updates are writes to Redis — durability is not critical (stale
location is replaced in 4 seconds). Append to a time-series DB (InfluxDB) only
if historical tracking is needed.

## Matching Engine

When a rider requests a ride:
1. Look up rider location → compute geohash or GEORADIUS
2. Find available drivers within radius, sorted by distance
3. Send push offer to top N candidates simultaneously (not sequentially)
4. First driver to accept → matched; others receive cancellation
5. If no acceptance within timeout → expand radius and retry

**State machine for a ride:**
  REQUESTED → MATCHING → ACCEPTED → DRIVER_EN_ROUTE → ARRIVED
  → IN_TRIP → COMPLETED → (RATED)

State stored in a relational DB (Postgres) for durability. Transitions trigger
events to Notification Service and Payments Service.

## Real-Time Communication

Both rider and driver need live updates:

- **Driver → Server:** HTTP polling every 4s is simple but wasteful
  Better: persistent WebSocket or long-polling for driver location streaming
- **Server → Rider/Driver:** push updates (match found, driver location, ETA)
  Use WebSocket connections managed by a dedicated Presence Service
  Or: server-sent events (SSE) for simpler one-way push

At scale, WebSocket connections are stateful — use a connection registry
(Redis key: user_id → server_node) to route server-originated pushes to the
correct node holding the open connection.

## Trip Service & Database

  trips
  ┌─────────────────┬────────────────────────────────┐
  │ trip_id         │ UUID        PRIMARY KEY         │
  │ rider_id        │ UUID                            │
  │ driver_id       │ UUID        NULLABLE            │
  │ status          │ ENUM(requested, ..., completed) │
  │ pickup_lat/lng  │ FLOAT                           │
  │ dropoff_lat/lng │ FLOAT       NULLABLE            │
  │ requested_at    │ TIMESTAMP                       │
  │ fare_cents      │ INT         NULLABLE            │
  └─────────────────┴────────────────────────────────┘

Write to Postgres with read replicas for trip history queries.

## Deep Dives

### ETA Calculation
  - Use a road graph (OpenStreetMap data + Dijkstra/A*) for routing
  - In practice, use a maps API or proprietary routing engine
  - Factor in real-time traffic (ingest traffic events, adjust edge weights)
  - Update ETA every ~30s during the trip

### Surge Pricing
  - Monitor supply (available drivers) vs demand (open requests) per geohash zone
  - When demand/supply ratio exceeds threshold → apply multiplier
  - Simple rule: if queue_length > available_drivers * 2 → surge = 1.5×
  - Communicate surge to riders before confirmation

### Driver Dispatch at Scale
  - Single region: Redis GEO handles millions of drivers
  - Multi-region: shard by geographic region (US East, US West, EU, etc.)
  - Each region is an independent cluster; cross-region trips are rare

### Payments
  - Fare calculated server-side (distance × rate + base fare)
  - Charge rider after trip completion via payment provider
  - Idempotency keys prevent double-charges on retry
  - Driver payout batched daily via bank transfer

## Key Decisions to Highlight

1. Redis GEO for driver location — sub-ms geospatial queries at high write throughput
2. Fan-out to N drivers simultaneously — faster match than sequential offers
3. WebSocket presence service — real-time bidirectional push without polling
4. Geohash grid for spatial queries — O(1) lookup by cell, simple to explain