Fantasy basketball analytics platform with ML predictions and live multiplayer drafts
Fantasy basketball players rely on gut instinct or basic stats to draft players. I wanted to explore whether machine learning could provide a competitive edge in NBA fantasy—something I am passionate about.
Build a fantasy basketball analytics platform with ML-powered predictions and real-time multiplayer draft functionality.
Architected ETL pipelines to normalize NBA player data into PostgreSQL
Engineered 50+ features to power custom XGBoost and Scikit-learn predictive models
Built low-latency multiplayer draft rooms using WebSockets
Created complex data visualizations synchronized across concurrent users
Integrated pgvector for similarity search on player stats
Platform supports ~150 active users per live draft, processing 450+ NBA players
ML pipeline outperformed standard fantasy drafting tools
15% increase in user retention during live drafts
Sub-second lag across 150+ concurrent users in draft rooms
MVP, stretch goals, and future vision
Development phases and milestones
ML-powered player predictions and draft tools
XGBoost models for player performance forecasting
Real-time draft recommendations with live updates
Side-by-side player analysis with visualizations
Trade analysis, lineup optimization, predictions game, and AI features
Evaluate trade proposals using prediction models
AI-powered optimal lineup suggestions
User predictions game with scoring and rankings
Real-time AI chat during live drafts
Continued development of ML models and prediction accuracy improvements
Ongoing refinement of prediction models
Common questions about this project, answered in STAR format
How did you build a real-time fantasy basketball draft system that handles concurrent users?
Key Takeaway: Real-time multiplayer systems require careful state management—optimistic updates improve UX but need robust reconciliation to handle race conditions.
How did you approach building ML-powered player projections for fantasy basketball?
Key Takeaway: Feature engineering matters more than model complexity—domain knowledge about basketball (back-to-backs, matchups) drove the biggest accuracy gains.
Tell me about a time you had to optimize database queries for a data-intensive application.
Key Takeaway: Performance optimization should start with measurement—profiling revealed that 80% of latency came from just two unoptimized queries that were easy fixes.
Quick answers to 'Why did you choose X?' questions
Unit tests for message handlers and state reconciliation logic. Integration tests with multiple mock WebSocket clients simulating concurrent users. Test race conditions: two users picking same player simultaneously. Test reconnection: disconnect mid-draft, verify state recovery. E2E tests for complete draft flow.
Evaluation dataset with historical player performance and known outcomes. Test prediction accuracy against holdout set. Monitor prediction drift over season - retrain when accuracy drops. A/B test model versions with subset of users. Log predictions vs actuals for continuous evaluation.
In live draft, picks happen every 30-90 seconds. Polling would be too slow or too frequent. WebSocket pushes instantly. Supabase Realtime handles infrastructure. Also enables presence - see who is online.
For tabular data with 50+ features, XGBoost often outperforms neural nets. More interpretable - can see feature importance. Requires less data. Easier to debug wrong predictions. Trade-off is may miss complex patterns, but interpretability matters for fantasy sports.
Every decision has costs — here's how I thought through them
Building WebSocket infrastructure from scratch would have taken weeks. Supabase Realtime gave us sub-second sync for 150+ concurrent users with minimal code.
For tabular data with 50+ engineered features, XGBoost consistently outperforms neural nets. Interpretability was also important for explaining predictions to users.
With 450 players, not millions, pgvector is fast enough. Keeping everything in Postgres simplified the architecture significantly.
A model trained on last season data becomes less accurate by February. Rolling retraining kept prediction accuracy stable throughout the season.
In a live draft, every second matters. Waiting 500ms for server confirmation would feel broken. Optimistic updates make it feel instant.
For a 0-to-1 product, speed to market matters more than theoretical scale. Supabase gives me Postgres, Realtime, Auth, and Storage with zero DevOps. If I hit millions of users, I'd have resources to migrate - that's a good problem to have. The trade-off is explicitly: ship fast now, accept migration cost later if successful.
For a solo developer, fewer moving parts is better. I don't need to manage EC2 instances, load balancers, or container orchestration. The trade-off is flexibility for operational simplicity. Real-time is handled by Supabase Realtime, so the serverless limitation on WebSockets doesn't matter.
Most "state" in the app is server data (player stats, draft picks). React Query handles this perfectly. Zustand is only for local UI state like filter selections.
Users should only see their own leagues and drafts. RLS makes this impossible to mess up - even if I forget an auth check in code, the database blocks unauthorized access.
With 50+ hand-crafted features (usage rate, efficiency, matchup difficulty, rest days), XGBoost achieves 70% accuracy. A black-box neural net might be 72% but users cannot understand it.
Draft queue, UI preferences, and filter state need to persist across sessions. Zustand with persist middleware handles this elegantly. Context would re-render too much.
Daily model retraining takes ~5 minutes. Vercel Cron handles this fine. A dedicated job server would be overkill for one scheduled task.
Player stat charts, projection comparisons, and draft analytics are standard chart types. Recharts handles these well. D3 would be overkill for bar charts.
For player similarity search on 450 players, the cost is negligible. Training custom embeddings would require labeled pairs of "similar players" - data I don't have.
Password management is a liability. Magic links are more secure and Supabase handles it. The small friction of checking email is worth not worrying about password breaches.
Fantasy basketball drafts happen a few times per year, not daily. A full mobile app would be overkill. Responsive web works fine for the use case.
The hardest problems I solved on this project
Monitored connection lifecycle. Found that mobile devices and browser tabs in background were disconnecting. Also network switching (wifi to cellular) caused drops.
Implemented reconnection logic with exponential backoff. Used Supabase Realtime presence to detect connection state. Added optimistic UI updates so users see their picks immediately even if connection briefly drops. Server reconciles state on reconnect. Added visible connection status indicator so users know if they are connected.
Lesson: Real-time features must handle unreliable connections gracefully. Design for disconnection as a normal state, not an exception. Optimistic UI with server reconciliation provides best UX.
Traced the issue to race conditions. Two users picking at nearly the same time, both seeing the pick as available.
Implemented server-side validation and sequencing. All picks go through server which assigns sequence numbers. Client state is authoritative only for optimistic UI. Server broadcasts confirmed picks with sequence. Clients reconcile any optimistic updates that conflicted. Added pick locking - when you click a player, they are briefly locked while server confirms.
Lesson: In multi-user real-time systems, the server must be the source of truth for state that can conflict. Optimistic UI is for responsiveness, but server arbitrates conflicts.
Analyzed prediction accuracy over time. Model trained on last season data became less accurate as current season progressed. Player roles change, injuries happen, trades affect stats.
Implemented rolling retraining with recent game data weighted more heavily. Retrain weekly using Vercel Cron. Added feature for current season context (games played, recent averages). Tracked prediction accuracy over time to catch model drift. Display confidence scores so users know when predictions are less certain.
Lesson: ML models in production need monitoring and retraining. Sports data changes constantly - a static model degrades. Build retraining into your infrastructure from the start.
Player stats change after every game (daily during season). But we have 450+ players, and hitting the stats API for every page view is expensive and slow.
ISR with 1-hour revalidation for player pages. Stats API is called in getStaticProps, cached for 1 hour. On-demand revalidation triggered after games end (schedule-based webhook). During games, client-side polling updates live stats without full page reload. After game, ISR revalidates with final stats.
Lesson: Match revalidation frequency to how often data actually changes. Player stats change once per game, not continuously. Combine ISR for baseline with CSR polling for real-time updates during events.
Private draft rooms need real-time WebSocket updates. But public/completed drafts should be indexable for SEO (people searching for draft results).
Dynamic rendering based on draft state. Active drafts: CSR with WebSocket connection for real-time. Completed drafts: SSG with results frozen at completion time (never changes, perfect for static). Public leagues: SSR for SEO with meta tags for social sharing. Used generateStaticParams to pre-build popular completed drafts.
Lesson: The same "page" can have different rendering strategies based on state. A draft room is CSR when live, SSG when completed. Do not lock yourself into one strategy per route.
Direct calls to NBA API were failing intermittently. Rate limits are strict and undocumented. No official API so scraping is fragile.
Built resilient fetching layer: retry with exponential backoff, circuit breaker to stop hammering when API is down, fallback to cached data. Nightly batch job fetches all player data when API is more stable (4 AM). Store in own database as source of truth. Real-time game data from a more reliable source (ESPN API) as supplement.
Lesson: Never rely on a single unreliable data source. Build caching, fallbacks, and alternative sources. If an API is flaky, fetch during off-peak hours and cache aggressively. Own your data when possible.
Key sections I'd walk through in a code review
src/lib/draft/syncProtocol.tsHandles the bi-directional sync between clients and server. Clients send state hashes, server compares and pushes full state if mismatch. Includes sequence numbers to detect and handle out-of-order messages.
ml/features/engineer.pyTakes raw NBA stats and computes 50+ features: rolling averages, opponent adjustments, rest days, home/away splits, usage rate when teammates are out, etc. Each feature is documented with its predictive value.
Want to discuss this project?