Hoop Almanac

Fantasy basketball analytics platform with ML predictions and live multiplayer drafts

Solo Developer

Team of 1

Ongoing

TypeScript

Next.js

PostgreSQL

Drizzle

OpenAI

D3.js

Zustand

Python

FastAPI

Scikit-learn

Pandas

NumPy

XGBoost

Jupyter Notebooks

pgvector

WebSockets

Live Site

The Problem

Situation

Fantasy basketball players rely on gut instinct or basic stats to draft players. I wanted to explore whether machine learning could provide a competitive edge in NBA fantasy—something I am passionate about.

My Goal

Build a fantasy basketball analytics platform with ML-powered predictions and real-time multiplayer draft functionality.

My Approach

Architected ETL pipelines to normalize NBA player data into PostgreSQL

Engineered 50+ features to power custom XGBoost and Scikit-learn predictive models

Built low-latency multiplayer draft rooms using WebSockets

Created complex data visualizations synchronized across concurrent users

Integrated pgvector for similarity search on player stats

The Outcome

Platform supports ~150 active users per live draft, processing 450+ NBA players

ML pipeline outperformed standard fantasy drafting tools

15% increase in user retention during live drafts

Sub-second lag across 150+ concurrent users in draft rooms

Project Roadmap

MVP, stretch goals, and future vision

Project Roadmap

Development phases and milestones

88%

7/8 complete

Fantasy Analytics Core

MVP

ML-powered player predictions and draft tools

Completed

Player Prediction Engine

XGBoost models for player performance forecasting

Draft Assistant

Real-time draft recommendations with live updates

Player Comparison Tool

Side-by-side player analysis with visualizations

Advanced Analytics

Stretch

Trade analysis, lineup optimization, predictions game, and AI features

Completed

Trade Analyzer

Evaluate trade proposals using prediction models

Lineup Optimizer

AI-powered optimal lineup suggestions

Predictions & Leaderboard

User predictions game with scoring and rankings

AI Draft Companion

Real-time AI chat during live drafts

Machine Learning Enhancements

Future

Continued development of ML models and prediction accuracy improvements

In Progress

Model Improvements

Ongoing refinement of prediction models

Completed

In Progress

Planned

Interview Questions

Common questions about this project, answered in STAR format

How did you build a real-time fantasy basketball draft system that handles concurrent users?

system design

SSituation

Fantasy basketball platforms needed real-time draft rooms where multiple users compete simultaneously for players, requiring instant updates and fair turn management.

TTask

Design and implement a multiplayer draft system that handles real-time bidding, automatic turn progression, and live player pool updates across all connected clients.

AActions

→Implemented WebSocket connections using Socket.io for bi-directional real-time communication
→Designed event-driven architecture with rooms for draft sessions and namespaces for different game modes
→Built optimistic UI updates with server reconciliation to minimize perceived latency
→Created draft timer system with automatic player selection on timeout
→Implemented player queue and nomination system for auction-style drafts

RResult

Delivered a responsive draft experience supporting 12+ concurrent users per room with sub-100ms update propagation, enabling competitive real-time fantasy drafts.

Key Takeaway: Real-time multiplayer systems require careful state management—optimistic updates improve UX but need robust reconciliation to handle race conditions.

How did you approach building ML-powered player projections for fantasy basketball?

technical

SSituation

Fantasy basketball success depends heavily on accurate player performance projections, but existing tools relied on basic season averages that didn't account for matchups, rest, or recent trends.

TTask

Build a machine learning pipeline that generates daily player projections incorporating multiple data sources and contextual factors.

AActions

→Collected and normalized historical player statistics from multiple NBA data sources
→Engineered features including opponent defensive ratings, rest days, home/away splits, and recent performance trends
→Trained gradient boosting models (XGBoost) for each statistical category with cross-validation
→Built automated daily pipeline to fetch latest data, generate projections, and serve via API
→Created confidence intervals to help users understand projection uncertainty

RResult

Achieved 15% improvement over baseline projections in back-testing, with the model successfully identifying breakout performances and rest-related underperformance patterns.

Key Takeaway: Feature engineering matters more than model complexity—domain knowledge about basketball (back-to-backs, matchups) drove the biggest accuracy gains.

Tell me about a time you had to optimize database queries for a data-intensive application.

problem solving

SSituation

The fantasy basketball app was experiencing slow load times on the player research pages, which aggregated statistics across multiple seasons and calculated advanced metrics on the fly.

TTask

Identify and resolve the performance bottlenecks causing 3-5 second page loads, targeting sub-500ms response times.

AActions

→Profiled database queries to identify N+1 problems and missing indexes on frequently filtered columns
→Implemented database-level aggregations using PostgreSQL window functions instead of application-level calculations
→Added materialized views for pre-computed advanced statistics that refresh nightly
→Introduced Redis caching layer for frequently accessed player profiles and projections
→Implemented pagination and virtual scrolling for large player lists

RResult

Reduced average page load time from 4.2 seconds to 380ms (91% improvement) while supporting 3x more concurrent users without infrastructure scaling.

Key Takeaway: Performance optimization should start with measurement—profiling revealed that 80% of latency came from just two unoptimized queries that were easy fixes.

Technical Decisions

Quick answers to 'Why did you choose X?' questions

Q1How do you test real-time features?

Unit tests for message handlers and state reconciliation logic. Integration tests with multiple mock WebSocket clients simulating concurrent users. Test race conditions: two users picking same player simultaneously. Test reconnection: disconnect mid-draft, verify state recovery. E2E tests for complete draft flow.

Q2How do you test ML predictions?

Evaluation dataset with historical player performance and known outcomes. Test prediction accuracy against holdout set. Monitor prediction drift over season - retrain when accuracy drops. A/B test model versions with subset of users. Log predictions vs actuals for continuous evaluation.

Q3Why WebSocket over polling?

In live draft, picks happen every 30-90 seconds. Polling would be too slow or too frequent. WebSocket pushes instantly. Supabase Realtime handles infrastructure. Also enables presence - see who is online.

Key Trade-offs

Every decision has costs — here's how I thought through them

Supabase Realtime (WebSockets) for live draft synchronization

Gained

+Built-in WebSocket infrastructure - no need to manage socket servers
+Postgres row-level changes broadcast automatically
+Integrates seamlessly with Supabase auth
+Handles reconnection and state recovery

Gave Up

−Tied to Supabase ecosystem
−Less control over message format and delivery
−Pricing scales with concurrent connections
−Debugging real-time issues harder than REST

Why Worth It

Building WebSocket infrastructure from scratch would have taken weeks. Supabase Realtime gave us sub-second sync for 150+ concurrent users with minimal code.

XGBoost over neural networks for player predictions

Gained

+Faster training on tabular data
+More interpretable - can see feature importance
+Requires less data to train effectively
+Easier to debug when predictions are wrong

Gave Up

−May miss complex non-linear patterns
−Less "cutting edge" for resume
−Feature engineering matters more

Why Worth It

For tabular data with 50+ engineered features, XGBoost consistently outperforms neural nets. Interpretability was also important for explaining predictions to users.

pgvector for similarity search instead of dedicated vector DB

Gained

+Single database for all data
+No sync between systems
+Simpler ops - one service to manage
+Native Postgres features available

Gave Up

−Less specialized vector search features
−May not scale to millions of vectors
−Newer technology with less community support

Why Worth It

With 450 players, not millions, pgvector is fast enough. Keeping everything in Postgres simplified the architecture significantly.

Rolling model retraining instead of static seasonal model

Gained

+Predictions stay accurate as season progresses
+Captures injuries, trades, and form changes
+Recent games weighted more heavily

Gave Up

−More infrastructure to maintain
−Compute costs for weekly retraining
−Risk of model instability if training data is noisy

Why Worth It

A model trained on last season data becomes less accurate by February. Rolling retraining kept prediction accuracy stable throughout the season.

Optimistic UI with rollback for draft picks

Gained

+Instant feedback when user makes a pick
+Feels responsive even with network latency
+Better user experience in competitive draft setting

Gave Up

−Complexity of handling rollback states
−Potential for brief UI inconsistency
−More client-side state management

Why Worth It

In a live draft, every second matters. Waiting 500ms for server confirmation would feel broken. Optimistic updates make it feel instant.

Supabase as primary infrastructure instead of AWS/self-hosted

Gained

+Built-in Realtime WebSockets - no need to manage socket servers
+Postgres with all extensions (pgvector) in managed environment
+Row Level Security for database-level authorization
+Auth, Storage, Edge Functions in one platform
+Generous free tier, reasonable scaling costs
+Fast iteration - focus on product, not infrastructure

Gave Up

−Horizontal scaling ceiling - no automatic sharding
−Vendor lock-in to Supabase ecosystem
−Less control over infrastructure tuning
−Would need migration at massive scale (millions of users)
−Connection pooling limits on free/pro tiers

Why Worth It

For a 0-to-1 product, speed to market matters more than theoretical scale. Supabase gives me Postgres, Realtime, Auth, and Storage with zero DevOps. If I hit millions of users, I'd have resources to migrate - that's a good problem to have. The trade-off is explicitly: ship fast now, accept migration cost later if successful.

Serverless API (Next.js routes on Vercel) instead of dedicated backend

Gained

+Zero server management - Vercel handles scaling
+Colocated with frontend - simpler deployment
+Auto-scaling to zero (cost-efficient for variable traffic)
+Built-in CDN and edge caching
+No CORS issues between frontend and API

Gave Up

−Cold starts can add 100-200ms latency
−No persistent connections (WebSockets handled by Supabase instead)
−Limited execution time (10s on hobby, 60s on pro)
−Harder to run background jobs (need workarounds like Cron)
−Less flexibility than a dedicated Express/Fastify server

Why Worth It

For a solo developer, fewer moving parts is better. I don't need to manage EC2 instances, load balancers, or container orchestration. The trade-off is flexibility for operational simplicity. Real-time is handled by Supabase Realtime, so the serverless limitation on WebSockets doesn't matter.

React Query for server state instead of Redux/Zustand

Gained

+Built-in caching with stale-while-revalidate
+Automatic refetching on focus, reconnect
+No boilerplate for async state management
+Optimistic updates with rollback built-in

Gave Up

−Another library to learn
−Cache invalidation can be tricky
−Not suitable for truly global client state

Why Worth It

Most "state" in the app is server data (player stats, draft picks). React Query handles this perfectly. Zustand is only for local UI state like filter selections.

Row Level Security (RLS) over API-level authorization

Gained

+Authorization enforced at database level - cannot be bypassed
+Policies defined once, applied everywhere
+Works with Supabase Realtime subscriptions
+No scattered auth checks in API routes

Gave Up

−Policies can be complex to write and debug
−Performance overhead for complex policies
−Testing RLS policies requires database setup
−Tied to Postgres/Supabase

Why Worth It

Users should only see their own leagues and drafts. RLS makes this impossible to mess up - even if I forget an auth check in code, the database blocks unauthorized access.

Feature engineering over deep learning for predictions

Gained

+Domain knowledge captured explicitly
+Features are interpretable and debuggable
+Works with limited training data
+Users can understand why predictions differ

Gave Up

−Manual feature engineering is time-consuming
−May miss patterns a neural net would find
−Need basketball domain expertise

Why Worth It

With 50+ hand-crafted features (usage rate, efficiency, matchup difficulty, rest days), XGBoost achieves 70% accuracy. A black-box neural net might be 72% but users cannot understand it.

Zustand for client-only state instead of Context API

Gained

+Simpler than Redux, no boilerplate
+Works outside React components
+Easy persistence to localStorage
+Devtools for debugging

Gave Up

−Another dependency
−Learning curve for team members
−Might be overkill for simple state

Why Worth It

Draft queue, UI preferences, and filter state need to persist across sessions. Zustand with persist middleware handles this elegantly. Context would re-render too much.

Vercel Cron for scheduled jobs over dedicated job server

Gained

+No additional infrastructure to manage
+Integrated with deployment lifecycle
+Free tier sufficient for daily retraining
+Logs visible in Vercel dashboard

Gave Up

−Limited to HTTP triggers
−Max execution time limits
−Cannot run long-running processes
−Less control over scheduling granularity

Why Worth It

Daily model retraining takes ~5 minutes. Vercel Cron handles this fine. A dedicated job server would be overkill for one scheduled task.

Recharts for data visualization over D3 directly

Gained

+React-native - fits naturally in component model
+Good defaults for common chart types
+Responsive by default
+Less code to write and maintain

Gave Up

−Less customizable than D3
−Limited chart types
−Harder to create novel visualizations

Why Worth It

Player stat charts, projection comparisons, and draft analytics are standard chart types. Recharts handles these well. D3 would be overkill for bar charts.

OpenAI embeddings over training custom embeddings

Gained

+Works immediately - no training required
+High quality embeddings for general text
+Simple API - just send text, get vectors
+Continuous improvements from OpenAI

Gave Up

−Per-token costs at scale
−Dependent on OpenAI availability
−May not capture basketball-specific semantics perfectly

Why Worth It

For player similarity search on 450 players, the cost is negligible. Training custom embeddings would require labeled pairs of "similar players" - data I don't have.

Email-based authentication over username/password

Gained

+No password storage or hashing to manage
+Magic links are more secure (no password reuse)
+Supabase handles email delivery
+Simpler user experience

Gave Up

−Dependent on email delivery
−Slight friction to check email for login
−Cannot log in if email is down

Why Worth It

Password management is a liability. Magic links are more secure and Supabase handles it. The small friction of checking email is worth not worrying about password breaches.

Mobile-responsive web over native mobile app

Gained

+Single codebase for all platforms
+Instant updates without app store review
+No App Store / Play Store fees
+Shareable links to specific drafts

Gave Up

−No push notifications (without PWA work)
−Less "app-like" feel on mobile
−Cannot access native device features
−Users need to bookmark, not install

Why Worth It

Fantasy basketball drafts happen a few times per year, not daily. A full mobile app would be overkill. Responsive web works fine for the use case.

Challenges & Solutions

The hardest problems I solved on this project

1WebSocket connections dropping during live drafts

Approach

Monitored connection lifecycle. Found that mobile devices and browser tabs in background were disconnecting. Also network switching (wifi to cellular) caused drops.

Solution

Implemented reconnection logic with exponential backoff. Used Supabase Realtime presence to detect connection state. Added optimistic UI updates so users see their picks immediately even if connection briefly drops. Server reconciles state on reconnect. Added visible connection status indicator so users know if they are connected.

Lesson: Real-time features must handle unreliable connections gracefully. Design for disconnection as a normal state, not an exception. Optimistic UI with server reconciliation provides best UX.

2Draft room state getting out of sync between participants

Approach

Traced the issue to race conditions. Two users picking at nearly the same time, both seeing the pick as available.

Solution

Implemented server-side validation and sequencing. All picks go through server which assigns sequence numbers. Client state is authoritative only for optimistic UI. Server broadcasts confirmed picks with sequence. Clients reconcile any optimistic updates that conflicted. Added pick locking - when you click a player, they are briefly locked while server confirms.

Lesson: In multi-user real-time systems, the server must be the source of truth for state that can conflict. Optimistic UI is for responsiveness, but server arbitrates conflicts.

3ML model predictions were stale by mid-season

Approach

Analyzed prediction accuracy over time. Model trained on last season data became less accurate as current season progressed. Player roles change, injuries happen, trades affect stats.

Solution

Implemented rolling retraining with recent game data weighted more heavily. Retrain weekly using Vercel Cron. Added feature for current season context (games played, recent averages). Tracked prediction accuracy over time to catch model drift. Display confidence scores so users know when predictions are less certain.

Lesson: ML models in production need monitoring and retraining. Sports data changes constantly - a static model degrades. Build retraining into your infrastructure from the start.

4Player stats pages needed to be both fast and up-to-date

Approach

Player stats change after every game (daily during season). But we have 450+ players, and hitting the stats API for every page view is expensive and slow.

Solution

ISR with 1-hour revalidation for player pages. Stats API is called in getStaticProps, cached for 1 hour. On-demand revalidation triggered after games end (schedule-based webhook). During games, client-side polling updates live stats without full page reload. After game, ISR revalidates with final stats.

Lesson: Match revalidation frequency to how often data actually changes. Player stats change once per game, not continuously. Combine ISR for baseline with CSR polling for real-time updates during events.

5Draft room needed real-time updates but also SEO for public leagues

Approach

Private draft rooms need real-time WebSocket updates. But public/completed drafts should be indexable for SEO (people searching for draft results).

Solution

Dynamic rendering based on draft state. Active drafts: CSR with WebSocket connection for real-time. Completed drafts: SSG with results frozen at completion time (never changes, perfect for static). Public leagues: SSR for SEO with meta tags for social sharing. Used generateStaticParams to pre-build popular completed drafts.

Lesson: The same "page" can have different rendering strategies based on state. A draft room is CSR when live, SSG when completed. Do not lock yourself into one strategy per route.

6NBA API (stats.nba.com) is notoriously unreliable and rate-limited

Approach

Direct calls to NBA API were failing intermittently. Rate limits are strict and undocumented. No official API so scraping is fragile.

Solution

Built resilient fetching layer: retry with exponential backoff, circuit breaker to stop hammering when API is down, fallback to cached data. Nightly batch job fetches all player data when API is more stable (4 AM). Store in own database as source of truth. Real-time game data from a more reliable source (ESPN API) as supplement.

Lesson: Never rely on a single unreliable data source. Build caching, fallbacks, and alternative sources. If an API is flaky, fetch during off-peak hours and cache aggressively. Own your data when possible.

Code Highlights

Key sections I'd walk through in a code review

WebSocket state reconciliation

src/lib/draft/syncProtocol.ts

Handles the bi-directional sync between clients and server. Clients send state hashes, server compares and pushes full state if mismatch. Includes sequence numbers to detect and handle out-of-order messages.

Feature engineering pipeline

ml/features/engineer.py

Takes raw NBA stats and computes 50+ features: rolling averages, opponent adjustments, rest days, home/away splits, usage rate when teammates are out, etc. Each feature is documented with its predictive value.

What I Learned

→Real-time features need disconnection handling - design for it
→Server must be source of truth for multi-user state
→ML models drift - build retraining and monitoring in
→Test WebSocket features with multiple mock clients
→Test race conditions explicitly
→Row Level Security ensures auth at database level
→Supabase Realtime handles WebSocket complexity
→Log predictions vs actuals for ML evaluation

Future Plans

+Add trade analysis - evaluate proposed trades using the same prediction models
+League history and trophy case for multi-season engagement
+Auction draft support in addition to snake draft
+Integration with ESPN/Yahoo for roster sync
+Mobile app with push notifications for draft turns

Want to discuss this project?

Get in Touch View More Projects