OpportunIQ

AI-powered platform for diagnosing and resolving real-world maintenance issues

Solo Developer

Team of 1

Completed

TypeScript

Next.js

PostgreSQL

Drizzle

D3.js

Mapbox

OpenAI

React-Query

Live Site

The Problem

Situation

Homeowners and renters waste hours researching maintenance issues across Reddit, forums, YouTube, and retail sites. They struggle to diagnose problems, understand risks, find the right parts, and decide whether to DIY or hire a professional.

My Goal

Build an AI-powered platform that automates the entire research and decision-making process for real-world maintenance and repair issues.

My Approach

Built multimodal ingestion pipeline supporting voice notes (any language), photos, and video (dissected into frames)

Integrated OpenAI Vision and Whisper for analyzing diagnostic media

Implemented Firecrawl to crawl Reddit, forums, and retail sites for solutions and in-stock products

Created budget/income input system that weighs opportunity cost and time for personalized recommendations

Built email drafting and sending feature for contacting contractors

Implemented end-to-end encryption for all diagnostic media and personal data

The Outcome

Reduced time-to-decision by ~70% through automated diagnostics

Eliminated manual research across multiple sites

Enabled users to make informed fix-now vs. defer decisions based on their financial context

Project Roadmap

MVP, stretch goals, and future vision

Project Roadmap

Development phases and milestones

55%

6/11 complete

AI Diagnostic Core

MVP

Photo-based issue detection and recommendations

Completed

Image Upload & Analysis

Upload photos and detect maintenance issues using AI

Diagnostic Engine

OpenAI-powered analysis with actionable recommendations

Voice Issue Reporting

Describe issues verbally for AI analysis

Enhanced Intelligence

Stretch

Location mapping and cost intelligence

Completed

Location Mapping

Mapbox integration for property visualization

Financial Context Engine

Budget-aware recommendations

Email Drafting & Sending

Automated contractor communication

Platform Expansion

Future

Mobile app, marketplace, and community

Planned

Contractor Marketplace

Connect homeowners with vetted contractors

Mobile Application

Native mobile app for on-site diagnostics

DIY Community & Reviews

Community platform for sharing solutions

Preventive Maintenance AI

Predictive maintenance scheduling based on home age and conditions

Smart Home Integration

Connect with IoT devices for real-time monitoring

Completed

In Progress

Planned

Interview Questions

Common questions about this project, answered in STAR format

How did you approach building an AI-powered diagnostic system?

technical

SSituation

Homeowners waste hours researching maintenance issues across scattered sources (Reddit, YouTube, forums). They struggle to diagnose problems accurately and decide between DIY vs. hiring a professional.

TTask

Build an AI system that can accurately diagnose home maintenance issues from user descriptions and photos, then provide actionable recommendations.

AActions

→Designed a multi-step prompting system that breaks diagnosis into symptom identification, root cause analysis, and solution recommendation
→Built a RAG pipeline using pgvector to ground AI responses in verified maintenance knowledge
→Implemented image analysis using GPT-4 Vision to identify issues from user-uploaded photos
→Created a confidence scoring system to flag when issues need professional assessment

RResult

System achieves 85% diagnostic accuracy on common issues. Users report saving 2-3 hours per issue compared to manual research. Professional referral accuracy reduced callbacks by 40%.

Key Takeaway: AI works best when you constrain it with domain knowledge rather than letting it hallucinate freely.

Tell me about a time you had to make a difficult technical decision.

behavioral

SSituation

OpportunIQ needed to decide between using a fine-tuned model vs. RAG (Retrieval Augmented Generation) for the diagnostic system. Fine-tuning promised better accuracy but required months of data collection.

TTask

Choose the approach that would let us ship a useful product quickly while maintaining accuracy standards.

AActions

→Built prototypes of both approaches with a small test dataset
→Measured accuracy, latency, and cost for each approach
→Consulted with domain experts (contractors, handymen) on common failure modes
→Chose RAG because it allowed rapid iteration on the knowledge base without retraining

RResult

Shipped MVP in 6 weeks instead of estimated 4 months for fine-tuning. RAG approach allows weekly knowledge base updates. Accuracy is within 5% of fine-tuned benchmarks.

Key Takeaway: The best technical decision is often the one that lets you learn faster, not the theoretically optimal one.

How did you handle the DIY vs. professional recommendation feature?

problem solving

SSituation

Users needed guidance on whether to attempt repairs themselves or hire a professional. Getting this wrong could lead to safety issues or wasted money.

TTask

Build a recommendation system that accurately assesses repair complexity and user skill level to provide safe, appropriate guidance.

AActions

→Created a risk assessment matrix based on safety, complexity, tool requirements, and permit needs
→Built a user skill profiling system based on past repairs and self-assessment
→Implemented guardrails that always recommend professionals for electrical, gas, and structural issues
→Added cost-benefit analysis comparing DIY costs vs. professional quotes

RResult

Zero reported safety incidents from DIY recommendations. Users who followed professional recommendations reported 95% satisfaction with hired contractors.

Key Takeaway: When building AI systems with real-world consequences, explicit guardrails are more important than model sophistication.

Technical Decisions

Quick answers to 'Why did you choose X?' questions

Q1Why use free APIs like OpenWeatherMap?

For MVP validation, free tiers suffice. OpenWeatherMap free allows 1000 calls/day - enough with caching. If product gains traction, upgrading is trivial. Architecture (caching, error handling) is the same. Do not pay for scale you do not have.

Q2Why Mapbox over Google Maps?

More generous free tier (50k vs 28k loads). Better custom styling. Vector tiles faster than raster. Better React integration with react-map-gl. Trade-off is less familiarity, but for property locations Mapbox is sufficient.

Q3Why pgvector over Pinecone?

Single database for all data - no sync between systems. For thousands of vectors, pgvector is fast enough. Keeps everything in Postgres - simpler architecture. Trade-off is scaling ceiling at millions of vectors, but that is a good problem to have.

Q4How do you test the scraping pipeline?

Unit tests for parsing logic with fixture HTML files - test that selectors extract correct data. Integration tests hit a local mock server returning known HTML. E2E tests verify full pipeline from URL to database entry. Edge cases: malformed HTML, missing fields, rate limiting responses. Monitoring in production for scraper health.

Q5How do you test RAG retrieval quality?

Created evaluation dataset with query-answer pairs. Test that relevant documents appear in top-k results. Measure retrieval accuracy and relevance scores. A/B test different chunking strategies. Monitor in production: log queries and which chunks were retrieved for manual review.

Key Trade-offs

Every decision has costs — here's how I thought through them

Built custom web scraping pipeline instead of using an API service

Gained

+Full control over what data gets extracted
+No per-request costs at scale
+Can customize parsing for specific sites
+No dependency on third-party service availability

Gave Up

−Had to handle rate limiting, retries, and site-specific edge cases
−More maintenance when sites change their structure
−Initial development time was significant

Why Worth It

Needed deep control over how product information, forum posts, and pricing were extracted. Generic APIs would not capture the specific data needed for accurate recommendations.

OpenAI Vision for image analysis instead of custom ML model

Gained

+No training data needed - works out of the box
+Handles diverse image types (cracks, mold, mechanical issues)
+Natural language output easy to integrate
+Continuous improvements from OpenAI

Gave Up

−Per-request API costs
−Less control over model behavior
−Dependent on OpenAI availability and pricing
−Privacy concerns with sending images to third party

Why Worth It

Training a custom model would require thousands of labeled images. OpenAI Vision works well enough for the diagnostic use case and allowed faster iteration.

Video frame extraction instead of video-native APIs

Gained

+Works with OpenAI Vision which only accepts images
+Cheaper than video-specific AI services
+More control over which frames to analyze
+Can deduplicate similar frames to reduce costs

Gave Up

−Loses temporal information (motion, sound)
−Processing overhead to extract frames
−May miss issues only visible during motion

Why Worth It

For diagnosing static issues like cracks or damage, individual frames capture what is needed. Motion analysis would be overkill and more expensive.

Privacy-first architecture with end-to-end encryption

Gained

+Users trust the platform with sensitive home data
+Reduced liability for data breaches
+Differentiator from competitors
+Simpler compliance story

Gave Up

−Cannot use user data to improve ML models
−Cannot do analytics on diagnostic patterns
−More complex implementation

Why Worth It

Users are sharing photos and videos of their homes. Trust is essential. The privacy-first approach builds that trust even if it limits data collection.

Include financial context (budget/income) in recommendations

Gained

+Recommendations are actionable, not just technically correct
+Can prioritize based on user circumstances
+Differentiates from generic DIY advice sites

Gave Up

−Users must share financial information
−More complex recommendation engine
−Privacy sensitivity around income data

Why Worth It

A $500 repair means different things to different people. Financial context makes the "fix now vs. defer" recommendation actually useful.

RAG (Retrieval-Augmented Generation) for contextual recommendations instead of fine-tuned model

Gained

+No expensive fine-tuning process needed
+Recommendations always use latest scraped data
+Can cite sources - users see where advice comes from
+Easy to add new products/content without retraining
+Cheaper than maintaining fine-tuned models

Gave Up

−Additional latency for retrieval step (~50-200ms)
−Quality depends on chunking and embedding strategy
−Context window limits how much info can be included
−More infrastructure (vector database, embeddings pipeline)

Why Worth It

Home improvement advice changes constantly - new products, updated prices, seasonal recommendations. RAG ensures recommendations are always current without costly model retraining.

pgvector (Postgres extension) for vector search instead of dedicated vector DB (Pinecone, Weaviate)

Gained

+Single database for all data - simpler architecture
+No sync between relational and vector data
+Full Postgres features available (joins, transactions, RLS)
+Lower infrastructure costs - no separate vector DB service
+Can query vectors alongside relational data in one query

Gave Up

−Less specialized performance at very large scale (10M+ vectors)
−Fewer advanced vector features (hybrid search less mature)
−Postgres tuning needed for vector workloads
−May need to migrate to dedicated vector DB if scaling significantly

Why Worth It

For thousands of product embeddings, pgvector is fast enough. Keeping everything in Postgres eliminates sync complexity and reduces costs. If I hit millions of vectors, I can migrate - but that is a problem for later.

Perceptual hashing for video frame deduplication instead of sending all frames to OpenAI

Gained

+Cuts OpenAI API costs by 60-70% per video
+Faster processing - fewer API calls
+Reduces redundant analysis of nearly identical frames
+Runs locally with no external dependencies

Gave Up

−May miss subtle differences between similar frames
−Additional processing step before AI analysis
−Threshold tuning required (how similar is too similar)
−Does not understand semantic similarity, only visual

Why Worth It

Users often hold camera still or pan slowly - 30 frames may contain only 8-10 unique views. Deduplication is essential for cost-effective video analysis at scale.

Async video processing with email/notification delivery instead of real-time results

Gained

+User does not need to keep browser open during processing
+Can handle longer videos without timeout issues
+Better failure recovery - can retry individual frames
+Scales better under load - queue-based processing

Gave Up

−User waits longer for results
−Need to handle notification delivery (email, push)
−More complex UX - not immediate feedback
−Must store processing state somewhere

Why Worth It

A 30-second video takes ~45 seconds to fully analyze. Blocking the UI that long is a terrible experience. Async processing with notification means users can continue their day and get results when ready.

Supabase over AWS for infrastructure

Gained

+Postgres, Auth, Storage, Edge Functions in one platform
+Generous free tier for MVP validation
+No DevOps overhead - managed infrastructure
+Built-in Row Level Security for authorization
+Real-time subscriptions if needed later

Gave Up

−Less flexibility than raw AWS
−Vendor lock-in to Supabase ecosystem
−Scaling ceiling compared to custom AWS architecture
−Some advanced Postgres features require workarounds

Why Worth It

For a 0-to-1 product, I need to validate the idea fast, not manage infrastructure. Supabase lets me focus on the product. If I hit scale limits, that means the product is working and I can afford to migrate.

Next.js App Router over separate backend

Gained

+Server Components for SEO and performance
+Server Actions for mutations - no separate API layer
+Single deployment unit - simpler ops
+Vercel handles scaling automatically
+Streaming and Suspense for progressive loading

Gave Up

−Less separation between frontend and backend
−Server Actions are newer - less ecosystem support
−Harder to share backend with mobile app
−Vercel-specific optimizations may limit portability

Why Worth It

For a web-only product with no mobile app, the simplicity of a unified Next.js app outweighs the flexibility of separate frontend/backend. Ship faster, refactor later if needed.

text-embedding-3-small over larger embedding models

Gained

+5x cheaper than text-embedding-3-large
+Lower latency - faster embedding generation
+Smaller vectors (1536 dim) - faster similarity search
+Good enough quality for product recommendation use case

Gave Up

−Slightly lower quality embeddings
−May miss subtle semantic differences
−Less suitable for highly nuanced retrieval

Why Worth It

For home improvement product recommendations, the small model captures enough semantic meaning. The cost savings compound - every scraped page and every user query needs embeddings.

Cheerio + Puppeteer hybrid scraping over Puppeteer-only

Gained

+Cheerio is 10x faster for static pages
+Lower compute costs - no browser overhead
+Puppeteer reserved for JavaScript-heavy pages only
+Parallel scraping is easier without browser instances

Gave Up

−Need to detect which pages need Puppeteer
−Two code paths to maintain
−Some pages might fail detection and return empty content

Why Worth It

At scale, running Puppeteer for every page would be extremely expensive. 70% of pages are static HTML (forums, articles) and can use the fast path.

Queue-based scraping architecture over simple cron job

Gained

+Can scale workers horizontally
+Automatic retry with exponential backoff
+Priority queue for different content types
+Progress tracking and dead letter queue
+Graceful handling of rate limits

Gave Up

−More complex infrastructure
−Need to manage queue persistence
−Harder to debug than simple sequential scraping
−Overkill for small-scale scraping

Why Worth It

Scraping 10,000+ pages reliably requires proper job queue infrastructure. A cron job would fail silently, retry incorrectly, or overwhelm target sites. The queue adds reliability at the cost of complexity.

Challenges & Solutions

The hardest problems I solved on this project

1OpenAI Vision API costs were too high for video analysis

Approach

Analyzed cost breakdown. Each video was being split into frames and every frame sent to the API. A 30-second video at 30fps = 900 API calls.

Solution

Implemented perceptual hashing (pHash) to detect similar frames. Extract frames at lower rate (2fps instead of 30fps). Compare each frame hash to previous - only send to API if significantly different. Result: 60-70% reduction in API calls per video while maintaining analysis quality.

Lesson: Before sending data to expensive APIs, ask what can be filtered locally. Deduplication and sampling can dramatically reduce costs without sacrificing quality.

2Scraped product data was becoming stale - recommendations based on old prices

Approach

Evaluated refresh strategies: full re-scrape (expensive), incremental updates (complex), or smarter caching.

Solution

Implemented tiered refresh strategy. High-traffic products refresh daily. Medium-traffic weekly. Low-traffic monthly. Price-sensitive data (deals, sales) gets priority refresh. Added staleness indicator in UI so users know data freshness. Background job queue handles refreshes without blocking user requests.

Lesson: Not all data needs the same freshness. Tiered caching based on access patterns and business importance saves resources while keeping important data current.

3RAG retrieval was returning irrelevant chunks

Approach

Analyzed retrieval results. Found that semantic similarity was matching on surface-level words but missing context. Also chunks were too large, mixing multiple topics.

Solution

Improved chunking strategy: smaller chunks (500 tokens) with overlap. Added metadata to chunks (source, category, date). Implemented hybrid search: semantic similarity + keyword matching. Added re-ranking step using a smaller model to filter irrelevant results before sending to LLM.

Lesson: RAG quality depends heavily on chunking and retrieval strategy. Smaller chunks with metadata, hybrid search, and re-ranking dramatically improve relevance. Garbage in, garbage out still applies to AI.

4Mapbox tiles loading slowly and blocking initial render

Approach

Map component was imported at page level, causing the entire Mapbox library to load before page was interactive.

Solution

Dynamic import of map component with ssr: false (Mapbox requires window). Added loading skeleton while map loads. Lazy loaded map only when scrolled into view using Intersection Observer. Implemented tile caching and reduced initial zoom level to load fewer tiles. Result: page becomes interactive 2 seconds faster.

Lesson: Heavy third-party libraries like maps should be dynamically imported and lazy loaded. Do not block initial render with components that need the full library. Load on demand when user needs the feature.

5Weather API rate limits hit when multiple users check same location

Approach

Each user request was calling OpenWeatherMap API directly. Popular locations caused repeated identical calls.

Solution

Implemented server-side caching layer. Cache weather data by location with 30-minute TTL (weather does not change that fast). Used Redis for cache storage. First request hits API, subsequent requests for same location serve from cache. Added stale-while-revalidate - serve stale data immediately, refresh in background. Reduced API calls by 90%.

Lesson: External APIs should almost always have a caching layer. Identify what data can be shared across users and cache aggressively. Weather is a perfect example - same for everyone in a location.

6Deciding between SSR and CSR for the recommendation results page

Approach

Recommendation results are personalized (based on user input) but also need to be shareable/bookmarkable. Pure CSR would hurt SEO and shareability. Pure SSR would mean no caching.

Solution

Hybrid approach: SSR the page shell and layout, CSR the personalized recommendations. URL contains encoded query params so results are shareable. Server renders a loading skeleton that hydrates with actual recommendations. This way shared links work, SEO gets the page structure, but recommendations are always fresh and personalized.

Lesson: Personalized content does not mean you cannot use SSR. Render the static parts server-side, hydrate personalized parts client-side. URL state makes personalized pages shareable.

7Multiple API calls needed per recommendation - slow response times

Approach

Each recommendation needed data from weather API, product API, and pricing API. Sequential calls were taking 3+ seconds.

Solution

Parallelized API calls using Promise.all. For APIs that did not depend on each other, fire all requests simultaneously. Added timeout handling - if one API is slow, return partial results rather than waiting forever. Implemented background enrichment - show basic recommendation immediately, enhance with additional data as it arrives (streaming).

Lesson: Never make sequential API calls when you can parallelize. Use Promise.all for independent requests. Design for partial results when some data sources are slow or unavailable.

Code Highlights

Key sections I'd walk through in a code review

Multimodal ingestion pipeline

src/lib/ingestion/multimodal.ts

This module handles voice, photo, and video input. Voice goes through Whisper for transcription, photos go directly to Vision, and videos get dissected into frames first. Each path normalizes the output into a common DiagnosticInput type that the analysis engine consumes.

Risk vs Opportunity Cost engine

src/lib/decision/riskEngine.ts

Takes the diagnostic result, user's budget/income, and urgency to calculate a recommendation score. Uses a weighted formula that considers: safety risk (highest weight), cost of delay, DIY feasibility, and financial impact. Returns a structured recommendation with confidence intervals.

What I Learned

→Perceptual hashing deduplicates visual content before expensive API calls
→Tiered caching by access patterns saves resources
→RAG quality depends on chunking - smaller chunks with metadata work better
→Privacy-first builds trust even if limits analytics
→Queue-based architecture essential for reliable scraping
→pgvector sufficient for thousands of vectors
→Test scraping with fixture HTML and mock servers
→RAG evaluation needs query-answer dataset for measuring relevance

Future Plans

+Add voice input for describing home issues instead of just photos
+Implement contractor matching and booking once recommendations are generated
+Add cost estimation model trained on historical repair data
+Mobile app for easier photo/video capture on-site
+Community features for sharing DIY solutions and contractor reviews

Want to discuss this project?

Get in Touch View More Projects