ContentRot - Automated Outdated Content Detector for Websites and Documentation

SaaS that crawls your site daily, flags articles/docs that mention deprecated tech, outdated prices, or broken links, and generates a prioritized fix list so you don't leak authority and conversions.

𝕏 Post Reddit HN

Difficulty

beginner

What is it?

Every website has content rot. Blog posts about 'React 15 patterns' rank for relevant keywords but confuse readers. Pricing pages list old tiers. Documentation references deprecated APIs. Old case studies feature clients who no longer exist. Search engines penalize stale content. Conversion rates drop because visitors read outdated info. ContentRot crawls your site on a schedule, detects outdated content using Claude (semantic analysis of deprecation patterns, version numbers, dates), flags broken links via headless browser, and generates a ranked dashboard showing 'Fix this first: your top-10 blog post links to deprecated Rails documentation (400 backlinks, $5k topical authority at risk).' For content-heavy companies (SaaS, developer tools, agencies), this prevents the slow bleed of organic traffic and customer trust. Why 100% buildable right now: Crawling APIs are stable (Cheerio, Playwright). Claude can detect outdated patterns with simple prompts. Supabase handles the data. Monthly crawls cost under $50. DevTools companies and SaaS founders are desperate for this — there's no product currently doing automated content rot detection at this price.

Why now?

Content SEO is hotter than ever (2025–2026 vibe is SEO renaissance). Claude's semantic understanding is now good enough for 'is this outdated?' detection. Playwright/Cheerio are mature and cheap. Every SaaS company has a 'content debt' problem they ignore until organic traffic drops.

▸Automated daily/weekly crawl of site with semantic analysis (Implementation: Playwright + Cheerio, Claude API for 'is this content outdated' prompt)
▸Link checker with 404 detection and redirect tracking
▸Dashboard with outdated content ranked by backlink count and traffic impact
▸Export fix list with suggested updates

Target Audience

SaaS companies, developer tools, agencies, and documentation-heavy businesses with 200+ pages. Estimated 500,000 such companies globally. Initial target: US/EU SaaS and developer tools with 10+ blog posts and active documentation. ICP: content lead or engineering manager responsible for SEO and doc maintenance.

Example Use Case

Carlos, a content lead at a developer tools SaaS with 300 blog posts, uses ContentRot to crawl weekly. The system flags 12 articles mentioning deprecated API endpoints, 8 articles with 404-broken links, and 4 case studies with dead company URLs. ContentRot prioritizes: 'Top 5 post about Node.js performance has 2,000 backlinks and links to deprecated package — fix first.' Carlos fixes the worst 20 articles over 2 weeks. Organic traffic recovers by 18% in 6 weeks.

User Stories

▸As a content lead, I want to find all outdated references in my site automatically, so that I can prioritize fixes by impact and recover organic traffic.
▸As a developer, I want to know when my API docs mention deprecated endpoints, so that I can update them before users get confused.
▸As a SaaS founder, I want alerts when my pricing page gets stale, so that leads don't read outdated info and object to real prices.

Acceptance Criteria

Site Setup: done when user adds website URL and crawler validates connectivity. Crawl Execution: done when crawler fetches 100+ pages without crashing and stores HTML in Supabase. Deprecation Detection: done when Claude flags 3+ types of outdated patterns (deprecated tech, old prices, broken dates) with 80%+ accuracy. Dashboard Display: done when user views outdated flags sorted by impact, with correct URLs and reasons. Backlink Estimate: done when dashboard shows estimated backlinks for each flagged page (can mock as count × 100 on v1).

Is it worth building?

$29/month for up to 500 pages × 60 customers = $1,740 MRR at month 3. $99/month for 2,000 pages × 40 customers = $3,960 MRR by month 5.

Unit Economics

CAC: $35 (ProductHunt launch + Reddit seeding + cold outreach). LTV: $290 (10 months × $29/month, assumes 50% stay paid for 1 year). Payback: ~1 month. Gross margin: 80% (after API costs).

Business Model

SaaS subscription

Monetization Path

Free tier: crawl up to 50 pages, basic link checking. Paid at $29/month (500 pages, weekly crawls, AI analysis), $99/month (2,000 pages, daily crawls, priority alerts).

Revenue Timeline

First dollar: week 2 (beta upgrade). $500 MRR: month 2. $1.5k MRR: month 3. $3k MRR: month 5.

Estimated Monthly Cost

Claude API: $30 (200 pages × weekly crawl × $0.05 per crawl estimate). Playwright: $0 (self-hosted runs cheap). Vercel: $20. Supabase: $25. Stripe: ~$10 processing fees. Total: ~$85/month at launch.

Profit Potential

Full-time viable at $2k–$6k MRR. Potential exit value $2M–$5M (boring SaaS 2–3x ARR multiple).

Scalability

High — can expand to custom deprecation rules per industry, multi-language support, Slack/email digest alerts, competitor site monitoring.

Success Metrics

Week 2: 40 signups. Week 3: 8 paying customers. Month 2: 30 paying customers. Retention: 80% at 30 days.

Launch & Validation Plan

Survey 20 content leads and SaaS founders. Ask: 'How much organic traffic do you lose to outdated content yearly?' If 12+ say $5k+, build. Recruit 3 beta customers with real sites (100–500 pages).

Customer Acquisition Strategy

First customer: Post on r/Entrepreneur and r/startups with 'I lost 18% organic traffic because my docs got stale' angle. Offer free scan + 1-month free trial to first 10 signups. Then: ProductHunt launch, direct cold email to content leads at 100 US SaaS companies (LinkedIn hunter list), Slack communities (Growth, Content Marketing), SEO forums.

What's the competition?

Competition Level

Very Low

What's the roadmap?

Feature Roadmap

V1 (week 2): Crawler, Claude deprecation detection, link checker, dashboard with flags sorted by impact. V2 (month 2–3): Slack digest alerts, fix suggestions (rewrite options), backlink integration (Ahrefs API), multi-site dashboard. V3 (month 4+): Custom deprecation rules, competitor monitoring, API for programmatic access, team features.

Milestone Plan

Phase 1 (Week 1): Supabase schema, Playwright crawler, Cheerio parser, Claude integration working locally. Done when: solo founder can crawl 50 pages and get deprecation flags in 2 minutes. Phase 2 (Week 2): Dashboard UI, Stripe checkout, cron job trigger. Done when: founder can add a site, trigger crawl, and see flags on web dashboard. Phase 3 (Month 2): ProductHunt launch, scale crawling to handle 2,000-page sites, beta customer support. Done when: 10+ paying customers with zero major bugs.

How do you build it?

Tech Stack

Next.js for dashboard, Supabase for crawl data and reports, Claude API for content analysis, Playwright for link checking, Vercel for hosting — build with Cursor for backend crawl logic and Claude integration, Lovable for dashboard UI.

Time to Ship

2 weeks

Required Skills

Web crawling (Cheerio or Playwright), Claude API, Next.js dashboard, Supabase.

Resources

Playwright docs, Claude API docs, Supabase docs, Cheerio docs, Next.js tutorials.

MVP Scope

Supabase schema for sites, crawl results, outdated content flags. Next.js dashboard with flag list and backlink counts. Playwright crawler (triggered weekly via cron). Claude API integration for deprecation detection. Link checker via headless browser. Stripe billing. Must-have files: /lib/crawler.ts, /lib/deprecation-detector.ts, /pages/api/crawl, /pages/dashboard, /lib/backlink-fetcher (can mock for v1), database migrations.

Core User Journey

Architecture Pattern

Supabase cron job triggers crawler -> Playwright crawls site pages -> Cheerio parses HTML -> Claude analyzes for deprecation -> results stored in Postgres -> dashboard queries and sorts by impact (backlinks × severity).

Data Model

User has many Websites. Website has many CrawlResults. CrawlResult has many OutdatedContentFlags. OutdatedContentFlag records URL, reason (deprecated API/price/date), severity, estimated_backlinks, suggested_fix.

Integration Points

Supabase for storage, Claude API for deprecation detection, Playwright for crawling and link checking, Stripe for payments.

V1 Scope Boundaries

V1 excludes: multi-language support, competitor monitoring, custom deprecation rules per user, mobile/app-specific crawling, JavaScript rendering for heavy SPAs (crawl static HTML only).

Success Definition

A content lead finds the product on ProductHunt, signs up, runs the first crawl, and sees real outdated content flagged without founder help, then purchases a paid plan.

Challenges

Accurately detecting 'outdated' is hard without domain knowledge. Prices, version numbers, and deprecation patterns vary wildly. False positives will kill trust. Crawling large sites (5,000+ pages) gets expensive fast. May hit rate limits on headless browser usage.

Avoid These Pitfalls

Do not crawl without user consent — set up robots.txt checks. Do not use Claude on every page; batch process 10 at a time to save costs. Do not try to estimate backlinks with headless browser — use Ahrefs API (paid) or mock data on v1.

Security Requirements

Auth: Supabase Auth with email/GitHub OAuth. RLS: all queries filtered by user_id, users can only see their own sites. Rate limiting: 50 req/min per IP. Input validation: validate URLs with URL parsing library. GDPR: data deletion endpoint for users and their crawl history.

Infrastructure Plan

Hosting: Vercel (Next.js dashboard). Database: Supabase (Postgres). Crawler: serverless function on Vercel (cron trigger or external scheduler). Playwright: run via Vercel serverless or BrowserStack API (paid, $20–50/month if scaling). CI/CD: GitHub Actions. Environments: local dev, staging (Vercel preview), prod. Monitoring: Sentry for errors, Vercel Analytics. Cost: Vercel $20, Supabase $25, Sentry $29, BrowserStack (optional) $0–50 = $74–124/month.

Performance Targets

Expected load at launch: 30 DAU. Target crawl: 500 pages/day, 10 pages/min. Crawler response time: under 30 seconds per page. Dashboard page load: under 2s. Caching: Supabase query caching, crawl results cached for 7 days.

Go-Live Checklist

☐Crawler tested on 5 real websites (100+ pages total)
☐Claude deprecation detection tested on 200+ articles, 80%+ accuracy confirmed
☐Link checker tested on 50+ URLs with known 404s
☐Stripe payment flow tested end-to-end
☐Error tracking (Sentry) live and alerts configured
☐Custom domain set up (e.g., contentrot.app) with SSL
☐Privacy policy and terms published (note: crawling disclosure)
☐3 beta teams tested and signed off
☐Rollback plan: disable crawler job, revert Supabase schema
☐ProductHunt, Reddit, and direct outreach launch posts drafted.

How to build it, step by step

1. npx create-next-app content-rot --typescript. 2. npm install supabase stripe @anthropic-ai/sdk playwright cheerio. 3. Set up Supabase project, create tables (users, websites, crawl_results, outdated_flags). 4. Write /lib/crawler.ts using Playwright to fetch pages and Cheerio to parse. 5. Write /lib/deprecation-detector.ts with Claude prompt to detect outdated patterns. 6. Create /pages/api/crawl webhook (triggered by Supabase cron or external scheduler like Vercel crons). 7. Build /pages/dashboard with outdated flags list, sorting by impact. 8. Set up Stripe checkout in /pages/api/stripe. 9. Create /pages/api/add-site endpoint to let users input URLs. 10. Deploy to Vercel and test with 1 real site.

Generated

March 24, 2026

Model

claude-haiku-4-5-20251001

← Back to All Ideas