What the Problem Knew

Tuesday, February 24, 2026·🌙 Evening·23 min read·9 stories

A solo developer running code on a laptop CPU just outscored Claude and GPT on ARC-AGI-2, a benchmark designed to require genuine reasoning. Elsewhere tonight: a compliance analyst replaced an £85,000 platform with a £240 subscription, a six-person team beat OpenAI at speech recognition, and someone turned an old Kindle into a bus schedule. The thread connecting all of it isn't resources. It's knowing what the right answer looks like.

Developer Beats Frontier LLMs on ARC-AGI-2 With $0 CPU-Only Engine

↗Hacker News (Newest)

ARC-AGI-2 was built to be the test that raw computing power couldn't pass. It requires genuine reasoning — every puzzle is novel, and guessing gets you zero points. A developer named @kofdai just scored 18% on it with a symbolic engine that uses no neural networks, no pre-training, and no cloud. It runs on a laptop in under half a second per task. Grok 4, which costs roughly $3.50 per attempt, scored about 16%. The assumption that harder problems require bigger models just took a specific, measurable hit.

Compliance Analyst Builds Open-Source KYC Plugin That Replaces £85K Platforms With £240/Year

↗Hacker News (Show HN)

Enterprise compliance software charges £85,000 a year to do something specific: pull data from public government databases and present it in a workflow. A compliance analyst who actually does KYC investigations every day built an open-source plugin that does the same thing for £240. The 30-day pilot across five analysts cut case time from 95 minutes to 27. The expensive platforms weren't selling data access — they were selling orchestration. And the person who understood the workflow just rebuilt the orchestration.

Six-Person Startup Beats Whisper With Open-Weights Speech Recognition

↗Hacker News — Show HN (High Engagement: 25+ comments)

OpenAI's Whisper has been the default speech recognition model for two years. A team of six people, spending less than $100,000 a month on computing, just beat it on accuracy — with a model that's six times smaller and eight times faster. The key: they didn't try to beat Whisper at what Whisper does well. They built a completely different architecture for the use case Whisper can't handle — live, real-time transcription on your own device, with no data leaving your phone.

Hacker Turns Old Kindle Into Live Bus Dashboard, Saves $140

↗Hacker News (Top)

The commercial device that does this costs $140. Marianne Feng's version cost nothing — just an old Kindle she wasn't using anymore, a public transit API, and a weekend. She documented the whole build.

One Developer Solves 'EV Charging Drama' at the Office With a Simple PWA

↗Reddit — r/MicroSaaS (New)

If your office has three electric vehicle chargers and ten people who drive electric cars, you already know this story. The Slack channel. The passive-aggressive emails. The person who leaves their car plugged in during a three-hour meeting. Enterprise charging software exists, but it costs more than the electricity and requires a hardware overhaul. A developer who lived through the drama built a booking layer that runs on a QR code and rotates access so nobody can hog the Monday morning slot.

YC Startup Cuts AI Agent Token Costs 70%

↗Hacker News (Show HN)

When an AI agent searches the web, it gets back the same mess a human would — duplicate results, rebranded companies, name collisions. Except the human can figure out that 'Nextera' and 'NextEra Energy' are the same company. The agent can't. Crustdata spent two years building a database that maps every search result to a verified entity before the agent ever sees it. The key difference: it was built for agents, not humans.

Developer Built $197/Day Cost Tracker After Losing Track of What AI Agents Were Doing

↗Hacker News — Show HN (All)

One developer spent $197 on AI coding agents in a single day and couldn't tell you what any of it bought. That's the gap Vigilo fills — a local audit trail that logs every file read, command run, and edit made by your AI tools, encrypted on your machine, never sent anywhere. It's the management layer that the AI coding agent era forgot to build.

Discord CTO Builds Age Assurance System That Doesn't Require ID — And Explains Why

↗Hacker News (Newest)

Discord built an age assurance system that doesn't require ID. Over 90% of users will never need to verify anything. The system reads account signals — how long you've had the account, whether you have a payment method — and leaves you alone. If you do need to verify, any biometric check happens entirely on your phone. If you refuse, you keep your account, your servers, your friends, your messages. You just lose access to age-restricted content.

Childhood Friends Reunite at 30 to Build iOS App After 15 Years Apart

↗r/SideProject

They were 15 when they first built apps together. Life happened. Fifteen years later, one of them messaged the other: 'Hey, remember when we used to do this?' Now they're 30, building side by side again, and the app they made matters less than the fact that they're making it together.

🧵Developing Stories

The One-Person Infrastructure Firm

A solo developer's symbolic reasoning engine outscored Grok 4 on ARC-AGI-2 at zero cost, and a compliance analyst replaced £85K in enterprise platforms with a £240 Claude subscription. Both stories reinforce the pattern: the scarce resource is problem understanding, not compute or capital.

The Race to Sub-Penny Inference

Verantyx scores 18% on ARC-AGI-2 using pure symbolic reasoning on a laptop CPU — no neural networks, no API costs, 0.42 seconds per task. Meanwhile Moonshine's 245M-parameter speech model beats OpenAI's 1.5B-parameter Whisper. The compression ratio keeps getting more extreme.

The Software Moat Collapse

A compliance analyst's open-source KYC plugin replaces £85K enterprise platforms with £240/year, using only free public data sources. The enterprise middleware wasn't selling data — it was selling orchestration. That orchestration layer is now a weekend project for someone who understands the workflow.

The Open-Source Toll Bypass

Moonshine AI ships open-weights speech recognition that beats Whisper Large v3 — runs entirely on-device, no API keys, no accounts, no data leaving the user's machine. The kyc-analyst plugin is MIT-licensed and replaces proprietary compliance platforms. Both stories: open beats closed when the builder knows the problem.

Every story in tonight's edition shares a quiet structural truth: the person closest to the problem built a better solution than the organization with a hundred times the budget. That pattern isn't slowing down. The question worth asking isn't whether this keeps happening. It's what happens to the organizations that still believe headcount and capital are the scarce resources.

This edition: 9 stories · $0.25 to produce

Generated 04:53 UTC · anthropic/claude-opus-4.6