Articles from 2024

17 articles published in 2024.

Every Failed AI Product Has the Same Root Cause

Every Failed AI Product Has the Same Root Cause

After 12 years in ML and AI, I keep seeing the same failure pattern: teams that ship fast and iterate on vibes instead of building systematic evaluation systems. Evals are not a nice-to-have — they are the core competency of any serious AI product team.

ProductRead more →
The 6 Ways I've Watched GenAI Projects Fail (And How to Avoid Them)

The 6 Ways I've Watched GenAI Projects Fail (And How to Avoid Them)

After 12 years in ML and two years watching GenAI projects go sideways in healthcare — sometimes with real patient consequences — here are the six failure modes I see over and over again, and what to do instead.

ProductRead more →
When to Look Beyond Standard LLMs (And When to Stop Overthinking It)

When to Look Beyond Standard LLMs (And When to Stop Overthinking It)

Most teams should use a frontier API and move on. But there are specific situations — extreme latency, long-context scale, cost walls, privacy constraints — where alternative architectures actually matter. Here's the decision framework I use.

EngineeringRead more →
When Recommendations Meet Language: The LLM-RecSys Convergence

When Recommendations Meet Language: The LLM-RecSys Convergence

Most AI stacks treat the recommendation engine and the language model as two separate systems that hand off to each other. A new class of hybrid models eliminates that seam — and the implications for domain-specific AI are significant.

EngineeringRead more →
Trading Speed for Quality: A Practical Guide to Inference-Time Scaling

Trading Speed for Quality: A Practical Guide to Inference-Time Scaling

Inference-time scaling lets you tune the latency-quality tradeoff at runtime rather than at training time. Here is a practical framework for deciding when to use Best-of-N sampling, beam search, iterative refinement, or one-shot generation — with real examples from clinical AI.

EngineeringRead more →
Inside the Black Box: What Mechanistic Interpretability Means for Builders

Inside the Black Box: What Mechanistic Interpretability Means for Builders

Healthcare AI requires explainability — "the model said so" is not a clinical rationale. Mechanistic interpretability is the research field trying to change that. Here is what it actually offers practitioners today, what the gap still is, and what you can do in the meantime.

EngineeringRead more →
How to Actually Test If Your AI Will Say Something Dangerous

How to Actually Test If Your AI Will Say Something Dangerous

Most teams treat jailbreak testing as a vibe check. StrongREJECT achieves 0.90 Spearman correlation with human judgment — which means automated safety evaluation is real, and there is no good excuse not to build it into your pipeline.

EngineeringRead more →
The Attack Your LLM App Is Definitely Vulnerable To

The Attack Your LLM App Is Definitely Vulnerable To

Prompt injection is the #1 OWASP threat to LLM applications — and most teams are not taking it seriously. Here is what the attack looks like, why it is so hard to stop, and how to actually harden your system.

EngineeringRead more →
The Honest Guide to LLM Evals: What Actually Works

The Honest Guide to LLM Evals: What Actually Works

Most teams skip real evals and wonder why their AI products degrade in production. Here is the framework that actually holds up — from 30-minute manual reviews to binary scoring to knowing when your eval suite is finally doing its job.

EngineeringRead more →
5 Reasons to Solve for Adoption Before Building Your Digital Health Tool
·6 min read

5 Reasons to Solve for Adoption Before Building Your Digital Health Tool

You built a great digital health product. Clinicians love the idea. But no one is buying. Why? Too many companies start with a great idea, build the tech first, and then struggle to get adoption.

HealthcareRead more →
Why Your LLM Evaluator Is Lying to You

Why Your LLM Evaluator Is Lying to You

LLM-as-judge evaluators feel like quality assurance but behave like rubber stamps. They fail hardest on the outputs that matter most — edge cases, safety-critical errors, domain-specific nuance. Here is what to do instead.

EngineeringRead more →
Why I Stopped Using RAG for Coding Agents (And What I Do Instead)

Why I Stopped Using RAG for Coding Agents (And What I Do Instead)

The instinct when building a coding agent is "I need RAG to handle large codebases." The better instinct is giving the agent tools to explore code the way a senior engineer would — reading files, following imports, tracing execution.

EngineeringRead more →
React Tooling 2024: Stop Using the Wrong Shit
·6 min read

React Tooling 2024: Stop Using the Wrong Shit

After building 20+ React apps this year, here's what actually works and what's a waste of time. Spoiler: You're probably overengineering.

ReactRead more →
The Neural Net Training Recipe That Actually Works

The Neural Net Training Recipe That Actually Works

I spent months chasing architecture fixes when my real problem was bad debugging hygiene. The training recipe that works — start simple, visualize everything, tune last — is the unglamorous discipline that separates working models from expensive experiments.

EngineeringRead more →
You Don't Need GPT-4 for That: Small Models and Edge Agents

You Don't Need GPT-4 for That: Small Models and Edge Agents

The assumption that frontier models are required for agentic function calling is wrong — and for healthcare AI, it can also be a compliance liability. Here's when a fine-tuned 7B model is the right architecture, and when it isn't.

EngineeringRead more →
Multi-Agent Orchestration in Practice: What I Learned Building Parallel Agent Systems

Multi-Agent Orchestration in Practice: What I Learned Building Parallel Agent Systems

The orchestrator/worker pattern is the key mental model for multi-agent systems. Here is how to structure orchestrators, spawn and manage workers, aggregate results, and avoid the coordination failures that will sink you.

EngineeringRead more →
What It Actually Takes to Build a Real LLM Agent

What It Actually Takes to Build a Real LLM Agent

Everyone's talking about agents. Few people have actually built one that works in production. Here's what the architecture papers skip: the failure modes, the memory tradeoffs, the tool design decisions that actually matter.

ReactRead more →