Engineering
58 articles in Engineering.

Prompt Engineering Didn't Die. It Got Unrolled.
Everyone keeps announcing the death of prompt engineering. They are describing the symptom, not the shift. The loops you used to run by hand — refine, retry, verify, learn — moved out of your head and into infrastructure. Four of them, simultaneously.

The Two Rhythms of B2B Tech: What Palantir Gets Right That Most Companies Get Wrong
Palantir built a company on the idea that software alone is not enough — you need engineers embedded with customers. That model has a name, a cost, and a hidden technical debt time bomb that most B2B companies are quietly sitting on.

Penpal: Dispatch Tool Today, RPG Interface Tomorrow
I built a tool that turns GitHub issues into pull requests using a three-agent pipeline. That's the boring part. The interesting part is what happens when you stop thinking about AI agents as productivity tools and start thinking about them as a workforce — and you build a world for them to live in.

Mirth Connect, what happened? Here's What Comes Next.
Mirth democratized healthcare integration. Then NextGen acquired it, and the world moved on. After 12 years of building on top of it, here's what the next generation of integration tooling looks like — and why it was inevitable.

Product Evals in Three Steps (That You'll Actually Do)
Most teams skip evals because the process feels overwhelming. Here is the three-step framework that makes eval-driven development achievable: label a small dataset, calibrate an LLM evaluator to human judgment, then iterate configs against the harness. No excuses left.

Table Stakes for Pragmatic Development Using LLMs
Updated for 2026: lessons from two years of using Claude Code in production. Context engineering, real eval frameworks, model economics, and agent workflows — what actually works.

Staff Engineer Layoff Survival Guide: Lessons from 2008, 2020, 2023 — and Now
I've survived three tech recessions. Lost my job in one, held the axe in another. Now it's 2026 and the AI boom changed the rules. Here's the updated playbook.

The Principal IC Playbook Nobody Shares With You
Reaching principal is the first rung of a new ladder, not the last rung of the old one. Here is what nobody told me about the ownership-autonomy paradox, how leverage actually works at this level, and the charter model I wish I had from day one.

The ESM Mess: JavaScript's Module System Is Still Broken and Here's Why
ES Modules have been the supposed future of JavaScript for nine years. Only 9-27% of the ecosystem has actually adopted them. Here's what's really going on, and how to survive until the ecosystem commits.

The Three Things Exceptional Engineering Leaders Do (And the One They Stop Doing)
Most engineering leaders excel at one of three pillars — providing direction, removing obstacles, or foreseeing change — and quietly fail at the other two. After 12 years and more management mistakes than I care to count, here's what I've learned about building strength across all three.

The Open-Weight LLM Landscape in 2026: What Engineers Actually Need to Know
The open-weight ecosystem has matured faster than most engineers realize. MoE proliferation, hybrid attention, and extended context windows are changing what's actually deployable on-premise — and that matters more than ever for healthcare AI.

Software 2.0 Is Here and It Changed How I Think About Programming
In 2017, a post called "Software 2.0" argued that neural networks would replace explicit logic as the dominant programming paradigm. Nine years later, that prediction has fully landed — and the implications for how we build software are bigger than most engineers want to admit.

Three Ways to Know If Your Career Is Actually Growing
Normal career metrics — title, pay, team size — tell you how you're doing relative to others. They don't tell you whether you're growing. Here are three that do.

RAG Isn't Dead. You're Just Using It Wrong.
The 'RAG is dead' narrative is wrong — but it's wrong in an interesting way. After building RAG systems in healthcare production, here's what actually kills LLM context quality and what to do about it.

Fine-Tuning a 70B Model on a Consumer GPU: The Q-LoRA Practical Guide
Q-LoRA + SFTTrainer + Flash Attention v2 means you can fine-tune a 70B parameter model on 24GB of VRAM. Here is what that actually looks like end-to-end, what it costs in quality, and when you should just use the API instead.

Time vs. Timing: The Career Framework I Wish I Had Earlier
I've made bets that paid off because of timing and bets that paid off because of compounding. Confusing the two is how careers stall.

The LLM Year in Review: What Actually Mattered in 2025 (And What Was Noise)
The prediction was: bigger models win. The reality was: DeepSeek R1 rewrote the rules in January and nothing was the same after that. Here is what 2025 actually taught us about reasoning, inference-time compute, and the changing economics of intelligence.

From Contractor to Consultant: The Mindset Shift That Changes Your Income
Contractors sell time. Consultants sell results. After 12 years in AI and ML, I learned that distinction the hard way — and the moment it clicked changed how I price, position, and pick clients.

What the Teams Actually Shipping Coding Agents Have Figured Out
Coding agents are the most economically viable AI in production today. Here are the patterns that Devin, Cline, Amp, and others converged on — and what they mean for anyone building or using agents seriously.

Stop Shipping Features: Why AI Products Need an Experiment Mindset
After shipping 12 features in a quarter and moving zero meaningful metrics, I learned the hard way that AI products are not software projects. The roadmap is a hypothesis board, not a delivery schedule.

Beyond Chunks: Why Faceted Context Is the Future of RAG
Chunk-based RAG returns results. Faceted context gives agents peripheral vision — an understanding of the information landscape that lets them navigate rather than just consume. Here is what that looks like in a domain where getting it wrong actually matters.

Context Engineering: The Skill That Replaced Prompt Engineering
After 12 years in ML and two years building production AI systems, I stopped obsessing over prompts. The engineers who ship better agents are not writing better instructions — they are designing better information spaces.

Every Service Is Going to Need an MCP Layer
REST APIs were designed for humans calling services through UIs. AI agents are not humans. Here is what breaks when you expose your existing APIs to agents, and what the right architecture actually looks like.

When English Became a Programming Language
v0 just proved that English plus AI can replace traditional web development for most apps. I've spent 12 years mastering this craft. Here's my honest take on what that means.

Fine-Tuning LLMs Without the RLHF Headache: The DPO Approach
RLHF is the right idea with the wrong implementation cost for most teams. DPO flips the math — here's how I'd use it to align a healthcare AI model on clinician feedback without burning a month on reward model engineering.

Agency Beats Intelligence: How I Now Hire (And Evaluate Myself)
Raw intelligence is abundant and cheap. The engineers thriving in the AI era are the ones with agency — the ability to set goals, act under uncertainty, and self-correct. Here is how I changed my interview process to find them.

You Are No Longer a Coder: The Shift from Execution to Direction
After 12 years of building systems by hand, I stopped writing most of my own code. Here is what changed, what I delegated to AI, what I found it cannot do, and why the hardest part of the transition had nothing to do with technology.

Context Rot: The Silent Performance Killer in Your LLM Application
Your LLM system works great in demos and degrades in production. The culprit is almost never the model. It's what you're feeding it. Here's how to diagnose and fix context rot before it kills your product.

Why Your AI Gets Smarter When You Let It Think Longer
Test-time compute is the most underused lever in production AI right now. Here is what chain-of-thought, best-of-N sampling, and process reward models actually mean for practitioners building real products — and when to use them vs. just grabbing a bigger model.

The Async-First Engineering Team: What Actually Works (And What Doesn't)
I went async-first with my remote engineering team. Productivity went up. Culture took a hit. Here's the honest accounting of what changed, what broke, and the specific practices that made it worth it.

What the AI Tool Ecosystem Is Actually Telling You
After wasting months on AI tools that sounded great and died within a year, I started reading the ecosystem differently. Here is the framework I use now — and why tool selection in healthcare AI is a compliance problem, not just a productivity one.

What Autonomous Vehicles Taught Me About Multi-Agent AI Design
BAIR researchers discovered that just 5% autonomous vehicle penetration can smooth all highway traffic — with no central coordination. That finding quietly reshapes how I think about building multi-agent AI systems.

Vision + Language: How Multimodal LLMs Actually Work (And When to Use Them)
Multimodal LLMs integrate vision through two fundamentally different architectures. Knowing which one you need — and why — is the decision that shapes every other technical choice in your build.

The Self-Healing Stack: What AI-Native Infrastructure Actually Means
The AI Cloud vision — where infrastructure monitors, optimizes, and repairs itself — is compelling. Some of it exists today. Most of it doesn't yet. Here's an honest breakdown of what self-healing infrastructure looks like in practice — and what engineers should actually be doing to prepare.
FHIR Meets Graph Databases: Exploring Healthcare's Natural Network Structure
How FHIR's interconnected resources transform into powerful graph relationships. Exploring the potential of graph technologies in healthcare AI at Clarity Health Project.

The Tools I Dropped When AI Changed My Development Workflow
After 12 years of accumulating dev tools, AI coding assistants forced me to rethink every layer of my stack. Here's what I dropped, what I added, and the principle behind the whole thing.

From GPT-2 to DeepSeek: The Architectural Changes That Actually Mattered
I've been reading ML papers for 10 years. Most don't matter. These architectural choices did. RoPE, GQA, SwiGLU — each one solved a real scaling problem. Here's what practitioners need to know when a new model claims 'better architecture.'

Building a GenAI Platform That Doesn't Collapse Under Its Own Weight
Most GenAI platforms fail not because the models are bad, but because teams build everything at once. A practitioner's guide to layered GenAI architecture — from the minimal production-ready core to healthcare-grade guardrails and beyond.

Every Failed AI Product Has the Same Root Cause
After 12 years in ML and AI, I keep seeing the same failure pattern: teams that ship fast and iterate on vibes instead of building systematic evaluation systems. Evals are not a nice-to-have — they are the core competency of any serious AI product team.

The 6 Ways I've Watched GenAI Projects Fail (And How to Avoid Them)
After 12 years in ML and two years watching GenAI projects go sideways in healthcare — sometimes with real patient consequences — here are the six failure modes I see over and over again, and what to do instead.

When to Look Beyond Standard LLMs (And When to Stop Overthinking It)
Most teams should use a frontier API and move on. But there are specific situations — extreme latency, long-context scale, cost walls, privacy constraints — where alternative architectures actually matter. Here's the decision framework I use.

When Recommendations Meet Language: The LLM-RecSys Convergence
Most AI stacks treat the recommendation engine and the language model as two separate systems that hand off to each other. A new class of hybrid models eliminates that seam — and the implications for domain-specific AI are significant.

Trading Speed for Quality: A Practical Guide to Inference-Time Scaling
Inference-time scaling lets you tune the latency-quality tradeoff at runtime rather than at training time. Here is a practical framework for deciding when to use Best-of-N sampling, beam search, iterative refinement, or one-shot generation — with real examples from clinical AI.

Inside the Black Box: What Mechanistic Interpretability Means for Builders
Healthcare AI requires explainability — "the model said so" is not a clinical rationale. Mechanistic interpretability is the research field trying to change that. Here is what it actually offers practitioners today, what the gap still is, and what you can do in the meantime.

How to Actually Test If Your AI Will Say Something Dangerous
Most teams treat jailbreak testing as a vibe check. StrongREJECT achieves 0.90 Spearman correlation with human judgment — which means automated safety evaluation is real, and there is no good excuse not to build it into your pipeline.

The Attack Your LLM App Is Definitely Vulnerable To
Prompt injection is the #1 OWASP threat to LLM applications — and most teams are not taking it seriously. Here is what the attack looks like, why it is so hard to stop, and how to actually harden your system.

The Honest Guide to LLM Evals: What Actually Works
Most teams skip real evals and wonder why their AI products degrade in production. Here is the framework that actually holds up — from 30-minute manual reviews to binary scoring to knowing when your eval suite is finally doing its job.

Why Your LLM Evaluator Is Lying to You
LLM-as-judge evaluators feel like quality assurance but behave like rubber stamps. They fail hardest on the outputs that matter most — edge cases, safety-critical errors, domain-specific nuance. Here is what to do instead.

Why I Stopped Using RAG for Coding Agents (And What I Do Instead)
The instinct when building a coding agent is "I need RAG to handle large codebases." The better instinct is giving the agent tools to explore code the way a senior engineer would — reading files, following imports, tracing execution.

The Neural Net Training Recipe That Actually Works
I spent months chasing architecture fixes when my real problem was bad debugging hygiene. The training recipe that works — start simple, visualize everything, tune last — is the unglamorous discipline that separates working models from expensive experiments.

You Don't Need GPT-4 for That: Small Models and Edge Agents
The assumption that frontier models are required for agentic function calling is wrong — and for healthcare AI, it can also be a compliance liability. Here's when a fine-tuned 7B model is the right architecture, and when it isn't.

Multi-Agent Orchestration in Practice: What I Learned Building Parallel Agent Systems
The orchestrator/worker pattern is the key mental model for multi-agent systems. Here is how to structure orchestrators, spawn and manage workers, aggregate results, and avoid the coordination failures that will sink you.

What It Actually Takes to Build a Real LLM Agent
Everyone's talking about agents. Few people have actually built one that works in production. Here's what the architecture papers skip: the failure modes, the memory tradeoffs, the tool design decisions that actually matter.

Three Hard Truths About LLMs in Production Nobody Warned Me About
Twelve years in ML and I still got burned. Stochasticity is a systems design problem. Your target model will be deprecated. Silent failures are worse than loud ones. Here is what production healthcare AI actually taught me.

What AI Agents Actually Are (And What They Can't Do Yet)
Everyone is building 'agents.' Most are just APIs with a system prompt. Here's the precise definition, what the components that actually matter are, the failure modes I've hit, and how to pick the right pattern for your problem.

CDS Hooks: What Nobody Tells You Before You Build
I spent months building a CDS Hooks integration that actually worked. Here's what the documentation doesn't mention.

The Food Truck Method: Building MVPs That Don't Suck
After building dozens of MVPs, I finally figured out why most turn into dumpster fires. Here's my framework for building fast without the regret.

HIPAA Consent: What Engineers Actually Need to Know
I spent 3 months drowning in HIPAA documentation. Here's what actually matters for engineers building healthcare apps.
























































