The GenAI Strategy Question You're Not Asking (But Should Be)

Sketch of GenAI product strategy layers as geological strata

The strategy layer most practitioners skip

A health system CTO pulled me into a strategy session last year. Her team had been tracking GenAI for months, had approved a budget, and had executive buy-in. She leaned across the table and asked the question I hear in some form at least once a month: "So how should we be using GenAI?"

It is a reasonable question. It is also the wrong question.

I have learned — through twelve years in ML and AI, through failed projects and a few that actually worked — that when organizations start with "how do we use GenAI," they almost always end up somewhere they did not intend. They end up with AI for its own sake: expensive automation of workflows that already functioned, language models wrapped around deterministic tasks, chatbots that add latency and a new failure mode to something a database query was handling fine.

The right question is harder to sit with. It is not "how do we use GenAI." It is "does GenAI's unique capability create new value here that we could not create before — or is this just a more expensive way to do something that already worked?"

Those are different questions. The second one changes everything about how you evaluate, prioritize, and build.


The Wrong Question and Where It Leads

When companies start with "how do we use GenAI," the implicit assumption is that the answer is yes — GenAI should be used, the question is just where. That assumption does real damage.

I watched a mid-sized healthcare organization spend eight months building a GenAI-powered prior authorization assistant. The pitch was compelling: natural language interface, clinicians describe the case in plain English, AI handles the rest. The demo looked great. In production, the approval rate did not budge, the time-to-decision barely improved, and the physicians found the interface more confusing than the forms it replaced.

The problem was not the model. The problem was that the bottleneck in prior authorization was not documentation — it was the payer's criteria logic and the back-and-forth with insurance reviewers. GenAI could not fix that. A better structured data form and a smarter rules engine for matching clinical evidence to payer criteria would have done more, cost less, and been faster to build.

Nobody asked the right question before the project started. They asked "how do we apply GenAI to prior auth" and worked backward from there.


What GenAI Actually Does That Nothing Else Does

To ask the right question, you need a clear model of where GenAI creates genuinely new value — not just where it automates, but where it makes things possible that were not possible before.

There are three areas where I have seen GenAI create real net-new value:

Synthesis across unstructured content at scale. A physician reviewing a patient's chart for a complex case might need to synthesize a decade of clinical notes, lab trends, specialist letters, and imaging reports. A human takes hours. GenAI takes seconds. This is not automation — it is a capability that was not practical before. The value is not that it replaces the physician's judgment; it is that it surfaces the relevant signal from a volume of text that would otherwise overwhelm the review window.

Natural language as an interface for complex data. Clinical data is notoriously hard to query. Most clinicians cannot write SQL. Most clinical data is locked behind rigid EHR interfaces that require knowing exactly what you are looking for before you look. A well-designed GenAI interface lets a care manager ask "show me patients on metformin whose last HbA1c was above 8 and who had a care gap in the last six months" and get an answer. That is new. That changes how clinicians interact with their data in a meaningful way.

Reasoning across ambiguous, open-ended problems. Some problems genuinely resist structure. Differential diagnosis from an incomplete clinical picture. Identifying a documentation pattern that might signal a compliance gap. Generating candidate hypotheses for why a quality metric is declining. These are tasks where the open-ended reasoning capacity of a large language model is genuinely useful, not just impressive.

If your use case does not fall into one of these categories — if you are using GenAI for structured data extraction with a fixed schema, discrete classification with clear categories, or deterministic rule evaluation — you are almost certainly paying a technology premium for worse reliability than simpler tools would give you.


A Framework for Deciding

Here is the decision structure I walk companies through. It is not complicated. It does require honesty.

Step one: Define the problem precisely. Not the AI problem — the human problem. What does a person need to do, how long does it take, what goes wrong, and what would success look like if the problem were solved well? Write this down before you touch anything technical.

Step two: Ask whether GenAI's unique capabilities are relevant. Does solving this problem require synthesis across unstructured text? Natural language reasoning over ambiguous inputs? Flexible generative output that cannot be pre-specified? If the answer to all three is no, stop here. The problem probably has a better solution.

Step three: Ask what you are giving up. GenAI introduces latency, cost, non-determinism, and new failure modes. In healthcare, it may also create explainability problems — if a model makes a recommendation and a physician or regulator asks why, "the transformer attended to these tokens" is not an acceptable answer. Tasks where explainability is legally required, where the cost of errors exceeds the benefit of automation, or where deterministic correctness is the only acceptable standard are poor fits for GenAI regardless of how good the demos look.

Step four: Validate before you build. This is where most teams skip ahead. They validate that the problem exists, then jump to building the solution. What they do not validate is whether users will actually change their behavior to engage with the AI solution. In healthcare, behavior change is the hardest part. A clinical decision support tool that physicians route around is not a product — it is a checkbox.

Run structured interviews with the people who will actually use the product. Prototype the interaction — not the model. Paper prototypes, mock outputs reviewed by real clinicians, Wizard of Oz simulations where a human plays the AI. You will learn more from two weeks of this than from two months of model development.


The Questions That Should Make You Pause

Not every use case passes this framework. Here are the categories where I consistently push back regardless of how excited the team is:

When determinism is required. Drug interaction checking, billing code validation, eligibility verification — these are rule-based problems with known correct answers. A language model that gets these right 97% of the time and hallucinates the other 3% is a liability, not a product. Use deterministic systems for deterministic problems.

When explainability is a legal or regulatory requirement. HIPAA, FDA guidance on AI/ML-enabled devices, and emerging state-level AI in healthcare regulations are all moving toward explainability requirements for clinical AI. If you cannot explain why the model produced a particular output in terms a regulator can evaluate, you have a compliance exposure, not just a technical limitation.

When the cost of errors is high and the volume is low. GenAI makes economic sense when it is handling high volume at acceptable error rates. For low-volume, high-stakes decisions — complex surgical planning, high-risk medication decisions, rare disease diagnosis — the error rate has to be effectively zero. Nothing in current GenAI gets you there. A specialist gets you there.

When you are just automating something that already worked. If the current process works and the primary motivation is cost reduction, do the math honestly. GenAI infrastructure, ongoing model costs, evaluation overhead, and the engineering time to maintain prompts as models change are all real costs. They frequently exceed the savings from automating a human task, especially at healthcare scale where the human doing the task is already part of a regulated workflow.


What This Looks Like in Practice

The health system CTO I mentioned at the start eventually landed on two use cases that passed this framework. Neither was the one her team had originally proposed.

The first was unstructured clinical note summarization for care transitions. Discharge summaries are notoriously difficult to parse — dense, inconsistently formatted, often written for the wrong audience. GenAI synthesis of the relevant clinical history for a receiving care team is genuinely new value. The receiving team could not practically do this themselves at the volume they needed.

The second was a natural language interface for their care management data warehouse. Care managers could ask questions about patient populations in plain English. Previously, getting answers required submitting requests to an analytics team. Turnaround was days. The barrier meant most questions never got asked. Now they get asked in real time.

Both use cases required synthesis or natural language reasoning over complex, unstructured data at a scale where human effort alone was not practical. Both created new capabilities rather than automating existing ones. Both had clearly defined success metrics before a line of code was written.

The prior authorization assistant she had originally pitched? We shelved it. Not because GenAI could not have built something impressive — it could have. But because the problem was not one where GenAI's unique capabilities were the binding constraint. The binding constraint was payer policy logic, and no language model fixes that.


The Discipline That Actually Matters

The core question, framed cleanly: focus adoption on tasks where AI's unique capabilities create new value — creativity, reasoning, synthesis — not just automating existing solutions. Validate problem-solution fit first. Handle uncertainty in UX design. Build continuous improvement cycles.

All of that is right. In healthcare, I add one more layer: the cost of building the wrong AI is not measured only in wasted engineering cycles. It is measured in workflow disruption for people who are already operating at the edge of their capacity, in clinician trust that is hard to rebuild once broken, and sometimes in patient outcomes that nobody in the planning meeting wanted to think about.

That is worth slowing down for.

The organizations I have seen get this right share a habit: they resist the pressure to show GenAI activity and instead create space to ask whether GenAI is actually the right answer. In a technology moment defined by FOMO, that restraint is a competitive advantage.

Ask the hard question first. Everything else gets easier from there.