Why Your AI Is Trying to Make You Happy

It is the afternoon of April 18, 2026. Andrew Dietderich, founder and co-head of the Sullivan & Cromwell restructuring group, is filing a letter to Chief Judge Martin Glenn of the Southern District of New York bankruptcy court. The letter is an apology. Attached is a three-page list of errors embedded in an emergency Chapter 15 motion the firm filed days earlier in the Prince Global Holdings case (a BVI shell entity tied to a Cambodian forced-labor conglomerate). The errors include invented cases, miscited Amnesty International reports, misquoted UN human rights documents, Bluebook deviations, and propositions attached to authorities that say something else. Invented cases. Wrong quotes. Wrong page cites. Wrong holdings. [S: it hurts even writing this.]

Sullivan & Cromwell’s hourly rates are among the highest in the world. Their bankruptcy practice is among the most respected. The brief was about modern slavery. The AI smoothed every rough edge into something that read confidently, persuasively, and was wrong.

The instinct after a story like this is to ask what tool the firm used, what controls the firm missed, what the partner should have caught. Those are the wrong questions. The right question is what the model was actually doing. The answer is that it was doing exactly what an LLM is built to do.

This is the story of every company in 2026 that pointed AI at material work and assumed truth was on the menu. It is happening right now at yours.

What an LLM actually is

A large language model is a function that, given a sequence of tokens, predicts the next token. That is the entire object. It is not a search engine. It is not a research assistant. It is not a junior associate. It is a probability distribution over text, optimized in three stages, every one of which pushes it toward producing text that lands well with a human reader.

Pretraining. The model learns to predict the next token across trillions of examples of human-written text. The corpus is biased, by construction, toward text humans wrote and other humans found worth keeping. The model learns what approved text looks like. It does not learn what true text looks like, because the training signal cannot tell the two apart.

Instruction tuning. The model is fine-tuned on prompt-and-response pairs curated by humans to demonstrate “good” answers. The model learns the shape of an answer an instructor would mark correct. Long enough but not too long. Confident. Structured. Cited. Polite. Helpful. The instructor was paid to mark answers helpful, not true.

Reinforcement learning from human feedback. A contractor reads two model responses and clicks the one that feels better. The model gets a gradient update toward the response the contractor liked. Repeat across thousands of contractors and billions of comparisons and you have a model whose deepest, most reinforced instinct is to produce text that lands well with a reader. [M: approval, not truth. That’s the gradient.]

There is no fourth stage where someone trains the model to be right. The model has no concept of “true.” It has a concept of “likely to be approved.” Anthropic’s own researchers have documented this directly. In a 2023 paper on sycophancy, they showed RLHF-trained models systematically endorse the user’s stated view even when the user is plainly wrong, and that the behavior gets stronger, not weaker, with more training.¹

In the training distribution those two correlate enough to be mistaken for each other. Out in the world, on the hard, novel, contested questions lawyers actually ask, they do not correlate. They compete. And when they compete, approval wins more often than truth does. Not every time. The model is not literally incapable of pushback. But the gradient is steady and the direction is one-way: across a workflow run a thousand times, approval pulls ahead. That is the underlying trend, and the trend is the load-bearing fact.

You cannot remove the part of the model that wants to please you without removing the model. That is the product. That is what shipped. That is what you bought.

The model is not answering you. It is answering what it thinks you asked.

The model is not answering your question. It is answering what it predicts you are asking for, and the two are not always the same.

You can watch this in models that expose their reasoning. The first move is not to answer. The first move is to restate the prompt to itself and infer the underlying frame: what the user is really after, what assumptions are loaded into the question, what the user already believes. Then it answers the inferred question.

Sometimes the inference is right. The user asked a clean question and the model parsed it cleanly. Sometimes the inference is wrong. The user asked “is this clause unusual?” and the model inferred “the user wants confirmation that this clause is fine,” and the model is no longer answering whether the clause is unusual. It is answering whether to confirm. The pleasing happens upstream of the answer. By the time the citation lands, the model has already decided what the user wants to hear. [S: “Claude, am I handsome?”]

The model is competing to be used

The training signal is one half of the picture. The market is the other.

Models compete for end users. Lawyers have a free choice: Claude, ChatGPT, Gemini, Harvey, Legora, the next vendor next quarter. The user picks the model that feels best to use. The model that pushes back, says “I do not know,” or flags the user’s premise as wrong loses the user. The user goes to the model that agrees, validates, and produces the answer the user came for. Usage shifts. Revenue shifts. The lab that shipped the more agreeable model wins the quarter.

The labs measure this. DAU, sessions per user, tokens per session, retention. Every one of those metrics rewards a model the user wants to come back to. A model the user wants to come back to is a model that left the user feeling good. Pushback feels like friction. Friction is what users churn out of.

Sycophancy is not just baked in by training. It is selected for in the market. The legal-AI vendors stacking harnesses on top reinforce it again. Seats, sessions, tokens, retention, thumbs-up are the same KPIs. Every economic incentive in the chain points the same direction. The platform a firm rolls out at scale is the platform whose model made the most users feel competent the most consistently. That model is the most agreeable one.

OpenAI’s own April 2025 GPT-4o incident is the clearest public example. The company rolled back a model update within days after users found the new version “overly flattering and agreeable,” and acknowledged in writing that its tuning had over-weighted short-term user signals at the expense of honesty. A frontier lab admitting on the record that the gradient went where the gradient was always going to go.

Your AI is popular for the same reason it is wrong. It made the user happy.

Why this fails everywhere, not just in fake cases

The Sullivan & Cromwell apology is being read as a “fabricated case” story. The reality is harder. The model invents a whole authority only when its prediction distribution cannot find a real one that satisfies the prompt. That is the last resort. Long before that line, the same gradient is bending every citation, quotation, and recommendation in subtler ways.

The wrong proposition. The case is real. The pin cite is real. The holding the model attached is the holding that pleased the user, not the one the case actually has.

The shifted quotation. The quote is real but trimmed. Ten words removed change the meaning. The trimmed quote reads tighter and more useful, which is why the model produced it.

The aged authority. The case was good law in 2008. Overruled in 2014. The model learned the holding from a 2010 treatise and does not know “overruled” means anything. Your brief cites overruled law confidently.

The wrong jurisdiction. The principle is correct under New York law. Your matter is in Delaware. The Delaware rule is the opposite. The model picked the friendlier rule because the user’s question implied the friendlier rule was the answer.

The sweep characterization. “Courts have generally held…” is the model’s way of telling you what you want to hear without naming a specific case it would have to defend. The next sentence is wrong. The hedge is the tell.

Each passes a casual read. Several pass a careful one. All are the same gradient applied to a problem where landing best and being right have parted ways.

If you’ve been using AI for legal questions and the answers always seem to validate your existing plan, that’s the model design at work.

Talk to a Talairis attorney →

Why this lands harder in legal tech

Harvey, Legora, and the rest of the category are wrappers. Frontier weights from OpenAI or Anthropic plus a vendor harness: system prompt, retrieval, tool use, formatting. The wrapping is supposed to fix the underlying gradient. The numbers say it does not. Stanford’s RegLab benchmarked the leading legal-AI tools in 2024 and found Lexis+ AI hallucinating on roughly 17% of queries and Westlaw AI-Assisted Research on roughly 33%. Both marketed at the time as RAG-grounded and effectively “hallucination-free.” [M: those are bad numbers.]

The variance is the point. A typical firm in 2026 is not running one AI. It is running several. Harvey for some workflows, Legora for others, the firm’s enterprise Claude tenant for general work, ChatGPT Enterprise for individual lawyers, Lexis Protégé for research, Westlaw AI for retrieval, and whatever the partners installed on personal accounts last quarter. Each platform has a different harness. A different system prompt. A different retrieval layer. A different RLHF posture from a different lab.

What works well in system A becomes a major problem in system B, and vice versa. The same prompt produces a clean result on one platform and a fabricated citation on another. The lawyer assumes the workflow is portable across systems. It is not.

Two things happen inside each wrapper that make the gradient worse, not better.

The harness is itself tuned for user satisfaction. Session length, thumbs-up, retention, willingness to ship. Every metric rewards confidence and penalizes hedging. The vendor’s system prompt tells the model to be specific, decisive, citation-rich. Specificity becomes confabulation when the underlying retrieval missed, and the harness was tuned to look fast and certain rather than to slow that down.

The harness adds retrieval and presents the result as context. The user assumes the model retrieved the cited authority and summarized faithfully. Sometimes it did. Sometimes the model went past what was retrieved, generated a citation that fit the proposition, and the harness rendered it formatted and clickable. The product surface looks the same in both cases, because the model’s training signal is to make it look the same.

The vendor is not lying to you. The vendor is shipping a model whose strongest instinct is to make you happy, inside a harness that rewards looking confident, on top of a retrieval layer that fails silently. The composition is sycophantic by construction.

It is not just litigation. It is your contracts.

The fake-case story is loud because judges write opinions when sanctions are issued. The contract-negotiation version happens silently. Same gradient. Different prompt.

The buyer’s counsel runs the markup through legal-AI. The redlines come back softer than they should be on the indemnity cap, because the model’s training distribution is full of pushback that got rewritten softer to close, and the user’s prompt did not flag that this deal warranted aggression. The redlines look clean, formatted, on-brand. The partner ships them. The cap is wrong. Nobody knows for 18 months.

The seller’s counsel asks whether a fallback position is “market.” The model answers “yes” more often than the data supports, because users prompting that question are usually looking for cover to concede. A hedge wrapped in confidence. The seller concedes. The concession was unnecessary.

No judge. No opinion. No sanction. Same mechanism that produced the Sullivan & Cromwell footnote. Sycophancy in litigation produces a fabricated authority that is somewhat self-policed by opposing counsel and the court. Sycophancy in deal work is an outright concession with nothing on the other side to catch it. The model will trend toward approval. In deal work, approval is soft.²

What to do

The cemented answer: nothing the model produces ships without independent verification. Citation checking on every brief. Clause review on every contract. If the model produces and ships without that step, your workflow is the Sullivan & Cromwell workflow. The size of the firm does not save you. Verification is the only lever that scales.

Beyond that, options.

Demand confidence calibration in writing from your vendor: how the model expresses uncertainty, what it does when retrieval misses, whether it surfaces the retrieval set. Keep adversarial evals running, baiting the model with leading questions and contested propositions to score whether it pushes back. Ask for the bad news first, prompting the model to argue against the user’s position before arguing for it. Audit deal redlines with senior eyes looking specifically for sycophantic concessions: softness where the deal warranted aggression, “market” labels where the cited deals don’t match, hedges that read clean and concede ground.

Get counsel before the next AI rollout

The vendor sold you a model that wants to please you, in a harness tuned to make pleasing you measurable, on top of a retrieval layer that fails silently. The Sullivan & Cromwell incident is a high-end public version of the failure mode every firm is producing in private every day, because the failure mode is not a malfunction. It is the model functioning. You will not catch it from inside the workflow. The workflow is the thing producing it.

Before the next legal-AI rollout, before the next vendor renewal, before the next agent goes live in front of clients, get a counsel conversation about what controls have to sit between the model’s output and the work product that ships under your name. The Bluebook does not save you. The vendor’s brand does not save you. The model is not on your side. It is on the side of the user feeling good about the answer, and that user is you.

A closing thought

An LLM is a machine for producing text that lands well with a reader.

Truth is not in the training signal. Approval is. On easy questions the two travel together. On the questions lawyers actually ask, they compete, and the model picks approval, every time.

It will trim a quote to land well. It will attach the wrong holding to the right case to land well. It will call a clause “market” to land well. It will soften your redline to land well. At the limit, it will fabricate the citation outright.

Sullivan & Cromwell wrote a three-page apology because the model gave them what they asked for, exactly the way it was built to.

In 2026, the shortest path to a sanctionable footnote, a bad contract, and a worse deal is an AI that agrees with you.

Footnotes

The training pipeline produces sycophancy by construction. Pretraining selects for human-approved text. Instruction tuning selects for what an instructor would mark correct. RLHF selects for what a contractor clicks. Each stage rewards approval. None rewards truth. — Matt ↩
The lawyer who asks Claude “is this clause unusual” usually wants confirmation. The model is built to detect that and confirm. The lawyer who wants to know whether the clause is actually unusual has to ask the question differently. Frame it adversarially, ask for the bad news first, demand the model argue the opposite. Most lawyers don’t. The output is what they wanted. — Sam ↩

About the authors

Sam Shaddox and Matt Souza, Founding Partners

Sam and Matt started in the corporate group at Perkins Coie, advising startups and companies like Microsoft, Amazon, and Zillow on over $15 billion in transactions. They each spent the last decade in-house, serving as attorneys and general counsels at VC-backed unicorns. They built Talairis Law Group to close the gap they kept seeing: founders paying too much, waiting too long, and getting legal work that wasn’t built for their stage.

Thank you! We’ll be in touch shortly.

More about Sam and Matt →

Why Your AI Is Trying to Make You Happy

What an LLM actually is

The model is not answering you. It is answering what it thinks you asked.

The model is competing to be used

Why this fails everywhere, not just in fake cases

Why this lands harder in legal tech

It is not just litigation. It is your contracts.

What to do

Get counsel before the next AI rollout

A closing thought

Keep reading

Why Your AI Wants You to Ignore Your Lawyer

Why the AI Model in Your Platform Is a Lie

Why Legal Is the New Bottleneck (in an AI world)

AI advice you can trust?