← Digital & AI Strategy

The delivery layer underneath AI: what it actually is, by example

Most enterprise GenAI programmes are not stalled by the model. They are stalled by the delivery layer underneath. Five organisational prerequisites, four engineering disciplines, and what they look like in public companies that got it right and ones that did not.

A practitioner deep-dive · Consulting Huber · 2026

Bernhard Huber

Interim Executive & Innovation Leader · CV · LinkedIn

The phrase, defined honestly

The picture in 2026 is consistent across the major surveys. McKinsey's 2025 State of AI puts 88% of large enterprises using AI in at least one business function. One in three has scaled it across the enterprise. Two in five can point to any EBIT impact at all. Roughly six per cent report meaningful EBIT impact — the cohort McKinsey calls AI high performers.

Most GenAI programmes are sitting on the wide step of that staircase, somewhere between pilot and production, and the standard explanation is that the model is not yet good enough. It almost never is. The same models that power the production wins also power the stalled pilots.

Across five primary surveys covering nearly ten thousand enterprise respondents — McKinsey, BCG, Deloitte, IBM and MIT NANDA — not one lists model quality as a top-ranked failure reason. Every top reason is structural. No clear ownership. No production-grade data. No target operating model. No governance that scales. No change capacity. Or, underneath any of those: no engineering-delivery discipline at all — no DORA metrics on the AI team, no pilot squad before scale, no engineering manager who owns the outcome, no cadence that connects exec, squad and risk in one room.

Google's 2025 State of DevOps puts the same observation in one sentence:

"AI doesn't fix a team; it amplifies what's already there. Strong teams use AI to become even better and more efficient. Struggling teams will find that AI only highlights and intensifies their existing problems." — Google DORA, 2025 State of AI-Assisted Software Development

The delivery layer underneath AI is the working definition of what that sentence is naming. It is not data plumbing. It is not a target-operating-model slide. It is the daily and weekly cadence that turns operating-model intent into shipped value — five organisational prerequisites and four engineering disciplines that public reporting can verify, case by case.

This piece defines the layer, walks through it, and shows it in nine public companies: where the layer was missing, where it was visibly in place, and one where the company built it, scaled it past where the evidence supported, and then publicly walked it back.

The five organisational prerequisites

Before any engineering discipline matters, five organisational elements have to be present. Each one fails in a recognisable way. The names are the same five our companion piece on AI Value Creation identifies; the question this section answers is what each one looks like when it is missing.

1. A business owner with P&L accountability

Not a sponsor from IT. Not a steering-committee chair. Not the innovation lab. A named operator whose number on a quarterly P&L moves with the use case — the head of customer operations, the chief credit officer, the head of marketing. If the only person who shows up at the monthly review is from technology, the use case is a technology project, and a technology project rarely produces an EBIT line a CFO can name. McDonald's three-year IBM drive-thru pilot is the cleanest cautionary tale: no public success criteria, no post-mortem, no named owner who paid for the result. After it was switched off in July 2024, neither McDonald's nor IBM published metrics. A three-year programme produced no learnable outcome because no operator was on the hook for one.

2. Production-grade data where the use case actually runs

The hard part, almost always. The model can be excellent and the demo can be polished, but if the data that the production workflow needs lives in a system the AI team cannot reach, was last cleaned in 2019, or sits in a region the use case is not allowed to touch, the pilot ends at the pilot. The positive example is Bloomberg LP: BloombergGPT is a 50-billion-parameter model pre-trained on a 363-billion-token Bloomberg-proprietary financial corpus. The competitive advantage is not the parameter count. It is the corpus — thirty years of proprietary financial archive, in a structure the model can use. Most enterprises cannot match that on every use case. The ones that ship learn to spot, early, which use cases have a defensible data layer and which do not.

3. A target operating model the AI work actually fits inside

Product, data, platform, security and change have to coordinate on a delivery cadence. When they do not, the symptom is recognisable: model performance on a benchmark is fine; the production workflow is unbuildable because legal has not signed off on the data class, the platform team is on a different roadmap, and the change-management team learns about the deployment after the press release. The court-documented Air Canada chatbot case from February 2024 is exactly this pattern. The chatbot invented a bereavement-fare policy. The British Columbia Civil Resolution Tribunal held the airline liable in Moffatt v. Air Canada, rejecting the airline's argument — striking in retrospect — that the chatbot was "a separate legal entity responsible for its own actions". No operating model connected the model's knowledge base to live policy. Nobody owned that connection.

4. Governance that scales

From 2 August 2026, the European Union's Regulation (EU) 2024/1689 applies in full to every high-risk AI system operated in the European market. The categories that catch most enterprise GenAI work are in Annex III: employment screening, performance monitoring, credit and insurance risk decisions, biometric workflows, education assessment. Article 9 requires a continuous risk-management process across the system lifecycle. Article 12 requires automatic logging with deployer retention of at least six months. Article 13 requires the system to be interpretable enough that the deployer can use it appropriately. Article 14 requires designated human-oversight personnel who can recognise automation bias, interpret outputs, override decisions and halt the system. Article 26 requires workers to be informed before workplace deployment and affected individuals to be notified when decisions about them are made using the system. Article 99 prices non-compliance at up to €35 million or 7% of total worldwide annual turnover, whichever is higher, for prohibited-AI violations. None of those obligations are met by a model that scored well on a benchmark. They are met by the operating cadence underneath — the cadence that keeps the oversight personnel staffed, the logging running, and the workforce-notification process repeatable. iTutorGroup's $365,000 EEOC settlement in August 2023 for an AI hiring tool that auto-rejected applicants over a named age is a preview of what the AI Act will price into the European market from 2026. The cost of skipping the governance layer is no longer a brand risk. It is a line item.

5. Change capacity

People who will use the tools, trust the outputs, and redesign their work around them. The failure mode is to remove the humans before the AI has proven the tail of the case distribution. NEDA, the National Eating Disorders Association, disbanded its human helpline and transitioned to the Tessa chatbot in May 2023. Within ten days Tessa was recommending calorie restriction and weight-loss targets to eating-disorder sufferers. The change layer — the trained counsellors who would have caught harmful outputs — had been eliminated before validation, not redesigned around the tool. The positive pattern looks different. Walmart's My Assistant rollout, launched in 2024 by Chief People Officer Donna Morris, scaled from 50,000 to 75,000 users across eleven countries on a "people-led, tech-powered" framing that kept human oversight in the operating model from day one.

The four delivery disciplines that make a layer

"Organisations that already organise for bounded agency in humans are well-suited to adopting AI effectively and humanely. Team Topologies offers Agentic AI clear boundaries, stable interfaces, aligned domains and collaborative ownership — the infrastructure for agency itself." — Matthew Skelton, Team Topologies as the Infrastructure for Agency with Humans and AI, QCon London, March 2026

The five prerequisites in the previous section answer the question "is the organisation ready?". The four disciplines in this section answer the question "is the delivery cadence real?". The cadence is the part that does not survive a slide. It is the part Bernhard's practice spends most of its time on.

Operating-model intent thesis · value levers · ambition

Five organisational prerequisites

Business owner with P&L accountability

Production-grade data where the use case runs

Target operating model · product · data · platform · security · change

Governance that scales · EU AI Act · model risk · audit

Change capacity · people who use, trust, redesign

The delivery layer · four engineering disciplines

DORA metrics on the AI delivery teams

One pilot squad before scale

Engineering managers mentored to run it

Exec / squad / risk in one room

Shipped AI value EBIT impact a CFO can name

The delivery layer underneath AI. Operating-model intent meets the real world only after passing through five organisational prerequisites and four engineering disciplines.

1. DORA metrics on the AI delivery teams, not industry benchmarks

The four DORA keys — deployment frequency, lead time for changes, change failure rate, time to restore service — were published in Accelerate (Forsgren, Humble and Kim, 2018) and have been the empirical backbone of software-delivery research ever since. The DORA programme's 2024 State of DevOps report introduced something the field did not have before: a measured AI-adoption variable. The finding was uncomfortable. A 25% increase in AI adoption on a team correlated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability. The 2025 follow-up State of AI-Assisted Software Development found throughput had recovered, but stability remained negative. The single most quoted sentence from that report is the one already cited above: "AI doesn't fix a team; it amplifies what's already there." The implication is structural. The DORA numbers an AI team actually moves matter more than the model it uses. The other relevant number is from BCG's AI Radar 2025 survey of 1,803 C-level executives: 60% of companies fail to define and monitor any financial KPIs related to AI value creation. The DORA-on-AI-teams discipline begins where that 60% ends. It is the gate that catches silent decay before it shows up in business outcomes.

2. One pilot squad before scale

A single product area where leadership actually cares. Not the innovation lab. The Lean Startup pattern, the two-pizza-team rule, the original Lockheed Skunk Works — the lineage is long and the rule is the same. Goldman Sachs codified it as institutional infrastructure: the GS Innovation Center, established in 2022, is the sandbox every GS AI initiative passes through before bank-wide rollout. The GS AI assistant launched firmwide in January 2025 went through that path. The pattern is visible at JPMorgan too: an opt-in internal rollout of LLM Suite to CIB before client-facing exposure. ING's COO Marnix van Stiphout has been explicit about the same discipline: "strict governance that focused all exploration in AI on five areas, and only under the control of the COO". ING reports 90% of its pilots reach production, against an industry average closer to 30%. The number is not a function of better models. It is a function of fewer simultaneous bets.

3. Engineering managers mentored to run it after the consultants leave

Outside firms who can't be fired by year three are how transformations get owned by no one. The discipline is to land the work inside the client's own engineering-management line. Camille Fournier's The Manager's Path (O'Reilly, 2017) and Will Larson's An Elegant Puzzle (Stripe Press, 2019) are the canonical references for what that role looks like at scale. The pattern that fails is the platform team named on a slide and unfunded on the budget. The pattern that succeeds is named, accountable, and paid out of the client's own engineering line. Mercado Libre's GitHub Copilot rollout to its 9,000+ developer base ran through a named SVP of Technology (Sebastian Barrios), a two-month developer onboarding bootcamp, and GitHub Advanced Security wired into the CI pipeline. That is what "the engineering manager owns it" looks like in public reporting.

4. Exec, squad and risk in one room, on a real cadence

Weekly for delivery. Monthly for value. Quarterly for the value-creation plan itself. The anti-pattern Marty Cagan and Chris Jones named in Empowered (Wiley, 2020) is the "Puppet Master" — leaders who impose solutions while pretending to empower teams. The empowered model assigns problems, not solutions, and uses the governance cadence to enforce accountability without micromanagement. Klarna's two-phase customer-service AI story is what a working cadence looks like even when the underlying bet has to be partly reversed. Phase one, in February 2024, scaled an OpenAI-powered customer-service agent across 23 markets with measured outcomes (67% of chats handled without a human, resolution time from 11 minutes to under 2). Phase two, fifteen months later, walked the substitution back: CEO Sebastian Siemiatkowski told Bloomberg the firm was rehiring humans for complex, fraud and hardship cases because the original eval framework had over-weighted speed and cost against tail-case quality. That is the governance cadence working as designed — including the part where it changes course.

Worked examples — when the layer is missing

Four public cases, each anchored to primary-source reporting, each mapped to the prerequisite or discipline that was structurally absent. The pattern is consistent: in each case the model was fine. The layer underneath was not.

Air Canada chatbot · February 2024

The British Columbia Civil Resolution Tribunal heard Moffatt v. Air Canada in February 2024. Mr Moffatt had asked the airline's chatbot about bereavement fares after his grandmother's death. The chatbot told him he could apply retroactively for a discounted rate within ninety days. No such policy existed. Air Canada denied the refund and, in tribunal, argued that the chatbot was "a separate legal entity responsible for its own actions". The tribunal disagreed, found negligent misrepresentation, and awarded CAD 812.02 plus costs. The case is small in dollar terms and large in structural terms. The prerequisite that was absent was the target operating model. Nobody owned the link between the chatbot's knowledge base and live fare policy. Product, legal and operations were not coordinating on a delivery cadence. The discipline that was absent was the governance cadence: there was no human-review path for novel policy queries and no audit trail that could have caught the hallucinated answer before it reached a grieving customer. The court ruling makes the cost of a missing operating model legible in a way that no consulting deck does.

NYC MyCity chatbot · March 2024

The Markup's March 2024 investigation of New York City's official small-business chatbot — a Microsoft Azure deployment announced by Mayor Adams in October 2023 — found it advising landlords to refuse Section 8 housing vouchers, employers to pocket worker tips, and businesses to refuse cash. Each of those answers is, on its face, a violation of city law. The city declined to take the chatbot offline after the reporting; it stayed live for months. The prerequisite that was missing was governance that scales. No model-risk review against applicable NYC law. No human oversight on outputs before go-live. No audit trail. The Department of Small Business Services, the city's legal team and the technology owners were running in disconnected lanes. The case is a preview of what the EU AI Act will price into the European market from 2 August 2026 — with the difference that the New York taxpayer pays the bill regardless, and the European deployer will pay it under Article 99.

iTutorGroup AI hiring tool · EEOC settlement, August 2023

The US Equal Employment Opportunity Commission announced its first-ever AI workplace-discrimination settlement in August 2023. iTutorGroup's automated recruitment screener had auto-rejected over two hundred US-based applicants on the basis of age alone — women 55 and over, men 60 and over. The discrimination was discovered when a single applicant submitted two identical applications differing only in date of birth and received different outcomes. The settlement: 365,000 dollars, mandatory anti-discrimination training, five-year EEOC monitoring, and an obligation to re-invite every rejected applicant. The prerequisite that was missing was, again, governance that scales: no disparate-impact testing, no model-risk framework, no audit trail — none of the routine employment-law due diligence that a human screening process would carry. The discipline that was missing was the governance cadence: no exec, no engineering manager and no legal-risk owner ever met in one room about the system before deployment. The EEOC case is now the United States' template for AI-employment liability; from August 2026 the EU AI Act's Article 26 will require workers to be informed before workplace deployment and affected individuals to be notified when the system is used in decisions about them. Same failure, two regulators, two priced exposures.

Klarna customer-service agent, phase two · May 2025 rollback

In February 2024 Klarna's OpenAI-powered customer-service agent went live across 23 markets, handled 2.3 million conversations in its first month, and cut average resolution time from eleven minutes to under two. By May 2025, CEO Sebastian Siemiatkowski publicly walked the deployment back: humans were being rehired for complex, fraud and hardship cases. The substitution had, in his words, gone too far. Phase one looks like a textbook success. Phase two is the failure pattern. The prerequisite that was absent in phase one was change capacity. Klarna had eliminated the human workforce — the layer that would have caught tail-case degradation — before the AI had proven it could handle the full distribution of cases. The discipline that was absent was the engineering-delivery discipline of measuring the right thing. Klarna's acceptance criteria measured average resolution time. They did not measure customer-satisfaction outcomes on complex emotional or fraud-related tickets, which is exactly where the AI quietly decayed. The Klarna case is a paired example: phase one shows what the case looks like with three of the prerequisites and three of the disciplines visibly present, and the next section returns to it from that angle. Phase two shows what happens when the missing parts catch up.

Three further cases corroborate the patterns above without anchoring the section. McDonald's three-year IBM drive-thru voice AI, terminated in June 2024, illustrates the missing business owner: no named operator, no public success criteria, no metrics, no post-mortem. NEDA's Tessa chatbot, taken offline in June 2023 after recommending calorie restriction to eating-disorder sufferers, illustrates the missing change capacity at extreme. DPD's UK delivery chatbot, which in January 2024 swore at a customer and wrote derogatory poetry about its own employer after a routine system update, illustrates the missing engineering-delivery discipline: no staging gate, no regression test, no red-team between update and live traffic.

Worked examples — when the layer is visibly in place

Four public cases where the layer is visibly present in the public reporting. Named owners. Production data. Pilot squads before scale. Engineering managers who own the outcome. Governance cadence connecting exec, squad and risk. The numbers below are the ones the operators themselves report on the record.

ING Bank · the centrepiece

ING is, on the public evidence, the cleanest current example of what the layer looks like in production. Operating from the Netherlands with a global retail and wholesale footprint, ING has put its entire GenAI portfolio — the customer chatbot, KYC and customer due diligence, transaction monitoring, the developer copilot, the agentic mortgage pilot — through a centralised platform under personal COO ownership. Chief Operating Officer Marnix van Stiphout owns the programme. Chief Technology Officer Daniele Tonella, on the record with Computer Weekly, describes the operating principle in one sentence: "strict governance that focused all exploration in AI on five areas, and only under the control of the COO". The five-areas rule is the pilot-squad discipline made institutional — not five hundred experiments, five. The headline outcome number is the one that should make every CFO and PE operator stop scrolling: 90% of ING's pilots reach production, against an industry average closer to 30%. The supporting numbers are consistent. 75% of customer queries handled autonomously across the retail chatbot footprint. KYC compressed from days or weeks to seconds. 10,000 daily transaction-monitoring alerts filtered down to ~500 relevant ones for compliance analysts. Five thousand employees trained on data fluency and GenAI. 140 distinct AI risks vetted under an EU AI Act compliance framework that already maps to the 2 August 2026 deadline. ING is the closest public case to the diagram in the previous section. Every box is staffed.

BBVA · phased rollout to 120,000 employees

BBVA's GenAI rollout is the textbook phased pilot-to-scale pattern. The Spanish bank started in May 2024 with 3,300 ChatGPT Enterprise licences. By late 2024 the rollout was at 11,000. In December 2025 the bank announced expansion to its full ~120,000-employee workforce, with Bloomberg corroborating independently. The programme is owned by Elena Alfaro as Global Head of AI Adoption, with Ricardo Martín Manjón as Global Head of Data and chairman Carlos Torres Vila signing the OpenAI strategic alliance. The outcome metrics are the kind of specificity boards rarely see: 83% of licence holders engaged weekly per BBVA's own AI Adoption tracking, 2.8 to 3 hours saved per employee per week, more than 4,800 custom GPTs built by employees with around 700 curated in an internal GPT Store. The single most precise number is from a single function: BBVA's Legal Services GPT automated more than 9,000 bastanteo queries annually and delivered 26% of the Legal Services division's annual savings KPI. That kind of KPI-attribution precision rarely makes a press release. When it does, it is because the operating model produced it — not the model.

JPMorgan Chase LLM Suite

JPMorgan Chase has put its GenAI work on the record in Jamie Dimon's 2025 Annual Report letter and a long McKinsey interview with Chief Analytics Officer Derek Waldron. LLM Suite, the bank's model-agnostic internal GenAI platform, runs more than 450 use cases in production against an annual technology budget of around 18 billion dollars. The platform reached more than 65,000 active CIB users and ~200,000 employees firmwide in eight months — opt-in rollout, employee-facing before any client-facing exposure, governed under a three-pillar architecture (OmniAI ML factory plus LLM Suite plus fundamental research). The discipline that is visible from the public reporting is the named CAO owning the programme, the de-risk-before-scale rollout philosophy, and the staged employee adoption that surfaced its own use cases. The headline productivity figure operators quote on the record: investment-banker pitch decks built in roughly thirty seconds that previously took hours, with three to six hours saved per CIB user per week.

Goldman Sachs · GS AI Platform and the Innovation Center

Goldman's discipline is institutional. Chief Information Officer Marco Argenti is publicly the named owner. The GS Innovation Center, established in 2022, is the pilot squad before scale — every GS AI initiative passes through it before bank-wide rollout. GitHub Copilot was rolled out to all 12,000 of the firm's developers; the public productivity number from Argenti, quoted by American Banker and Fortune, is roughly 20% — the equivalent of adding 2,400 developers to the existing headcount. In January 2025 GS AI, a model-agnostic assistant covering GPT, Gemini and Claude, was extended to the full 46,000-person workforce. By mid-2025 the bank reported about a million prompts per month firmwide. The governance controls are documented publicly: automated monitoring, hallucination reduction, information-protection guardrails, prompt-content flagging, AI benchmarked against human performance rather than absolute accuracy. The point is not that those controls are exotic. The point is that they are public, named and standing.

Three further cases corroborate the patterns above. The peer-reviewed GitHub Copilot productivity study — Communications of the ACM, March 2024, n=95 professional developers — remains the field's gold-standard evidence: Copilot users completed an identical task 55.8% faster than the control group, with 78% versus 70% task completion. Mercado Libre's Latin American GitHub Copilot rollout to its 9,000+ developer base illustrates the engineering-manager-ownership discipline (named SVP Sebastian Barrios), a two-month bootcamp, GitHub Advanced Security wired into the CI pipeline, and ~100,000 pull requests per day as a deployment-frequency DORA proxy. BloombergGPT, a 50-billion-parameter model pre-trained on a 363-billion-token Bloomberg proprietary financial corpus, is the cleanest example of the production-grade-data prerequisite from the previous section turned into competitive moat. Twenty years of structured financial archive is the data layer. The model is the layer that sits on top of it.

Klarna belongs back in this section too. Phase one of its customer-service AI — the period from February 2024 through early 2025 — visibly carried four of the five prerequisites: a named CEO operator on the P&L, production data in a real workflow, an operating model that integrated AI into transactional refunds and returns, and a governance cadence that eventually detected and acted on the quality decay. The two missing elements were the engineering-delivery discipline that would have caught the tail-case degradation earlier (the wrong acceptance metric was being watched), and the change-capacity buffer that should have been preserved during scale. Phase two is what an honest governance cadence looks like when those gaps surface. The course-correction was public, fast and on the record. That is closer to success than failure even when the headline outcome reverses, because the operating model produced the change.

The diagnostic: five conditions, four disciplines

The grid below maps the eight worked cases against the nine structural elements. Read across each row: a filled forest dot means the element was visibly present in public reporting; an empty dashed circle means it was visibly absent or the case turned on its absence; a half-shaded mark means partial. The legend is below the grid.

Case	Five organisational prerequisites					Four delivery disciplines
Case	Owner	Data	Op model	Gov	Change	DORA	Pilot	EM	Cadence
Air Canada chatbotFeb 2024 · BCCRT 149	~	~	○	○	~	○	~	○	○
NYC MyCityMar 2024 · The Markup	○	~	○	○	○	○	○	○	○
iTutorGroup AI hiringAug 2023 · EEOC settlement	○	~	~	○	~	○	○	○	○
Klarna AI · phase 2 rollbackMay 2025 reversal	●	●	●	~	○	○	~	~	●
ING Bank2024–2026 · COO-owned	●	●	●	●	●	●	●	●	●
BBVA2024–2025 · phased to 120k	●	●	●	●	●	~	●	●	●
JPMorgan LLM Suite2024–2026 · CAO-owned	●	●	●	~	●	~	●	●	●
Goldman Sachs GS AI2022– · Innovation Center	●	●	●	●	●	~	●	●	●

●Visibly present in public reporting ~Partial · ambiguous in reporting ○Absent · or case turned on its absence

Three things to notice. First, the failure cases cluster on the right side of the grid, not the left. In every failure case at least one of the four delivery disciplines is absent — usually the governance cadence, the engineering-manager owner, or both. The model was not the problem. The cadence was. Second, the success cases are not perfect. JPMorgan, BBVA and Goldman all carry a partial mark on DORA-on-AI-teams — the public reporting does not yet show formal DORA dashboards for their AI work, even though the rest of the layer is visibly in place. That is the field's current frontier; it is not a failure of those programmes, it is a gap in the practice. Third, ING is the only case in the grid with all nine boxes filled. There are other programmes that probably belong on the right-hand side of that comparison. ING is the one whose Chief Operating Officer has put the operating principle on the record — five authorised areas, only under the control of the COO — and whose 90% pilot-to-production rate is publicly verifiable.

How to run the grid on your own programme. Take the most strategically important GenAI use case in flight today. For each of the nine columns, answer one question. Is there a named human, on the record, who owns this column for this use case? Owner: the operator whose number on the P&L will move. Data: the engineer accountable for the freshness, classification and access of the data the model needs in production. Op model: the person who can call product, data, platform, security and change into one room next week. Governance: the legal-and-risk owner who has read Articles 9, 12, 13, 14 and 26 of Regulation (EU) 2024/1689 and signed off against them. Change: the change-management owner who has not been disbanded. DORA: the engineer who can quote your AI team's lead time and change failure rate this morning. Pilot: the squad that owns the production target before anyone else gets the system. EM: the engineering manager who will still be running this in two years when the consultants are gone. Cadence: the weekly delivery review, monthly value review and quarterly value-creation review, with attendance.

The pass mark, in our experience, is roughly four-of-nine to start a serious programme and seven-of-nine to expect production value at scale. Below four-of-nine the work is not a delivery problem yet; it is a leadership problem — and a delivery layer is not the first thing the organisation needs.

How to read this if you're the buyer

If you are a CEO, board member, transformation officer or private-equity operator, the GenAI debate has split into four distinct buyer situations. The framing below cuts through the noise faster than any scoring matrix.

Situation 1 — the board asks "which model should we bet on?". It is, almost always, the wrong question. The right one is which use case has the layer underneath. A board that spends a meeting on Claude versus GPT versus Gemini is spending an hour on the part that is least decisive. The same hour on the four delivery disciplines — who owns the DORA numbers, where the pilot squad sits, which engineering manager runs it after the consultants leave, when exec, squad and risk meet in one room — will move the AI investment more than any model selection ever did. The answer to the model question is "it changes every quarter, and it does not matter; pick the one that fits the data layer you already have".

Situation 2 — CEO with stalled pilots. The grid above is the diagnostic order. Start with the rightmost four columns (the disciplines), not the leftmost five (the prerequisites). If the disciplines are absent, no amount of fixing the prerequisites will produce production value — the prerequisites pile up as readiness statements and the work does not ship. If the disciplines are in place but the prerequisites are not, the work ships into a vacuum — production deployment with no operator who owns the P&L, no governance that survives an audit, no change layer that uses the output. Three months of disciplined cadence on one use case beats twelve months of pilots on six.

Situation 3 — private equity due diligence. The commercial and tech assessment now needs an AI-delivery section. Three questions cut through the management deck. First, name the production GenAI workload that moved a line on the most recent quarterly P&L; if it does not exist, the AI claim is theatre. Second, name the operator who owns it; if the answer is a CIO or a Chief Innovation Officer rather than a P&L head, the workload is technology, not value. Third, ask to see the AI team's DORA dashboard for the last six months — if the dashboard does not exist, the disciplined delivery layer is not in place and the value-creation thesis on AI in the hold period should be discounted. A clean answer on the three questions adds turns of EBITDA visibility; an unclean answer should reprice the asset.

Situation 4 — post-merger integration. Two AI portfolios, almost always, with overlapping use cases and disconnected operating models. The temptation is to merge tooling. The right move is to merge the layer. One business owner per consolidated use case. One operating-model owner per integrated function. One governance cadence across the new perimeter, mapped to Article 26 of the EU AI Act if either entity operates in the European market. The model and tooling questions can wait six months; the layer cannot.

Three questions cut through a vendor pitch faster than any RFP scorecard. "Show me the named operator who will own this use case on Monday." "Show me the engineering manager on the client side who will still be running this in two years." "Show me the AI-team DORA numbers from your last engagement." If a firm cannot answer all three, what you are buying is enablement, not delivery.

Where Consulting Huber fits

Consulting Huber is a practitioner firm. We do not compete on the SAFe-certified bench size of a Big Four, the global delivery footprint of an MBB, or the volume of named flagship cases that comes with a thousand-consultant payroll. We compete on the opposite problem: CEOs, boards, transformation officers and PE operators who want the delivery layer of a large firm, delivered directly by senior practitioners, with the capability transferred to the client's own engineering management by the end of the engagement.

In practice that means: a named business owner identified for each use case before code is written; DORA metrics installed on the AI delivery team in the first six weeks; the pilot squad placed in the product area where leadership actually cares; the engineering manager who will run it after we leave named on day one and mentored across the engagement; a governance cadence that puts exec, squad and risk in the same room every week, every month and every quarter. The model is not platform lock-in. It is the opposite. We work to be unnecessary by the end of the engagement, and we let the client keep the right to fire us at the end of any cycle. The full shape of that work — engineering discipline, team design, delivery metrics — sits in our agile engineering and delivery practice.

If you are an operator looking at one of the four buyer situations above and want a direct conversation about how the layer would land in your specific case, the calendar link below is the fastest way to start.

Sources consulted

The delivery-layer foundations

Forsgren, Humble & Kim, Accelerate: The Science of Lean Software and DevOps (IT Revolution, 2018; 2nd ed. 2025), ISBN 978-1-942788-33-1, itrevolution.com/product/accelerate · Google DORA, Accelerate State of DevOps 2024 · Google DORA, 2025 State of AI-Assisted Software Development · Skelton & Pais, Team Topologies (IT Revolution, 2019; 2nd ed. 2025), ISBN 978-1-942788-81-2, itrevolution.com/product/team-topologies · Matthew Skelton, Team Topologies as the Infrastructure for Agency with Humans and AI, QCon London, March 2026 · Fournier, The Manager's Path (O'Reilly, 2017), ISBN 978-1-491973-89-9 · Larson, An Elegant Puzzle (Stripe Press, 2019), ISBN 978-1-732265-18-9 · Cagan & Jones, Empowered (Wiley, 2020), ISBN 978-1-119691-29-7 · CNCF, Platforms White Paper and Platform Engineering Maturity Model · Humanitec, State of Platform Engineering Vol. 3 (2024).

Pilot-to-P&L and the failure-rate evidence base

McKinsey QuantumBlack, State of AI: How Organizations Are Rewiring to Capture Value (March 2025, n=1,993) · BCG, AI Radar 2025: Closing the AI Impact Gap (n=1,803) · BCG, The Widening AI Value Gap (September 2025) · IBM Institute for Business Value, 2025 CEO Study (May 2025, n=2,000) · Deloitte, State of Generative AI in the Enterprise Q4 2024 (n=2,773) · MIT NANDA, The GenAI Divide: State of AI in Business 2025 (July 2025, used as directional bracket alongside McKinsey) · RAND Corporation, Why AI Projects Fail and How They Can Succeed (August 2024).

Regulation

European Parliament and Council, Regulation (EU) 2024/1689 (the AI Act) · artificialintelligenceact.eu article-level explorer · Articles 5, 6, 9, 10, 12, 13, 14, 26, 50, 51, 53, 55, 99, 113 and Annex III cited in §2.

Failure cases (primary sources)

Moffatt v. Air Canada, 2024 BCCRT 149 — CanLII full ruling · EEOC v. iTutorGroup, Inc., 1:22-cv-02565 (E.D.N.Y.) — EEOC press release and case record · The Markup, "NYC's AI Chatbot Tells Businesses to Break the Law" (March 2024) · Klarna press release, "AI assistant handles two-thirds of customer service chats" (Feb 2024) · Fortune, "Klarna AI humans return on investment" (May 2025) · Restaurant Dive, "McDonald's ends IBM drive-thru voice order test" (June 2024) · NPR, NEDA Tessa coverage (June 2023) · Fox Business, DPD chatbot coverage (January 2024).

Success cases (primary sources)

ING: Computer Weekly, "How ING reaps benefits of centralising AI"; McKinsey, interview with COO Marnix van Stiphout. BBVA: BBVA-OpenAI strategic alliance announcement (Dec 2025); Bloomberg coverage. JPMorgan: 2025 Annual Report; McKinsey interview with CAO Derek Waldron. Goldman Sachs: CNBC firmwide launch (January 2025); Fortune interview with CIO Marco Argenti. GitHub Copilot: Peng et al., arXiv:2302.06590 (February 2023); Communications of the ACM (March 2024); Mercado Libre customer story. BloombergGPT: Wu et al., arXiv:2303.17564 (March 2023); Bloomberg press release.

Book a 30-min call Or send a brief on your situation