AI didn’t make our engineers just faster. It made them different
Hryhorii Tatsyi17 min read·Just now--
A Twelve-month field study of AI at a European bank
Internal study, ~900-engineer IT organization, May 2025 — April 2026. Hryhorii Tatsyi, CTO, Raiffeisen Bank Ukraine.
In December I wrote about reaching 62% adoption. This piece looks at what’s behind that number — and what changed in a way no adoption percentage can capture.
The headline I’d lead with on a slide
Over twelve months our IT team shrank by 75 people — and 64 of them were engineers, the people who push code daily. On a smaller engineer base we shipped more code, fewer incidents, better security, and at least seven products that didn’t exist before.
None of that should happen on paper.
That’s the finding, not the methodology. The rest of this piece is the answer to the question I had to answer for myself first: what does this actually mean about how the organization works now?
If you read nothing else, take this: stop measuring AI as percentage gains in metrics you already had. The sharpest signal is what your engineers built that didn’t exist before.
Why one number isn’t enough
Our headline metric is real. The Copilot Enterprise license park roughly doubled. Engagement rose from 62% to 83%. 68% of engineers now get at least half of their code with AI assistance. These are correct numbers. We achieved them. They look right on a slide.
The question is what they mean for the work of the organization.
“83% of engineers use AI” and “AI changed how our IT works” are different statements. The first is necessary but incomplete. The second is what costs millions of euros over five years.
I tried single-percent answers first. “AI users ship 16 PRs a month, non-AI ship 6.” “Cycle time dropped 20%.” They look good. None survives a basic question about selection bias, baselines, or parallel changes. At enterprise scale, one number can’t honestly answer “what changed in how the organization works.”
So I sat down with three weeks of raw data — every Jira changelog, every GitHub PR and review, every Copilot seat record, every incident — and looked for findings that survive scrutiny. The answer comes in three layers, plus one half that matters more than all the metrics combined.
The cohort: an IT team in motion
Twelve months of AI rollout, and four things happened simultaneously to the IT team.
The team got smaller — 75 net people, around 8% — and a quarter of headcount cycled through. But the contraction wasn’t uniform. It was concentrated almost entirely on engineers.
Without AI, new hires couldn’t have replaced leavers fast enough, and throughput would have dropped. More importantly, we couldn’t have absorbed a ~9% reduction in the engineer base without losing productivity.
I measured what “new hires productive faster” actually means. Time to first PR for 37 new coding engineers vs. 157 hired in 2023–2024 told a clean story.
AI didn’t make fast engineers faster. The fast ones were already fast. AI lifted the people who used to take a long time to get warm — up to baseline.
Before 2024 we had engineers taking 60–90 days to their first PR — normal for a bank with an unfamiliar stack, internal SDKs, and compliance gates. Among new hires those cases have nearly disappeared.
In December I cited Anthropic research: time to tenth PR for new engineers using AI fell from 86 days to 39. Ours: 82 to 40. Almost identical, measured independently. This is not unique to one bank — it’s a repeating industry pattern.
These are new hires. Senior engineers transformed in a different way (more on that below). Together the two effects more than absorbed the contraction. This isn’t “AI grew our headcount.” It’s “AI made a smaller, more productive team possible.” For a bank in 2026 — technology demand rising, the engineering labor market tightening — that may matter more than direct feature acceleration.
Findings I — Output, incidents, security, debt
One thing before the numbers. None of these changes was made by AI. The team did the work — engineers, CloudOps, product leads, security, the postmortem culture built over years. What AI did was make specific actions cheap that used to be too expensive: close an old CVE, add missing tests, update an architecture diagram, follow a postmortem action through, parse a production log. When those actions got cheap, the team chose to do them. Read every number below as “the team did X because AI made Y cheaper” — not “AI did X.”
The cleanest cohort comparison: AI users vs. their non-AI peers
The corporate proxy gives me a clean classification. Any traffic to claude.ai / api.anthropic.com (including personal subscriptions) or an active Copilot seat → AI cohort. Non-AI peers had zero AI traffic for the entire period. Same projects, same backlog, same definition of done.
Twenty-one percentage points isn’t a productivity bonus. It’s the kind of work each cohort took on. The next finding shows what shifted.
Incidents fell. Resolution times shrank.
In a bank, incidents aren’t a secondary metric. A blocker-class production incident costs more than a month of team capacity on new features. And while feature velocity rose, stability and security improved at the same time.
This is team work — postmortem culture, SRE practice, engineers keeping the landscape running every day. AI made cheap the actions that had been accumulating in the backlog for decades.
- Postmortem culture predates AI. What changed: PIR action items (“add monitoring,” “cover this case in tests,” “rewrite that old service”) used to stall in the backlog. They’re now closed in batches. That’s where the −70% on blockers comes from.
- The freed capacity went into stability, not just features. Feature velocity didn’t slip, the additional capacity reinforced the existing platform.
- Engineers expanded coverage with monitoring, tests, logs. Reading old code with AI is an order of magnitude easier, adding an alert or trace is an evening’s work, not a week’s.
- SRE diagnoses critical incidents faster. One tool: a Kubernetes AI Support agent in DevPortal (more on that below). It walks through pods, deployments, logs, ingresses, and events in about 30 seconds and gives a first working hypothesis. The difference between “you dig for hours” and “you have direction in two minutes.” That’s where the −68% on critical resolution time lives.
The −82% on new secrets in code comes from the Shift-left Security Plugin: it catches them in the IDE before commit. The +155% on high-severity closures — Claude reads the changelog, runs the tests, suggests adaptive fixes. Long-standing piles of Dependabot alerts get drained in batches.
AI helps repay debt we’d been carrying for years
Why so much weight on incidents, security, and architecture diagrams kept in sync with code, instead of a heroic feature-acceleration number?
Because of where a ticket actually spends its life.
This reframes the question. The backlog isn’t inefficiency. It’s a signal that demand exceeds team capacity. Speeding up active coding by 50% gives only 4.5% on a single ticket’s cycle time. But it lifts throughput by roughly the same amount, and throughput is what eventually drains the queue. AI works on two levels:
- Cycle time per ticket — accelerates every execution phase, not only coding: code review, QA, discovery, dependency updates. Together about 35% of ticket life. AI helps in each.
- Throughput and decision-making — closing a ticket as no-longer-relevant becomes cheaper, moving one through the pipeline becomes faster. That’s where the −48% on cancelled resolutions comes from: less noise, more real decisions.
The next step is therefore clear: adapt the entire product development lifecycle for AI — not only coding, but discovery, refinement, review, QA, deployment, postmortems. The question is no longer “will it accelerate?” but “how do we redesign each phase so AI is the load-bearing structure, not an add-on?”
The strategic bet
In one to two years, the industry will mature to a formal AI SDLC — a new wave of transformations where discovery phase,code review, QA, deployment, ownership models, and security processes all need to be rewritten. That will be expensive. I’m deliberately not running into it now. First, while the AI capacity dividend lasts, the old debt has to come off — otherwise we walk into the AI SDLC era carrying the same accumulation that drowned European banks in the 2010s during cloud migration.
Repay 10–15 years of debt in one year, without losing delivery speed, and arrive at the next, harder transformation in a position that doesn’t require an emergency restructuring.
No dashboard captures this, on a strategic horizon, this will be the main result.
What ties it together
Reduced to one line: AI expanded our production possibility frontier, and we deliberately allocated the freed capacity — partly into feature velocity, partly into stability, partly into security and debt repayment.
This is not a trade-off magic erased. It’s capacity we allocated deliberately. We could have funneled all the freed capacity into features and shown an even higher percentage. That’s a short game as a bank, we don’t play it.
Findings II — The AI stack shapes the form of work
Common assumption: “Copilot makes an engineer N% faster.” That’s half-true — and the half that misleads. The shift isn’t they get faster at the same work. It’s they take a different kind of work. The kind of work, not the speed, is what determines AI’s strategic value.
I compared engineers within the same role across five dimensions — PR count, story points, reviews, repository breadth, cycle time — and split them into cohorts. Three archetypes show up cleanly.
Copilot only writes code in the same radius, just faster. PR count up 10–25%, story points roughly flat vs. no-AI, repository breadth in the same radius. Accelerates motion, doesn’t change direction. This is the baseline Copilot Enterprise effect.
Multi-tool closes much larger units of work — cross-repo, cross-domain — that previously required cross-team coordination or didn’t fit a sprint at all. Story points 1.5–3× higher than no-AI, repository breadth +50–80%, longer cycle time. An engineer who can now take work that used to require a team.
Claude over the corporate stack — the DevOps case. PR count similar to Copilot-only, but story points multiply and repository breadth expands radically. Doesn’t ship more — ships at a different scale, where cycle time is longer and value-per-PR is multiplicatively higher.
Findings III — Senior engineers who transformed
AI doesn’t lift one number. It lifts different numbers for different engineers.
Lead engineer on the IBM stack, nine years at the bank. PR count grew roughly 4×, the breadth of the codebase he contributed to also grew about 4×, reviews doubled. Same engineer now reviews and ships at four times the radius. The senior with one of the longest tenures in my dataset showed the largest movement — which kills the “AI is for juniors” narrative outright.
Data engineer, five years on the team. PR count fell 60%, volume of code grew 4.5×. Stopped patching schemas, started rewriting pipelines. By PR count, regression. By scope and type of work, one of the biggest gains in the data.
Architect with the longest tenure in the dataset. Less than a handful of PRs in the six months before AI → dozens in the six months after. A senior who had drifted away from active development for years returned to the codebase. This pattern repeats across about twenty engineers with 5, 10, 20+ years of tenure.
PR count as the only metric would have flagged the data engineer as a regression and missed the architect’s reactivation entirely. Reading “did productivity go up?” through a single metric is the most expensive measurement mistake on the list.
In honesty: these are three vivid stories from the dataset. There are also seniors who didn’t move — AI doesn’t act uniformly on everyone. I’m not claiming every senior transforms. I’m claiming that for those who did, the form of work changed radically — and no single-number metric would have shown it.
Findings IV — Seven things that didn’t exist a year ago
The strategically most interesting half of AI’s impact: it allows things to be built that otherwise wouldn’t be. Seven products at our bank illustrate it.
Each card above is one product. Below — what they actually do, and why each one wouldn’t have shipped a year ago.
Service Knowledge Hub — a knowledge base that documents itself
A developer in one of our product departments was reorganizing a platform of 57 BFF microservices. Questions like “what does this service depend on, which Vault secrets does it need, how critical is it?” used to take an hour bouncing between Confluence, Kubernetes YAML, and MS Teams.
He built an internal tool that parses Kubernetes manifests, federates 3,105 objects from a Jira Service Catalog, and produces wiki documentation per BFF. Live preview, Excel/CSV export, and an SQL Knowledge Base REPL with natural-language queries:
> show all mission-critical Cards services!
→ SELECT name, tech_lead FROM sc_full
WHERE domain='Cards' AND criticality='Mission Critical';By April 2026: 83 tagged releases per month. In a “shippers” metric this looks flat (1 PR pre, 7 PR post). In reality: a 3,000-object knowledge platform managing 57 production microservices. One engineer became a one-person platform team on a few-tens-of-dollars-a-month subscription.
Mobile Android — a workflow redesigned around AI
The Mobile Android tribe formalized an AI-driven flow. An AI label on a Jira ticket triggers a CI Plan job that produces solution-design.md, technical-plan.md, code-map.md, ticket-summary.md. The developer reviews, marks "Ready for review," and CI launches an Implementation job — specialized sub-agents generate code. Optional QA testing and Request changes labels move the ticket forward or send AI back to address PR comments.
This is process redesign, not individual habit. Six months ago, formalizing CI artifacts through plan/implement/review/QA would have been a multi-quarter initiative requiring DevOps, QA, and platform team buy-in. The tribe did it in a single sprint cycle, on its own initiative.
AI Agent Portal — for everyone in the bank who isn’t an engineer
Internal AI Agent Portal: 2,085 registered users, 649 monthly active, 187 published agents, 70–110 daily users. From an empty repository to a bank-wide product in 87 days.
Why this was even possible. Not “one engineer with an AI subscription.” It’s a dividend from years of investment in platform and governance: an LLM Gateway through which every model request passes against corporate rules; an existing SSO and DLP stack without which security wouldn’t have approved production; an OpenAPI culture maintained by service teams for years; a base AI portal infrastructure built by other teams long before this product began. Without that layer the portal would be impossible regardless of how many AI tools you stacked on top.
Plus dozens of people who handled edge cases, supported users, tested. Like Service Knowledge Hub, like Mobile Android — a team story in which AI made certain actions cheap, not a solo act.
The architecturally most important feature: automatic agent connection to any API via an OpenAPI spec. The user provides a spec URL → the portal generates an MCP tool. Zero integration code. This shifts the twenty-year construction of “API → UI → user” into “API → agent → user” — for most internal use cases, six months of development become one hour. That’s a separate architectural thesis worth its own piece, I’ll write it.
What AI couldn’t do in the Portal: authentication and authorization. Integration into the bank’s identity stack requires people who know Keycloak and compliance. Pre-AI, this product wouldn’t have shipped — it would have required negotiation with a team that has its own roadmap. AI removed the negotiation.
Shift-left Security Plugin — vulnerabilities closed before commit
An internal AI Security plugin — a set of AI skills running in the dev environment that analyzes code for vulnerabilities before a commit exists. Classic shift-left in 2025 means “we moved SAST onto CI.” Our plugin shifts security validation further left, to the moment of code-writing in the IDE. The vulnerability never reaches git history. Time-to-fix drops from hours/days to minutes. Compliance audit gets a cleaner code history with no commits titled “fix vulnerability XYZ.”
The plugin runs in production. What used to require expensive contracts with external SAST vendors, an engineer in our AI culture built on the same Claude/Copilot stack. It’s one of the drivers behind the −82% on new secrets in code.
DevPortal — Backstage with AI agents
The infrastructure team is building Backstage as our internal developer platform. By April 2026: a service catalog, deployment dashboards, and two AI agents in active use. Documentation search agent answers questions across internal docs. Kubernetes AI Support diagnoses production services: hover over any service in any prod environment — in about 30 seconds it walks through pods, deployments, logs, ingresses, events, and gives a first working hypothesis. One of the tools through which SRE pulled critical resolution time down by 68%.
Three of the platform’s top-10 contributors are engineers actively using Claude on top of (not instead of) the corporate stack. Without them, half the platform isn’t built. This layer of work didn’t replace the corporate team — it joined and built what the corporate team didn’t have capacity for.
DRAIF MCP — text-to-SQL for the Data Lake, on a model trained on our data
Our Data Lake holds 10,000 tables across 200 sources, 500 terabytes. A continuous stream of business questions hits it daily. A non-trivial question is complex SQL — dozens of joins and aggregations. Until this year that meant a ticket to a data analyst who hand-wrote and tuned every query. Off-the-shelf text-to-SQL tools didn’t solve it: they didn’t understand scd2_customer_dim or "curated layer," and banking abbreviations like KYC or DPD were noise.
The Data Lake team built an MCP server (Model Context Protocol — Anthropic’s standard for connecting external services to LLMs) with eight tools, plugged into Claude Code, GitHub Copilot, Cursor — any MCP-capable AI client. The user asks in their own words; the agent calls the MCP for relevant tables, pulls schemas, sees field relationships, validates the generated SQL, returns a working query. Without MCP, the LLM hallucinates plausible SQL that doesn’t run. With MCP it produces working queries on our real data, even queries spanning hundreds of lines of SQL.
Why this works specifically here. We fine-tuned the embedding model — the component that retrieves relevant tables for a query — on our own metadata via AWS SageMaker. By quality metrics, it beats OpenAI’s embedding model by 2×. A graph of relationships across ~2,000 of our most-used tables lets the agent stitch data across sources without asking the user what they probably don’t know.
The distance from “I have a question about the data” to “I have an answer” shrank from a ticket-and-queue to a chat in Claude Code. Dozens of staff use DRAIF MCP daily — business and data engineers both. Hundreds of working-hours saved per month.
Call Evaluation — the product that won the group vote against all competitors
The sales and service channel generates thousands of calls a day. They used to be sampled by supervisors — a few percent of the traffic, manual evaluation, human bias, zero feedback into scripts. The rest passed any analytics by.
We built Call Evaluation — a product inside our perimeter that closes the full loop. It transcribes audio (STT accuracy >97%), analyzes every dialog against defined criteria (communication quality, script compliance, NPS triggers, cross-sell potential — evaluation precision 90%), gives recommendations to supervisors and individual operators, and — most importantly — the team uses the aggregated analysis to redesign the scripts themselves. Real-conversation data feeds back into communication design instead of sitting in reports. The platform is Vertex AI on Google Cloud, deliberately built without vendor lock-in: model T1-validated, the LLM is replaceable, integration with banking systems is deep.
Each country in our group has its own call-analysis product — some built in-house, some bought from vendors. This year the group ran a vote among collections teams across all countries: which product would you want at home? Our Call Evaluation got the most votes — more than any other product, including ones built over years with bigger budgets and bigger teams.
This is not “we made something not worse than a vendor.” It’s a product the countries in our group want to buy from us, built by our own team on top of the AI stack in a timeframe that would have been unrealistic a year ago. Pre-AI, we would have entered the competition with a vendor contract on someone else’s SaaS — their models, their roadmap, our data in their cloud — and couldn’t have won, because we’d have had nothing to put on the table. Now: a product that has become an export.
And these are only the ones I chose to unpack
Seven is the floor, not the ceiling. Outside this list, more products are already running or near production: Business Context Engine (an internal layer giving AI agents context about our products, domains, and processes — separate piece coming); Business Tinder, which compliance platform+business built for its own needs; several others in pilot. I picked seven where I can show concrete numbers without revealing what hasn’t passed internal review yet. Engineers at the bank are building noticeably more on top of the AI stack than this article lists. That is the difference between “AI as an individual productivity tool” and “AI as a platform for building new things.”
Discussion
In December I closed with: “The future doesn’t belong to those who adopt AI fastest. It belongs to those who build a culture of working with it every day.”
I still believe that. Six months later I’d add one more thing.
Stop measuring AI impact only as percentage gains in metrics you already had. The sharpest signal is what your engineers built that didn’t exist before.
A dashboard will show how an old set of numbers moves. A list — “things shipped in the last year that wouldn’t have existed a year ago” — will show what AI actually does in your organization. Most CTOs in 2026 publish the first. Almost none publish the second.
I want more people doing this in a year. And I want all of us to enter the formal AI SDLC era five years from now without an accumulated mountain of debt — with a culture already used to working with AI every day, and teams capable of taking on work that used to require teams.
I know we’ll get there. I see it already: a team smaller than a year ago shipping more; engineers who were drowning in legacy rewriting it in an evening; products that used to require six months of planning showing up in days. That’s the foundation of a bank that doesn’t just adopt AI — it shapes how AI is used in financial services.
Methods
- Pre window: May 1 — November 1, 2025. Post window: November 1, 2025 — May 1, 2026.
- Story points: Jira, same projects in both windows.
- AI users: non-zero proxy traffic to
claude.ai/api.anthropic.com, or an active Copilot Enterprise seat. Non-AI peers: zero proxy or seat traces for the entire period. - Incidents: INC-project tickets, resolution time =
created→resolutiondate. - Engineer transformations: per-individual pre/post on PR count, code volume, repository breadth (GHE GraphQL), reviews.
- Security alerts: org-level GitHub Enterprise, secret-scanning resolved + Dependabot fixed/dismissed.
- Phase-time distribution: Jira changelog, largest retail project, ~1,200+ resolved tickets across twelve months.
- TTFPR: GHE GraphQL search (PRs by author, sorted ascending by
createdAt). Pre cohort: 157 engineers hired in 2023–2024 in the IT business line on coding roles. Post cohort: 37 coding engineers hired May 2025 – April 2026.
Limitations & open questions
- No formal Difference-in-Differences specification with pre-period for both groups. That’s the next step.
- Defect rate on AI-touched PRs specifically is not yet measured. It should be, and it will be.
- Perception-survey cross-check is missing — METR’s 2025 study showed engineers were off by up to 40 points on whether AI had actually accelerated them.
- Some Findings II buckets are small (n=2–3 in places). Treated as pattern illustration, not statistically significant.
- The seven products in Findings IV don’t depend on any of the above. They shipped or didn’t.
Hryhorii Tatsyi · CTO, Raiffeisen Bank Ukraine · May 2026
Have questions I didn’t answer? Find me on LinkedIn.