I’ve been inconsistent with my writing here. Some weeks I publish, other weeks... nothing. Not because I lost interest in procurement, or ran out of things to say. I’ve been building instead of just talking about building.

Over the past several months, I’ve been building AI agents that review contracts, onboard vendors, and assess third-party risk. Not hypothetical agents. Not demo agents. Real systems processing real documents for real procurement teams. And what I’ve learned is this: we’re not talking about the future of procurement anymore. We’re living in it. Right now. Today.

The question isn’t whether AI will transform procurement. It’s whether you understand what’s already happening well enough to position yourself, and your team, for what comes next.

The Inflection Point No One Saw Coming

I’ve been optimistic about AI in procurement for years. Not because I’m easily impressed by technology hype (I’ve seen enough failed implementations to be appropriately sceptical), but because the underlying economics made sense. Procurement work involves massive volumes of structured and semi-structured data, clear decision frameworks, and repeatable processes. Perfect conditions for automation.

But optimism and reality are different things. For years, the technology simply wasn’t ready. AI tools in procurement were either:

  1. Glorified search engines that could find information but not act on it

  2. Rules-based automation dressed up with “AI” marketing labels

  3. Interesting demos that fell apart when confronted with real-world complexity

Then, in late 2024 and early 2025, something fundamental shifted.

The Technical Breakthrough

The shift wasn’t about one model or one company. It was about crossing a threshold of capability that made genuinely useful AI agents possible. Let me break down what actually changed, with real data:

Context Windows Expanded Dramatically

When ChatGPT launched in late 2022, it could handle roughly 4,000 tokens. By early 2024, models like Claude 3 offered 200,000 tokens. As of 2025, Claude 4 and Gemini 2 can process up to 1 million tokens in their context windows.

Practical impact: An AI agent can now review an entire vendor contract portfolio (dozens of contracts), understand your organisation’s complete contract standards documentation, cross-reference supplier policies, and make contextually-informed decisions. Not decisions based on fragments of information. The difference between 4,000 tokens and 1 million tokens is the difference between reviewing a single contract summary and analysing an entire procurement function’s documentation simultaneously.

Hallucination Rates Dropped Dramatically

In early 2024, even advanced models hallucinated (made up facts or misrepresented information) at concerning rates. A 2024 medical study found GPT-4 hallucinated 28.6% of academic references it generated.

By December 2024, the landscape changed drastically. According to Vectara’s hallucination leaderboard, Google’s Gemini 2.0-Flash achieved a 0.7% hallucination rate, while GPT-4o showed 1.5%, and Claude 3.5 Sonnet demonstrated 4.6%. All of these represented massive improvements from just months earlier.

More importantly, some models showed up to a 64% reduction in hallucination rates during 2025. When combined with Retrieval-Augmented Generation (RAG), where the AI references specific source documents, hallucination rates dropped by an additional 71%.

[INSERT IMAGE 2: Hallucination Rate Improvements (2024-2025)] Bar chart showing: GPT-4 (Early 2024): 28.6%, GPT-4o (Late 2024): 1.5%, Gemini 2.0-Flash (2025): 0.7% Caption: Source: Vectara Hallucination Leaderboard, Journal of Medical Internet Research. 64% average reduction in 2025 | 71% further reduction with RAG

What this means practically: You can now deploy AI agents for contract review and risk assessment with confidence that they’re extracting accurate information, not fabricating clauses or misrepresenting terms. The reliability crossed the threshold where oversight shifts from “constant verification” to “exception handling.”

Accuracy Reached Production Standards

For contract data extraction specifically, modern AI systems achieve accuracy rates above 95% for structured documents. At Gatekeeper, our LuminIQ agents are currently achieving 98% accuracy in clause extraction, exceeding human performance on many contract analysis tasks.

This isn’t hypothetical. These are production systems processing real contracts for real organisations right now.

I was at SaaStr’s AI event in London in December 2024. Artisan (probably the best-known AI SDR, or Sales Development Rep, company) mentioned something revealing: their product only started actually working recently. Not “working better” or “improving.” Actually working. Delivering value. Producing results customers would pay for.

That comment crystallised something I’d been experiencing in my own work. The agents I’d been building for months suddenly started behaving less like clever chatbots and more like competent junior analysts. They understood instructions. They handled edge cases. They knew when to escalate to humans instead of guessing.

The technology was finally ready.

What “Ready” Actually Means in Procurement

Let me ground this in specifics, because “AI is ready” means nothing without concrete applications.

At Gatekeeper, I work on product marketing and sit on our AI Council. We launched LuminIQ, an AI workforce for procurement, risk, and compliance teams, in late 2024. I’ve spent months testing what these agents can actually do versus what we hoped they could do.

Here’s what “ready” looks like in practice:

Contract Review Agent: Real Performance Data

Traditional Process:

  • Senior analyst reviews 50-page software vendor contract

  • Identifies key commercial terms, risk clauses, compliance requirements

  • Cross-references against company standards

  • Documents findings and recommendations

  • Time required: 2-4 hours

  • Consistency: Varies by analyst experience and current workload

AI Agent Process:

  • Agent ingests same 50-page contract

  • Extracts all commercial terms with source citations

  • Flags deviations from company standards

  • Identifies regulatory compliance gaps

  • Highlights unusual or concerning language

  • Generates structured summary and risk assessment

  • Time required: 3-5 minutes

  • Consistency: Identical methodology applied every time

  • Accuracy: 98% in production (Gatekeeper LuminIQ)

The time savings are obvious. What’s less obvious but more important: the agent doesn’t get tired, doesn’t miss things because it’s Friday afternoon, doesn’t overlook a problematic clause buried on page 38 because it’s rushing to finish before a meeting. And with 98% accuracy on structured extraction tasks, the reliability is production-ready.

But here’s what the agent doesn’t do: make the final decision. It identifies issues, quantifies risks, and surfaces options. A human still decides whether a liability cap is acceptable, whether to negotiate a particular clause, and whether the overall risk profile fits the organisation’s tolerance.

This is the crucial distinction. We’re not replacing procurement professionals. We’re eliminating the exhausting, time-consuming work that prevents them from doing strategic thinking.

Vendor Onboarding: Before and After

Traditional vendor onboarding at most mid-sized organisations:

  1. Request documents (insurance, financial statements, compliance certifications, etc.)

  2. Follow up repeatedly as vendors send incomplete packages

  3. Manually review each document for completeness and compliance

  4. Cross-reference against internal requirements

  5. Conduct risk assessment

  6. Route through approval workflows

  7. Update systems and notify stakeholders

Total time: 40-60 hours spread across 2-4 weeks Bottlenecks: Document collection, manual review, waiting for approvals Common failures: Missing documents, expired certificates, incomplete risk assessment

AI-powered vendor onboarding:

  1. Agent sends document requests with clear requirements

  2. Agent automatically verifies document completeness upon receipt

  3. Agent extracts key data points and assesses against requirements

  4. Agent flags risks and non-compliance issues

  5. Agent routes to appropriate approvers with full context

  6. Agent updates systems and triggers notifications

Total time: 4-8 hours, mostly waiting for vendor responses and approvals Bottlenecks: Vendor response time, approval decisions Common failures: Significantly reduced. Agents catch missing/expired documents immediately

The 40-60 hour process becomes 4-8 hours. More importantly, those 4-8 hours are focused on judgment and decision-making, not administrative drudgery.

Why This Time Is Different

I’ve watched multiple “AI will transform procurement” hype cycles come and go. Why is this different?

1. The Technology Actually Works Now

Previous generations of AI in procurement:

  • Required extensive training data specific to your organisation

  • Broke on edge cases

  • Needed constant human correction

  • Never quite delivered the promised ROI

Current generation:

  • Works out of the box with minimal configuration

  • Handles edge cases reasonably or escalates appropriately

  • Improves with feedback but doesn’t require it to function

  • Delivers measurable ROI in weeks, not months

2. The Cost-Benefit Equation Flipped

2022: Implementing AI procurement tools

  • Setup cost: $50K-150K

  • Training time: 3-6 months

  • Ongoing maintenance: Significant

  • Time to value: 9-12 months

  • Unclear ROI

2025: Implementing AI agent platforms

  • Setup cost: $10K-30K (often less)

  • Training time: Days to weeks

  • Ongoing maintenance: Minimal

  • Time to value: Weeks

  • Clear, measurable ROI

The barrier to entry dropped by an order of magnitude while the capability increased dramatically.

3. The Talent Pool Understands This Now

In 2022, if you wanted to implement AI in procurement, you needed:

  • Data scientists

  • ML engineers

  • Integration specialists

  • Extensive IT support

In 2025, you need:

  • Someone who understands procurement processes

  • Basic technical literacy

  • Willingness to experiment

The skill gap closed. Procurement professionals can now implement and manage AI agents without becoming technical experts. You need to understand what the agents can do and how to oversee them, not how they work under the hood.

This democratization matters. It means adoption will accelerate rapidly because the technology isn’t confined to organizations with deep technical resources.

What This Means for Your Career

Let me be direct about something: procurement roles are changing. Not disappearing, but changing. And professionals who understand how to work with AI agents will be significantly more valuable than those who don’t.

Here’s the uncomfortable truth: if your primary value is reviewing contracts, extracting data from documents, conducting standard risk assessments, or managing routine vendor interactions, your role is increasingly automatable. Not someday. Now.

But here’s the opportunity: if you can:

  • Design agent workflows

  • Interpret agent outputs and make decisions

  • Identify which processes benefit from agent automation

  • Oversee agent performance and improve their operation

  • Focus on strategic supplier relationships, complex negotiations, and policy development

...then you become more valuable, not less.

Think about it this way: a procurement pro who can oversee 10 AI agents doing the work of 40 people is worth far more than someone who does the work of one person really well.

The learning curve is real. Understanding agent capabilities, knowing when to trust their outputs, and recognising their limitations are all new skills. But they’re learnable skills. And learning them now, while most of your peers are still debating whether this matters, gives you a significant advantage.

Why I’m Committing to Consistency

I’ve been publishing sporadically because I wanted to make sure what I was saying was grounded in reality, not speculation. But the technology has crossed a threshold. The things I’m testing and building aren’t experiments anymore. They’re production systems delivering measurable results.

It’s time to document what’s actually working, systematically and consistently.

Over the next few months, I’ll be documenting what I’m learning:

  • What AI agents can actually do in procurement (with real demonstrations)

  • Where they fail and why

  • How to implement them effectively

  • What this means for procurement careers and organisations

  • The uncomfortable truths vendors won’t tell you

This isn’t theoretical. Everything I write about, I’m building or testing. Every claim about capability, I’ll back with demonstrations. Every assertion about ROI, I’ll support with data.

My goal isn’t to convince you that AI agents are perfect (they’re not). It’s to help you understand what’s actually possible now so you can make informed decisions about your team, your organisation, and your career.

Because here’s what I know: the procurement professionals who understand this technology will shape the next decade of our profession. Those who don’t will spend the next decade reacting to changes they didn’t see coming.

What’s Next

In my next post, I’ll answer a fundamental question: What actually is procurement in 2026? Because the answer to that question is fundamentally different than it was 24 months ago.

I’ll also be creating video content for my World of Procurement YouTube channel, showing real AI agents working on real procurement tasks. Subscribe there if you want to see what I’m describing here in action.

The inflection point happened. The technology is ready. The question isn’t whether this will transform procurement. It’s whether you’ll be leading that transformation or reacting to it.

What questions do you have about AI agents in procurement? What would you like me to cover in future posts? Drop a comment below. I read and respond to all of them.

If you found this valuable, share it with your procurement network. They’ll thank you later.

Sources & References

Context Windows & Model Capabilities

  1. IBM - What is a context window? (November 2024)https://www.ibm.com/think/topics/context-windowComprehensive overview of context window evolution from GPT-3 (4K tokens) through Claude 4 and Gemini 2 (1M tokens)

  2. Anthropic - Claude Documentation: Context Windows (2025)https://docs.claude.com/en/docs/build-with-claude/context-windowsTechnical specifications for Claude’s 1M token context window capability

  3. Skywork.ai - Claude 4.5 Context Length & Extended Memory Explained (September 2025)https://skywork.ai/blog/claude-4-5-context-length-extended-memory/Detailed analysis of Claude 4’s context window capabilities and practical applications

  4. Codingscape - LLMs with Largest Context Windows (September 2025)https://codingscape.com/blog/llms-with-largest-context-windowsComparative analysis of context windows across major LLM providers

Hallucination Rates & Reliability

  1. AllAboutAI - AI Hallucination Report 2026: Which AI Hallucinates the Most? (December 2025)https://www.allaboutai.com/resources/ai-statistics/ai-hallucinations/Comprehensive hallucination statistics showing Gemini 2.0-Flash at 0.7%, GPT-4o at 1.5%, up to 64% reduction rates in 2025

  2. Vectara - Introducing the Next Generation of Vectara’s Hallucination Leaderboard (May 2025)https://www.vectara.com/blog/introducing-the-next-generation-of-vectaras-hallucination-leaderboardIndustry-standard hallucination benchmarking showing dramatic improvements in frontier models

  3. Lakera - LLM Hallucinations in 2025: How to Understand and Tackle AI’s Most Persistent Quirk (2025)https://www.lakera.ai/blog/guide-to-hallucinations-in-large-language-modelsResearch on RAG reducing hallucinations by 71% and prompt-based mitigation cutting GPT-4o’s rate from 53% to 23%

  4. Journal of Medical Internet Research - Hallucination Rates and Reference Accuracy of ChatGPT and Bard (May 2024)https://www.jmir.org/2024/1/e53164/Peer-reviewed study showing GPT-4 hallucination rate of 28.6% in early 2024, establishing baseline for comparison

  5. Nature npj Digital Medicine - Framework to assess clinical safety and hallucination rates of LLMs (May 2025)https://www.nature.com/articles/s41746-025-01670-7Medical research demonstrating sub-human clinical error rates achievable through careful engineering

Contract Analysis & Extraction Accuracy

  1. V7 Labs - AI Document Analysis: The Complete Guide for 2025 (2025)https://www.v7labs.com/blog/ai-document-analysis-complete-guideDetailed analysis showing AI document analysis accuracy frequently above 95% for structured data extraction

  2. GEP - AI-Powered Contract Analysis: Benefits & Challenges (January 2025)https://www.gep.com/blog/technology/ai-powered-contract-analysis-benefits-challengesProcurement-focused analysis showing up to 80% reduction in contract review time with AI

AI in Contract Management Trends

  1. Legartis - Trends 2025: AI in Contract Analysis (May 2025)https://www.legartis.ai/blog/trends-ai-contract-analysisEuropean perspective on AI contract analysis trends, including explainable AI and multilingual capabilities

Additional Technical References

  1. Wikipedia - Claude (language model) (January 2026)https://en.wikipedia.org/wiki/Claude_(language_model)Comprehensive overview of Claude’s development history and capabilities

  2. arXiv - HalluLens: LLM Hallucination Benchmark (April 2025)https://arxiv.org/html/2504.17550v1Academic benchmark showing many SOTA models with hallucination rates below 5% as of December 2024

  3. Frontiers in AI - Survey and analysis of hallucinations in large language models (August 2025)https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1622292/fullPeer-reviewed academic survey on hallucination attribution and mitigation strategies

  4. GitHub - Vectara Hallucination Leaderboard (Updated continuously)https://github.com/vectara/hallucination-leaderboardOpen-source hallucination evaluation framework with continuously updated model comparisons

Daniel leads Product and Customer Marketing at Gatekeeper and sits on their AI Council. He has nearly a decade of experience across procurement, risk management, and supply chain, and has been testing AI agent capabilities in production environments since early 2024. Views expressed are his own.

Reply

Avatar

or to participate