I’ve been inconsistent with my writing here. Some weeks I publish, other weeks... nothing. Not because I lost interest in procurement, or ran out of things to say. I’ve been building instead of just talking about building.
Over the past several months, I’ve been building AI agents that review contracts, onboard vendors, and assess third-party risk. Not hypothetical agents. Not demo agents. Real systems processing real documents for real procurement teams. And what I’ve learned is this: we’re not talking about the future of procurement anymore. We’re living in it. Right now. Today.
The question isn’t whether AI will transform procurement. It’s whether you understand what’s already happening well enough to position yourself, and your team, for what comes next.
The Inflection Point No One Saw Coming
I’ve been optimistic about AI in procurement for years. Not because I’m easily impressed by technology hype (I’ve seen enough failed implementations to be appropriately sceptical), but because the underlying economics made sense. Procurement work involves massive volumes of structured and semi-structured data, clear decision frameworks, and repeatable processes. Perfect conditions for automation.
But optimism and reality are different things. For years, the technology simply wasn’t ready. AI tools in procurement were either:
Glorified search engines that could find information but not act on it
Rules-based automation dressed up with “AI” marketing labels
Interesting demos that fell apart when confronted with real-world complexity
Then, in late 2024 and early 2025, something fundamental shifted.

The Technical Breakthrough
The shift wasn’t about one model or one company. It was about crossing a threshold of capability that made genuinely useful AI agents possible. Let me break down what actually changed, with real data:
Context Windows Expanded Dramatically
When ChatGPT launched in late 2022, it could handle roughly 4,000 tokens. By early 2024, models like Claude 3 offered 200,000 tokens. As of 2025, Claude 4 and Gemini 2 can process up to 1 million tokens in their context windows.
Practical impact: An AI agent can now review an entire vendor contract portfolio (dozens of contracts), understand your organisation’s complete contract standards documentation, cross-reference supplier policies, and make contextually-informed decisions. Not decisions based on fragments of information. The difference between 4,000 tokens and 1 million tokens is the difference between reviewing a single contract summary and analysing an entire procurement function’s documentation simultaneously.
Hallucination Rates Dropped Dramatically
In early 2024, even advanced models hallucinated (made up facts or misrepresented information) at concerning rates. A 2024 medical study found GPT-4 hallucinated 28.6% of academic references it generated.
By December 2024, the landscape changed drastically. According to Vectara’s hallucination leaderboard, Google’s Gemini 2.0-Flash achieved a 0.7% hallucination rate, while GPT-4o showed 1.5%, and Claude 3.5 Sonnet demonstrated 4.6%. All of these represented massive improvements from just months earlier.
More importantly, some models showed up to a 64% reduction in hallucination rates during 2025. When combined with Retrieval-Augmented Generation (RAG), where the AI references specific source documents, hallucination rates dropped by an additional 71%.

[INSERT IMAGE 2: Hallucination Rate Improvements (2024-2025)] Bar chart showing: GPT-4 (Early 2024): 28.6%, GPT-4o (Late 2024): 1.5%, Gemini 2.0-Flash (2025): 0.7% Caption: Source: Vectara Hallucination Leaderboard, Journal of Medical Internet Research. 64% average reduction in 2025 | 71% further reduction with RAG
What this means practically: You can now deploy AI agents for contract review and risk assessment with confidence that they’re extracting accurate information, not fabricating clauses or misrepresenting terms. The reliability crossed the threshold where oversight shifts from “constant verification” to “exception handling.”
Accuracy Reached Production Standards
For contract data extraction specifically, modern AI systems achieve accuracy rates above 95% for structured documents. At Gatekeeper, our LuminIQ agents are currently achieving 98% accuracy in clause extraction, exceeding human performance on many contract analysis tasks.
This isn’t hypothetical. These are production systems processing real contracts for real organisations right now.
I was at SaaStr’s AI event in London in December 2024. Artisan (probably the best-known AI SDR, or Sales Development Rep, company) mentioned something revealing: their product only started actually working recently. Not “working better” or “improving.” Actually working. Delivering value. Producing results customers would pay for.
That comment crystallised something I’d been experiencing in my own work. The agents I’d been building for months suddenly started behaving less like clever chatbots and more like competent junior analysts. They understood instructions. They handled edge cases. They knew when to escalate to humans instead of guessing.
The technology was finally ready.
What “Ready” Actually Means in Procurement
Let me ground this in specifics, because “AI is ready” means nothing without concrete applications.
At Gatekeeper, I work on product marketing and sit on our AI Council. We launched LuminIQ, an AI workforce for procurement, risk, and compliance teams, in late 2024. I’ve spent months testing what these agents can actually do versus what we hoped they could do.
Here’s what “ready” looks like in practice:
Contract Review Agent: Real Performance Data
Traditional Process:
Senior analyst reviews 50-page software vendor contract
Identifies key commercial terms, risk clauses, compliance requirements
Cross-references against company standards
Documents findings and recommendations
Time required: 2-4 hours
Consistency: Varies by analyst experience and current workload
AI Agent Process:
Agent ingests same 50-page contract
Extracts all commercial terms with source citations
Flags deviations from company standards
Identifies regulatory compliance gaps
Highlights unusual or concerning language
Generates structured summary and risk assessment
Time required: 3-5 minutes
Consistency: Identical methodology applied every time
Accuracy: 98% in production (Gatekeeper LuminIQ)
The time savings are obvious. What’s less obvious but more important: the agent doesn’t get tired, doesn’t miss things because it’s Friday afternoon, doesn’t overlook a problematic clause buried on page 38 because it’s rushing to finish before a meeting. And with 98% accuracy on structured extraction tasks, the reliability is production-ready.
But here’s what the agent doesn’t do: make the final decision. It identifies issues, quantifies risks, and surfaces options. A human still decides whether a liability cap is acceptable, whether to negotiate a particular clause, and whether the overall risk profile fits the organisation’s tolerance.
This is the crucial distinction. We’re not replacing procurement professionals. We’re eliminating the exhausting, time-consuming work that prevents them from doing strategic thinking.
Vendor Onboarding: Before and After
Traditional vendor onboarding at most mid-sized organisations:
Request documents (insurance, financial statements, compliance certifications, etc.)
Follow up repeatedly as vendors send incomplete packages
Manually review each document for completeness and compliance
Cross-reference against internal requirements
Conduct risk assessment
Route through approval workflows
Update systems and notify stakeholders
Total time: 40-60 hours spread across 2-4 weeks Bottlenecks: Document collection, manual review, waiting for approvals Common failures: Missing documents, expired certificates, incomplete risk assessment
AI-powered vendor onboarding:
Agent sends document requests with clear requirements
Agent automatically verifies document completeness upon receipt
Agent extracts key data points and assesses against requirements
Agent flags risks and non-compliance issues
Agent routes to appropriate approvers with full context
Agent updates systems and triggers notifications
Total time: 4-8 hours, mostly waiting for vendor responses and approvals Bottlenecks: Vendor response time, approval decisions Common failures: Significantly reduced. Agents catch missing/expired documents immediately

The 40-60 hour process becomes 4-8 hours. More importantly, those 4-8 hours are focused on judgment and decision-making, not administrative drudgery.
Why This Time Is Different
I’ve watched multiple “AI will transform procurement” hype cycles come and go. Why is this different?
1. The Technology Actually Works Now
Previous generations of AI in procurement:
Required extensive training data specific to your organisation
Broke on edge cases
Needed constant human correction
Never quite delivered the promised ROI
Current generation:
Works out of the box with minimal configuration
Handles edge cases reasonably or escalates appropriately
Improves with feedback but doesn’t require it to function
Delivers measurable ROI in weeks, not months
2. The Cost-Benefit Equation Flipped
2022: Implementing AI procurement tools
Setup cost: $50K-150K
Training time: 3-6 months
Ongoing maintenance: Significant
Time to value: 9-12 months
Unclear ROI
2025: Implementing AI agent platforms
Setup cost: $10K-30K (often less)
Training time: Days to weeks
Ongoing maintenance: Minimal
Time to value: Weeks
Clear, measurable ROI

The barrier to entry dropped by an order of magnitude while the capability increased dramatically.
3. The Talent Pool Understands This Now
In 2022, if you wanted to implement AI in procurement, you needed:
Data scientists
ML engineers
Integration specialists
Extensive IT support
In 2025, you need:
Someone who understands procurement processes
Basic technical literacy
Willingness to experiment
The skill gap closed. Procurement professionals can now implement and manage AI agents without becoming technical experts. You need to understand what the agents can do and how to oversee them, not how they work under the hood.
This democratization matters. It means adoption will accelerate rapidly because the technology isn’t confined to organizations with deep technical resources.
What This Means for Your Career
Let me be direct about something: procurement roles are changing. Not disappearing, but changing. And professionals who understand how to work with AI agents will be significantly more valuable than those who don’t.

Here’s the uncomfortable truth: if your primary value is reviewing contracts, extracting data from documents, conducting standard risk assessments, or managing routine vendor interactions, your role is increasingly automatable. Not someday. Now.
But here’s the opportunity: if you can:
Design agent workflows
Interpret agent outputs and make decisions
Identify which processes benefit from agent automation
Oversee agent performance and improve their operation
Focus on strategic supplier relationships, complex negotiations, and policy development
...then you become more valuable, not less.
Think about it this way: a procurement pro who can oversee 10 AI agents doing the work of 40 people is worth far more than someone who does the work of one person really well.
The learning curve is real. Understanding agent capabilities, knowing when to trust their outputs, and recognising their limitations are all new skills. But they’re learnable skills. And learning them now, while most of your peers are still debating whether this matters, gives you a significant advantage.
Why I’m Committing to Consistency
I’ve been publishing sporadically because I wanted to make sure what I was saying was grounded in reality, not speculation. But the technology has crossed a threshold. The things I’m testing and building aren’t experiments anymore. They’re production systems delivering measurable results.
It’s time to document what’s actually working, systematically and consistently.
Over the next few months, I’ll be documenting what I’m learning:
What AI agents can actually do in procurement (with real demonstrations)
Where they fail and why
How to implement them effectively
What this means for procurement careers and organisations
The uncomfortable truths vendors won’t tell you
This isn’t theoretical. Everything I write about, I’m building or testing. Every claim about capability, I’ll back with demonstrations. Every assertion about ROI, I’ll support with data.
My goal isn’t to convince you that AI agents are perfect (they’re not). It’s to help you understand what’s actually possible now so you can make informed decisions about your team, your organisation, and your career.
Because here’s what I know: the procurement professionals who understand this technology will shape the next decade of our profession. Those who don’t will spend the next decade reacting to changes they didn’t see coming.
What’s Next
In my next post, I’ll answer a fundamental question: What actually is procurement in 2026? Because the answer to that question is fundamentally different than it was 24 months ago.
I’ll also be creating video content for my World of Procurement YouTube channel, showing real AI agents working on real procurement tasks. Subscribe there if you want to see what I’m describing here in action.
The inflection point happened. The technology is ready. The question isn’t whether this will transform procurement. It’s whether you’ll be leading that transformation or reacting to it.
What questions do you have about AI agents in procurement? What would you like me to cover in future posts? Drop a comment below. I read and respond to all of them.
If you found this valuable, share it with your procurement network. They’ll thank you later.
Sources & References
Context Windows & Model Capabilities
IBM - What is a context window? (November 2024)https://www.ibm.com/think/topics/context-windowComprehensive overview of context window evolution from GPT-3 (4K tokens) through Claude 4 and Gemini 2 (1M tokens)
Anthropic - Claude Documentation: Context Windows (2025)https://docs.claude.com/en/docs/build-with-claude/context-windowsTechnical specifications for Claude’s 1M token context window capability
Skywork.ai - Claude 4.5 Context Length & Extended Memory Explained (September 2025)https://skywork.ai/blog/claude-4-5-context-length-extended-memory/Detailed analysis of Claude 4’s context window capabilities and practical applications
Codingscape - LLMs with Largest Context Windows (September 2025)https://codingscape.com/blog/llms-with-largest-context-windowsComparative analysis of context windows across major LLM providers
Hallucination Rates & Reliability
AllAboutAI - AI Hallucination Report 2026: Which AI Hallucinates the Most? (December 2025)https://www.allaboutai.com/resources/ai-statistics/ai-hallucinations/Comprehensive hallucination statistics showing Gemini 2.0-Flash at 0.7%, GPT-4o at 1.5%, up to 64% reduction rates in 2025
Vectara - Introducing the Next Generation of Vectara’s Hallucination Leaderboard (May 2025)https://www.vectara.com/blog/introducing-the-next-generation-of-vectaras-hallucination-leaderboardIndustry-standard hallucination benchmarking showing dramatic improvements in frontier models
Lakera - LLM Hallucinations in 2025: How to Understand and Tackle AI’s Most Persistent Quirk (2025)https://www.lakera.ai/blog/guide-to-hallucinations-in-large-language-modelsResearch on RAG reducing hallucinations by 71% and prompt-based mitigation cutting GPT-4o’s rate from 53% to 23%
Journal of Medical Internet Research - Hallucination Rates and Reference Accuracy of ChatGPT and Bard (May 2024)https://www.jmir.org/2024/1/e53164/Peer-reviewed study showing GPT-4 hallucination rate of 28.6% in early 2024, establishing baseline for comparison
Nature npj Digital Medicine - Framework to assess clinical safety and hallucination rates of LLMs (May 2025)https://www.nature.com/articles/s41746-025-01670-7Medical research demonstrating sub-human clinical error rates achievable through careful engineering
Contract Analysis & Extraction Accuracy
V7 Labs - AI Document Analysis: The Complete Guide for 2025 (2025)https://www.v7labs.com/blog/ai-document-analysis-complete-guideDetailed analysis showing AI document analysis accuracy frequently above 95% for structured data extraction
GEP - AI-Powered Contract Analysis: Benefits & Challenges (January 2025)https://www.gep.com/blog/technology/ai-powered-contract-analysis-benefits-challengesProcurement-focused analysis showing up to 80% reduction in contract review time with AI
AI in Contract Management Trends
Legartis - Trends 2025: AI in Contract Analysis (May 2025)https://www.legartis.ai/blog/trends-ai-contract-analysisEuropean perspective on AI contract analysis trends, including explainable AI and multilingual capabilities
Additional Technical References
Wikipedia - Claude (language model) (January 2026)https://en.wikipedia.org/wiki/Claude_(language_model)Comprehensive overview of Claude’s development history and capabilities
arXiv - HalluLens: LLM Hallucination Benchmark (April 2025)https://arxiv.org/html/2504.17550v1Academic benchmark showing many SOTA models with hallucination rates below 5% as of December 2024
Frontiers in AI - Survey and analysis of hallucinations in large language models (August 2025)https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1622292/fullPeer-reviewed academic survey on hallucination attribution and mitigation strategies
GitHub - Vectara Hallucination Leaderboard (Updated continuously)https://github.com/vectara/hallucination-leaderboardOpen-source hallucination evaluation framework with continuously updated model comparisons
Daniel leads Product and Customer Marketing at Gatekeeper and sits on their AI Council. He has nearly a decade of experience across procurement, risk management, and supply chain, and has been testing AI agent capabilities in production environments since early 2024. Views expressed are his own.
