Implementing NIST AI RMF: Managing (Part 4 of 4)

Operationalizing Governance at Scale

Aug 02, 2025

This is Part 4 of a four-part series on implementing practical AI governance using the NIST AI Risk Management Framework. We've established strategic governance (GOVERN), gained comprehensive visibility (MAP), and implemented meaningful measurement systems (MEASURE). Now we operationalize everything: moving from pilot purgatory to production scale while maintaining governance effectiveness. The MANAGE function transforms frameworks into sustainable operations that deliver consistent business value.

Most orgs get stuck between successful pilots and production deployment. Teams build governance frameworks that work perfectly for a few pilot projects but collapse under the operational complexity of managing dozens of AI systems in production. The frameworks look comprehensive, the pilots demonstrate clear value, but production deployment never happens.

Once again, this isn't a technology problem. Orgs succeed at pilot-scale governance (if there is such a thing) because they can dedicate specialized resources to each initiative. When scaling time arrives, they discover their governance approaches require manual oversight that doesn't scale with their AI portfolio ambitions. What works for a handful of pilots becomes a bottleneck for dozens of downstream system implementations.

Any focus on approval processes rather than operational management is a huge flag of the red variety. Just as in cybersecurity, ignoring daily operational realities will lead to heartache and worse.

The NIST AI Risk Management Framework addresses this directly through its MANAGE function [1]. The framework recognizes that governance effectiveness depends entirely on operational sustainability, but many orgs misinterpret the actual requirements when transitioning from the relatively controlled environment of pilots to the sometimes-stark reality of production operations.

Operational Challenges at Scale

We’ve seen a consistent pattern of trying to operationalize AI governance where the frameworks that seem airtight during controlled pilot phases often collapse once real-world pressures hit. What works in isolation may not in fact scale.

For one thing, resource constraints become immediately obvious. Ad hoc working group or committee oversight that worked for pilot projects needs to retire when they move up from pilot status. Likewise, manual review processes that were pilot-sufficient become immediately insufficient.

Process complexity amplifies under operational pressure, as well. Approval workflows designed for careful, almost boutique evaluation become obstacles to routine maintenance, and change management as a whole becomes unworkable for even minor model updates and routine vendor upgrades. (Spoiler: There will be a lot of vendor upgrades to manage!)

But don’t lose hope! This is an opportunity for true integration of AI into existing governance operations. And no, AI doesn’t need “its own kind governance”! As mentioned elsewhere in this series, it just demands that you expand governance scope to include the right variables and elements.

Research from multiple industries shows that successful operationalization requires building on existing frameworks rather than creating parallel systems. AstraZeneca's documented implementation leveraged existing pharmaceutical data governance structures, investing 2,000 person-hours over 12 months while maintaining approximately €20,000 (US$24,000 in April 2021) in certification costs [2]. This win cascaded from integration with established regulatory compliance processes rather than building AI-specific parallel systems.

The NIST MANAGE Function

The NIST MANAGE function defines four operational capabilities that enable sustainable AI governance at scale:

MANAGE 1: AI Risk Prioritization and Response

Establish systematic approaches to risk prioritization that integrate with existing operational risk management. This means adapting established incident response procedures to handle AI-specific issues rather than building in parallel. Orgs succeeding at scale extend their existing ITSM processes to include AI-related incidents and change management.

Operational Warning: Risk prioritization systems often fail when they rely too heavily on historical data patterns that don't account for AI's rapid evolution. New attack vectors, regulatory changes, and model capabilities can emerge faster than risk frameworks can adapt. And if you need proof of that, check your favorite news outlet.

The Red/Yellow/Green classification system we’ve been orienting around in this series then becomes operational infrastructure, but implementation requires careful attention to avoid oversimplification. And goodness knows, we’ve seen a great deal of simple thinking when it comes to “full steam ahead” implementation of GenAI.

MANAGE 2: AI System Benefit Maximization Strategies

Develop operational procedures that maximize AI system benefits while maintaining control. This requires clear understanding of how AI systems integrate with business processes and systematic approaches to optimizing performance without compromising governance.

Successful orgs focus on business outcomes rather than technical metrics. They measure deployment velocity improvements, resource efficiency gains, and stakeholder satisfaction rather than just technical performance indicators.

MANAGE 3: Third-Party AI Risk Management

Build systematic vendor management processes that scale with AI portfolio growth. I haven’t met a single org yet who didn’t have more than one vendor leading with AI. Often spread across different risk categories and business functions, this requires standardized approaches rather than treating each vendor relationship as unique.

Research demonstrates a clear case for such systematic approaches. Automated vendor assessment systems reduce human error and provide consistent evaluation criteria. Bloomberg Law research finds AI contract analysis tools achieve 94% accuracy in identifying contractual deviations, completing reviews in 26 seconds versus hours for manual review [3]. That’s pretty good, but not perfect enough for a lot of law firm clients I wouldn’t think.

Bloomberg Law research finds that AI contract analysis tools can locate issues in under 30 seconds instead of hours of manual review, and with 94 percent accuracy [3]. That’s strong on paper, but the law firms I’ve spoken with still hesitate. For them, knowing how a decision was made is even more important than speed.

And now a familiar warning: Vendor automation creates new blind spots. At present, automated assessments may (read: will!) miss emerging risks, vendor relationship changes, or novel threat patterns. Vendors also push back against standardized assessments, arguing their solutions require custom evaluation approaches. That’s a “for now” solution, in my opinion, and as we get ISO standards past ISO 42001 this opinion will necessarily mature into one that makes sense for AI customers.

MANAGE 4: AI Risk Treatment Documentation and Monitoring

Implement continuous monitoring and documentation systems for AI risk treatment decisions. This goes beyond initial assessment to systematic tracking of how risk treatment approaches perform over time and adjustment based on operational experience.

Orgs need automated monitoring systems that handle routine oversight while escalating exceptions to human review. OneTrust case studies demonstrate governance controls that automatically enforce policies in real-time, cutting oversight costs by over 99% through automation [4]. But checklist-oriented systems are not enough.

The Limits of Checklist Governance

Traditional checklist management approaches prove insufficient for the emerging complexity of MCP-enabled, multi-agent systems. While automated monitoring systems like OneTrust* demonstrate significant cost reductions through policy enforcement, they operate within established, predictable frameworks that don't account for the distributed nature of modern AI deployments.
For example, the Model Context Protocol's architecture creates new governance blind spots that checklist approaches cannot address and are nontrivial to automate. Configuration drift occurs when authentication checks are inadvertently disabled or encryption requirements become inconsistent across distributed MCP servers, and the more complex that matrix of involved systems, the more likely that drift is to occur. Even when tools appear secure in isolation, their interactions can create unexpected data pathways, particularly where permission boundaries are misaligned or eroded over time. Security researchers identified multiple outstanding issues in April 2025, including prompt injection attacks and lookalike tools that can silently replace trusted ones.

*I’m really not singling out OneTrust, theirs just happens to be the research I referenced here!

Multi-agentic AI systems amplify these challenges through what researchers call "cascading hallucinations." When one AI agent generates inaccurate information, it gets reinforced through memory, tool use, and multi-agent interactions, amplifying misinformation across multiple decision-making steps. This creates self-reinforcing destructive behaviors that traditional monitoring systems cannot detect or prevent. Observability becomes a Russia nesting doll of causality.

The fundamental design principle of agentic AI creates direct conflict with governance oversight. These systems explicitly aim to take humans "out of the loop" through autonomous task execution, yet governance frameworks depend on human oversight as a foundational principle. When autonomous agents act independently across multiple systems, determining accountability becomes significantly more complex than current governance structures can handle.

We all need end-to-end automation that can monitor agent behavior, track multi-step workflows, and maintain accountability across distributed AI systems. This requires moving beyond checklist-based governance toward what researchers call "AgentOps", the idea of automated monitoring, oversight, and orchestration of multi-agent systems that can track actions, behaviors, tool usage, and impact in real-time.

Practical Implementation

Rather than following rigid phase-based approaches, research supports adaptive implementation that responds to organizational context and operational experience. This still needs project management and goalposts, don’t get me wrong, but the nascent nature of AI governance will slow you down. We just can’t follow neatly staged approaches, as different concerns emerge dynamically rather than in a predictable way, especially as you peel back the layers of vendor-delivered solutions.

Phase 1: Integration Foundation

I’ll say it again: Start by integrating AI governance with existing operational infrastructure rather than building parallel systems. Effective AI governance platforms integrate seamlessly with existing infrastructure to avoid silos.

Focus on operational bottlenecks that prevent the move from pilot to production. Your existing approval pathways can handle routine changes without committee review, and automated monitoring systems generate exception-based alerts rather than requiring routine human oversight. It will take time to identify the right mechansim for oversight, be that log monitoring to an injection-based approach, and this is the time to figure that out.

Implementation Checklist:

✓ ITSM integration for AI-related incident response and change management
✓ Dashboard integration for operational monitoring within existing systems
✓ Vendor management integration leveraging established procurement and vendor diligence processes
✓ Change management integration for routine updates and maintenance procedures

Phase 2: Scaling Operations

Build systematic approaches to vendor relationship management that leverage existing procurement and contract management processes. This operationalizes the MANAGE 3 principles outlined above through specific implementation steps.

Evidence shows successful scaling requires moving beyond bespoke, one-off vendor management. McKinsey research identifies technical enablers including feature stores, code assets, standards and protocols, and MLOps technology that enable consistent vendor relationship management [5].

Scaling Checklist:

✓ Standardized evaluation criteria across all AI vendors
✓ Automated assessment processes for routine contract renewals
✓ Performance tracking integrated with existing vendor scorecards
✓ Escalation procedures for governance violations

Phase 3: Continuous Optimization

Continuous improvement remains just as important as initial design, and in implementing that you need to ground your feedback loops in operational experience rather than theoretical expectations. They should track approval steps that introduce friction without delivering material risk reduction, adjust alerting thresholds where monitoring produces false positives, and routinely assess governance overhead for effectiveness and efficiency. I’m all for speed, but if you have a bottleneck that doesn’t improve outcomes, you’ve got a problem.

Research reveals some concerning problems with leaning too hard on automation for governance decisions though. An Oxford Academic study with 2,854 participants found zero evidence that automation actually improves governance outcomes. Worse, automation led to 'selective adherence' where people only listened to the algorithmic advice when it confirmed what they already believed [6].

You have to maintain real human oversight in automated systems to prevent these selective adherence problems. The Dutch Childcare Benefits Scandal demonstrates exactly what can go wrong. In that debacle, the Netherlands used automated systems to flag families for childcare benefit fraud, but the algorithms systematically targeted families with dual citizenship or foreign-sounding names. When human reviewers were removed from the process, the system flagged over 26,000 families as fraudulent based on algorithmic bias, devastating families financially before anyone realized the discrimination was fully baked into the system [7].

Operational Success Indicators

Research validates four operational indicators that demonstrate successful AI governance scaling:

Deployment Velocity: Strong evidence identifies time-to-deployment as a critical indicator for success. To this end, measure the time required to move from concept to production, including the duration of any governance-related approval steps, while ensuring that safety and compliance remain intact throughout the process.

Resource Efficiency: Moderate-at-best evidence supports tracking processing efficiency metrics and cost-value analysis, though comprehensive empirical validation remains limited. In general, orgs should be skeptical of claims about dramatic resource efficiency gains without careful measurement.

Incident Response Effectiveness: The strongest evidence supports measuring incident response capabilities. The NIST AI RMF emphasizes incident response time measurement, and multiple academic studies identify this as critical for operational success. Orgs should track mean time to resolution and incident severity trends, but the good news is that it’s just a configuration item or two in existing ITSM systems to track things like TTR, MTTR, etc.

Stakeholder Confidence: Finally, emerging evidence supports measuring stakeholder satisfaction through surveys, trust metrics, and explainability scores, though measurement approaches vary significantly across orgs. That said, don't fall into the trap of relying on single metrics for customer satisfaction, treating complex AI interactions like Netflix likes or Uber ratings. Governance decisions affect real business outcomes, so you need feedback mechanisms that capture whether stakeholders actually believe the AI is helping rather than just being tolerated. For example, ask developers whether the AI code review suggestions save them time or create busywork, or survey business users about whether automated approval decisions feel fair and transparent rather than just fast.

Vendor Management at Scale

Operational AI governance requires systematic approaches to vendor relationships, though the tooling for automated compliance is still nascent, so you'll need some creativity and structure to make this work.

This is a great place to exercise those Red/Yellow/Green risk levels again. Successful orgs build consistent vendor management processes while maintaining flexibility for different vendor categories and risk levels. This includes standardized contract language that addresses AI-specific requirements like model transparency, data processing location, and change notification requirements.

The research shows that systematic approaches outperform bespoke vendor management, but the market's not quite there yet for full automation. So, for now at least, it’s important to maintain human judgment for complex decisions and avoid over-reliance on automated assessment systems that can't handle edge cases or novel vendor relationships.

Learning from Implementation Failures

Implementation failures in AI governance often follow predictable patterns when orgs don't properly operationalize the NIST MANAGE function. Systems get deployed without adequate testing of risk management processes. Governance frameworks that work in controlled pilots break down when exposed to the complexity of production environments with multiple stakeholders and competing priorities.

The research reveals a clear pattern, in that rigid frameworks break when they hit messy realities. You need governance that can adapt based on what you're actually seeing, not conceptual ideals.

Critical Implementation Warnings

Automation Bias: Studies show that people cherry-pick algorithmic advice, accepting recommendations that confirm their existing beliefs while ignoring contradictory guidance. You can't just set up automated systems and walk away.

Resource Efficiency Overselling: Be skeptical of dramatic efficiency claims. Research shows that AI efficiency gains can paradoxically increase overall resource consumption as usage grows, similar to how fuel-efficient cars led to more driving.

Standardization Rigidity: Cookie-cutter approaches often fail because every org has different constraints, cultures, and risk tolerances. What works for a financial services company won't necessarily work for a manufacturing operation.

Wrapping Up

That brings the series to a close! From early governance foundations through visibility, trust, and operational execution, each function of the NIST AI RMF builds on the last. We’ve seen the shift from theory to practice take time, but the orgs tracking to success are the ones starting with the right scaffolding in place. Good luck!

Connect with me to discuss your organization's operational AI governance challenges and scaling strategies.

Read the complete NIST AI RMF Implementation Series:

References

[1] National Institute of Standards and Technology. "AI Risk Management Framework (AI RMF 1.0)" and "AI RMF Playbook." NIST, January 2023, updated July 2024. https://www.nist.gov/itl/ai-risk-management-framework

[2] Frontiers in Computer Science. "Challenges and Best Practices in Corporate AI Governance: Lessons from the Biopharmaceutical Industry." 2024. https://www.frontiersin.org/articles/10.3389/fcomp.2022.1068361/full

[3] Debevoise Data Blog. "Good AI Vendor Risk Management Is Hard, But Doable." September 2024. https://www.debevoisedatablog.com/2024/09/26/good-ai-vendor-risk-management-is-hard-but-doable/

[4] OneTrust. "AI Governance Solutions." 2025. https://www.onetrust.com/solutions/ai-governance/

[5] McKinsey & Company. "Scaling AI for Success: Four Technical Enablers for Sustained Impact." 2024. https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/tech-forward/scaling-ai-for-success-four-technical-enablers-for-sustained-impact

[6] Taylor & Francis Online. "Algorithmic Decision-Making and System Destructiveness: A Case of Automatic Debt Recovery." 2021. https://www.tandfonline.com/doi/full/10.1080/0960085X.2021.1960905

[7] Springer. "Toward AI Governance: Identifying Best Practices and Potential Barriers and Outcomes." Information Systems Frontiers, 2022. https://link.springer.com/article/10.1007/s10796-022-10251-y

CREDITS: base cover image from http://airc.nist.gov; Anthropic Claude Sonnet 4 for editorial review