NIST Guidance on Generative AI Security & Prompt Injection in 2026

Introduction

Generative AI is already running payroll systems, processing loan applications, and executing customer-facing workflows at scale. Security teams are defending that reality with tools built for a different one.

Traditional controls — firewalls, DLP, code scanners — operate on network traffic, data patterns, and static code. None of them see adversarial instructions embedded in natural language. With OWASP classifying prompt injection as LLM01:2025 — the top threat against large language models — deployed risk has outpaced available defenses.

NIST has responded. Between July 2024 and early 2026, the agency published or initiated four significant frameworks:

NIST AI 600-1 — generative AI risk guidance
NIST AI 100-2 (March 2025 update) — adversarial machine learning
NCCoE Agent Identity Concept Paper — identity management for AI agents
AI Agent Standards Initiative — emerging standards for agentic systems

Taken together, they give enterprises a threat taxonomy and a direction for action.

What follows breaks down what those documents actually say, how prompt injection behaves in agentic deployments, and the controls security teams need before the next incident — not after it.

Key Takeaways

NIST AI 100-2 E2025 (March 2025) expanded its adversarial ML taxonomy to cover autonomous AI agent vulnerabilities — indirect prompt injection, tool supply chain attacks, and multi-agent abuse
Prompt injection (OWASP LLM01:2025) is the primary vector NIST identifies against generative AI — and it's structurally more dangerous when agents act on retrieved data autonomously
NIST research clocked novel agent attacks at an 81% task-hijacking success rate versus 11% for known baselines — most enterprise defenses are calibrated for the wrong threat
COSAiS will extend SP 800-53 with AI deployment overlays; enterprises in regulated sectors should start gap analysis now
Runtime enforcement is required to operationalize NIST guidance; input filters reduce surface area but cannot fully stop prompt injection against stochastic LLMs

NIST's 2025–2026 Generative AI Security Landscape

NIST's AI security publications have moved from governance scaffolding to operational specifics over the past two years. Understanding which document does what is the first step to using them effectively.

The Core Document Stack

Document	Published	Primary Purpose
NIST AI 100-1 (AI RMF 1.0)	January 2023	Governance framework: Govern, Map, Measure, Manage
NIST AI 600-1	July 2024	GenAI risk profile — 12 risks unique to or exacerbated by generative AI
NIST AI 100-2 E2025	March 2025	Adversarial ML taxonomy — now includes agentic AI attack categories
NCCoE Agent Identity Concept Paper	February 2026	OAuth 2.0, SPIFFE/SPIRE, workload identity for AI agents

AI 600-1 is a companion to the AI RMF, not a standalone compliance document. Its 12 named risks include confabulation, data privacy, information integrity, and information security — the last of which covers the lowered barriers to offensive cyber that generative AI creates. Regulated industries should treat all 12 as audit-ready categories now, not after an examiner asks.

AI 100-2 E2025 closed a significant gap in the 2023 edition. The March 2025 update extended NIST's adversarial ML taxonomy to cover AI agent vulnerabilities — indirect prompt injection, agent tool supply chain attacks, and agentic deployment contexts — that the original release addressed only partially.

The AI Agent Standards Initiative

On February 17, 2026, NIST announced the AI Agent Standards Initiative (CAISI), built on three pillars: agent interoperability standards, open-source protocol development (including Model Context Protocol), and fundamental research in agent identity and authorization.

NIST's January 2026 RFI (NIST-2025-0035) drew 937 submissions, including formal responses from the OpenID Foundation and a financial services coalition comprising the American Bankers Association, Bank Policy Institute, and BITS. That depth of engagement — spanning identity standards bodies and major banking trade groups — means security teams can expect agent identity and authorization requirements to surface in regulatory guidance well before industry consensus is reached.

Prompt Injection: NIST's #1 Generative AI Threat

Prompt injection is the attack class where user prompts or external data alter LLM behavior in unintended ways. NIST AI 100-2 E2025 defines it as untrusted input combined with a higher-trust prompt to produce behavior the system was never designed to allow.

Direct vs. Indirect Injection

The distinction matters enormously in enterprise deployments:

Direct prompt injection — an attacker crafts malicious input in the prompt itself, requiring direct access to the user interface
Indirect prompt injection — adversarial instructions are embedded in documents, emails, web pages, or database records that the model retrieves and acts on, with no direct attacker access required

Indirect injection is the harder problem. The malicious instruction arrives through a legitimate retrieval channel: a PDF the agent was asked to summarize, a calendar entry it was asked to reschedule, a web page it pulled for research. Input validation at the query layer doesn't see it.

Why Prevention Alone Is Insufficient

NIST AI 100-2 E2025 is unusually direct on this point: many mitigations are empirical and lack theoretical guarantees. Model-based detection cannot prevent all impermissible outputs. This isn't a gap NIST expects to close with better prompting. It's a structural property of how LLMs process natural language.

Successful prompt injection in enterprise deployments can result in:

System prompt leakage and sensitive data disclosure
Unauthorized execution of connected-system commands
Privilege escalation across integrated services
In agentic systems: cascading tool calls, data exfiltration, or downstream agent manipulation

The NIST-Aligned Mitigation Hierarchy

No single control closes the gap, but layered controls reduce it substantially:

Constrain model behavior via system prompt controls and behavioral guardrails
Enforce least-privilege access so agents can't act beyond their intended scope
Implement semantic input and output filtering — not just pattern matching
Segregate and label untrusted external content before it enters the model's context
Require human approval for high-risk or irreversible agent actions
Red-team regularly against indirect injection vectors, using frameworks like NIST's open-source AgentDojo-Inspect toolchain

6-step NIST-aligned prompt injection mitigation hierarchy layered defense framework

The Multimodal Expansion

Those layered controls assume text-based input. Cross-modal injection removes even that assumption.

NIST AI 100-2 E2025 and OWASP both flag cross-modal injection as an emerging and under-defended threat category. Adversarial instructions can be hidden in images processed alongside text, bypassing text-focused filters entirely. Anthropic's November 2025 research on prompt injection in browser-use agents describes exactly this class of attack, where instructions hidden in web content hijack Claude Opus 4.5's browsing behavior.

Obfuscated variants using Base64 encoding, multilingual inputs, or adversarial suffixes complicate filter-based defenses further still.

Agentic AI and the Expanding Prompt Injection Attack Surface

Agentic architectures change the risk calculus. An autonomous agent doesn't just generate text — it reads from external data sources, writes back to them, invokes tools, chains tasks to subordinate agents, and accumulates context across sessions. A successful injection no longer manipulates a single model output — it can hijack an entire workflow.

The 81% Finding

NIST's January 2025 technical blog, "Strengthening AI Agent Hijacking Evaluations," reported a result that every security team running AI in production should internalize: novel attack techniques targeting agent-specific behavioral patterns achieved an 81% task-hijacking success rate, versus 11% for the strongest known baseline attack.

Three implications follow:

Defenses calibrated to known attack playbooks substantially underestimate attacker capability against real agent deployments
Single-shot evaluations are insufficient — multi-attempt testing (25 attempts per task in the NIST evaluation) raised success rates further and better reflects real attacker behavior
Per-task risk variation is wide; aggregate statistics mask operations where hijacking succeeds at near-100% rates

Agent Authorization: The Gap Nobody Fixed

NIST's NCCoE concept paper (February 2026) identified a structural problem that most enterprise AI deployments share: agents either inherit full user-level permissions (over-privileged) or operate under generic service account credentials (unauditable). Neither model supports least-privilege enforcement.

The concept paper proposes binding agent permissions to specific delegated tasks and time windows, using:

OAuth 2.0 scoping for task-specific permission grants
Pushed Authorization Requests for secure, server-side authorization initiation
SPIFFE/SPIRE workload identity for cryptographic agent authentication

AI agent least-privilege authorization model using OAuth SPIFFE and delegated permissions

This isn't a finished standard — it's a direction. Enterprises building agent deployments today should be designing authorization architectures that can accommodate these patterns when they mature.

COSAiS: The Compliance Trajectory

NIST's COSAiS project (Control Overlays for Securing AI Systems) will deliver dedicated SP 800-53 overlays for single-agent and multi-agent deployments, translating abstract security requirements into AI-specific implementation guidance.

NIST has not confirmed an official finalization timeline. Enterprises in regulated sectors shouldn't wait — beginning gap analysis against available COSAiS materials now positions teams ahead of the compliance curve rather than scrambling when overlays finalize.

Operationalizing NIST Guidance: What Security Teams Must Do Now

The four AI RMF core functions — Govern, Map, Measure, Manage — provide the operating structure. Here's what each requires in practice for AI security in 2026.

Step 1: Build an AI Agent Inventory

You cannot protect what you haven't enumerated. Every production agent deployment — commercial, vendor-provided, or internally built — needs documentation covering:

Scope of tool and API access
Data sources read from and written to
Human identities the agent acts on behalf of
The authorization model governing its actions

This inventory is the prerequisite for gap analysis against COSAiS overlays. Enterprises that cannot enumerate deployed agents cannot demonstrate compliance, regardless of which framework applies.

Step 2: Expand Red Team Coverage to Agent-Specific Attack Vectors

Most existing penetration testing programs test direct adversarial inputs. They do not test indirect prompt injection via documents, calendar entries, email, or retrieved web content — which is where real attackers will focus.

Red team exercises should explicitly scope these channels, using NIST's AgentDojo-Inspect toolchain or equivalent frameworks. Use multi-attempt testing protocols rather than single-shot evaluations. The NIST finding of 81% success over 25 attempts versus lower single-shot rates illustrates why single evaluations understate probabilistic LLM risk.

Step 3: Enforce Least-Privilege Authorization for Every Agent Action

Map current agent access patterns against the NCCoE's delegated authorization model. Agents running under full user permissions or generic service accounts represent unacceptable risk — not because of what they're doing now, but because a successful injection attack inherits whatever permissions the agent holds.

Narrowly scoped agents limit the blast radius of any attack that gets through. An agent with read-only access to a specific data source cannot exfiltrate from sources it was never authorized to touch, even if an adversarial instruction tries.

Step 4: Implement Runtime Detection as the Final Control Layer

Policy controls and input validation reduce surface area. They don't eliminate prompt injection risk — and NIST says so explicitly. Runtime enforcement is the layer that catches what pre-deployment controls miss.

This means detection at inference time: analyzing what the model receives and how it responds, validating tool calls before they execute, and enforcing per-action scope in real time.

PromptHalo's runtime security platform operates inline on every inference, tool call, and agent-to-agent handoff, making allow/restrict/challenge/deny/monitor decisions in under 100ms. ML-based detection achieves a stated catch rate above 95% at under 5% false positives, with no model retraining or code changes required. The platform deploys via API gateway, agent mode, or inline middleware in under a day.

Step 5: Generate Decision-Level Audit Trails

Application-level logging — recording that a request was made and a response returned — is not sufficient for AI compliance. Decision-level logging captures what the model received, what it decided, what tools were invoked, and under what authority, as a tamper-evident record at the per-inference level.

This distinction matters for two reasons. Anomaly detection requires the granularity to surface behavioral drift that aggregate logs would obscure. And regulatory evidence requires records that demonstrate control effectiveness at the decision level, not just system availability.

PromptHalo's append-only audit logs capture each decision with its reason, the acting agent identity, session and tenant context, and timestamp. Each record is structured for compliance export and mapped to NIST AI RMF, OWASP LLM Top 10, and the EU AI Act.

What This Means for Regulated Enterprises

The NIST AI RMF remains a voluntary framework. But the compliance landscape around it is anything but passive.

The financial services coalition's joint response to NIST's RFI — signed by the American Bankers Association, Bank Policy Institute, and BITS — called for voluntary, consensus-based guidance on agent deployments and secure automated integrations. The OpenID Foundation's response focused on agent identity, workload identity federation, and transaction tokens as foundational requirements.

These aren't abstract positions. They reflect where financial institutions are actively deploying and where regulatory conversations are heading.

The Audit Readiness Gap

Most existing audit programs derive from traditional IT frameworks. They cover network controls, access management, and data handling — but bias drift, retrieval poisoning, and indirect prompt injection fall entirely outside their scope. Security teams preparing for AI audits in 2026 need to:

Extend control libraries to include NIST AI 100-2 E2025 threat categories
Demonstrate that controls were tested against agent-specific attack scenarios
Produce evidence at the decision level, not just system-level logs

Three-step AI audit readiness checklist for enterprises aligning with NIST AI 100-2 requirements

The gap between what existing audit programs cover and what AI systems actually do is the single most common compliance risk in enterprise AI deployments right now.

Compliance as Competitive Differentiation

In regulated sectors — fintech, payments, healthcare — demonstrating a documented AI security posture is becoming a procurement prerequisite, one that directly affects which deals get closed. Enterprises that can show runtime controls, red team results, and audit-ready logs aligned to recognized frameworks close deals that competitors without that documentation cannot.

Treating NIST AI RMF alignment as a proactive program — rather than a reactive compliance exercise — produces measurable advantages:

Faster deployment: documented controls reduce internal approval cycles for new AI features
Lower regulatory friction: pre-mapped evidence shortens examiner review timelines
Procurement leverage: a verifiable AI security posture becomes a differentiator, not just a checkbox

Frequently Asked Questions

What is NIST AI 100-2 and how does it address prompt injection in 2025?

NIST AI 100-2 is NIST's authoritative taxonomy for adversarial AI attacks and mitigations. The March 2025 edition (E2025) expanded coverage to include autonomous AI agent vulnerabilities — including indirect prompt injection and supply chain attacks on agent tools — that were only partially addressed in the 2023 edition. It also acknowledges that current mitigations cannot provide fool-proof prevention.

Is NIST's generative AI security guidance mandatory for enterprises?

The AI RMF and companion documents (AI 600-1, AI 100-2) are voluntary frameworks. However, federal agencies, financial regulators, and enterprise procurement processes increasingly reference them. Enterprises in regulated sectors should treat alignment as a practical requirement, not optional guidance.

What is the difference between direct and indirect prompt injection?

Direct prompt injection involves an attacker crafting malicious input in the prompt itself, requiring direct access to the interaction. Indirect prompt injection embeds adversarial instructions in external data sources — documents, emails, web pages — that the LLM retrieves and acts on. In agentic systems, indirect injection is more dangerous because it arrives through legitimate retrieval channels, bypassing query-time input validation entirely.

What is COSAiS and why should enterprises monitor it?

COSAiS (Control Overlays for Securing AI Systems) is NIST's project to extend SP 800-53 with AI-specific implementation overlays, including for single-agent and multi-agent deployments. NIST hasn't confirmed a final publication date, but enterprises in regulated sectors should begin gap analysis against available materials now rather than waiting.

Why do traditional security tools fail to detect prompt injection?

Firewalls, DLP tools, and code scanners operate on network traffic, data patterns, and static code — none of which captures the semantics of adversarial instructions embedded in natural language or retrieved content. Detecting prompt injection requires inference-time analysis of what the model receives and how it responds, which is outside the visibility of conventional security stacks.

How can organizations use the NIST AI RMF to build defenses against generative AI threats?

The four core functions map directly to practical controls:

Govern: Establish AI risk ownership and policies covering GenAI threats
Map: Enumerate all agents and external data sources
Measure: Red team against NIST AI 100-2 E2025 attack categories using multi-attempt testing
Manage: Implement runtime enforcement and decision-level audit trails that demonstrate continuous control effectiveness