Agent & security testing

Arga includes security testing in every validation workflow and offers dedicated tools for testing AI agent behaviour. This page covers both.

Red-teaming: built into every run

Red-teaming is included when Arga generates validation plans for Runs, PR Checks, and Sandboxes. Arga includes adversarial scenarios alongside functional tests, so you get security and resilience coverage as part of the same validation flow.

For software

Arga fuzzes your application with adversarial inputs and unexpected interaction patterns to uncover bugs that normal testing misses. Because all external calls route through digital twins, this happens safely without affecting real services. What Arga looks for:

Unhandled error states and edge cases
Race conditions and state corruption
Unexpected behaviour under simulated failures (timeouts, rate limits, malformed responses)

For AI agents

AI agents present unique risks — they can take unexpected actions, chain tools in harmful ways, or leak sensitive information. Arga places agents in controlled environments and probes their behaviour with adversarial prompts and unusual scenarios. What Arga tests:

Prompt injection and jailbreak resistance
Correct handling of ambiguous or conflicting instructions
Appropriate boundaries on tool use and data access
Graceful degradation when external services fail

How red-teaming works

Start a validation run

Choose Runs, PR Checks, or Sandboxes. Enter your target URL or select a repo and branch. Arga probes the environment and provisions the appropriate twins.

AI planning generates adversarial scenarios

Arga’s planning agents create both functional and adversarial test scenarios. Adversarial scenarios are designed to push boundaries and expose weaknesses.

Review and approve the plan

You see the full plan — including red-team scenarios — before execution begins. Remove or adjust any scenarios that don’t apply.

Execute with twins

Scenarios run against your environment with digital twins intercepting external calls. Nothing touches production services.

Review findings

Arga surfaces failures, unexpected behaviours, and near-misses with detailed execution traces. Findings are categorized by severity (critical, high, medium, low, info) and attack category.

AI agent testing (private beta)

Beyond red-teaming, Arga validates AI agent behaviour by placing agents in controlled environments backed by digital twins. Agents interact with realistic API surfaces without causing real-world side effects.

Agent testing is currently in private beta. We’re working closely with early partners to refine the workflow before opening it up more broadly.

Why test agents differently?

AI agents are non-deterministic. The same prompt can produce different tool calls, API sequences, and outcomes. Traditional unit tests can’t cover the combinatorial space of agent behaviour. Arga addresses this by:

Observing behaviour in a sandbox — Agents run against digital twins that faithfully replicate external APIs, so you can see exactly what an agent would do in production without risk.
Tracking state transitions — Arga records every API call, state change, and decision point, giving you a complete audit trail of agent behaviour.
Validating against expected outcomes — Define what correct behaviour looks like, and Arga flags deviations.

Use cases

Pre-deployment validation

Before shipping a new agent version, run it through Arga’s sandbox to verify it handles common scenarios correctly and doesn’t exhibit unexpected behaviour.

Continuous monitoring

Connect Arga to your CI pipeline to test agent behaviour on every code change, catching regressions before they reach production.

Incident reproduction

When an agent misbehaves in production, use session replay to reconstruct the exact state and debug the root cause.

Interested in agent testing?

Book a 30-minute session to discuss agent testing for your use case.

Get started

Validate

Digital twins

SDKs

Setup

Reference

Agent & security testing

Red-teaming: built into every run

For software

For AI agents

How red-teaming works

AI agent testing (private beta)

Why test agents differently?

Use cases

Interested in agent testing?

Get started

Validate

Digital twins

SDKs

Setup

Reference

Documentation Index

​Red-teaming: built into every run

​For software

​For AI agents

​How red-teaming works

​AI agent testing (private beta)

​Why test agents differently?

​Use cases

Interested in agent testing?

Red-teaming: built into every run

For software

For AI agents

How red-teaming works

AI agent testing (private beta)

Why test agents differently?

Use cases