Arga includes security testing in every validation workflow and offers dedicated tools for testing AI agent behaviour. This page covers both.Documentation Index
Fetch the complete documentation index at: https://docs.argalabs.com/llms.txt
Use this file to discover all available pages before exploring further.
Red-teaming: built into every run
Red-teaming is included when Arga generates validation plans for Runs, PR Checks, and Sandboxes. Arga includes adversarial scenarios alongside functional tests, so you get security and resilience coverage as part of the same validation flow.For software
Arga fuzzes your application with adversarial inputs and unexpected interaction patterns to uncover bugs that normal testing misses. Because all external calls route through digital twins, this happens safely without affecting real services. What Arga looks for:- Unhandled error states and edge cases
- Race conditions and state corruption
- Unexpected behaviour under simulated failures (timeouts, rate limits, malformed responses)
For AI agents
AI agents present unique risks — they can take unexpected actions, chain tools in harmful ways, or leak sensitive information. Arga places agents in controlled environments and probes their behaviour with adversarial prompts and unusual scenarios. What Arga tests:- Prompt injection and jailbreak resistance
- Correct handling of ambiguous or conflicting instructions
- Appropriate boundaries on tool use and data access
- Graceful degradation when external services fail
How red-teaming works
Start a validation run
Choose Runs, PR Checks, or Sandboxes. Enter your target URL or select a repo and branch. Arga probes the environment and provisions the appropriate twins.
AI planning generates adversarial scenarios
Arga’s planning agents create both functional and adversarial test scenarios. Adversarial scenarios are designed to push boundaries and expose weaknesses.
Review and approve the plan
You see the full plan — including red-team scenarios — before execution begins. Remove or adjust any scenarios that don’t apply.
Execute with twins
Scenarios run against your environment with digital twins intercepting external calls. Nothing touches production services.
AI agent testing (private beta)
Beyond red-teaming, Arga validates AI agent behaviour by placing agents in controlled environments backed by digital twins. Agents interact with realistic API surfaces without causing real-world side effects.Agent testing is currently in private beta. We’re working closely with early partners to refine the workflow before opening it up more broadly.
Why test agents differently?
AI agents are non-deterministic. The same prompt can produce different tool calls, API sequences, and outcomes. Traditional unit tests can’t cover the combinatorial space of agent behaviour. Arga addresses this by:- Observing behaviour in a sandbox — Agents run against digital twins that faithfully replicate external APIs, so you can see exactly what an agent would do in production without risk.
- Tracking state transitions — Arga records every API call, state change, and decision point, giving you a complete audit trail of agent behaviour.
- Validating against expected outcomes — Define what correct behaviour looks like, and Arga flags deviations.
Use cases
Pre-deployment validation
Pre-deployment validation
Before shipping a new agent version, run it through Arga’s sandbox to verify it handles common scenarios correctly and doesn’t exhibit unexpected behaviour.
Continuous monitoring
Continuous monitoring
Connect Arga to your CI pipeline to test agent behaviour on every code change, catching regressions before they reach production.
Incident reproduction
Incident reproduction
When an agent misbehaves in production, use session replay to reconstruct the exact state and debug the root cause.
Interested in agent testing?
Book a 30-minute session to discuss agent testing for your use case.

