Python runs

Canonical resources

For new integrations, prefer client.test_runs, client.tests, client.sandbox_runs, and client.twin_runs.

Create an ad hoc test run

run = client.test_runs.create(
    prompt="Test the checkout flow",
    start_url="https://staging.myapp.com",
    twins=["stripe", "slack"],   # optional
    repo="myorg/myrepo",         # optional
    branch="feature/checkout",   # optional
)

detail = client.test_runs.wait(run.id, timeout=300)
print(detail.status)

List, inspect, or rerun test runs:

runs = client.test_runs.list()
detail = client.test_runs.get("run-id")
rerun = client.test_runs.rerun("run-id")

Create a sandbox run

sandbox = client.sandbox_runs.create(
    repo="myorg/myrepo",
    branch="feature/checkout",
    twins=["stripe", "slack"],
    ttl_minutes=60,
)

detail = client.sandbox_runs.get(sandbox.sandbox_id)
logs = client.sandbox_runs.logs(sandbox.sandbox_id)

Create a twin run

result = client.twin_runs.create(
    ["stripe", "slack"],
    ttl_minutes=60,
    scenario_id="scenario-id",  # optional
)

status = client.twin_runs.get(result["run_id"])
client.twin_runs.extend(result["run_id"], ttl_minutes=30)
client.twin_runs.lock(result["run_id"])
client.twin_runs.teardown(result["run_id"])

Run a saved test

tests = client.tests.list(repo_full_name="myorg/myrepo")
test = client.tests.get("test-id")
run = client.tests.run("test-id", start_url="https://staging.myapp.com")

Legacy validation runs

Test a live URL with browser validation and optional digital twins.

run = client.runs.create_url_run(
    url="https://staging.myapp.com",
    prompt="Test the checkout flow",         # optional
    twins=["stripe", "slack"],               # optional
    credentials={"email": "test@example.com", "password": "pass"},  # optional
    runner_mode="visual",                    # optional
    session_id="abc-123",                    # optional
)
# run.run_id, run.status, run.session_id

Create a PR run

Validate code changes from a branch or pull request.

run = client.runs.create_pr_run(
    repo="myorg/myrepo",
    branch="feature/checkout",               # optional (or use pr_url)
    pr_url="https://github.com/org/repo/pull/42",  # optional
    twins=["stripe"],                        # optional
    context_notes="Changed the payment flow", # optional
    scenario_prompt="Customer with active subscription",  # optional
)

Agent run migration

The current Python package still exposes client.runs.create_agent_run(...), but it targets the removed /validate/agent-run endpoint. Use client.runs.create_url_run(...) for deployed URLs and client.runs.create_pr_run(...) for normal PR validation. If you specifically need the sandbox-style branch flow with run_type: "agent_run", use the raw POST /validate/pr-run API directly; the current Python SDK does not expose that helper.

Get run details

detail = client.runs.get("run-id")
print(detail.status)          # "completed", "running", "failed", etc.
print(detail.results_json)    # list of step results
print(detail.step_summaries)  # grouped summaries
print(detail.event_log_json)  # timeline events

Stream results

Stream live results via server-sent events:

for event in client.runs.stream_results("run-id"):
    print(event)

# Async
async for event in client.runs.stream_results("run-id"):
    print(event)

Wait for completion

Block until the run reaches a terminal status (completed, failed, or cancelled):

detail = client.runs.wait(
    "run-id",
    poll_interval=2.5,  # seconds between polls (default 2.5)
    timeout=600,         # max seconds to wait (default 600)
)

Cancel a run

client.runs.cancel("run-id")

client.runs is the legacy namespace over /validate/... and /runs/.... New work should prefer client.test_runs, client.tests, client.sandbox_runs, and client.twin_runs.

​Canonical resources

​Create an ad hoc test run

​Create a sandbox run

​Create a twin run

​Run a saved test

​Legacy validation runs

​Create a PR run

​Agent run migration

​Get run details

​Stream results

​Wait for completion

​Cancel a run

Canonical resources

Create an ad hoc test run

Create a sandbox run

Create a twin run

Run a saved test

Legacy validation runs

Create a PR run

Agent run migration

Get run details

Stream results

Wait for completion

Cancel a run