Execution & Safety

AgentMD executes commands from AGENTS.md with safety checks and permission boundaries.

Deterministic Workflows

AgentMD is built on deterministic workflows for governance. Commands are parsed, validated, and run in a defined order with explicit permission checks. Outcomes are reproducible given the same input and environment — making it easier to audit, debug, and comply with regulations. See Agentic AI Best Practices for the full IBM 2026 guidance.

Execution Flow

  1. Parse AGENTS.md and extract commands
  2. Check isCommandSafe() — block dangerous patterns
  3. Check isCommandAllowed(permissions) — enforce frontmatter
  4. Execute with timeout (default 60s)
  5. Return result (success, exitCode, stdout, stderr)

Blocked Patterns

These commands are never executed:

  • rm -rf /, rm -rf ~/, rm -rf ${...}
  • chmod -R 777, chown -R ... /
  • curl ... | sh, wget ... | bash
  • base64 -d ... | sh
  • eval "..."
  • nc ... -e, ncat ... --exec
  • mkfs.*, dd if=... of=/dev
  • > /etc/, > /usr/
  • sudo su, sudo -i, su -, su root (privilege escalation)

Permission Boundaries

When YAML frontmatter defines permissions.shell:

  • allow — Only listed commands run (supports * wildcards)
  • deny — Block even if in allow
  • default: deny — Require explicit allow list

CLI

agentmd run .           # Run all commands
agentmd run . test      # Run only test-type commands
agentmd run . build lint

Security Best Practices

AgentMD aligns with agent security principles from the industry: least privilege, explicit allowlists, and defense in depth.

  • Least privilege — Use permissions.shell.default: deny and an explicit allow list. Only permit commands the agent actually needs.
  • Explicit allowlists — Avoid broad wildcards. Prefer pnpm test over pnpm *.
  • Human-in-the-loop — Use policy rules with approval: always for sensitive operations (deploy, migrate, production changes).
  • Sandboxing — Use agentmd run . --sandbox for isolated runs. For production, use containerized execution.

See AI agent security (IBM) for the full threat landscape and countermeasures.

Evaluation Metrics

AgentMD tracks execution outcomes for governance and debugging. Each run returns:

  • Success/failure — Exit code and error output
  • DurationdurationMs per command
  • Output — stdout and stderr for audit

The dashboard aggregates execution history, success rates (execution-level and command-level), and audit logs. Planned: OpenTelemetry export for integration with Langfuse, Datadog, and other observability platforms.

Dashboard Execution Modes

When you run executions from the dashboard, the worker supports two modes:

  • Mock (default) — Simulates execution with fixed step durations and outputs. No repo access. Use for demos or when the worker doesn't have AGENTMD_REAL_EXECUTION=1.
  • Real — Fetches AGENTS.md from agentsMdUrl, parses commands, clones the repo, and runs them. Requires AGENTMD_REAL_EXECUTION=1 on the worker. Only supports public GitHub repos.

The execution detail page shows a badge: Real execution or Mock execution. See deploy/worker/README.md for worker setup.

Environment

Commands run with shell: true in the repo directory. Environment variables are inherited from the process. For production, consider containerized execution.

API

Use the Execution API for programmatic runs:

POST /api/execute
{
  "agentsMdUrl": "https://.../AGENTS.md",
  "agentId": "pr-labeler",
  "repositoryId": "repo_123"
}

→ API Reference