Execution & Safety
AgentMD executes commands from AGENTS.md with safety checks and permission boundaries.
Deterministic Workflows
AgentMD is built on deterministic workflows for governance. Commands are parsed, validated, and run in a defined order with explicit permission checks. Outcomes are reproducible given the same input and environment — making it easier to audit, debug, and comply with regulations. See Agentic AI Best Practices for the full IBM 2026 guidance.
Execution Flow
- Parse AGENTS.md and extract commands
- Check
isCommandSafe()— block dangerous patterns - Check
isCommandAllowed(permissions)— enforce frontmatter - Execute with timeout (default 60s)
- Return result (success, exitCode, stdout, stderr)
Blocked Patterns
These commands are never executed:
rm -rf /,rm -rf ~/,rm -rf ${...}chmod -R 777,chown -R ... /curl ... | sh,wget ... | bashbase64 -d ... | sheval "..."nc ... -e,ncat ... --execmkfs.*,dd if=... of=/dev> /etc/,> /usr/sudo su,sudo -i,su -,su root(privilege escalation)
Permission Boundaries
When YAML frontmatter defines permissions.shell:
- allow — Only listed commands run (supports
*wildcards) - deny — Block even if in allow
- default: deny — Require explicit allow list
CLI
agentmd run . # Run all commands
agentmd run . test # Run only test-type commands
agentmd run . build lintSecurity Best Practices
AgentMD aligns with agent security principles from the industry: least privilege, explicit allowlists, and defense in depth.
- Least privilege — Use
permissions.shell.default: denyand an explicitallowlist. Only permit commands the agent actually needs. - Explicit allowlists — Avoid broad wildcards. Prefer
pnpm testoverpnpm *. - Human-in-the-loop — Use policy rules with
approval: alwaysfor sensitive operations (deploy, migrate, production changes). - Sandboxing — Use
agentmd run . --sandboxfor isolated runs. For production, use containerized execution.
See AI agent security (IBM) for the full threat landscape and countermeasures.
Evaluation Metrics
AgentMD tracks execution outcomes for governance and debugging. Each run returns:
- Success/failure — Exit code and error output
- Duration —
durationMsper command - Output — stdout and stderr for audit
The dashboard aggregates execution history, success rates (execution-level and command-level), and audit logs. Planned: OpenTelemetry export for integration with Langfuse, Datadog, and other observability platforms.
Dashboard Execution Modes
When you run executions from the dashboard, the worker supports two modes:
- Mock (default) — Simulates execution with fixed step durations and outputs. No repo access. Use for demos or when the worker doesn't have
AGENTMD_REAL_EXECUTION=1. - Real — Fetches AGENTS.md from
agentsMdUrl, parses commands, clones the repo, and runs them. RequiresAGENTMD_REAL_EXECUTION=1on the worker. Only supports public GitHub repos.
The execution detail page shows a badge: Real execution or Mock execution. See deploy/worker/README.md for worker setup.
Environment
Commands run with shell: true in the repo directory. Environment variables are inherited from the process. For production, consider containerized execution.
API
Use the Execution API for programmatic runs:
POST /api/execute
{
"agentsMdUrl": "https://.../AGENTS.md",
"agentId": "pr-labeler",
"repositoryId": "repo_123"
}