← All posts

Security

Prompt injection and the permission boundary

Prompt injection is not a bug you patch once. Any agent that reads untrusted content — a web page, an email, a document, a tool result — can be handed instructions that conflict with yours. The uncomfortable truth: you cannot guarantee the model will refuse.

Stop defending only at the prompt

System-prompt hardening and input filtering help at the margin, but they are probabilistic. If your only defense is "the model will know better," then one successful injection equals one successful action. The goal is to make a hijacked instruction fail at execution, not just hopefully be declined.

The permission boundary does the containing

  • If the worker's role cannot see the dangerous tool, the injection has nothing to call.
  • If the action is gated, a human sees the request before it runs — the injection surfaces instead of executing.
  • If credentials are server-side and scoped, a tricked agent cannot exfiltrate a key it never held.
  • Whatever it attempts lands in the audit log, so the attempt is visible even if it fails.
Assume the model can be convinced. Design so that being convinced isn't enough.

Least privilege is the real mitigation

The blast radius of a successful injection is exactly the worker's permissions — no more. That is why scoping, visibility filtering, and approval gates are security controls, not just convenience. Grantry treats the permission boundary as the place injection gets contained, so a compromised prompt cannot exceed the role.


Grantry

Contain the model you can't fully trust.

Start for free

Questions first? Email [email protected].