Open AI Under Lock: How to Make an Agent Act Safely

The author rejected the trending OpenClaw due to the risks posed by autonomous AI without hard technical boundaries. Instead, he built Agent One—a system where freedom of action is combined with architectural security, and rules are baked into the code, not just the prompts.

Building Agent One: secure autonomy instead of unrestricted access

Agentic AI is moving fast. What began as conversational systems is rapidly evolving into software that can take real action on behalf of users. In this landscape, the real challenge is no longer whether an AI agent can act. The question is how to let it act safely.

Agent One was designed as a practical answer to that question. It is a personal AI assistant built to operate with meaningful autonomy while remaining confined within strict technical boundaries. The goal is not unlimited power. The goal is controlled capability. It runs on an inexpensive VPS, communicates through Telegram, and performs real tasks without ever becoming uncontrollable. This project represents a deliberate alternative to early experimental agents such as OpenClaw, which demonstrated how powerful autonomous systems can be, but also exposed architectural risks when guardrails rely too heavily on prompts rather than infrastructure.

What Agent One Can Actually Do

Agent One is not a toy chatbot. It performs structured, practical work. It can research complex topics and produce organized reports. It drafts emails and sends them only after explicit user approval. It creates and updates Google Docs and Sheets. It processes user-provided files. It installs tools inside its controlled environment and improves its performance over time by referencing structured memory from previous sessions. These capabilities make it useful for real productivity workflows. At the same time, several deliberate restrictions ensure it never exceeds its authority. The agent cannot access API keys directly. It cannot rewrite or disable its own guardrails. It cannot access folders that were not explicitly mounted for it. It cannot install or use unapproved tools. It cannot send emails without confirmation.

Importantly, these limits are not enforced through polite instructions inside prompts. They are enforced at the architectural level. Docker isolation, permission-scoped file mounting, and orchestrated execution flows ensure that the model physically cannot bypass its boundaries.

The Architecture Explained Clearly

The system is intentionally divided into distinct roles. This separation prevents the kind of uncontrolled behavior that can emerge when planning and execution are merged.

The Manager: Planning Without Direct Action

The Manager agent functions as the strategic layer. It interprets user requests, decomposes tasks into structured plans, and communicates progress back to the user. Crucially, the Manager has no direct access to files, scripts, or system-level operations. It cannot execute commands. It only receives summaries from execution layers. This design ensures that planning intelligence is separated from operational power. The philosophy behind this approach aligns with a broader idea often discussed in agent design: AI should extend human intent, not replace it. The Manager reasons. It does not act.

Memory and Session Design Without Overengineering

Many agent frameworks rush toward complex vector databases and elaborate memory graphs. Agent One takes a simpler path. Manager memory holds user preferences, long-term project context, and corrections. Executor memory stores environment-specific knowledge, installed tools, and discovered workarounds. Sessions represent short-term working context. A session contains the original request, the execution plan, key assumptions, and a structured log of actions taken.

When workflows become complex, the system reloads only session data instead of the full conversation history. This prevents context bloat and reduces reasoning drift. Large agent ecosystems often suffer from noise accumulation over long chains of tool calls, an issue observed in experiments around autonomous systems like OpenClaw. Simplicity here is not a limitation. It is a deliberate performance and stability choice.

The Ralph Wiggum Loop: Resetting for Clarity

Long-running tasks generate clutter. Tool outputs accumulate. Earlier assumptions become outdated. Models begin to reason over stale or irrelevant context. To counter this, Agent One periodically resets its reasoning context after major phases of execution. The Manager restarts with only structured session data, discarding conversational noise.

Simple tasks complete in a single pass. More complex workflows such as research followed by structured reporting and distribution are executed in clean iterations. Each cycle begins with clarity rather than historical baggage. This loop prevents drift, reduces hallucination risk, and keeps reasoning aligned with current objectives.

1. Manage Through Goals, Not Scripts

Early prototypes relied on rigid, step-by-step execution instructions. This approach worked only in predictable environments. The moment an unexpected condition appeared, the system stalled or behaved incorrectly. Shifting to goal-driven task definitions changed everything. Executors were given clear outcomes to achieve along with explicit constraints. Instead of following fragile instructions, they adapted intelligently within defined boundaries. Autonomy worked better when it was guided by objectives rather than micromanaged through scripts.

2. Hard Guardrails Outperform Prompt Rules

Telling a model to “always ask before sending an email” is not a security strategy. It is a suggestion. True safety emerged only when rules were enforced at the infrastructure level. If the agent cannot physically access the email API without passing through a confirmation gate, the rule becomes reliable. Architectural enforcement proved dramatically more robust than relying on prompt discipline.

3. Keep Coordination Contracts Minimal

At one stage, communication between the Manager and Executors became over-engineered. Too many structured fields and validation layers made the system slower and more fragile. Reducing the interface to three core elements, context, goal, and constraints, improved both clarity and resilience. Over-structuring creates brittleness. Simplicity creates durability.

4. Separation of Concerns Is Non-Negotiable

During testing, the Manager was temporarily granted execution privileges. The result was immediate architectural confusion. Planning and acting must remain separate responsibilities. Once reasoning layers gain operational power, boundaries blur and safety weakens. Strict role isolation is not an academic principle. It is a practical necessity.

5. AI Still Requires Human Oversight

Even advanced models failed to detect logical contradictions within the system’s own design. They executed instructions competently, but they did not reliably question flawed architecture. Human review remains essential at the system level. AI can optimize within a structure. It cannot yet guarantee that the structure itself is sound.

6. Prompts Must Be Treated as Versioned Code

Prompt engineering cannot be casual. Small wording changes created measurable shifts in behavior. Treating prompts as version-controlled assets, complete with testing and rollback capability, introduced discipline. Once prompts were managed like software components, stability improved significantly.

7. Start With Simple Memory Systems

There is a strong temptation to begin with vector databases and complex retrieval pipelines. In practice, structured tables handled early-stage memory needs effectively. Simple systems are easier to audit, easier to debug, and often entirely sufficient until scale demands something more advanced.

8. Avoid Complexity Without Clear Benefit

Remote laptop control seemed powerful in theory. In reality, it introduced networking overhead, security considerations, and maintenance costs that outweighed its practical value for most workflows. Every new capability increases surface area. If it does not produce meaningful benefit, it should not be included.

9. Observability Is Critical

Detailed logging uncovered subtle failures that would otherwise have gone unnoticed. Incorrect parameters, malformed tool calls, and edge-case misinterpretations surfaced only because execution data was transparent. Agent systems are not black boxes. They must be observable, measurable, and diagnosable. Without that visibility, autonomy becomes guesswork.

Share with friends

Ready to get started? Get Your API Key Now!

Get API Key