A VPS is the natural home for autonomous AI agents. An agent running on a local machine stops the moment the laptop closes. On a VPS it runs around the clock, holds a stable IP address, and reaches external APIs without the bandwidth limitations of a home connection. Every serious agentic workflow eventually ends up on a server — this is just the infrastructure reality of autonomous software.
What AI Agents Are and Why They Need a Server
An AI agent is a program that independently executes chains of actions to reach a goal. Unlike a chatbot that answers questions, an agent plans steps, calls tools, checks results, and adjusts its approach. It can browse a website, extract data, run it through a language model, send an email, and log the result to a database — without a human in the loop at any step.
Workloads that commonly run as persistent agents on a VPS:
- Competitor price monitoring with automatic price list updates
- Scraping job boards, news, or tender platforms with alerts
- Automated messaging across email and chat platforms
- Scheduled content generation and publishing
- Handling inbound support requests through CRM integrations
- Financial monitoring: transactions, exchange rates, anomaly detection
All of these require the agent to be running when the trigger fires — not when someone's laptop happens to be open.
Two Types of Agents, Two Sets of Requirements
Before picking a VPS configuration, you need to know which kind of agent you're running. The resource profiles are dramatically different.
API-based agents call external language models — OpenAI GPT-4o, Anthropic Claude, Google Gemini. The agent itself just orchestrates: calls the API, processes the response, calls the next tool. The heavy computation happens on the model provider's infrastructure. These agents barely touch server resources — 1-2 GB of RAM and a single vCPU handle even complex multi-agent chains.
Local model agents run the language model directly on the server through Ollama, LlamaCpp, or vLLM. This changes the requirements entirely: a minimum of 8 GB RAM for small models (7B parameters), 16-32 GB for mid-size models (13B-30B), and realistically a GPU for usable generation speed. The tradeoff: complete data privacy, no API costs.
Requirements by Workload Type
Lightweight agents (API models, automation, bots)
Frameworks: LangChain, LlamaIndex, AutoGPT, CrewAI, n8n, Flowise.
Minimum configuration:
- 1-2 vCPU
- 2-4 GB RAM
- 20-40 GB SSD
- Python 3.10+ or Node.js 18+
This covers the majority of agentic use cases when you're relying on external API calls. Starting from €5.77/month on THE.Hosting.
Mid-complexity agents (RAG, vector databases, multi-agent systems)
When an agent works with large document collections through RAG (Retrieval-Augmented Generation), runs a local vector database (Chroma, Qdrant, Weaviate), or coordinates multiple parallel agents, the requirements step up.
Recommended configuration:
- 4 vCPU
- 8-16 GB RAM
- 80-160 GB NVMe
Local models via Ollama
Ollama is the simplest way to run open-source language models on a VPS. It supports Llama 3, Mistral, Gemma, Phi, Qwen, and dozens of others. Without a GPU it runs on CPU — functional, but slow.
Requirements by model size:
| Model | Parameters | RAM | Generation speed (CPU) |
|---|---|---|---|
| Phi-3 Mini | 3.8B | 4 GB | ~8-12 tokens/sec |
| Llama 3.1 | 8B | 8 GB | ~4-6 tokens/sec |
| Mistral | 7B | 8 GB | ~4-6 tokens/sec |
| Llama 3.1 | 70B | 48 GB | ~0.5-1 tokens/sec |
For production use of local models, GPU is effectively required — CPU generation speed is too slow for most real-world workloads.
Setting Up a VPS for an AI Agent
Install Docker and base environment
Docker is the standard way to deploy agent stacks. It isolates dependencies and makes service management straightforward.
apt update && apt install -y docker.io docker-compose
systemctl enable --now docker
Install Ollama for local models
curl -fsSL https://ollama.com/install.sh | sh
Pull a model and run it:
ollama pull llama3.1
ollama run llama3.1
Ollama exposes a REST API at http://localhost:11434 — agentic frameworks connect to it directly.
Deploy n8n — visual agent orchestrator
n8n lets you build agentic workflows through a visual interface without writing code. It supports AI nodes for OpenAI, Anthropic, and local models via Ollama.
docker run -d \
--name n8n \
-p 5678:5678 \
-v n8n_dаta:/home/node/.n8n \
docker.n8n.io/n8nio/n8n
The web interface is available at http://your-ip:5678 after the container starts.
Deploy Flowise — visual LangChain agent builder
Flowise provides a drag-and-drop interface for building LangChain chains. Simpler than n8n for purely AI-centric tasks.
docker run -d \
--name flowise \
-p 3000:3000 \
flowiseai/flowise
Python agent via CrewAI
Set up in a virtual environment:
python3 -m venv agent-env
source agent-env/bin/activate
pip install crewai langchain-openai
A minimal two-agent crew:
from crewai import Agent, Task, Crew
researcher = Agent(
role='Researcher',
goal='Find current information on a given topic',
backstory='Experienced data analyst',
verbose=True
)
writer = Agent(
role='Writer',
goal='Produce a clear report based on the research',
backstory='Technical writer with broad background',
verbose=True
)
task1 = Task(description='Research VPS options for AI agents', agent=researcher)
task2 = Task(description='Write a summary report from the research', agent=writer)
crew = Crew(agents=[researcher, writer], tasks=[task1, task2])
result = crew.kickoff()
Auto-start with systemd
To keep the agent running across server restarts, create a systemd service:
nano /etc/systemd/system/ai-agent.service
[Unit]
Description=AI Agent Service
After=network.target
[Service]
Type=simple
User=ubuntu
WorkingDirectory=/home/ubuntu/agent
ExecStart=/home/ubuntu/agent-env/bin/python main.py
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
systemctl enable ai-agent
systemctl start ai-agent
Agent Frameworks Worth Knowing
LangChain / LangGraph — the most widely used Python framework. Covers every major LLM provider and has the richest tool ecosystem. LangGraph extends it with stateful agents and cyclic execution graphs, which matter for complex reasoning workflows.
CrewAI — purpose-built for multi-agent systems. Clean API for defining agents with specific roles and delegating tasks between them. Lower barrier to entry than LangGraph for team-of-agents patterns.
AutoGPT — one of the first autonomous agent projects with an open codebase. Set a goal, the agent plans and executes. Requires an external API (OpenAI) and works best for research and exploration tasks.
n8n — visual workflow orchestrator with solid AI node support. Best choice when the agent needs to integrate with many external services through ready-made connectors.
Flowise — visual LangChain builder. Simpler than n8n for AI-only chains, less flexible for general automation.
Dify — a full platform for building and deploying AI applications. Includes an agent builder, RAG pipelines, an API gateway, and usage analytics. The most complete self-hosted option in the category.
Security Considerations
An AI agent with access to APIs, databases, and external services is an expanded attack surface. A few non-negotiable practices:
Never store API keys in code. Use environment variables loaded from a .env file with restricted permissions:
chmod 600 .env
Run the agent as a non-privileged user — not as root. Create a dedicated account:
adduser --disabled-password agent-user
Restrict the agent's network access if it shouldn't be reaching arbitrary hosts. Configure UFW with a whitelist of permitted destinations.
Log everything the agent does. In production this is mandatory — you need to understand what the agent was doing when something goes wrong.
Monitoring and Observability
Langfuse — open-source LLM observability platform. Traces every model call, records token usage and latency, lets you compare runs. Deploys on the same VPS via Docker.
LangSmith — cloud-hosted equivalent from the LangChain team. Easier to set up, but data leaves your infrastructure.
Portainer — web UI for managing Docker containers. If the agent runs in Docker, Portainer lets you restart it, view logs, and manage it through a browser without SSH.
Frequently Asked Questions
What's the minimum VPS needed to run an AI agent?
For API-based agents (OpenAI, Anthropic), 1 vCPU and 2 GB of RAM is genuinely enough. The agent only orchestrates calls — all computation happens at the model provider. For local models via Ollama, the minimum is 8 GB RAM for a 7-8B parameter model.
Can I run GPT-4 on a VPS?
No — GPT-4 is only available through OpenAI's API. What you can run locally on a VPS are open-source models: Llama 3.1, Mistral, Gemma, Qwen through Ollama or LlamaCpp. They don't match GPT-4 across the board, but they're fully private and have no per-token cost.
Do AI agents need a GPU?
API-based agents don't need a GPU at all. For local models, a GPU is the difference between usable and painful: a 7B model on CPU generates around 4-6 tokens per second; on a modern GPU, 60-100 tokens per second. For production workloads with local models, GPU isn't optional.
How do I keep API keys safe on a server?
Store keys in environment variables loaded from a .env file. Set file permissions to 600 so only the owner can read it. Load them in Python with python-dotenv. Never commit .env to version control — add it to .gitignore from the start.
How many agents can I run on one VPS?
API-based agents are limited by your LLM provider's rate limits, not server resources — you can run many simultaneously on modest hardware. For local models, multiple agents can share a single Ollama instance without duplicating the model in memory, so RAM consumption doesn't multiply linearly with agent count.
THE.Hosting runs NVMe-backed VPS across 50+ locations — a solid foundation for agentic workloads. Plans start at €5.77/month for API-based agents, from €15/month for configurations running local models. Deploys in 60 seconds. Get started or reach out to support 24/7 via Telegram.