The perfect cloud platform for development and production of RAG / AI apps

August 1, 202515 min read

Zerops is a cloud platform built on four core principles - developer experience, flexibility, scalability, and of course, affordability. These principles, along with the platform design of Zerops, make it the ideal platform for the entire lifecycle of developing and operating the production of RAG / LLM apps.

Let's see how by creating:

a production and development environment for a RAG application
using a vector database for vectors, relational database for metadata, key-value database for cache, message broker for queues, and object storage for storage
with Python business API, Python worker and a dashboard app with CI/CD build and deploy pipeline
easily scalable and production grade (different location backups, high availability, observability)

Why? Because such an infrastructure setup, or a variation of it, is the baseline for the development workflow and running production for most projects, RAG applications included. If you are a developer without DevOps knowledge, you are probably thinking "not touching this with a ten-foot pole." If you are a developer with DevOps knowledge, you can imagine what a pain it would be to get this right and maintainable. And because of that - Zerops.

You can set up this entire infrastructure deterministically, and the cost per resource will be 2-3x cheaper than other popular cloud/PaaS options, with no artificial fees for seats or tiers of functionality. We encourage users to add all team members and use as many environments as they need for the best development practices. While you read the rest of the blog post, start deploying the production environment of the application with a single click. It’s free, no credit card required.

Deploy production environment of RAG example application to Zerops

To dive a little deeper, we'll set up the other, development environment for the application from scratch.

Creating a Zerops project and adding managed services
Deploying admin dashboard app, Python worker and API service
Production vs. development enviroment
Scaling services
Observability & debuggability
Cost
It doesn't stop there..

Creating a Zerops project and adding managed services

Zerops offers two kinds of services. First are runtimes (Node.js, Python, PHP, Golang, .NET, Java, Rust, Ruby, Bun, Deno, Elixir, Gleam), which use system containers (full OS environments like VMs but with container efficiency, sitting between Docker/OCI single-process focus and the isolation of microVMs/full VMs) with either Ubuntu or Alpine and the given technology preconfigured (and we'll go into more detail in the following section).

The second kind are semi-managed services, such as relational databases (Postgres, MySQL), vector databases (Qdrant), key-value databases (open-source Redis-compatible Valkey), search engines (Elasticsearch, Typesense, Meilisearch), message queues (Kafka, NATS), and storage (S3-compatible object storage).

We have multiple options for creating a project and adding services — web app, CLI, REST API — we'll choose import by clicking the import button from the app menu in the top left corner. With this YAML structure, we'll create the development project and the semi-managed services we chose for our infrastructure — Qdrant (vector db), Postgres (for metadata), Valkey (Redis-compatible), NATS (queue), and object storage (S3-compatible).

project:
  name: rag-dev
  # we set up project-wide variables that reference
  # env variables Zerops automatically generates
  # for managed services, these will be used by our
  # API and worker service
  envVariables:
    DB_HOST: ${db_hostname}
    DB_PORT: ${db_port}
    DB_NAME: ${db_dbName}
    DB_USER: ${db_user}
    DB_PASSWORD: ${db_password}

    AWS_ACCESS_KEY_ID: ${storage_accessKeyId}
    AWS_REGION: us-east-1
    AWS_BUCKET: ${storage_bucketName}
    AWS_ENDPOINT: ${storage_apiUrl}
    AWS_SECRET_ACCESS_KEY: ${storage_secretAccessKey}
    AWS_URL: ${storage_apiUrl}/${storage_bucketName}
    AWS_USE_PATH_STYLE_ENDPOINT: "true"

    NATS_URL: nats://${queue_hostname}:${queue_port}
    NATS_USER: ${queue_user}
    NATS_PASSWORD: ${queue_password}

    QDRANT_URL: ${qdrant_connectionString}
    QDRANT_API_KEY: ${qdrant_apiKey}

    REDIS_HOST: ${cache_hostname}
    REDIS_PORT: ${cache_port}

# creating managed services is as easy as
# setting hostname and selecting whether
# it should run in highly available mode
# or not (HA means the service will run
# on multiple containers for both load
# balancing and maximum uptime)
services:
  - hostname: db
    type: postgresql@16
    mode: NON_HA

  - hostname: cache
    type: valkey@7.2
    mode: NON_HA

  - hostname: queue
    type: nats@2.10
    mode: NON_HA

  - hostname: qdrant
    type: qdrant@1.12
    mode: NON_HA

  - hostname: storage
    type: object-storage
    objectStorageSize: 2
    objectStoragePolicy: public-read

And within 60 seconds, you'll have the entire structure spun up and ready to use.

Along with the services you chose, Zerops will also create a dedicated "core project infrastructure" that will serve as the entry point from and to the internet and includes logger and statistics services, an L3 balancer with firewall, and an L7 balancer that takes care of SSL termination, certification, and load balancing on HTTP ports. These services will be fully managed by Zerops and will scale up as needed by the load.

This internal setup both creates a private network (vxlan) that no one can access and all services can securely communicate within, but also makes your project completely independent from any Zerops "global" services. Essentially, everything needed for your project to run is contained within, and unless Zerops' physical servers burn, your application will keep running.

Deploying admin dashboard app, Python worker and API service

Zerops includes a built-in build & deploy pipeline that allows you to easily compile your apps, modify and cache container images, and zero-downtime deploy and rollback to previous versions. The pipeline can be triggered either by CLI (installed either on a developer's machine or on any CI/CD software - GitHub workflows/GitLab actions runners, etc.), or by connecting a service to a GitHub/GitLab repository. A simple zerops.yml file placed in your application root will instruct Zerops how to build, deploy, and start your applications.

To create our Python service and static service for our dashboard, we'll once again utilize the import functionality, this time importing services into our existing rag-dev project and making use of the buildFromGit param, which allows us to trigger a one-time build and deploy pipeline from any public git repository.

In our case, we'll link to our example starter repository, which is essentially a "hello-world" complexity app that mainly focuses on how to utilize the broader Zerops platform functionality and can serve as a guide for integrating Zerops into your own apps. Our repository includes all three apps (api, worker, dashboard); zerops.yml supports both single and monorepositories.

zerops-rag/
├── api/
├── dashboard/
├── processor/
└── zerops.yml

Before importing services, let's take a deeper look at the two main important parts. First, zerops.yml, which is used to describe how to build and deploy your apps — here's an example of zerops.yml for a Python app with explanations in comments:

zerops:
  # setup should match the hostname of the services
  - setup: api
    # Python apps are not compiled and their deps
    # are usually not in a vendor folder, so we are
    # just passing the source from api/ folder to the
    # runtime container
    build:
      # `~` is a wildcard for starting path
      # https://docs.zerops.io/zerops-yaml/specification#using-wildcards
      deployFiles: ./api/~
      # we want to bake in dependencies into
      # the runtime image so we don't have to
      # install them with each deploy
      addToRunPrepare:
        - ./api/requirements.txt
    run:
      base: python@3.11
      # we can choose either `alpine` or `ubuntu`
      # depending on what our apps require
      os: alpine
      # if we want to make the service publicly
      # accessible we need to tell Zerops
      # on which ports it will broadcast
      ports:
        - port: 8000
          httpSupport: true
      # modern Python libs like `uv` are pre-installed
      # we use the `requirements.txt` we passed from
      # the source to install deps on the container itself
      prepareCommands:
        - sudo uv pip install --system -r ./api/requirements.txt
      # we start the app (important to broadcast on 0.0.0.0)
      start: gunicorn main:app --bind 0.0.0.0:8000 --workers ${WORKERS:-1} --worker-class uvicorn.workers.UvicornWorker

And the second part is the utilization of Zerops-generated environment variables from within the code to easily connect to the semi-managed services.

PostgreSQL - api/main.py:L81-L89

db_pool = await asyncpg.create_pool(
  host=os.getenv("DB_HOST"),
  user=os.getenv("DB_USER"),
  password=os.getenv("DB_PASSWORD")
)

Redis / Valkey - api/main.py#L113-L117

redis_client = redis.Redis(
    host=os.getenv("REDIS_HOST"),
    port=os.getenv("REDIS_PORT")
)

Object Storage (S3) - api/main.py#L100-L107

s3 = boto3.client(
    's3',
    endpoint_url=os.getenv("AWS_ENDPOINT"),
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID")
)

NATS Queue - api/main.py#L58-L62

nc = await nats.connect(
    os.getenv("NATS_URL"),
    user=os.getenv("NATS_USER"),
    password=os.getenv("NATS_PASSWORD")
)

Qdrant Vector DB - api/main.py#L194

response = await client.post(
    f"{os.getenv('QDRANT_URL')}/collections/documents/points/search",
    headers={"api-key": os.getenv("QDRANT_API_KEY")}
)

And that's it - we are making use of the environment variables we defined on the project when we imported the semi-managed services, which themselves reference the environment variables automatically generated for each service. Zerops handles all the infrastructure complexity; you just read environment variables and focus on building your apps.

services:
  - hostname: api
    type: python@3.11
    # we can use `buildFromGit` to one-time trigger
    # the build & deploy pipeline
    buildFromGit: https://github.com/zeropsio/recipe-rag-starter
    # we can enable *.zerops.app subdomain for previews
    enableSubdomainAccess: true
    # we can use `envSecrets` to save tokens, passwords
    # or values that we want to change on the fly
    priority: 10
    envSecrets:
      WORKERS: "1"

  # worker will be accessed by API / read from queue
  # so it doesn't need a public subdomain access
  - hostname: processor
    type: python@3.11
    buildFromGit: https://github.com/zeropsio/recipe-rag-starter
    priority: 10

  - hostname: dashboard
    type: static
    buildFromGit: https://github.com/zeropsio/recipe-rag-starter
    enableSubdomainAccess: true

When Zerops is done building and deploying your apps, our rag-dev project should be fully functioning and ready to use for development.

Production vs. development environment

One of the biggest advantages of using Zerops is environment parity - there are no differences between the core infrastructure of production and development projects. The difference comes down purely to how many resources you allocate to each service and whether it should run in high availability or not. This way, you can make sure you won't ever get into "but it works on my machine" situations. Let's see how we can make use of this.

Let's start by cloning the repository:

git clone https://github.com/zeropsio/recipe-rag-starter rag-starter
cd rag-starter

Download Zerops CLI and authorize:

# check https://docs.zerops.io/references/cli#manual-installation
# for other installation options
npm i -g @zerops/zcli

# generate token here
# https://app.zerops.io/settings/token-management
zcli login YOUR_TOKEN_HERE

# set scope to our dev project so all following
# commands are ran with this context
zcli project scope rag-dev

To test everything works, let's quickly trigger a new build & deploy pipeline using the CLI:

# interactive mode will let you select
# service to trigger the pipeline on choose 
# any you want, `api` for example
zcli push

Now that we have successfully learned how to trigger a new pipeline, which is what you'd typically do for your stage-like environments, let's take this a step further and use Zerops as a full-fledged Cloud Development Environment for local and remote development.

First, we start by using the built-in WireGuard VPN to connect to the private network of the project.

# will require wireguard to be installed on your system
# https://docs.zerops.io/references/vpn#prerequisites
zcli vpn up

This will allow you to locally access any service running inside your Zerops project simply by its Zerops hostname, e.g., postgres://db:5432. This means we can, among other things, offload running all the databases, storages, and even other pieces of the project like the worker to Zerops — only running development of a single piece of the project locally. Let's give it a try and start development of the api service.

# install uv if you don't have it yet
# https://docs.astral.sh/uv/getting-started/installation/
curl -LsSf https://astral.sh/uv/install.sh | sh

# use zcli to create .env with resolved env variables values
zcli project env --service api --user-only > api/.env

# install deps and start development
cd api
uv venv # if you haven't already
uv pip install -r requirements.txt # if you haven't already
uv run uvicorn main:app --reload

When we now visit http://127.0.0.1:8000/status, we should get a working JSON response.

How about we take this one step further and try developing fully remotely with a browser version of VS Code? Import this YAML inside your dev project.

services:
  - hostname: apiremote
    type: python@3.11
    buildFromGit: https://github.com/zeropsio/recipe-rag-starter
    # make the service run on a single container
    # with a little bit more RAM for code-server
    maxContainers: 1
    verticalAutoscaling:
      ram: 2
    # we can override the zerops.yml configuration inline
    # for long term solution you'd add it to the zerops.yml in repo
    zeropsYaml:
      zerops:
        - setup: apiremote
          run:
            base: python@3.11
            os: ubuntu
            prepareCommands:
                - curl -fsSL https://code-server.dev/install.sh | sh
            start: code-server --auth none --bind-addr 0.0.0.0:8080 /var/www

Now when we visit http://apiremote:8080, we get access to a VS Code server instance with a fully working terminal where you can run the same start commands you would locally.

But that's not the end. Imagine what you can do when you install Claude Code or opencode.ai into the apiremote service — you get a sandboxed production-like infrastructure for your agent to operate in, with seamless hand-off to a human developer. Zerops is flexible; the options for utilizing it for human and agentic development are endless.

Scaling services

One thing you might need regardless of the environment is scaling. Zerops has built-in automatic horizontal and vertical scaling with granular configuration (1 CPU, 0.125 GB RAM). This means that you can easily set your worker to run on 10 containers, each with 10 CPUs and 48 GB RAM before the job, and then scale it back down to 1 container with 0.125 GB RAM. While Zerops scales automatically within the selected range, when you know how many resources your jobs will require, you can use the API or CLI to manually set resources before and after running your job.

Observability & debuggability

Zerops automatically gathers all logs from all services and gives you access through the web app, CLI, and REST API. This serves well for a quick view into what's happening inside your apps. For a proper log system, Zerops allows users to set up log forwarding to any syslog-ng compatible third-party system. This includes the ELK stack, which itself can be deployed to Zerops with a single click. Prometheus can also be deployed with a single click to Zerops for detailed resource dashboards.

For debugability, Zerops allows you to SSH with root access into any service/container, simply by using their hostname

ssh api

For convenience and quick access, the Zerops web app also includes a web terminal.

No matter the platform, debugging builds is always a pain in the ass - the endless cycle of triggering pipelines, waiting for the job to start and fail, and so on. And so Zerops has an option to stop the pipeline at certain points, allowing you to SSH/web terminal into the build container and run debug commands or manually run commands there before continuing.

Cost

A fully highly available (all databases on 3 containers, API on 2 containers) production-grade RAG infrastructure might cost around $30/month (then depending on the actual load of your workers - Zerops charges by the minute), while the development non-HA project costs around $10/month (or $2/month if you run it 8 hours a day to use as a cloud development environment).

Per resource, Zerops is 2-3x cheaper than most popular PaaS solutions, nearly rivaling fully DIY VPS solutions from the cheapest providers.

It doesn't stop there

As prefaced in the beginning, Zerops is a cloud platform for developers, not just AI/LLM apps. It just so happens that the general platform design we created fits extremely well with AI/LLM apps as well. This means that Zerops can cover any of your other needs - deploy a headless CMS, marketing site, CRM, analytics software, or any other software your product or your development team requires.

If you have any questions or need help deploying anything to Zerops, reach out through our community Discord - our whole dev team is there!

Table of contents