BlogAI
AI

Building an AI Automation Stack on a Startup Budget

How we wire OpenAI, n8n, and Supabase together to ship customer-facing AI features in days, not months.

Nirmal Sankhla
March 28, 2026 · 11 min read
Building an AI Automation Stack on a Startup Budget

AI features are easy to demo and hard to ship. The gap is mostly plumbing — vector storage, observability, cost ceilings, prompt management — not model choice. Here's the lean stack we use to get from prototype to production in a couple of weeks.

The stack

test
  • OpenAI or Anthropic for the LLM calls
  • Supabase Postgres + pgvector for embeddings and chat history
  • n8n for back-office automations (or Vercel cron + Edge Functions if simpler)
  • Vercel AI SDK for streaming, tool calls, and structured output
  • PostHog or Helicone for usage + cost tracking

Total monthly cost for a small product with a few hundred daily AI requests: under $50, often under $20.

RAG, properly

If your AI feature answers questions from your own knowledge base, you need retrieval. The mistake we see: people store embeddings in a separate vector DB and end up with two source-of-truth problems. Use pgvector — it's a column type on the same Supabase Postgres.

-- Schema for a simple RAG store on Supabase
create extension if not exists vector;

create table documents (
  id uuid primary key default gen_random_uuid(),
  content text not null,
  metadata jsonb,
  embedding vector(1536)  -- OpenAI text-embedding-3-small
);

create index documents_embedding_idx
  on documents using ivfflat (embedding vector_cosine_ops)
  with (lists = 100);

Then a similarity-search function the API can call:

create or replace function match_documents(
  query_embedding vector(1536),
  match_count int default 5
)
returns table (id uuid, content text, similarity float)
language sql stable as $$
  select id, content,
    1 - (embedding <=> query_embedding) as similarity
  from documents
  order by embedding <=> query_embedding
  limit match_count;
$$;

Guardrails that actually pay off

In rough priority order:

  • Per-user rate limits at the route level (Upstash, 5 req/min works for most)
  • Hard daily cost ceiling per workspace — pause new requests if breached, page yourself
  • Output validation — Zod schemas on every structured response
  • Refusal grounding for RAG — instruct the model to answer 'I don't know' if context is empty
  • Prompt logging that excludes PII (or excludes prompts entirely for sensitive products)

Evals, not vibes

Track model quality the same way you'd track a regular API: with tests. Build a small set of input → expected-output pairs early, run them on every prompt change, and grade with a cheaper model. Don't ship prompt changes that drop your eval score, no matter how much better they 'feel'.

Vibes-based prompt engineering is the bug bash equivalent of staging environments — it works once and then quietly degrades.

When to skip AI altogether

If a regex, a SQL query, or a five-line algorithm could do the job — do that. AI features add cost, latency, error surface, and review burden. They're worth it for tasks that genuinely need fuzzy understanding (summarisation, freeform Q&A, classification of messy text). They're not worth it for problems with a clean deterministic answer.

var im= "test";
Share
Nirmal Sankhla
Founder & Engineering Lead

Ten-plus years across Flutter, Node.js, and AI. Likes shipping, dislikes meetings.

Need expert AI development?

Book a free 30-minute call. We’ll give you an honest read.

More to read

Ready to Build Your Dream App?

Let's discuss your project in a free 30-minute discovery call. No commitment required.