the default advice for retrieval is to reach for a managed vector database. it is good advice at scale. it is also a second bill, a second service to operate, and a second thing to keep in sync with your actual data.
the cheaper path
most projects are not at that scale. for a small to mid corpus, pgvector inside the postgres you already run handles retrieval fine. one database, one backup story, one place the data lives.
we measured recall and cost against the usual managed options on a realistic corpus:
- recall held within a hair of the managed services for our sizes
- query latency was fine for a chat loop, well under the model's own thinking time
- cost was the line that moved, because there was no second service to pay for
when to graduate
this stops being true as the corpus grows and the query volume climbs. when you feel pgvector straining, that is the signal to move, and by then you will have the traffic to justify the bill. start on what you already pay for. graduate when the numbers tell you to, not before.