homewritinglocal-rag-sqlite-vec-transformers-js-mcp

Local RAG without the cloud: sqlite-vec, Transformers.js, and one MCP server

May 31, 2026RAG · MCP · sqlite-vec · Transformers.js · local-first · applied AI

You want an AI assistant that actually knows your stuff — your notes, your code, the decision you made three weeks ago and already forgot. For that to work, the assistant has to look things up before it answers, instead of confidently making something up.

The usual way to build that lookup takes all your private material and sends it off to a couple of outside companies, charging you a little every time you search. I built Synaptic — the memory behind my AI coding setup — to do the entire job on your own computer instead. Nothing gets uploaded. Nothing leaves the machine.

Here's how it works, in plain language, and the one idea that decides whether a thing like this is actually any good.

Why "on your own machine" matters

When the lookup happens on your computer, the trips to the internet that the cloud version makes simply disappear. Your data never gets handed to anyone. There's no per-search bill. And there's no outage somewhere else that takes your assistant down with it.

The only moment an outside AI gets involved is the very last step, when it writes the actual answer. All the finding — digging through your notes to grab the relevant bits — happens locally. For a tool that reads your private code and your personal notes, that isn't a nice bonus. It's the whole point.

A sleek laptop glowing softly from within, data held safely on the machine

The one idea that actually matters: two kinds of searching

Here's the part that decides everything, and it's genuinely simple once you see it. There are two completely different ways to search, and most tools only do one.

Searching by meaning. This is the clever, modern kind. Ask "how do we handle logins?" and it understands the idea, so it finds the right notes even if they say "authentication" and never use the word "login." Brilliant — until you go looking for an exact thing.

Searching by exact words. This is the old-fashioned kind, like Ctrl-F across all your files. Perfect when you know the precise term you want. Useless when your question is fuzzy.

Each one fails exactly where the other shines. Search by meaning for a specific function name and it hands you a lovely paragraph describing the code while completely missing the actual line. Search by exact words for a fuzzy idea and you get every file that happens to share a word — which is all of them.

So you use both, and then you have to combine their results fairly. That turns out to be the tricky part, because the two methods rate things on totally different scales — it's like trying to average a movie's star rating with its runtime. The fix is to ignore the raw numbers and look only at the order each method ranked things in, then blend those two ordered lists. Do that well and a dead-simple setup on your laptop beats an expensive cloud one. Do it badly and no amount of fancy infrastructure saves you.

That blending step is the whole ballgame. Everything else is plumbing.

Two beams of light converging into a single bright point

The honest caveats

I'd rather you hear these from me:

Some of the parts are young software. They work well, but I keep a backup, the way you'd keep a spare key.
It needs a moment to warm up when it first starts, then it's instant.
It's built for short notes, not for searching ten-thousand-word documents. Different job, different tool.
One person at a time. Perfect for a personal tool; not built to serve a thousand people at once.
You still pay for the answer itself. Doing the search locally kills the lookup costs. The AI that writes the final reply still costs whatever it costs.

For a personal memory tool, every one of those is a fine trade — and what you get back is something far simpler to run, for basically the same quality as the cloud version.

Why simpler wins

The popular way to build this drags in extra moving parts: a second cloud service, a separate database to babysit, a stack of accounts and bills. My version is one small program and a single file on your disk. You could understand the whole thing in an afternoon.

For a tool that's supposed to quietly disappear into the background and just remember things for you, that simplicity isn't a compromise. It's the entire feature.

The takeaway

You can give an AI access to your own stuff without shipping it to the cloud. It's been possible for a couple of years, it's simpler to run than the paid version, and the thing that actually makes it good isn't expensive infrastructure — it's the boring, clever idea of blending two kinds of search fairly.

Look at the idea, not the machinery.

For the technically curious: the searchable vectors live inside a SQLite file via sqlite-vec; the text-to-vector step runs locally through Transformers.js (no embeddings API); exact-match keyword search is SQLite's built-in FTS5; the assistant reaches all of it through an MCP server exposing save / search / session tools; and the two result sets are merged with Reciprocal Rank Fusion. The code is on GitHub.