TurboVec Tutorial: Generate Your First Embedding in 5 Minutes
No vector database experience. No RAG knowledge. Just Python, five code snippets, and a working similarity search at the end. TurboVec compresses your embeddings 8–16× and searches faster than FAISS — with zero training required.
What Is TurboVec
TurboVec is a free tool that takes word or document "fingerprints" (called embeddings) and stores them so you can find similar ones instantly — using way less memory than existing tools.
Think of an embedding as a list of numbers (say, 1536 numbers) that represents what a piece of text "means." Two sentences about dogs will have similar lists of numbers. Two sentences about cooking will have different ones. TurboVec's job is to take millions of these number-lists, compress them to about one-eighth their original size, and still find the closest matches faster than the industry-standard tool (FAISS). It's built in Rust for speed, but you control it with simple Python. No cloud, no training step — you feed it vectors and it just works.
Why Use TurboVec (for Beginners)
You don't need to understand compression to get the benefit
TurboVec automatically compresses your embeddings 8–16× without you doing anything. If you generated embeddings for 10 million documents with OpenAI's text-embedding-3-small, they'd normally eat 31 GB of RAM. TurboVec fits them in about 4 GB. For a beginner with a laptop, that means you can actually run real workloads without hitting "out of memory" errors.
The memory wall is the most common blocker for beginners experimenting with embeddings at scale — TurboVec is the direct fix.
No training, no tuning, no manual
Most vector compression tools (including FAISS) need you to run a training step — feeding it sample data so it can learn how to compress your specific vectors. If your data changes, you retrain. TurboVec skips this entirely. Create an index, add vectors, search immediately.
Users hit cold-start complexity with FAISS. TurboVec removes the training requirement — pip install and go.
Faster than FAISS — not just smaller
On Apple Silicon (M-series Macs), TurboVec searches 12–20% faster than FAISS. On Intel/AMD (x86), it matches or beats FAISS at 4-bit compression and stays within ~1% at 2-bit. Speed matters even for beginners: when you type a search query, you expect results instantly, not after a coffee break.
How To Generate Your First TurboVec Embedding
By the end of these five steps, you'll have generated real embedding vectors, stored them in a TurboVec index, and run a similarity search. You only need Python installed.
Step 1: Install TurboVec
TurboVec ships as a precompiled wheel (Rust compiled to native code) — you don't need Rust installed. Works on macOS (ARM/x86), Linux (x86/ARM), and Windows. sentence-transformers gives you a free local embedding model (~90 MB download).
Step 2: Generate Embeddings with a Model
TurboVec doesn't generate embeddings itself — it stores and searches them. First, you need vectors:
What just happened? The model converted each sentence into a "vector" — a list of 384 numbers. Sentences that mean similar things ("cat sat on mat" and "feline rested on rug") get similar numbers. Sentences about different topics (programming vs cats) get very different numbers. These 384 numbers are the dimensions — think of them like 384 different ways to describe what a sentence is about.
Step 3: Store Embeddings in TurboVec and Search
What the output means: indices[0] tells you which documents matched. scores[0] tells you how similar they are — cosine similarity, from 0 (unrelated) to 1 (identical meaning).
Step 4: Understand What You Just Made
You just built a mini semantic search engine. Here's what each piece does:
- Embeddings: 384-number fingerprints that capture meaning, not just keywords. "Cat sat on mat" and "feline rested on rug" matched even though they share almost no words.
- Dimensions (384): Each number captures a different aspect of meaning. One might represent "is about animals," another "is about furniture," another "is about code."
- Similarity score: How close two vectors are in 384-dimensional space. Think of it like physical distance — two points close together = two sentences close in meaning.
- TurboVec index: The compressed storage that makes searching fast. Without an index, searching 10 million embeddings would compare your query against every single one. With TurboVec, it finds the closest matches much faster.
Step 5: Try a Simple Similarity Comparison (Bonus)
This is the "aha moment" — seeing semantic similarity in action:
Expect high similarity (~0.6–0.9) for meaningful pairs and low (~0.1–0.3) for unrelated ones. This is the foundation of semantic search — finding results by meaning, not just keywords.
Key Features
Zero Training Needed
Create an index, add vectors, search immediately. No codebook training, no k-means, no rebuilds as data grows — the difference between "pip install and go" vs "read a paper on product quantization first."
8–16× Memory Compression
31 GB of float32 embeddings becomes ~4 GB at 2-bit, ~8 GB at 4-bit. This is what makes local RAG viable on a laptop. Compression is lossy but TurboQuant's approach keeps quality loss minimal.
Filtered Search with Allowlists
Pass an allowlist of IDs to search() and TurboVec only searches within that subset. Combine keyword search (BM25/SQL) with semantic reranking — blocks with no allowed IDs are skipped entirely.
Python & Rust, Framework Integrations
Drop-in replacements for LangChain, LlamaIndex, Haystack, and Agno. Swap one import and your existing pipeline runs on TurboVec instead of in-memory stores.
Save & Load Indexes to Disk
index.write("my_index.tq") saves your index. TurboQuantIndex.load("my_index.tq") loads it back. Persistent storage means no re-indexing every time you restart your script.
CPU-Optimized, No GPU Required
SIMD-accelerated (NEON on ARM/Mac, AVX-512 on x86). Runs at full speed on any modern laptop CPU. No cloud costs, no GPU bills.
What You Can Do With It (4 Beginner-Friendly Use Cases)
Build a "Search My Notes" Tool
You have a folder of markdown notes, journal entries, or meeting summaries. Generate embeddings for each with a free model, index them in TurboVec. Now "what did I say about the Q2 budget?" finds relevant notes even if the word "Q2" isn't in them. This was impossible on a laptop for large collections before TurboVec.
Local RAG on Your Machine
RAG means: search your documents for relevant context, then feed that context to an LLM to answer a question. TurboVec handles the "search your documents" part. Pair it with Ollama running locally and you have a fully offline QA system that never sends data to the cloud.
Deduplicate Similar Content
Got scraped articles, product listings, or support tickets with near-duplicates? Generate embeddings, store in TurboVec, search for nearest neighbors. Similarity score above 0.95 → flag as duplicate. Fast enough to run on hundreds of thousands of items.
Content Recommendation ("If You Liked This...")
Blog posts, products, or songs with descriptions. Embed them all. When a user views one item, use TurboVec to find the 5 most similar. The same similarity search from Step 5 above, just at scale. Works offline — no recommendation API service needed.
FAQ
No. TurboVec runs entirely on CPU. It's optimized with SIMD instructions (NEON on ARM/Mac, AVX-512 on x86) for speed. For most beginners, a laptop CPU is more than enough for millions of vectors.
TurboVec is a vector index — it handles the "store and search embeddings" part. Vector databases like ChromaDB or Pinecone add metadata filtering, persistence, access control, and APIs on top. TurboVec is the engine; vector databases are the whole car. Beginners often start with TurboVec for learning, then move to a full vector DB when they need metadata filtering.
No. TurboVec indexes and searches embeddings — it doesn't create them. You must generate embeddings first using a separate model (sentence-transformers, OpenAI's API, Ollama with an embedding model, etc.). Then TurboVec takes over for storage and search.
"dim" is the number of dimensions in your embeddings — literally, how many numbers are in each vector. Match this to whatever embedding model you use. If your model outputs 384-dimensional vectors (like all-MiniLM-L6-v2), set dim=384. If you use OpenAI's text-embedding-3-small, use dim=1536. Mismatching dim will cause errors.
Start with 4-bit (bit_width=4). It gives better search quality at about 8× compression. 2-bit gives 16× compression but slightly lower quality — use it when memory is extremely tight and perfect recall isn't critical. For a beginner with a few thousand to a million vectors, 4-bit is the safe default.
TurboVec has Python bindings (via PyO3/maturin) and native Rust. There are no official .NET or JavaScript bindings yet. If you're not using Python or Rust, you may need to wait or contribute a binding.
TurboVec will reject them. Every vector in a single index must have the same number of dimensions. The dim parameter is fixed at index creation. If you change embedding models (and therefore dimensions), create a new index.