EMBEDDINGS FOR BEGINNERS

TurboVec Tutorial: Generate Your First Embedding in 5 Minutes

Q: What's the difference between TurboVec and a vector database like Pinecone or ChromaDB?

TurboVec is a vector index — it handles the 'store and search embeddings' part. Vector databases like ChromaDB or Pinecone add metadata filtering, persistence, access control, and APIs on top. TurboVec is the engine; vector databases are the whole car. Beginners often start with TurboVec for learning, then move to a full vector DB when they need metadata filtering.

Q: What does dim mean when I create TurboQuantIndex(dim=384)?

'dim' is the number of dimensions in your embeddings — literally, how many numbers are in each vector. Match this to whatever embedding model you use. If your model outputs 384-dimensional vectors (like all-MiniLM-L6-v2), set dim=384. If you use OpenAI's text-embedding-3-small, use dim=1536. Mismatching dim will cause errors.

Q: 2-bit vs 4-bit — which should I use?

Start with 4-bit (bit_width=4). It gives better search quality at about 8× compression. 2-bit gives 16× compression but slightly lower quality — use it when memory is extremely tight and perfect recall isn't critical. For a beginner with a few thousand to a million vectors, 4-bit is the safe default.

No vector database experience. No RAG knowledge. Just Python, five code snippets, and a working similarity search at the end. TurboVec compresses your embeddings 8–16× and searches faster than FAISS — with zero training required.

Zero training needed · 8–16× compression · Faster than FAISS

What Is TurboVec

TurboVec is a free tool that takes word or document "fingerprints" (called embeddings) and stores them so you can find similar ones instantly — using way less memory than existing tools.

Think of an embedding as a list of numbers (say, 1536 numbers) that represents what a piece of text "means." Two sentences about dogs will have similar lists of numbers. Two sentences about cooking will have different ones. TurboVec's job is to take millions of these number-lists, compress them to about one-eighth their original size, and still find the closest matches faster than the industry-standard tool (FAISS). It's built in Rust for speed, but you control it with simple Python. No cloud, no training step — you feed it vectors and it just works.

TurboVec is a vector index — not a database. It handles the "store and search" layer. It doesn't generate embeddings, manage metadata, or provide an API endpoint. It's the fastest engine on CPU, not the whole car.

Why Use TurboVec (for Beginners)

You don't need to understand compression to get the benefit

TurboVec automatically compresses your embeddings 8–16× without you doing anything. If you generated embeddings for 10 million documents with OpenAI's text-embedding-3-small, they'd normally eat 31 GB of RAM. TurboVec fits them in about 4 GB. For a beginner with a laptop, that means you can actually run real workloads without hitting "out of memory" errors.

The memory wall is the most common blocker for beginners experimenting with embeddings at scale — TurboVec is the direct fix.

No training, no tuning, no manual

Most vector compression tools (including FAISS) need you to run a training step — feeding it sample data so it can learn how to compress your specific vectors. If your data changes, you retrain. TurboVec skips this entirely. Create an index, add vectors, search immediately.

Users hit cold-start complexity with FAISS. TurboVec removes the training requirement — pip install and go.

Faster than FAISS — not just smaller

On Apple Silicon (M-series Macs), TurboVec searches 12–20% faster than FAISS. On Intel/AMD (x86), it matches or beats FAISS at 4-bit compression and stays within ~1% at 2-bit. Speed matters even for beginners: when you type a search query, you expect results instantly, not after a coffee break.

How To Generate Your First TurboVec Embedding

By the end of these five steps, you'll have generated real embedding vectors, stored them in a TurboVec index, and run a similarity search. You only need Python installed.

Step 1: Install TurboVec

pip install turbovec
pip install sentence-transformers

TurboVec ships as a precompiled wheel (Rust compiled to native code) — you don't need Rust installed. Works on macOS (ARM/x86), Linux (x86/ARM), and Windows. sentence-transformers gives you a free local embedding model (~90 MB download).

Note: If you're on a very old CPU (pre-AVX2), TurboVec may fall back to a slower code path. It still works — just slower. You'll see a warning if this happens.

Step 2: Generate Embeddings with a Model

TurboVec doesn't generate embeddings itself — it stores and searches them. First, you need vectors:

from sentence_transformers import SentenceTransformer

# Load a free embedding model (downloads ~90 MB on first run)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Your documents — sentences, paragraphs, product descriptions, etc.
documents = [
    "The cat sat on the mat",
    "A feline rested on the rug",
    "Python is a programming language",
    "JavaScript runs in the browser",
    "My cat loves tuna fish",
]

# Generate embeddings — each document becomes a list of 384 numbers
embeddings = model.encode(documents)

print(f"Generated {len(embeddings)} embeddings")
print(f"Each embedding has {len(embeddings[0])} dimensions")
# Output: Generated 5 embeddings
# Output: Each embedding has 384 dimensions

What just happened? The model converted each sentence into a "vector" — a list of 384 numbers. Sentences that mean similar things ("cat sat on mat" and "feline rested on rug") get similar numbers. Sentences about different topics (programming vs cats) get very different numbers. These 384 numbers are the dimensions — think of them like 384 different ways to describe what a sentence is about.

Step 3: Store Embeddings in TurboVec and Search

from turbovec import TurboQuantIndex
import numpy as np

# Create an index. dim=384 matches our model. bit_width=4 = good balance.
index = TurboQuantIndex(dim=384, bit_width=4)

# Add your embeddings
index.add(embeddings)

# Create a search query — same model, same process
query_text = "animals that sleep on furniture"
query_embedding = model.encode([query_text])

# Search for the 3 most similar documents
scores, indices = index.search(query_embedding, k=3)

print("Search results (index, similarity score):")
for i, (score, idx) in enumerate(zip(scores[0], indices[0])):
    print(f"  #{i+1}: [{idx}] \"{documents[idx]}\" — score: {score:.4f}")
# Search results:
#   #1: [0] "The cat sat on the mat" — score: 0.5234
#   #2: [1] "A feline rested on the rug" — score: 0.4891
#   #3: [4] "My cat loves tuna fish" — score: 0.3127

What the output means: indices[0] tells you which documents matched. scores[0] tells you how similar they are — cosine similarity, from 0 (unrelated) to 1 (identical meaning).

Step 4: Understand What You Just Made

You just built a mini semantic search engine. Here's what each piece does:

Embeddings: 384-number fingerprints that capture meaning, not just keywords. "Cat sat on mat" and "feline rested on rug" matched even though they share almost no words.
Dimensions (384): Each number captures a different aspect of meaning. One might represent "is about animals," another "is about furniture," another "is about code."
Similarity score: How close two vectors are in 384-dimensional space. Think of it like physical distance — two points close together = two sentences close in meaning.
TurboVec index: The compressed storage that makes searching fast. Without an index, searching 10 million embeddings would compare your query against every single one. With TurboVec, it finds the closest matches much faster.

Step 5: Try a Simple Similarity Comparison (Bonus)

This is the "aha moment" — seeing semantic similarity in action:

def compare_sentences(a, b):
    """Show how similar two sentences are."""
    vec_a = model.encode([a])
    vec_b = model.encode([b])
    tmp = TurboQuantIndex(dim=384, bit_width=4)
    tmp.add(vec_b)
    scores, _ = tmp.search(vec_a, k=1)
    return scores[0][0]

# Test pairs
pairs = [
    ("The weather is nice today", "It's a beautiful sunny day"),
    ("The weather is nice today", "I need to buy groceries"),
    ("Python programming tutorial", "How to write Python code"),
    ("Python programming tutorial", "Best pizza restaurants in Rome"),
]

for a, b in pairs:
    sim = compare_sentences(a, b)
    bar = "█" * int(sim * 20)
    print(f"{sim:.3f} {bar} \"{a}\" vs \"{b}\"")

Expect high similarity (~0.6–0.9) for meaningful pairs and low (~0.1–0.3) for unrelated ones. This is the foundation of semantic search — finding results by meaning, not just keywords.

Key Features

Zero Training Needed

Create an index, add vectors, search immediately. No codebook training, no k-means, no rebuilds as data grows — the difference between "pip install and go" vs "read a paper on product quantization first."

8–16× Memory Compression

31 GB of float32 embeddings becomes ~4 GB at 2-bit, ~8 GB at 4-bit. This is what makes local RAG viable on a laptop. Compression is lossy but TurboQuant's approach keeps quality loss minimal.

Filtered Search with Allowlists

Pass an allowlist of IDs to search() and TurboVec only searches within that subset. Combine keyword search (BM25/SQL) with semantic reranking — blocks with no allowed IDs are skipped entirely.

Python & Rust, Framework Integrations

Drop-in replacements for LangChain, LlamaIndex, Haystack, and Agno. Swap one import and your existing pipeline runs on TurboVec instead of in-memory stores.

Save & Load Indexes to Disk

index.write("my_index.tq") saves your index. TurboQuantIndex.load("my_index.tq") loads it back. Persistent storage means no re-indexing every time you restart your script.

CPU-Optimized, No GPU Required

SIMD-accelerated (NEON on ARM/Mac, AVX-512 on x86). Runs at full speed on any modern laptop CPU. No cloud costs, no GPU bills.

Current limitations (honest): TurboVec is a vector index, not an embedding model — you bring your own embeddings. No built-in metadata filtering (combine with SQL or use the allowlist). The project is young (v0.6.0 as of June 2026). No GPU acceleration — designed for CPU workloads.

What You Can Do With It (4 Beginner-Friendly Use Cases)

Build a "Search My Notes" Tool

You have a folder of markdown notes, journal entries, or meeting summaries. Generate embeddings for each with a free model, index them in TurboVec. Now "what did I say about the Q2 budget?" finds relevant notes even if the word "Q2" isn't in them. This was impossible on a laptop for large collections before TurboVec.

Local RAG on Your Machine

RAG means: search your documents for relevant context, then feed that context to an LLM to answer a question. TurboVec handles the "search your documents" part. Pair it with Ollama running locally and you have a fully offline QA system that never sends data to the cloud.

Deduplicate Similar Content

Got scraped articles, product listings, or support tickets with near-duplicates? Generate embeddings, store in TurboVec, search for nearest neighbors. Similarity score above 0.95 → flag as duplicate. Fast enough to run on hundreds of thousands of items.

Content Recommendation ("If You Liked This...")

Blog posts, products, or songs with descriptions. Embed them all. When a user views one item, use TurboVec to find the 5 most similar. The same similarity search from Step 5 above, just at scale. Works offline — no recommendation API service needed.

FAQ

No. TurboVec runs entirely on CPU. It's optimized with SIMD instructions (NEON on ARM/Mac, AVX-512 on x86) for speed. For most beginners, a laptop CPU is more than enough for millions of vectors.

TurboVec is a vector index — it handles the "store and search embeddings" part. Vector databases like ChromaDB or Pinecone add metadata filtering, persistence, access control, and APIs on top. TurboVec is the engine; vector databases are the whole car. Beginners often start with TurboVec for learning, then move to a full vector DB when they need metadata filtering.

No. TurboVec indexes and searches embeddings — it doesn't create them. You must generate embeddings first using a separate model (sentence-transformers, OpenAI's API, Ollama with an embedding model, etc.). Then TurboVec takes over for storage and search.

"dim" is the number of dimensions in your embeddings — literally, how many numbers are in each vector. Match this to whatever embedding model you use. If your model outputs 384-dimensional vectors (like all-MiniLM-L6-v2), set dim=384. If you use OpenAI's text-embedding-3-small, use dim=1536. Mismatching dim will cause errors.

Start with 4-bit (bit_width=4). It gives better search quality at about 8× compression. 2-bit gives 16× compression but slightly lower quality — use it when memory is extremely tight and perfect recall isn't critical. For a beginner with a few thousand to a million vectors, 4-bit is the safe default.

TurboVec has Python bindings (via PyO3/maturin) and native Rust. There are no official .NET or JavaScript bindings yet. If you're not using Python or Rust, you may need to wait or contribute a binding.

TurboVec will reject them. Every vector in a single index must have the same number of dimensions. The dim parameter is fixed at index creation. If you change embedding models (and therefore dimensions), create a new index.

What's Next

→ Build a Full RAG Pipeline TurboVec + Ollama: search your docs, feed to a local LLM → TurboVec vs FAISS Deep Dive Benchmarks, memory usage, and when to use which → TurboVec in Production Persistent indexes, metadata filtering with SQL, API deployment