Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Vector Database Self-Guided Course

This document is a self-guided course on vector databases. It is organized into four parts: conceptual foundations, the internals of vector search systems, hands-on Rust exercises with Turso and sqlite-vec, and real-world application pipelines. Each section is either a reading lesson or a hands-on Rust programming exercise. Sections marked 🚧 are stubs whose full content is tracked in an nbd ticket — follow the ticket ID to find the detailed learning objectives and instructions.


Table of Contents

Part 1 — Foundations

  1. What Is a Vector?
  2. Embeddings
  3. Vector Similarity

Part 2 — Vector Databases

  1. What Is a Vector Database?
  2. Under the Hood: ANN Algorithms

Part 3 — Turso + sqlite-vec Basics

  1. Setting Up
  2. Exercise 1 — Storing and Retrieving Vectors
  3. Exercise 2 — K-Nearest Neighbor Search

Part 4 — Real Applications

  1. Generating Embeddings in Rust
  2. Exercise 3 — Semantic Document Search
  3. Exercise 4 — Recommendation Engine
  4. Exercise 5 — Retrieval-Augmented Generation

Part 1 — Foundations

1. What Is a Vector?

A vector is an ordered list of numbers. That is the entire definition — nothing more exotic than a list where position matters. A two-element list [3.0, 4.0] is a vector; so is a 1 536-element list of floating-point values produced by a language model. What makes vectors useful is that the numbers have a geometric interpretation: each element is a coordinate along one axis of a space, and the vector as a whole names a point (or an arrow from the origin to that point) in that space.

Geometric intuition in two and three dimensions. Start with the familiar. A 2-dimensional vector [x, y] is a point in the plane — the kind you plot on graph paper. The vector [3.0, 4.0] sits three units to the right of the origin and four units up. An arrow drawn from [0, 0] to [3, 4] has a magnitude (length) of √(3² + 4²) = 5 and points in a specific direction. Magnitude and direction together completely characterise the vector; change either one and you have a different vector.

A 3-dimensional vector [x, y, z] extends this to physical space: three coordinates, three axes, one point. You can still compute a magnitude — √(x² + y² + z²) — and you can still talk about direction. Two 3D vectors point in the same direction if one is a positive scalar multiple of the other; they are perpendicular (orthogonal) if their dot product is zero.

High-dimensional spaces. Nothing in the definition of a vector limits it to two or three elements. A d-dimensional vector [x₁, x₂, …, x_d] is a point in d-dimensional space. The geometry extends perfectly: magnitude is √(x₁² + x₂² + … + x_d²), the dot product of two vectors is Σᵢ aᵢ · bᵢ, and you can compute angles and distances between points just as you would in 2D or 3D.

High-dimensional geometry is counterintuitive in subtle ways that are worth knowing:

  • The curse of dimensionality. In high-dimensional spaces, most of the volume of a hypersphere is concentrated near its surface rather than its interior. Two randomly chosen high-dimensional vectors from a standard distribution tend to be nearly orthogonal — their dot product is close to zero — even when you have not deliberately constructed them that way. This means “nearest neighbour” in high dimensions is a harder problem than it sounds: there are exponentially many directions, and nearby points can seem far away using simple distance measures.

  • Normalisation changes the geometry. A unit vector has magnitude exactly 1. Dividing a vector by its magnitude — normalisation — projects all vectors onto the surface of the unit hypersphere. On that sphere, distance and angle are equivalent measures of similarity, which simplifies many computations. Embedding models often output unit-normalised vectors precisely to exploit this equivalence.

  • Dimensions are not independent features. When people say a language model embeds words into a 768-dimensional space, they do not mean “dimension 42 encodes the concept of colour.” The axes of an embedding space are rarely interpretable on their own. Meaning is encoded in the relative positions of points — which vectors are close to which others — not in the values along any single axis.

Vectors as representations. The key insight that makes vector databases useful is that real-world objects — documents, images, audio clips, products, users — can be represented as vectors such that similarity in meaning or content corresponds to proximity in the vector space. Two documents that discuss the same topic will, if embedded well, produce vectors that are close together. Two documents on unrelated topics will produce vectors that are far apart.

This is not magic; it is the result of training a model to produce embeddings where similar inputs cluster near each other. Once you have such a model, every search or comparison problem reduces to a geometric problem: find the vectors closest to a query vector. The rest of this course is about how to do that efficiently at scale.

A note on notation. Throughout this course, vectors are written in bold or with subscripts: v, q, or v₁. The i-th element of a vector v is written v[i] or vᵢ. The magnitude of v is written |v| or ‖v‖. Dimension is written d and the number of stored vectors is written n.


2. Embeddings

What an embedding is. An embedding is a function — learned from data, not hand-crafted — that maps an input to a fixed-size vector of floating-point numbers. The input can be a word, a sentence, an image, a product listing, or anything else that can be fed into a neural network. The output is always the same shape: a Vec<f32> of some predetermined length d. The function is trained so that inputs with similar meaning produce vectors that are close together in the d-dimensional space, while unrelated inputs produce vectors that are far apart. Once you have such a function, comparing the meaning of two inputs reduces to comparing their vectors — which is exactly the geometric problem that vector databases are built to solve.

Word embeddings and a brief history. The idea that meaning can live in a vector took hold in 2013 when Mikolov et al. published Word2Vec. Word2Vec trains on raw text and assigns every word in its vocabulary a single static vector, typically of 100 to 300 dimensions. The striking result was that vector arithmetic captured semantic relationships: the vector for king minus the vector for man plus the vector for woman produced a vector closest to queen. GloVe (2014) and fastText (2016) refined the approach, but the core limitation remained — each word gets exactly one vector regardless of context. The word bank has the same embedding whether it refers to a riverbank or a financial institution. Static word embeddings are largely a historical curiosity today, but they established the foundational principle: meaning can be encoded as geometry.

Contextual embeddings from encoder models. Modern embedding models solve the polysemy problem by reading the entire input before producing a vector. Models such as those in the sentence-transformers library or OpenAI’s text-embedding-3-small take a full sentence (or paragraph) as input, process it through a transformer encoder, and output one vector that represents the whole input. Internally these models produce a vector for every token; the single sentence-level vector is obtained either by averaging all token vectors (mean-pooling) or by taking the vector at a special [CLS] token position. You do not need to understand transformer internals to use these models — the interface is simple: input is a string, output is a Vec<f32> of fixed length. Because the model sees the full context, the same word in different sentences yields different final embeddings, correctly distinguishing river bank from investment bank.

What makes a good embedding model. Embedding models are trained with a contrastive objective: given a pair of inputs known to be similar (a question and its answer, two paraphrases, a caption and its image), the loss function pulls their vectors closer together; given a dissimilar pair, it pushes them apart. The quality of the training data — how many pairs, how diverse, how accurately labelled — matters as much as model size. Models are evaluated on the MTEB (Massive Text Embedding Benchmark), which measures performance across retrieval, classification, clustering, and semantic similarity tasks. In general, larger models produce better embeddings but cost more compute per input and return higher-dimensional vectors that consume more storage.

Practical dimensionalities. Different models produce different vector sizes, and the choice affects speed, memory, and quality. Common dimensions include 384 (MiniLM — fast inference, model size around 80–130 MB, a good default for prototyping), 768 (BERT-base and many sentence-transformers models — the most common open-source default), 1 536 (OpenAI text-embedding-3-small — a strong hosted option balancing quality and cost), and 3 072 (OpenAI text-embedding-3-large — highest quality from OpenAI at roughly double the cost). Higher dimensionality is not always better: on small datasets or narrow domains, a 384-dimensional model may match or outperform a 1 536-dimensional one while using a quarter of the storage and running faster at query time. Choose based on your task, your latency budget, and empirical evaluation — not on the assumption that bigger is automatically better.

Embeddings for non-text data. Vectors are not limited to language. CLIP (Contrastive Language-Image Pretraining) trains a text encoder and an image encoder jointly so that their output vectors inhabit the same space — a photo of a dog and the sentence “a photograph of a dog” end up near each other, enabling text-to-image and image-to-text search with no modality-specific logic. Product embeddings can be learned from purchase co-occurrence: items frequently bought together are trained to have nearby vectors, powering recommendation engines. Audio, code, and molecular structures have their own embedding models. The vector database does not care what produced the floats — it stores arrays of f32 and computes distances. This modality-agnostic storage is one of the reasons vector databases have become a general-purpose building block in modern AI systems.


3. Vector Similarity

Once you have two vectors, how do you measure how alike they are? This section covers the three most common similarity and distance functions used in vector search — their formulas, geometric interpretations, and trade-offs — then works through a concrete example so the arithmetic is familiar before you encounter these functions in SQL.

Cosine similarity. The cosine similarity of two vectors a and b is defined as cos(θ) = (a · b) / (‖a‖ · ‖b‖), where a · b is the dot product and ‖a‖ is the magnitude of a. The result ranges from −1 to 1: a value of 1 means the vectors point in exactly the same direction, 0 means they are orthogonal (perpendicular), and −1 means they point in exactly opposite directions. The critical property of cosine similarity is that it measures only the angle between vectors, ignoring their magnitudes entirely. This makes it ideal for text embeddings: a short document and a long document on the same topic may produce vectors that differ in magnitude but point in nearly the same direction, and cosine similarity correctly identifies them as similar.

Cosine distance. Cosine distance is simply 1 − cosine_similarity. Its range is 0 to 2, where 0 means the vectors are identical in direction and 2 means they are fully opposite. This is what sqlite-vec’s vector_distance_cos function returns. Pay attention to the naming: the function name contains “cos” but it returns a distance, not a similarity — smaller values mean more similar vectors, not less. This is a common source of confusion when writing queries for the first time.

Dot product. The dot product of two vectors a and b is a · b = Σᵢ aᵢbᵢ — multiply corresponding elements and sum the results. For unit-normalised vectors (vectors whose magnitude is exactly 1), the dot product equals cosine similarity, because the denominator ‖a‖ · ‖b‖ = 1 · 1 = 1 and cancels out. For unnormalised vectors, the dot product conflates magnitude and direction: a longer vector will produce a larger dot product even if the angle is the same. Some embedding models are trained specifically for maximum inner product search (MIPS), meaning their vectors are not unit-normalised and the raw dot product is the intended similarity metric. The model’s documentation or model card will say so when this is the case.

Euclidean (L2) distance. The Euclidean distance between two vectors is ‖a − b‖ = √(Σᵢ (aᵢ − bᵢ)²) — the straight-line distance between two points in d-dimensional space. Its range is 0 to ∞, with 0 meaning the vectors are identical. Unlike cosine similarity, L2 distance is sensitive to vector magnitude: two vectors pointing in the same direction but with different lengths will have a non-zero L2 distance. L2 is most appropriate for low-dimensional geometric or tabular data where absolute coordinate values carry meaning — for example, geographic coordinates or sensor readings.

When to use each. For text and sentence embeddings, use cosine similarity (or equivalently, dot product if your model outputs unit-normalised vectors, which many do). When in doubt, follow the recommendation on the model card. For low-dimensional geometric features where absolute position matters, use L2 distance.

Worked example. Let a = [1, 0, 1] and b = [1, 1, 0]. Compute all three metrics by hand:

Dot product: a · b = (1)(1) + (0)(1) + (1)(0) = 1 + 0 + 0 = 1.

Magnitudes: ‖a‖ = √(1² + 0² + 1²) = √2 ≈ 1.414. ‖b‖ = √(1² + 1² + 0²) = √2 ≈ 1.414.

Cosine similarity: cos(θ) = 1 / (√2 · √2) = 1 / 2 = 0.5. The cosine distance is 1 − 0.5 = 0.5, which is what vector_distance_cos would return.

Euclidean distance: ‖a − b‖ = √((1−1)² + (0−1)² + (1−0)²) = √(0 + 1 + 1) = √2 ≈ 1.414.

These three numbers — dot product = 1, cosine similarity = 0.5, L2 distance ≈ 1.414 — describe different aspects of the relationship between a and b. In the exercises that follow, you will see these same computations expressed as SQL function calls over stored vectors.


Part 2 — Vector Databases

4. What Is a Vector Database?

A vector database is a data store built around one core operation: given a query vector q, return the k stored vectors most similar to q. Every other feature — indexing, filtering, replication, APIs — exists to make that single operation fast, accurate, and convenient at scale. This section explains why that operation is hard, what problems it solves, and how vector databases compare to the data systems you already know.

The core operation. Given a query vector q and n stored vectors, find the k vectors most similar to q. This is the k-nearest-neighbour (KNN) problem. Exact KNN requires computing the distance from q to every stored vector — O(n · d) work per query. At n = 1 000 000 and d = 768, that is 768 million floating-point operations for a single query, far too slow for interactive use. Vector databases solve this by using approximate nearest-neighbour (ANN) algorithms (covered in §5) that trade a small accuracy loss for orders-of-magnitude speed gains. An ANN index can answer the same query in milliseconds by examining only a tiny fraction of the stored vectors.

Use cases. The ability to find “semantically similar” items powers a wide range of applications:

  • Semantic search: find documents that match the meaning of a query, not just its keywords — a search for “how to fix a flat tyre” retrieves results about “changing a punctured wheel” even though no words overlap.
  • Recommendation: given an item a user just viewed or purchased, return the k most similar items from the catalogue (§11), or surface content preferred by users with similar taste profiles.
  • Retrieval-Augmented Generation (RAG): retrieve the most relevant passages from a knowledge base before prompting a large language model, so the model’s answer is grounded in real documents rather than its training data alone (§12).
  • Duplicate and near-duplicate detection: identify items that are semantically identical or extremely close to a given item — useful for deduplicating support tickets, detecting plagiarism, or clustering similar product listings.
  • Anomaly detection: items whose vectors are far from all stored vectors are likely anomalous, enabling outlier detection without hand-crafted rules.
  • Multi-modal search: find images matching a text description, or vice versa, by storing CLIP-style joint embeddings where text and image vectors share the same space.

vs. relational databases. SQL WHERE clauses perform exact matches and range queries on scalar values — equality, greater-than, LIKE, IN. There is no built-in notion of “nearest” for an array of floats. You cannot write ORDER BY similarity(embedding, ?) in standard SQL because the concept does not exist in the relational model. Extensions like pgvector (PostgreSQL) and sqlite-vec (SQLite / Turso) add vector column types, distance functions, and ANN indexes to existing relational databases, letting you combine vector search with traditional filtering in a single query. This course uses sqlite-vec via the libsql crate, which means you get vector search without leaving the SQLite ecosystem you may already know.

vs. full-text search (BM25 / TF-IDF). Traditional keyword search scores documents by how often query terms appear, weighted by rarity across the corpus. It works well when users know the exact vocabulary of the documents they want, but it cannot handle synonymy — “car” and “automobile” are unrelated tokens unless you maintain an explicit synonym list — and it has no concept of sentence-level meaning. Vector search captures both synonymy and broader conceptual similarity because the embedding model learns those relationships from data. In practice, hybrid search — combining a BM25 keyword score with an ANN vector score — outperforms either method alone and is a common pattern in production systems.

Key metrics. When evaluating a vector database or an ANN index, four numbers matter:

  • Recall@k: the fraction of the true k nearest neighbours that the ANN algorithm actually returns. A recall@10 of 0.95 means 95 out of every 100 true top-10 results are found; the other 5 are replaced by slightly less similar vectors.
  • QPS (queries per second): how many queries the index can serve per second at a given recall target. Higher is better; this is the throughput you care about in production.
  • Index build time: the one-time cost paid to construct the search index from raw vectors. HNSW indexes, for example, require inserting each vector into a multi-layer graph, which can take minutes to hours for large datasets.
  • Memory footprint: HNSW stores graph edges in RAM alongside the vectors themselves, which limits how large the index can grow on a single machine. Quantisation and disk-backed indexes reduce memory at the cost of recall or latency.

Where sqlite-vec and Turso fit. sqlite-vec is an excellent choice for embedded applications, local development, prototyping, and small-to-medium corpora — up to a few million vectors. It runs inside your application process with no separate server, and Turso adds cloud hosting, replication, and edge caching on top of the same SQLite foundation. For larger-scale deployments — tens of millions of vectors, multi-tenancy, complex filtered search, or distributed indexing — dedicated vector databases such as Pinecone, Qdrant, or Weaviate provide additional infrastructure. The concepts you learn in this course transfer directly: the same embeddings, distance functions, and query patterns apply regardless of which engine you choose.


5. Under the Hood: ANN Algorithms

Why not exact search? Brute-force KNN computes the distance from the query vector to every stored vector — O(n · d) work per query. At n = 1 000 000 vectors, d = 768 dimensions, and 1 000 queries per second, that is roughly 768 billion floating-point operations per second — infeasible on a commodity CPU. Approximate nearest-neighbour (ANN) algorithms find results in O(log n) or sub-linear time at the cost of occasionally missing a few true nearest neighbours. The two dominant families are HNSW and IVFFlat.

HNSW — Hierarchical Navigable Small World. HNSW is the dominant algorithm for in-memory ANN and is the algorithm used by sqlite-vec.

Imagine a multi-level skip list where each level is a proximity graph. The top level is sparse, containing only a small subset of nodes connected by long-range edges that enable fast coarse navigation across the dataset. Each subsequent level adds more nodes and shorter-range edges, increasing density. The bottom level contains every vector, connected to its nearest neighbours by short-range edges that enable precise local search. When a query arrives, the algorithm starts at an entry point on the top level and greedily moves to whichever neighbour is closest to the query vector. When no neighbour on the current level is closer than the current node, the algorithm descends one level and repeats the greedy walk with the denser graph. At the bottom level, it collects the k nearest candidates encountered during traversal and returns them as the result.

HNSW key parameters:

  • M — the number of bidirectional connections each node maintains per layer. Higher M improves recall (the algorithm has more paths to explore) but increases memory consumption and slows down inserts because more edges must be evaluated and updated. A typical default is 16.
  • ef_construction — the size of the dynamic candidate list used when inserting a new vector into the graph. Higher values produce a higher-quality index (better-connected graph) at the cost of slower index construction. A typical default is 200.
  • ef_search — the size of the candidate list used during query-time traversal. Higher values improve recall at the cost of higher query latency. This parameter is often set equal to k by default, but increasing it is the easiest way to trade latency for accuracy at query time.

HNSW supports incremental inserts with no full rebuild — each new vector is linked into the existing graph structure, which is why the CREATE INDEX ... USING libsql_vector_idx in §6 requires no separate training step. The memory cost of the graph is O(n · M · 4 bytes) on top of the vectors themselves.

IVFFlat — Inverted File with flat quantisation. IVFFlat is the dominant approach for disk-based or GPU-accelerated ANN and is used by default in systems like Faiss and pgvector.

The idea is to partition the dataset into nlist Voronoi cells using k-means clustering during a one-time training step. Each cell is defined by a centroid vector, and every stored vector is assigned to the cell whose centroid is closest. At query time, the algorithm computes the distance from the query to all nlist centroids, selects the nprobe nearest centroids, and then performs exact brute-force search only within those cells — skipping the vast majority of the dataset entirely.

IVFFlat key parameters:

  • nlist — the number of clusters (Voronoi cells). A common heuristic is to set nlist ≈ √n. More clusters mean each cell is smaller, so query-time search within a cell is faster, but training takes longer and very small cells increase the risk of a query’s true neighbours falling in an unsearched cell.
  • nprobe — the number of clusters examined at query time. Higher nprobe improves recall at the cost of higher latency. Setting nprobe = nlist degenerates to exact search; setting nprobe = 1 checks only the single most likely cluster.

Unlike HNSW, IVFFlat requires a training step (the k-means clustering) before any data can be inserted. Incremental inserts require assigning each new vector to an existing cluster, which can degrade quality over time as the data distribution drifts from the original centroids — periodic retraining is recommended for heavily updated datasets. IVFFlat uses less memory than HNSW for the same n because it does not store graph edges.

sqlite-vec uses HNSW. The libsql_vector_idx index type you created in §6 builds an HNSW index — which is why rows can be inserted incrementally with no training step. The current sqlite-vec API does not expose M or ef parameters directly; sensible defaults are chosen for broad applicability.

Summary table.

PropertyHNSWIVFFlat
Query timeO(log n)O(nprobe · n / nlist)
InsertIncrementalBatch (requires training)
MemoryHigher (graph edges)Lower
Recall@10 at defaults~0.95+~0.90+ (depends on nprobe)
Used bysqlite-vec, Qdrant, Weaviatepgvector, Faiss

Part 3 — Turso + sqlite-vec Basics

6. Setting Up

This section walks through everything you need before writing a single SQL query: adding the right crates, opening a local Turso connection, and loading the sqlite-vec extension that gives SQLite vector-search superpowers.

What You Are Building

Turso is a SQLite-compatible database with built-in support for vector similarity search via the sqlite-vec extension. In local development you use a file-backed SQLite database; in production the same code points at a Turso cloud database. The libsql crate (the Rust client for Turso) speaks the Turso wire protocol and also handles local SQLite files transparently.

Cargo.toml

Create a new binary project and add the following dependencies:

cargo new vec-demo
cd vec-demo

Replace the [dependencies] section of Cargo.toml with:

[dependencies]
libsql = "0.9"
tokio = { version = "1", features = ["full"] }

libsql is the official Rust client for Turso / libSQL databases. It supports both local SQLite files and remote Turso connections with the same API, making it straightforward to develop locally and deploy to the cloud. tokio provides the async runtime — all libsql operations are async.

Add the release-build optimisation profile from the project conventions:

[profile.release]
opt-level = "z"
lto = true
strip = true
codegen-units = 1

Opening a Local Connection

Replace src/main.rs with the following:

use libsql::{Builder, Database};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let db: Database = Builder::new_local("vectors.db").build().await?;
    let conn = db.connect()?;

    // Verify the connection works
    let mut rows = conn.query("SELECT sqlite_version()", ()).await?;
    if let Some(row) = rows.next().await? {
        let version: String = row.get(0)?;
        println!("SQLite version: {version}");
    }

    Ok(())
}

Run it with cargo run. You should see output like:

SQLite version: 3.46.0

A file named vectors.db will appear in the current directory. This is a standard SQLite database — you can open it with any SQLite client to inspect its contents.

Enabling Vector Support with sqlite-vec

The libsql crate ships with sqlite-vec built in. No separate installation is required. Vector functions become available automatically once you use the right column types and functions in your SQL.

The key types and functions you will use throughout this course:

ConstructPurpose
F32_BLOB(d)Column type for storing a d-dimensional float32 vector
vector(json_array)Creates a vector from a JSON array literal
vector_extract(blob)Converts a stored vector blob back to a JSON array
vector_distance_cos(a, b)Cosine distance between two vectors (0 = identical, 2 = opposite)
libsql_vector_idx(col)Index type for fast approximate nearest-neighbour search
vector_top_k(table, query, k)Table-valued function: returns the k nearest rows to a query vector

Creating a Vector Table

Extend main to create a table that stores 3-dimensional float32 vectors:

#![allow(unused)]
fn main() {
conn.execute(
    "CREATE TABLE IF NOT EXISTS items (
         id      INTEGER PRIMARY KEY,
         label   TEXT NOT NULL,
         embedding F32_BLOB(3) NOT NULL
     )",
    (),
).await?;
}

F32_BLOB(3) declares a column that holds a 3-dimensional float32 vector stored as a binary blob. The 3 is the dimensionality — use the actual size of your embedding model’s output (e.g., F32_BLOB(768) for a 768-dimensional model) in real projects.

Creating a Vector Index

Without an index, nearest-neighbour search performs a full table scan — computing the distance from the query to every stored vector. For small tables this is fine; at scale you need an index:

#![allow(unused)]
fn main() {
conn.execute(
    "CREATE INDEX IF NOT EXISTS items_vec_idx
         ON items (embedding)
         USING libsql_vector_idx(embedding)",
    (),
).await?;
}

This creates an HNSW index over the embedding column. Queries that use vector_top_k will automatically use this index. The index is updated incrementally as rows are inserted or deleted — no manual rebuild is required.

Putting It Together

At this point your main.rs should look like this:

use libsql::{Builder, Database};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let db: Database = Builder::new_local("vectors.db").build().await?;
    let conn = db.connect()?;

    // Verify connection
    let mut rows = conn.query("SELECT sqlite_version()", ()).await?;
    if let Some(row) = rows.next().await? {
        let version: String = row.get(0)?;
        println!("SQLite version: {version}");
    }

    // Create vector table
    conn.execute(
        "CREATE TABLE IF NOT EXISTS items (
             id        INTEGER PRIMARY KEY,
             label     TEXT NOT NULL,
             embedding F32_BLOB(3) NOT NULL
         )",
        (),
    ).await?;

    // Create HNSW index
    conn.execute(
        "CREATE INDEX IF NOT EXISTS items_vec_idx
             ON items (embedding)
             USING libsql_vector_idx(embedding)",
        (),
    ).await?;

    println!("Database ready.");
    Ok(())
}

cargo run should print:

SQLite version: 3.46.0
Database ready.

You now have a working local vector database. Exercises 1 through 5 build on this foundation, adding data, querying it, and connecting the full embedding-to-search pipeline.


7. Exercise 1 — Storing and Retrieving Vectors

Goal: Insert 6 labelled 3-dimensional vectors into the items table created in §6, then SELECT all rows and print each label alongside its deserialized Vec<f32>.

The Dataset

We use a tiny hand-crafted set of 3D vectors so the results are easy to verify by inspection. The vectors are designed so that items in the same category cluster together — animals near [high, low, low], vehicles near [low, high, low], and programming languages near [low, low, high]:

idlabelembedding
1“cat”[0.9, 0.1, 0.2]
2“dog”[0.8, 0.2, 0.3]
3“car”[0.1, 0.9, 0.1]
4“truck”[0.2, 0.8, 0.2]
5“python”[0.15, 0.1, 0.95]
6“rust”[0.1, 0.05, 0.9]

In later exercises you will query these vectors to see how cosine distance naturally separates the three clusters.

Step 1 — Formatting a Vector for INSERT

sqlite-vec’s vector(?) SQL function accepts a JSON array string — for example "[0.9,0.1,0.2]". You pass this string as a text parameter and vector() converts it into the internal F32_BLOB format for storage.

A small helper keeps the conversion in one place:

#![allow(unused)]
fn main() {
fn vec_to_json(v: &[f32]) -> String {
    format!("[{}]", v.iter().map(|x| x.to_string()).collect::<Vec<_>>().join(","))
}
}

Calling vec_to_json(&[0.9, 0.1, 0.2]) returns the string "[0.9,0.1,0.2]", ready to bind as a SQL parameter.

Step 2 — Inserting Rows

Use INSERT OR IGNORE so the program is idempotent — running it twice does not produce duplicate-key errors or duplicate data:

INSERT OR IGNORE INTO items (id, label, embedding) VALUES (?, ?, vector(?))

Define the dataset as a Vec<(i64, &str, Vec<f32>)> and loop over it:

#![allow(unused)]
fn main() {
let data: Vec<(i64, &str, Vec<f32>)> = vec![
    (1, "cat",    vec![0.9,  0.1,  0.2]),
    (2, "dog",    vec![0.8,  0.2,  0.3]),
    (3, "car",    vec![0.1,  0.9,  0.1]),
    (4, "truck",  vec![0.2,  0.8,  0.2]),
    (5, "python", vec![0.15, 0.1,  0.95]),
    (6, "rust",   vec![0.1,  0.05, 0.9]),
];

for (id, label, embedding) in &data {
    conn.execute(
        "INSERT OR IGNORE INTO items (id, label, embedding) VALUES (?, ?, vector(?))",
        libsql::params![*id, *label, vec_to_json(embedding)],
    ).await?;
}
println!("Inserted {} rows.", data.len());
}

Step 3 — Selecting and Deserializing

Query all rows back out. The vector_extract function converts the stored F32_BLOB back into a JSON array string that you can parse in Rust:

SELECT id, label, vector_extract(embedding) FROM items ORDER BY id

Add serde_json to your Cargo.toml dependencies for JSON parsing:

serde_json = "1"

Then fetch and deserialize:

#![allow(unused)]
fn main() {
let mut rows = conn
    .query("SELECT id, label, vector_extract(embedding) FROM items ORDER BY id", ())
    .await?;

while let Some(row) = rows.next().await? {
    let id: i64 = row.get(0)?;
    let label: String = row.get(1)?;
    let json_str: String = row.get(2)?;
    let embedding: Vec<f32> = serde_json::from_str(&json_str)?;
    println!("{id:<3}{label:<10}{embedding:?}");
}
}

Step 4 — Expected Output

Running cargo run should print:

SQLite version: 3.46.0
Database ready.
Inserted 6 rows.
1  cat       [0.9, 0.1, 0.2]
2  dog       [0.8, 0.2, 0.3]
3  car       [0.1, 0.9, 0.1]
4  truck     [0.2, 0.8, 0.2]
5  python    [0.15, 0.1, 0.95]
6  rust      [0.1, 0.05, 0.9]

Every vector round-trips through the database intact: Rust Vec<f32> → JSON string → vector()F32_BLOB storage → vector_extract() → JSON string → serde_json → Rust Vec<f32>.

Cargo.toml Additions

Your full [dependencies] section should now be:

[dependencies]
libsql = "0.9"
tokio = { version = "1", features = ["full"] }
serde_json = "1"

Reference Solution

Show full solution

Cargo.toml (dependencies only):

[dependencies]
libsql = "0.9"
tokio = { version = "1", features = ["full"] }
serde_json = "1"

src/main.rs:

use libsql::{Builder, Database};

/// Convert a float slice to a JSON array string for sqlite-vec's `vector()` function.
fn vec_to_json(v: &[f32]) -> String {
    format!(
        "[{}]",
        v.iter()
            .map(|x| x.to_string())
            .collect::<Vec<_>>()
            .join(",")
    )
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // --- Open database ---
    let db: Database = Builder::new_local("vectors.db").build().await?;
    let conn = db.connect()?;

    // Verify connection
    let mut rows = conn.query("SELECT sqlite_version()", ()).await?;
    if let Some(row) = rows.next().await? {
        let version: String = row.get(0)?;
        println!("SQLite version: {version}");
    }

    // --- Create table (from §6) ---
    conn.execute(
        "CREATE TABLE IF NOT EXISTS items (
             id        INTEGER PRIMARY KEY,
             label     TEXT NOT NULL,
             embedding F32_BLOB(3) NOT NULL
         )",
        (),
    )
    .await?;

    // --- Create HNSW index (from §6) ---
    conn.execute(
        "CREATE INDEX IF NOT EXISTS items_vec_idx
             ON items (embedding)
             USING libsql_vector_idx(embedding)",
        (),
    )
    .await?;

    println!("Database ready.");

    // --- Insert 6 labelled vectors ---
    let data: Vec<(i64, &str, Vec<f32>)> = vec![
        (1, "cat",    vec![0.9,  0.1,  0.2]),
        (2, "dog",    vec![0.8,  0.2,  0.3]),
        (3, "car",    vec![0.1,  0.9,  0.1]),
        (4, "truck",  vec![0.2,  0.8,  0.2]),
        (5, "python", vec![0.15, 0.1,  0.95]),
        (6, "rust",   vec![0.1,  0.05, 0.9]),
    ];

    for (id, label, embedding) in &data {
        conn.execute(
            "INSERT OR IGNORE INTO items (id, label, embedding) VALUES (?, ?, vector(?))",
            libsql::params![*id, *label, vec_to_json(embedding)],
        )
        .await?;
    }
    println!("Inserted {} rows.", data.len());

    // --- Select and deserialize ---
    let mut rows = conn
        .query(
            "SELECT id, label, vector_extract(embedding) FROM items ORDER BY id",
            (),
        )
        .await?;

    while let Some(row) = rows.next().await? {
        let id: i64 = row.get(0)?;
        let label: String = row.get(1)?;
        let json_str: String = row.get(2)?;
        let embedding: Vec<f32> = serde_json::from_str(&json_str)?;
        println!("{id:<3}{label:<10}{embedding:?}");
    }

    Ok(())
}

Goal: Given a query vector, use vector_top_k to find the 3 most similar items, join with the items table to retrieve labels and exact cosine distances, and display the results ranked by distance.

Step 1 — Introduce vector_top_k

vector_top_k is a table-valued function (TVF) that returns row IDs of approximate nearest neighbours without performing a full table scan. It leverages the HNSW index created in §6 to navigate directly to the neighbourhood of the query vector. The syntax is:

SELECT i.rowid FROM vector_top_k('items', vector(?), ?) i

The three arguments are:

  1. Table name (string literal) — the table whose vector index should be searched.
  2. Query vector — passed through vector() as a JSON array string, just like when inserting data.
  3. k — the number of nearest neighbours to return.

The function returns only rowid values — it does not return labels, embeddings, or distances. To access other columns you must JOIN the result back to the original table. This design keeps the TVF focused on index traversal and lets you choose exactly which columns to retrieve.

Step 2 — Full KNN Query

Combine the TVF with a JOIN and an exact distance computation to get labelled, ranked results:

SELECT items.id, items.label, vector_distance_cos(items.embedding, vector(?)) AS dist
FROM vector_top_k('items', vector(?), ?) AS knn
JOIN items ON items.rowid = knn.rowid
ORDER BY dist ASC

Notice that the query vector must be passed twice — once as the second argument to vector_top_k (for index traversal to find candidate rows) and once as the second argument to vector_distance_cos (for exact distance computation on those candidates). Both are the same JSON array string bound to separate SQL parameters.

Why two passes? vector_top_k uses the HNSW index to quickly identify which rows are likely nearest neighbours, but it does not return distance values. vector_distance_cos then computes the exact cosine distance for each candidate row, which you use for ranking and display.

Step 3 — Run Three Queries and Print Results

Define a helper function that runs the KNN query for a given query vector and prints the results:

#![allow(unused)]
fn main() {
async fn knn_query(
    conn: &libsql::Connection,
    query: &[f32],
    k: i32,
) -> Result<(), Box<dyn std::error::Error>> {
    let q = vec_to_json(query);
    let mut rows = conn
        .query(
            "SELECT items.id, items.label, vector_distance_cos(items.embedding, vector(?)) AS dist
             FROM vector_top_k('items', vector(?), ?) AS knn
             JOIN items ON items.rowid = knn.rowid
             ORDER BY dist ASC",
            libsql::params![q.clone(), q.clone(), k],
        )
        .await?;

    println!("Query: {q}");
    let mut rank = 1;
    while let Some(row) = rows.next().await? {
        let label: String = row.get(1)?;
        let dist: f64 = row.get(2)?;
        println!("  {rank}. {label:<10} dist={dist:.4}");
        rank += 1;
    }
    println!();
    Ok(())
}
}

Run three queries, each probing one of the three clusters from the §7 dataset:

#![allow(unused)]
fn main() {
// Animal cluster
knn_query(&conn, &[0.85, 0.15, 0.25], 3).await?;

// Vehicle cluster
knn_query(&conn, &[0.15, 0.85, 0.15], 3).await?;

// Language cluster
knn_query(&conn, &[0.1, 0.05, 0.92], 3).await?;
}

Expected output (exact distances depend on floating-point precision):

Query: [0.85,0.15,0.25]
  1. cat        dist=0.0023
  2. dog        dist=0.0089
  3. python     dist=0.1834

Query: [0.15,0.85,0.15]
  1. car        dist=0.0006
  2. truck      dist=0.0030
  3. cat        dist=0.3885

Query: [0.1,0.05,0.92]
  1. rust       dist=0.0003
  2. python     dist=0.0016
  3. dog        dist=0.2197

Each query correctly identifies the two items in its target cluster as the closest matches. The third result is always from a different cluster with a noticeably larger distance.

For the 6-row dataset used in these exercises, vector_top_k falls back to exact search — the HNSW index has too few nodes to offer a meaningful shortcut, so the algorithm examines every vector. The results are identical to brute-force KNN.

At scale — millions of rows — vector_top_k returns approximate results. The HNSW index navigates the graph greedily, which means some true nearest neighbours may be missed if they are poorly connected in the graph. This is the recall-vs-speed trade-off discussed in §5: the index answers queries in milliseconds instead of seconds, but recall@k is typically ~0.95 rather than 1.0.

vector_distance_cos, by contrast, always gives the exact cosine distance for any specific pair of vectors. It is a pure computation with no approximation. The approximation lives only in the selection of which candidates to evaluate — that is the job of the index.

In practice this means: trust vector_top_k for fast retrieval, but understand that at scale a small fraction of true nearest neighbours may not appear in the result set. If perfect recall is required, you can increase the index’s ef_search parameter (when exposed by the engine) or fall back to brute-force search over a filtered subset.

Reference Solution

Show full solution

Cargo.toml (dependencies only):

[dependencies]
libsql = "0.9"
tokio = { version = "1", features = ["full"] }
serde_json = "1"

src/main.rs:

use libsql::{Builder, Database};

/// Convert a float slice to a JSON array string for sqlite-vec's `vector()` function.
fn vec_to_json(v: &[f32]) -> String {
    format!(
        "[{}]",
        v.iter()
            .map(|x| x.to_string())
            .collect::<Vec<_>>()
            .join(",")
    )
}

/// Run a KNN query and print the top-k results with labels and distances.
async fn knn_query(
    conn: &libsql::Connection,
    query: &[f32],
    k: i32,
) -> Result<(), Box<dyn std::error::Error>> {
    let q = vec_to_json(query);
    let mut rows = conn
        .query(
            "SELECT items.id, items.label, vector_distance_cos(items.embedding, vector(?)) AS dist
             FROM vector_top_k('items', vector(?), ?) AS knn
             JOIN items ON items.rowid = knn.rowid
             ORDER BY dist ASC",
            libsql::params![q.clone(), q.clone(), k],
        )
        .await?;

    println!("Query: {q}");
    let mut rank = 1;
    while let Some(row) = rows.next().await? {
        let label: String = row.get(1)?;
        let dist: f64 = row.get(2)?;
        println!("  {rank}. {label:<10} dist={dist:.4}");
        rank += 1;
    }
    println!();
    Ok(())
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // --- Open database ---
    let db: Database = Builder::new_local("vectors.db").build().await?;
    let conn = db.connect()?;

    // Verify connection
    let mut rows = conn.query("SELECT sqlite_version()", ()).await?;
    if let Some(row) = rows.next().await? {
        let version: String = row.get(0)?;
        println!("SQLite version: {version}");
    }

    // --- Create table (from §6) ---
    conn.execute(
        "CREATE TABLE IF NOT EXISTS items (
             id        INTEGER PRIMARY KEY,
             label     TEXT NOT NULL,
             embedding F32_BLOB(3) NOT NULL
         )",
        (),
    )
    .await?;

    // --- Create HNSW index (from §6) ---
    conn.execute(
        "CREATE INDEX IF NOT EXISTS items_vec_idx
             ON items (embedding)
             USING libsql_vector_idx(embedding)",
        (),
    )
    .await?;

    println!("Database ready.");

    // --- Insert 6 labelled vectors (from §7) ---
    let data: Vec<(i64, &str, Vec<f32>)> = vec![
        (1, "cat",    vec![0.9,  0.1,  0.2]),
        (2, "dog",    vec![0.8,  0.2,  0.3]),
        (3, "car",    vec![0.1,  0.9,  0.1]),
        (4, "truck",  vec![0.2,  0.8,  0.2]),
        (5, "python", vec![0.15, 0.1,  0.95]),
        (6, "rust",   vec![0.1,  0.05, 0.9]),
    ];

    for (id, label, embedding) in &data {
        conn.execute(
            "INSERT OR IGNORE INTO items (id, label, embedding) VALUES (?, ?, vector(?))",
            libsql::params![*id, *label, vec_to_json(embedding)],
        )
        .await?;
    }
    println!("Inserted {} rows.", data.len());

    // --- KNN queries ---
    // Animal cluster
    knn_query(&conn, &[0.85, 0.15, 0.25], 3).await?;

    // Vehicle cluster
    knn_query(&conn, &[0.15, 0.85, 0.15], 3).await?;

    // Language cluster
    knn_query(&conn, &[0.1, 0.05, 0.92], 3).await?;

    Ok(())
}

Part 4 — Real Applications

9. Generating Embeddings in Rust

Before you can search by meaning, you need a way to convert text into vectors. This section covers two approaches available in Rust: running a local embedding model with fastembed-rs (no API key, works offline, suited for smaller models) and calling an HTTP embedding API such as the OpenAI Embeddings endpoint (larger, higher-quality models at the cost of latency and a network dependency).

Option A — fastembed-rs (local, recommended for exercises). The fastembed crate wraps ONNX Runtime and ships pre-trained sentence-transformer models. No API key is required, it works fully offline after the first run, inference is CPU-only, and results are deterministic — all properties that make it ideal for the exercises in §10–§12. Add it to your project:

fastembed = "4"

The default model is BGE-Small-EN-v1.5, which produces 384-dimensional vectors. On first use, the model weights (~130 MB) are downloaded to ~/.cache/huggingface/hub/ and reused from there on subsequent runs. Here is the minimal code to embed two strings:

#![allow(unused)]
fn main() {
use fastembed::{TextEmbedding, InitOptions, EmbeddingModel};

let model = TextEmbedding::try_new(
    InitOptions::new(EmbeddingModel::BGESmallENV15)
        .with_show_download_progress(true),
)?;

let docs = vec!["hello world", "Rust is fast"];
let embeddings: Vec<Vec<f32>> = model.embed(docs, None)?;
// embeddings[0].len() == 384
}

Batch embedding matters. Passing multiple strings in a single model.embed() call is significantly more efficient than embedding one string at a time, because the runtime can batch tensor operations. Always collect your corpus into a Vec and embed it in one shot rather than looping with individual calls.

Option B — HTTP API (OpenAI-compatible). When you need a specific production-grade model — or your deployment already relies on an external embeddings service — you can call an OpenAI-compatible endpoint instead. You will need three additional crates:

reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"

Define request and response types that match the API schema:

#![allow(unused)]
fn main() {
#[derive(serde::Serialize)]
struct EmbedRequest {
    model: String,
    input: Vec<String>,
}

#[derive(serde::Deserialize)]
struct EmbedResponse {
    data: Vec<EmbedData>,
}

#[derive(serde::Deserialize)]
struct EmbedData {
    embedding: Vec<f32>,
}

async fn embed_texts(texts: Vec<String>) -> anyhow::Result<Vec<Vec<f32>>> {
    let api_key = std::env::var("OPENAI_API_KEY")?;
    let client = reqwest::Client::new();
    let res: EmbedResponse = client
        .post("https://api.openai.com/v1/embeddings")
        .bearer_auth(&api_key)
        .json(&EmbedRequest {
            model: "text-embedding-3-small".into(),
            input: texts,
        })
        .send()
        .await?
        .json()
        .await?;
    Ok(res.data.into_iter().map(|d| d.embedding).collect())
}
}

Choosing between them. For the remaining exercises in this course (§10–§12), use fastembed. It requires no API key, has no network dependency, and produces deterministic results — which means your assertions will be stable across runs. Inference is sub-100 ms per batch on a modern CPU, more than fast enough for the dataset sizes used here. Reach for the HTTP approach when you need a specific production-grade model, when your application already communicates with an embeddings service, or when you need multilingual support beyond what the local models offer.

Dimensionality note. The F32_BLOB(d) column type you define in your schema must match the model’s output dimension exactly — you cannot mix dimensions within a single column. The toy examples in §6–§8 used F32_BLOB(3) for hand-written 3-D vectors. Now that you are working with real models, change that declaration to F32_BLOB(384) for BGE-Small-EN-v1.5, F32_BLOB(768) for all-MiniLM-L6-v2, or F32_BLOB(1536) for OpenAI’s text-embedding-3-small. If you change the dimension of an existing column, you must drop and recreate both the column and its associated vector index — sqlite-vec cannot reindex vectors whose dimensions have changed.


Goal: Embed a corpus of 15 short text passages with fastembed-rs, store the embeddings in Turso, then accept a natural-language query, embed it, and return the top-5 most semantically relevant passages — with no keyword matching.

Setup

Create a new project (or extend your existing vec-demo crate). Your Cargo.toml dependencies:

[dependencies]
libsql = "0.9"
fastembed = "4"
tokio = { version = "1", features = ["full"] }
serde_json = "1"

The table schema uses F32_BLOB(384) because BGE-Small-EN-v1.5 produces 384-dimensional embeddings:

CREATE TABLE IF NOT EXISTS docs (
    id        INTEGER PRIMARY KEY,
    passage   TEXT NOT NULL,
    embedding F32_BLOB(384) NOT NULL
)

Corpus

Use these 15 passages spanning three topics.

Rust programming (5):

  1. “Rust uses an ownership system to guarantee memory safety without a garbage collector.”
  2. “The borrow checker enforces that references do not outlive the data they point to.”
  3. “Cargo is Rust’s build system and package manager, used to manage dependencies and run tests.”
  4. “Rust’s trait system enables zero-cost abstractions and compile-time polymorphism.”
  5. “Async Rust uses futures and the tokio runtime to handle concurrent I/O efficiently.”

Astronomy (5):

  1. “A black hole is a region of spacetime where gravity is so strong that nothing can escape.”
  2. “The Milky Way galaxy contains an estimated 100 to 400 billion stars.”
  3. “Neutron stars are the collapsed cores of massive stars, with densities exceeding atomic nuclei.”
  4. “The cosmic microwave background is the thermal radiation left over from the early universe.”
  5. “Exoplanets are planets outside our solar system, detected via transit photometry or radial velocity.”

Cooking (5):

  1. “Maillard reaction gives browned foods their distinctive flavour through amino acid and sugar reactions.”
  2. “Sous vide cooking involves sealing food in vacuum bags and cooking at precise low temperatures.”
  3. “Emulsification combines two immiscible liquids, such as oil and water, using an emulsifier like lecithin.”
  4. “Fermentation converts sugars to acids or alcohol using microorganisms, used in bread, beer, and yogurt.”
  5. “Knife skills — julienne, brunoise, chiffonade — determine the surface area and cooking time of vegetables.”

Step 1 — Embed the corpus

Use fastembed::TextEmbedding with the default model (BGE-Small-EN-v1.5) to embed all 15 passages in a single model.embed() call. This returns a Vec<Vec<f32>> — one 384-dimensional vector per passage.

#![allow(unused)]
fn main() {
use fastembed::{TextEmbedding, InitOptions, EmbeddingModel};

let model = TextEmbedding::try_new(InitOptions {
    model_name: EmbeddingModel::BGESmallENV15,
    show_download_progress: true,
    ..Default::default()
})?;

let embeddings = model.embed(passages.clone(), None)?;
}

Step 2 — Insert into Turso

Loop over the passages and their corresponding embeddings. Convert each Vec<f32> to a JSON string so it can be passed to the vector(?) SQL function. Use INSERT OR IGNORE so re-runs are idempotent.

#![allow(unused)]
fn main() {
fn vec_to_json(v: &[f32]) -> String {
    let parts: Vec<String> = v.iter().map(|x| format!("{x}")).collect();
    format!("[{}]", parts.join(","))
}

for (i, (passage, emb)) in passages.iter().zip(embeddings.iter()).enumerate() {
    let json = vec_to_json(emb);
    conn.execute(
        "INSERT OR IGNORE INTO docs (id, passage, embedding) VALUES (?, ?, vector(?))",
        libsql::params![i as i64, passage.as_str(), json.as_str()],
    )
    .await?;
}
}

Embed the query string the same way you embedded the corpus — using model.embed() with a single-element slice. Then run vector_top_k('docs_idx', vector(?), 5) and join back to the docs table to retrieve the passage text and cosine distance.

#![allow(unused)]
fn main() {
let query = "memory safety in systems programming";
let q_emb = model.embed(vec![query.to_string()], None)?;
let q_json = vec_to_json(&q_emb[0]);

let mut rows = conn
    .query(
        "SELECT d.passage, v.distance
         FROM vector_top_k('docs_idx', vector(?), 5) AS v
         JOIN docs AS d ON d.rowid = v.id
         ORDER BY v.distance",
        libsql::params![q_json.as_str()],
    )
    .await?;
}

Step 4 — Run three queries and verify

Run the following queries and confirm the results cluster by topic:

QueryExpected top results
"memory safety in systems programming"Rust passages
"stars and galaxies"Astronomy passages
"fermentation and cooking techniques"Cooking passages

Print each result ranked by distance, showing the passage text and the cosine distance score:

#![allow(unused)]
fn main() {
let queries = vec![
    "memory safety in systems programming",
    "stars and galaxies",
    "fermentation and cooking techniques",
];

for query in &queries {
    println!("\n=== Query: \"{query}\" ===\n");
    let q_emb = model.embed(vec![query.to_string()], None)?;
    let q_json = vec_to_json(&q_emb[0]);

    let mut rows = conn
        .query(
            "SELECT d.passage, v.distance
             FROM vector_top_k('docs_idx', vector(?), 5) AS v
             JOIN docs AS d ON d.rowid = v.id
             ORDER BY v.distance",
            libsql::params![q_json.as_str()],
        )
        .await?;

    let mut rank = 1;
    while let Some(row) = rows.next().await? {
        let passage: String = row.get(0)?;
        let distance: f64 = row.get(1)?;
        println!("  {rank}. [{distance:.4}] {passage}");
        rank += 1;
    }
}
}

Reference Solution

Show full solution
// src/main.rs — Semantic Document Search (Exercise 3)

use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
use libsql::Builder;

fn vec_to_json(v: &[f32]) -> String {
    let parts: Vec<String> = v.iter().map(|x| format!("{x}")).collect();
    format!("[{}]", parts.join(","))
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // ── 1. Connect to Turso (local file) ──
    let db = Builder::new_local("semantic_search.db").build().await?;
    let conn = db.connect()?;

    // ── 2. Create the docs table ──
    conn.execute(
        "CREATE TABLE IF NOT EXISTS docs (
            id        INTEGER PRIMARY KEY,
            passage   TEXT NOT NULL,
            embedding F32_BLOB(384) NOT NULL
        )",
        (),
    )
    .await?;

    // ── 3. Create the vector index ──
    conn.execute(
        "CREATE INDEX IF NOT EXISTS docs_idx ON docs (libsql_vector_idx(embedding))",
        (),
    )
    .await?;

    // ── 4. Define the corpus ──
    let passages: Vec<String> = vec![
        // Rust programming
        "Rust uses an ownership system to guarantee memory safety without a garbage collector.",
        "The borrow checker enforces that references do not outlive the data they point to.",
        "Cargo is Rust's build system and package manager, used to manage dependencies and run tests.",
        "Rust's trait system enables zero-cost abstractions and compile-time polymorphism.",
        "Async Rust uses futures and the tokio runtime to handle concurrent I/O efficiently.",
        // Astronomy
        "A black hole is a region of spacetime where gravity is so strong that nothing can escape.",
        "The Milky Way galaxy contains an estimated 100 to 400 billion stars.",
        "Neutron stars are the collapsed cores of massive stars, with densities exceeding atomic nuclei.",
        "The cosmic microwave background is the thermal radiation left over from the early universe.",
        "Exoplanets are planets outside our solar system, detected via transit photometry or radial velocity.",
        // Cooking
        "Maillard reaction gives browned foods their distinctive flavour through amino acid and sugar reactions.",
        "Sous vide cooking involves sealing food in vacuum bags and cooking at precise low temperatures.",
        "Emulsification combines two immiscible liquids, such as oil and water, using an emulsifier like lecithin.",
        "Fermentation converts sugars to acids or alcohol using microorganisms, used in bread, beer, and yogurt.",
        "Knife skills — julienne, brunoise, chiffonade — determine the surface area and cooking time of vegetables.",
    ]
    .into_iter()
    .map(String::from)
    .collect();

    // ── 5. Embed the corpus ──
    let model = TextEmbedding::try_new(InitOptions {
        model_name: EmbeddingModel::BGESmallENV15,
        show_download_progress: true,
        ..Default::default()
    })?;

    let embeddings = model.embed(passages.clone(), None)?;

    // ── 6. Insert passages + embeddings ──
    for (i, (passage, emb)) in passages.iter().zip(embeddings.iter()).enumerate() {
        let json = vec_to_json(emb);
        conn.execute(
            "INSERT OR IGNORE INTO docs (id, passage, embedding) VALUES (?, ?, vector(?))",
            libsql::params![i as i64, passage.as_str(), json.as_str()],
        )
        .await?;
    }

    println!("Inserted {} passages.\n", passages.len());

    // ── 7. Run three queries ──
    let queries = vec![
        "memory safety in systems programming",
        "stars and galaxies",
        "fermentation and cooking techniques",
    ];

    for query in &queries {
        println!("=== Query: \"{query}\" ===\n");
        let q_emb = model.embed(vec![query.to_string()], None)?;
        let q_json = vec_to_json(&q_emb[0]);

        let mut rows = conn
            .query(
                "SELECT d.passage, v.distance
                 FROM vector_top_k('docs_idx', vector(?), 5) AS v
                 JOIN docs AS d ON d.rowid = v.id
                 ORDER BY v.distance",
                libsql::params![q_json.as_str()],
            )
            .await?;

        let mut rank = 1;
        while let Some(row) = rows.next().await? {
            let passage: String = row.get(0)?;
            let distance: f64 = row.get(1)?;
            println!("  {rank}. [{distance:.4}] {passage}");
            rank += 1;
        }
        println!();
    }

    Ok(())
}

11. Exercise 4 — Recommendation Engine

Goal: Build an item-based recommendation engine. Store item feature vectors in Turso, then given a target item, find the k most similar items using KNN and exclude the query item from the results.

We will use hand-crafted 5-dimensional feature vectors for a product catalogue (no fastembed dependency — this keeps the focus on the recommendation logic itself). The five dimensions represent affinity scores for: [electronics, clothing, sports, food, books].

Catalogue (10 items):

idnameembedding
1Laptop[0.95, 0.0, 0.1, 0.0, 0.2]
2Mechanical Keyboard[0.85, 0.0, 0.0, 0.0, 0.1]
3USB-C Hub[0.9, 0.0, 0.0, 0.0, 0.0]
4Running Shoes[0.0, 0.6, 0.9, 0.0, 0.0]
5Yoga Mat[0.0, 0.2, 0.95, 0.0, 0.0]
6Water Bottle[0.1, 0.1, 0.7, 0.0, 0.0]
7T-Shirt[0.0, 0.95, 0.1, 0.0, 0.0]
8Cookbook[0.0, 0.0, 0.0, 0.6, 0.9]
9Protein Bar[0.0, 0.0, 0.3, 0.95, 0.0]
10Novel[0.0, 0.0, 0.0, 0.1, 0.95]

Step 1 — Schema

Create a products table and an HNSW vector index:

#![allow(unused)]
fn main() {
conn.execute(
    "CREATE TABLE IF NOT EXISTS products (
        id    INTEGER PRIMARY KEY,
        name  TEXT NOT NULL,
        embedding F32_BLOB(5) NOT NULL
    )",
    (),
)
.await?;

conn.execute(
    "CREATE INDEX IF NOT EXISTS products_idx
     ON products (libsql_vector_idx(embedding))",
    (),
)
.await?;
}

Step 2 — Insert items

Use the same pattern as Exercise 1: format each Vec<f32> as a JSON array string and insert with INSERT OR IGNORE:

#![allow(unused)]
fn main() {
let products: Vec<(i64, &str, Vec<f32>)> = vec![
    (1,  "Laptop",              vec![0.95, 0.0,  0.1,  0.0,  0.2]),
    (2,  "Mechanical Keyboard", vec![0.85, 0.0,  0.0,  0.0,  0.1]),
    (3,  "USB-C Hub",           vec![0.9,  0.0,  0.0,  0.0,  0.0]),
    (4,  "Running Shoes",       vec![0.0,  0.6,  0.9,  0.0,  0.0]),
    (5,  "Yoga Mat",            vec![0.0,  0.2,  0.95, 0.0,  0.0]),
    (6,  "Water Bottle",        vec![0.1,  0.1,  0.7,  0.0,  0.0]),
    (7,  "T-Shirt",             vec![0.0,  0.95, 0.1,  0.0,  0.0]),
    (8,  "Cookbook",             vec![0.0,  0.0,  0.0,  0.6,  0.9]),
    (9,  "Protein Bar",         vec![0.0,  0.0,  0.3,  0.95, 0.0]),
    (10, "Novel",               vec![0.0,  0.0,  0.0,  0.1,  0.95]),
];

for (id, name, emb) in &products {
    let emb_json = serde_json::to_string(emb)?;
    conn.execute(
        "INSERT OR IGNORE INTO products (id, name, embedding)
         VALUES (?, ?, vector(?))",
        libsql::params![*id, *name, emb_json.as_str()],
    )
    .await?;
}
}

Step 3 — Recommend function

Write a helper that retrieves recommendations for a given item:

#![allow(unused)]
fn main() {
async fn recommend(
    conn: &libsql::Connection,
    item_id: i64,
    k: usize,
) -> Result<Vec<(String, f64)>, Box<dyn std::error::Error>> {
    // 1. Get the query item's embedding as a JSON string.
    let mut stmt = conn
        .prepare("SELECT vector_extract(embedding) FROM products WHERE id = ?")
        .await?;
    let mut rows = stmt.query(libsql::params![item_id]).await?;
    let row = rows
        .next()
        .await?
        .ok_or("item not found")?;
    let query_vec: String = row.get(0)?;

    // 2. Use vector_top_k with k+1 to leave room for the query item itself.
    let sql = format!(
        "SELECT products.id, products.name,
                vector_distance_cos(products.embedding, vector(?1)) AS distance
         FROM vector_top_k('products_idx', ?1, {limit})
         JOIN products ON products.rowid = id
         WHERE products.id != ?2
         ORDER BY distance
         LIMIT ?3",
        limit = k + 1
    );
    let mut stmt = conn.prepare(&sql).await?;
    let mut rows = stmt
        .query(libsql::params![query_vec.as_str(), item_id, k as i64])
        .await?;

    // 3. Collect (name, distance) pairs.
    let mut results = Vec::new();
    while let Some(row) = rows.next().await? {
        let name: String = row.get(1)?;
        let distance: f64 = row.get(2)?;
        results.push((name, distance));
    }
    Ok(results)
}
}

The key ideas:

  1. Retrieve the query vectorvector_extract returns the stored embedding as a JSON string that can be passed straight back to vector_top_k.
  2. Over-fetch by one — request k + 1 candidates because vector_top_k will return the query item itself (distance ≈ 0). The WHERE products.id != ?2 clause filters it out.
  3. Cosine distancevector_distance_cos returns a value between 0 (identical) and 2 (opposite). Lower means more similar.

Step 4 — Print recommendations

Request recommendations for three representative items and verify the clusters make sense:

#![allow(unused)]
fn main() {
let queries = vec![
    (1, "Laptop"),
    (4, "Running Shoes"),
    (8, "Cookbook"),
];

for (id, name) in &queries {
    let recs = recommend(&conn, *id, 2).await?;
    let rec_str: Vec<String> = recs
        .iter()
        .map(|(n, d)| format!("{n} ({d:.3})"))
        .collect();
    println!(
        "Customers who liked {name} also liked: {}",
        rec_str.join(", ")
    );
}
}

Expected output (distances are approximate):

Customers who liked Laptop also liked: Mechanical Keyboard (0.023), USB-C Hub (0.041)
Customers who liked Running Shoes also liked: Yoga Mat (0.019), Water Bottle (0.063)
Customers who liked Cookbook also liked: Novel (0.168), Protein Bar (0.397)
  • Laptop → electronics cluster (Mechanical Keyboard, USB-C Hub)
  • Running Shoes → sports cluster (Yoga Mat, Water Bottle)
  • Cookbook → food/books cluster (Novel, Protein Bar)
Show full solution
use libsql::Builder;

/// Find the k most similar products to the given item, excluding the item itself.
async fn recommend(
    conn: &libsql::Connection,
    item_id: i64,
    k: usize,
) -> Result<Vec<(String, f64)>, Box<dyn std::error::Error>> {
    // Retrieve the query item's embedding as a JSON string.
    let mut stmt = conn
        .prepare("SELECT vector_extract(embedding) FROM products WHERE id = ?")
        .await?;
    let mut rows = stmt.query(libsql::params![item_id]).await?;
    let row = rows.next().await?.ok_or("item not found")?;
    let query_vec: String = row.get(0)?;

    // KNN search: fetch k+1 to account for the query item appearing in its
    // own results, then filter it out.
    let sql = format!(
        "SELECT products.id, products.name,
                vector_distance_cos(products.embedding, vector(?1)) AS distance
         FROM vector_top_k('products_idx', ?1, {limit})
         JOIN products ON products.rowid = id
         WHERE products.id != ?2
         ORDER BY distance
         LIMIT ?3",
        limit = k + 1
    );
    let mut stmt = conn.prepare(&sql).await?;
    let mut rows = stmt
        .query(libsql::params![query_vec.as_str(), item_id, k as i64])
        .await?;

    let mut results = Vec::new();
    while let Some(row) = rows.next().await? {
        let name: String = row.get(1)?;
        let distance: f64 = row.get(2)?;
        results.push((name, distance));
    }
    Ok(results)
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let db = Builder::new_local(":memory:").build().await?;
    let conn = db.connect()?;

    // --- Schema ---
    conn.execute(
        "CREATE TABLE IF NOT EXISTS products (
            id    INTEGER PRIMARY KEY,
            name  TEXT NOT NULL,
            embedding F32_BLOB(5) NOT NULL
        )",
        (),
    )
    .await?;

    conn.execute(
        "CREATE INDEX IF NOT EXISTS products_idx
         ON products (libsql_vector_idx(embedding))",
        (),
    )
    .await?;

    // --- Seed data ---
    let products: Vec<(i64, &str, Vec<f32>)> = vec![
        (1,  "Laptop",              vec![0.95, 0.0,  0.1,  0.0,  0.2]),
        (2,  "Mechanical Keyboard", vec![0.85, 0.0,  0.0,  0.0,  0.1]),
        (3,  "USB-C Hub",           vec![0.9,  0.0,  0.0,  0.0,  0.0]),
        (4,  "Running Shoes",       vec![0.0,  0.6,  0.9,  0.0,  0.0]),
        (5,  "Yoga Mat",            vec![0.0,  0.2,  0.95, 0.0,  0.0]),
        (6,  "Water Bottle",        vec![0.1,  0.1,  0.7,  0.0,  0.0]),
        (7,  "T-Shirt",             vec![0.0,  0.95, 0.1,  0.0,  0.0]),
        (8,  "Cookbook",             vec![0.0,  0.0,  0.0,  0.6,  0.9]),
        (9,  "Protein Bar",         vec![0.0,  0.0,  0.3,  0.95, 0.0]),
        (10, "Novel",               vec![0.0,  0.0,  0.0,  0.1,  0.95]),
    ];

    for (id, name, emb) in &products {
        let emb_json = serde_json::to_string(emb)?;
        conn.execute(
            "INSERT OR IGNORE INTO products (id, name, embedding)
             VALUES (?, ?, vector(?))",
            libsql::params![*id, *name, emb_json.as_str()],
        )
        .await?;
    }

    // --- Recommendations ---
    let queries = vec![
        (1, "Laptop"),
        (4, "Running Shoes"),
        (8, "Cookbook"),
    ];

    for (id, name) in &queries {
        let recs = recommend(&conn, *id, 2).await?;
        let rec_str: Vec<String> = recs
            .iter()
            .map(|(n, d)| format!("{n} ({d:.3})"))
            .collect();
        println!(
            "Customers who liked {name} also liked: {}",
            rec_str.join(", ")
        );
    }

    Ok(())
}

12. Exercise 5 — Retrieval-Augmented Generation

Goal: Build a retrieval-augmented generation (RAG) pipeline that:

  1. Stores the 15-passage corpus from §10 in Turso
  2. Accepts a natural-language question
  3. Retrieves the top-3 most relevant passages using vector KNN
  4. Injects the passages into a prompt as context
  5. Sends the prompt to an OpenAI-compatible LLM API
  6. Prints the grounded answer

Setup:

[dependencies]
libsql = "0.9"
fastembed = "4"
reqwest = { version = "0.12", features = ["json"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
tokio = { version = "1", features = ["full"] }

You will need an API key stored in the OPENAI_API_KEY environment variable. This exercise works with any OpenAI-compatible provider — OpenAI itself, Groq, Together AI, or a local Ollama instance (base URL http://localhost:11434/v1, model llama3.2). Adjust the base URL and model name accordingly if you are not using OpenAI.

Step 1 — Retrieval function

Reuse the semantic search logic from §10. Write a function that embeds the query, runs a KNN search, and returns the top-k passage texts:

#![allow(unused)]
fn main() {
async fn retrieve(
    conn: &libsql::Connection,
    model: &TextEmbedding,
    query: &str,
    k: usize,
) -> Result<Vec<String>, Box<dyn std::error::Error>> {
    let q_emb = model.embed(vec![query.to_string()], None)?;
    let q_json = vec_to_json(&q_emb[0]);

    let mut rows = conn
        .query(
            "SELECT d.passage
             FROM vector_top_k('docs_idx', vector(?), ?) AS v
             JOIN docs AS d ON d.rowid = v.id
             ORDER BY v.distance",
            libsql::params![q_json.as_str(), k as i64],
        )
        .await?;

    let mut passages = Vec::new();
    while let Some(row) = rows.next().await? {
        let passage: String = row.get(0)?;
        passages.push(passage);
    }
    Ok(passages)
}
}

Step 2 — Prompt construction

Build a prompt string that instructs the model to answer using only the retrieved context:

#![allow(unused)]
fn main() {
fn build_prompt(context_passages: &[String], question: &str) -> String {
    let mut prompt = String::from(
        "You are a helpful assistant. Answer the question using only the provided context.\n\
         If the context does not contain enough information, say so.\n\n\
         Context:\n",
    );

    for passage in context_passages {
        prompt.push_str(passage);
        prompt.push_str("\n\n");
    }

    prompt.push_str(&format!("Question: {question}\n\nAnswer:"));
    prompt
}
}

Step 3 — LLM API call

POST to the chat completions endpoint. Define request and response structs with serde, then send the prompt as a user message:

#![allow(unused)]
fn main() {
#[derive(serde::Serialize)]
struct ChatRequest {
    model: String,
    messages: Vec<Message>,
}

#[derive(serde::Serialize)]
struct Message {
    role: String,
    content: String,
}

#[derive(serde::Deserialize)]
struct ChatResponse {
    choices: Vec<Choice>,
}

#[derive(serde::Deserialize)]
struct Choice {
    message: ResponseMessage,
}

#[derive(serde::Deserialize)]
struct ResponseMessage {
    content: String,
}

async fn call_llm(
    client: &reqwest::Client,
    api_key: &str,
    prompt: &str,
) -> Result<String, Box<dyn std::error::Error>> {
    let request = ChatRequest {
        model: "gpt-4o-mini".to_string(),
        messages: vec![Message {
            role: "user".to_string(),
            content: prompt.to_string(),
        }],
    };

    let resp = client
        .post("https://api.openai.com/v1/chat/completions")
        .bearer_auth(api_key)
        .json(&request)
        .send()
        .await?
        .error_for_status()?
        .json::<ChatResponse>()
        .await?;

    Ok(resp.choices[0].message.content.clone())
}
}

Step 4 — Wire it together and run

Set up the database and corpus exactly as in §10, then run three example questions that exercise each topic cluster:

#![allow(unused)]
fn main() {
let questions = vec![
    "How does Rust ensure memory safety?",
    "What is a black hole?",
    "What is the Maillard reaction?",
];

let client = reqwest::Client::new();
let api_key = std::env::var("OPENAI_API_KEY")?;

for question in &questions {
    println!("=== Question: \"{question}\" ===\n");

    let passages = retrieve(&conn, &model, question, 3).await?;

    println!("Retrieved passages:");
    for (i, p) in passages.iter().enumerate() {
        println!("  {}: {p}", i + 1);
    }
    println!();

    let prompt = build_prompt(&passages, question);
    let answer = call_llm(&client, &api_key, &prompt).await?;

    println!("Answer: {answer}\n");
}
}

Each question should pull passages from the matching cluster — Rust passages for the first, astronomy for the second, and cooking for the third. The LLM’s answer will be grounded in those passages rather than relying on its own parametric knowledge.

Step 5 — Discussion: RAG patterns

Chunk size and overlap. The 15-passage corpus used here is already conveniently pre-chunked into single sentences, but real documents are rarely so tidy. In practice, long documents are split into overlapping chunks — typically 200–500 tokens with a 50–100 token overlap between consecutive chunks. The overlap ensures that sentences near a chunk boundary are not orphaned from their surrounding context, which would hurt retrieval quality. Choosing the right chunk size is a trade-off: smaller chunks yield more precise retrieval but lose broader context, while larger chunks retain context at the cost of noisier matches.

Re-ranking. The ANN index returns approximate nearest neighbors quickly, but the ranking is based on a single embedding similarity score. A cross-encoder re-ranker — a model that takes (query, passage) pairs as input and produces a relevance score — can re-order the top-k candidates for significantly better precision. The typical pattern is to retrieve a larger set (e.g., top-20) with ANN and then re-rank to the final top-3 or top-5 with the cross-encoder.

Hybrid search. Semantic (ANN) search excels at matching meaning but can miss exact keywords, while keyword-based search (BM25) is great at exact term matching but blind to synonyms. Combining both — often called hybrid search — frequently outperforms either approach alone. A common fusion strategy is Reciprocal Rank Fusion (RRF), which merges the two ranked lists by summing the reciprocal of each result’s rank.

Context window limits. The number of passages you can inject depends on the model’s context length and the average passage length. GPT-4o-mini supports 128k tokens, but stuffing the entire context window with retrieved passages introduces noise and increases latency and cost. A good heuristic is to inject only enough passages to cover the question — typically 3 to 5 short passages or 1 to 2 longer chunks — and to place the most relevant passages first.

Reference Solution

Show full solution
// src/main.rs — Retrieval-Augmented Generation (Exercise 5)

use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
use libsql::Builder;

fn vec_to_json(v: &[f32]) -> String {
    let parts: Vec<String> = v.iter().map(|x| format!("{x}")).collect();
    format!("[{}]", parts.join(","))
}

/// Retrieve the top-k passages most relevant to `query` using vector KNN.
async fn retrieve(
    conn: &libsql::Connection,
    model: &TextEmbedding,
    query: &str,
    k: usize,
) -> Result<Vec<String>, Box<dyn std::error::Error>> {
    let q_emb = model.embed(vec![query.to_string()], None)?;
    let q_json = vec_to_json(&q_emb[0]);

    let mut rows = conn
        .query(
            "SELECT d.passage
             FROM vector_top_k('docs_idx', vector(?), ?) AS v
             JOIN docs AS d ON d.rowid = v.id
             ORDER BY v.distance",
            libsql::params![q_json.as_str(), k as i64],
        )
        .await?;

    let mut passages = Vec::new();
    while let Some(row) = rows.next().await? {
        let passage: String = row.get(0)?;
        passages.push(passage);
    }
    Ok(passages)
}

/// Build a RAG prompt that instructs the model to answer from context only.
fn build_prompt(context_passages: &[String], question: &str) -> String {
    let mut prompt = String::from(
        "You are a helpful assistant. Answer the question using only the provided context.\n\
         If the context does not contain enough information, say so.\n\n\
         Context:\n",
    );

    for passage in context_passages {
        prompt.push_str(passage);
        prompt.push_str("\n\n");
    }

    prompt.push_str(&format!("Question: {question}\n\nAnswer:"));
    prompt
}

#[derive(serde::Serialize)]
struct ChatRequest {
    model: String,
    messages: Vec<Message>,
}

#[derive(serde::Serialize)]
struct Message {
    role: String,
    content: String,
}

#[derive(serde::Deserialize)]
struct ChatResponse {
    choices: Vec<Choice>,
}

#[derive(serde::Deserialize)]
struct Choice {
    message: ResponseMessage,
}

#[derive(serde::Deserialize)]
struct ResponseMessage {
    content: String,
}

/// Send the prompt to an OpenAI-compatible chat completions API.
async fn call_llm(
    client: &reqwest::Client,
    api_key: &str,
    prompt: &str,
) -> Result<String, Box<dyn std::error::Error>> {
    let request = ChatRequest {
        model: "gpt-4o-mini".to_string(),
        messages: vec![Message {
            role: "user".to_string(),
            content: prompt.to_string(),
        }],
    };

    let resp = client
        .post("https://api.openai.com/v1/chat/completions")
        .bearer_auth(api_key)
        .json(&request)
        .send()
        .await?
        .error_for_status()?
        .json::<ChatResponse>()
        .await?;

    Ok(resp.choices[0].message.content.clone())
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // ── 1. Connect to Turso (local file) ──
    let db = Builder::new_local("rag_search.db").build().await?;
    let conn = db.connect()?;

    // ── 2. Create the docs table ──
    conn.execute(
        "CREATE TABLE IF NOT EXISTS docs (
            id        INTEGER PRIMARY KEY,
            passage   TEXT NOT NULL,
            embedding F32_BLOB(384) NOT NULL
        )",
        (),
    )
    .await?;

    // ── 3. Create the vector index ──
    conn.execute(
        "CREATE INDEX IF NOT EXISTS docs_idx ON docs (libsql_vector_idx(embedding))",
        (),
    )
    .await?;

    // ── 4. Define the corpus ──
    let passages: Vec<String> = vec![
        // Rust programming
        "Rust uses an ownership system to guarantee memory safety without a garbage collector.",
        "The borrow checker enforces that references do not outlive the data they point to.",
        "Cargo is Rust's build system and package manager, used to manage dependencies and run tests.",
        "Rust's trait system enables zero-cost abstractions and compile-time polymorphism.",
        "Async Rust uses futures and the tokio runtime to handle concurrent I/O efficiently.",
        // Astronomy
        "A black hole is a region of spacetime where gravity is so strong that nothing can escape.",
        "The Milky Way galaxy contains an estimated 100 to 400 billion stars.",
        "Neutron stars are the collapsed cores of massive stars, with densities exceeding atomic nuclei.",
        "The cosmic microwave background is the thermal radiation left over from the early universe.",
        "Exoplanets are planets outside our solar system, detected via transit photometry or radial velocity.",
        // Cooking
        "Maillard reaction gives browned foods their distinctive flavour through amino acid and sugar reactions.",
        "Sous vide cooking involves sealing food in vacuum bags and cooking at precise low temperatures.",
        "Emulsification combines two immiscible liquids, such as oil and water, using an emulsifier like lecithin.",
        "Fermentation converts sugars to acids or alcohol using microorganisms, used in bread, beer, and yogurt.",
        "Knife skills — julienne, brunoise, chiffonade — determine the surface area and cooking time of vegetables.",
    ]
    .into_iter()
    .map(String::from)
    .collect();

    // ── 5. Embed the corpus ──
    let model = TextEmbedding::try_new(InitOptions {
        model_name: EmbeddingModel::BGESmallENV15,
        show_download_progress: true,
        ..Default::default()
    })?;

    let embeddings = model.embed(passages.clone(), None)?;

    // ── 6. Insert passages + embeddings ──
    for (i, (passage, emb)) in passages.iter().zip(embeddings.iter()).enumerate() {
        let json = vec_to_json(emb);
        conn.execute(
            "INSERT OR IGNORE INTO docs (id, passage, embedding) VALUES (?, ?, vector(?))",
            libsql::params![i as i64, passage.as_str(), json.as_str()],
        )
        .await?;
    }

    println!("Inserted {} passages.\n", passages.len());

    // ── 7. RAG pipeline ──
    let api_key = std::env::var("OPENAI_API_KEY")?;
    let client = reqwest::Client::new();

    let questions = vec![
        "How does Rust ensure memory safety?",
        "What is a black hole?",
        "What is the Maillard reaction?",
    ];

    for question in &questions {
        println!("=== Question: \"{question}\" ===\n");

        let context = retrieve(&conn, &model, question, 3).await?;

        println!("Retrieved passages:");
        for (i, p) in context.iter().enumerate() {
            println!("  {}: {p}", i + 1);
        }
        println!();

        let prompt = build_prompt(&context, question);
        let answer = call_llm(&client, &api_key, &prompt).await?;

        println!("Answer: {answer}\n");
    }

    Ok(())
}