Building Semantic Suggested Articles for a Static Blog (and How To Visualize Embeddings)

25 Sep 2025 - tsp
Last update 26 Sep 2025
Reading time 37 mins

Semantic search has become a key ingredient in making websites more accessible and discoverable. Instead of matching keywords, semantic search interprets meaning: it retrieves content that is similar in meaning to a query or another document. This article explains how I built an automatic “Similar Articles” recommender for a static blog built with Jekyll (the approach is generator‑agnostic) - the result of the recommender you can see on every article page on this blog. I will go from the basics of embeddings and cosine similarity, through PostgreSQL with pgvector, to the actual pipeline and integration with Jekyll.

From Keywords to Semantics
- Cosine Similarity
Visualizing Embeddings
Choosing the Right Database Backend
The Pipeline
Jekyll Integration
Configuration
Using the blogsimi CLI
Conclusion

An example set of recommendations generated by the script that is described in this article

The above image shows an example set of recommendations generated by the script that is described in this article

From Keywords to Semantics

Traditional search engines operated on keywords for a very long time. A query like “neural networks” will only match articles containing those exact words. Actually they perform of course stopword removal and then utilize triplets of words, assign them statistical weights and so on - but in essence it’s just matching keywords. Semantic search, in contrast, projects texts into a high‑dimensional vector space where semantically similar texts lie close together. This is achieved using embeddings - dense high dimensional numerical representations of meaning. We have looked into those before.

Cosine Similarity

Lets quickly recall cosine similarity - one of the most common ways to measure similarity between embeddings. Each text embedding is a vector in an N‑dimensional space. Cosine similarity measures the angle between two vectors: the smaller the angle, the more similar the texts.

Visualizing Embeddings

Before describing the implementation we can ask ourself how this actually looks like. To demonstrate what embedding vectors are and what you can actually do with them, here I render interactive visualizations of this blog’s data. These views let the reader pan, tilt, zoom, and rotate the projected point cloud, and they can also choose which principal vectors to display. When JavaScript is enabled the display is fully interactive; if it is disabled we fall back to static images.

I created two datasets for this: one in which every so called chunk is shown and clustered into 24 clusters and one in which only the article centroids are shown and clustered into 16 groups. In both cases I applied k‑means clustering after dimensionality reduction so that the reader can immediately see how groups of semantically related content emerge. Hovering the cursor over any point reveals which article it corresponds to, so the semantic relation between posts becomes directly tangible.

The choice of 24 clusters for the chunk‑level view and 16 clusters for the article‑level view is deliberate: the larger number captures the greater variability between fine‑grained text fragments, while the smaller number emphasizes broader themes when only article centroids are shown. These values were determined experimentally and provide a balance between readability and resolution - for a practical application you can define metrics and methods like the gap statistics method to determine the optimal number of clusters, this has not been done here. In this way the visualization goes beyond a static scatter plot and turns the abstract mathematics of embeddings into an intuitive, explorable landscape of the blogs content.

Chunks (2D projection, main principal components)

Chunks (3D projection, selectable principal components)

Articles (2D projection, main principal components)

Articles (3D projection, selectable principal components)

Choosing the Right Database Backend

There are many ways to store and query embeddings, and each option comes with its own strengths and trade offs.

PostgreSQL with pgvector

PostgreSQL is a general purpose relational database that has proven itself as extremely stable and standards‑conformant. With the pgvector extension it can natively store embedding vectors and perform similarity queries. Because it is still a full SQL database, you can combine semantic queries with relational joins, full‑text search, and even graph queries (for example via Apache AGE). This makes it an excellent choice if embeddings are only one part of a larger system that also needs traditional relational data management or graphs.

PostgreSQL has been in continuous development since 1986 (originally as the ```POSTGRES`` project at the University of California, Berkeley) and saw its first public release in the mid‑1990s. It is widely regarded as one of the most stable and mature open‑source databases. The project is actively maintained by a large global community and commercial contributors, with major releases coming regularly every year. PostgreSQL is released under the permissive PostgreSQL License, a variant of the MIT license, which makes it free and open source for both academic and commercial use. This long history and liberal license are key reasons why it is trusted for critical applications across industries.

Chroma

Chroma is a purpose‑built vector database written in Python. It is very easy to integrate into machine learning workflows and prototypes. Developers can quickly spin up a local instance, insert embeddings, and query them with minimal code. The downside is that it is less suited for heavy production workloads, and support on less common platforms (such as FreeBSD) can be problematic. Still, it shines in research environments and small projects where fast iteration matters more than long‑term operational stability.

Chroma is a relatively young project, having emerged in the 2020s, and is under active development with a fast‑moving feature set. It is released under the Apache 2.0 open‑source license. While this makes it attractive for experimentation and integration into AI projects, the rapid development pace means that breaking changes and evolving APIs should be expected.

Faiss

Faiss (Facebook AI Similarity Search) is a C++ library with Python bindings designed for extremely fast nearest‑neighbor searches in high‑dimensional vector spaces. It offers GPU acceleration and supports a wide range of indexing strategies (IVF, HNSW, product quantization, etc.), making it ideal for very large datasets where performance is critical. However, Faiss is a library, not a database - so you need to build your own storage, persistence, and metadata layers around it.

Faiss was first released by Facebook AI Research (FAIR) in 2017 and has since become one of the standard tools for large‑scale similarity search. It is released under the MIT license, is stable and widely used in both academic and industrial settings, and continues to be actively maintained and extended by the open‑source community.

The choice for this project

In my case I chose PostgreSQL with pgvector. This was rooted mainly because I want more than just vector search: I want standard SQL queries, relational joins, metadata storage, and overall database stability. At least in other projects. And on top of that stability and reliability as well as platform independence are some key features for me so the choice was easy.

The Pipeline

Gathering and Indexing Data

My tool iterates over all rendered HTML files generated by Jekyll, which are usually stored in the _site/ directory after a jekyll build command. For each file it first extracts the main content using BeautifulSoup, deliberately ignoring navigation, sidebars, ads or contact information so that only real article text is processed. This approach not only reduces unnecessary data but also makes it easier to detect whether an articles content has truly changed, as opposed to layout or metadata adjustments.

The extracted HTML is then converted into Markdown, a format that is simpler to handle and more natural for large language models and embedding transformers. The markdown text is split into overlapping chunks of roughly 400 tokens each, with an overlap of about 100 tokens. This chunking is necessary because embedding models typically work best with limited context sizes, and the overlap helps compensate for random sentence cuts at the edges of chunks.

For every chunk, an embedding vector is generated using an transformer. These embeddings capture the semantic meaning of the text and are later used for similarity comparisons. To avoid unnecessary recomputation, the system computes a SHA hash of each file’s content and only regenerates embeddings when the actual content has changed.

Below are the core code fragments that implement this pipeline setup and extraction logic:

# Defaults relevant to crawling and chunking
DEFAULTS = {
    "site_root": "_site",
    "exclude_globs": ["tags/**", "drafts/**", "private/**", "admin/**"],
    "content_ids": ["content"],
    "chunk": {"max_tokens": 800, "overlap_tokens": 80},
}

def select_content_element(soup, ids_or_classes):
    """Return inner HTML of the first element matched by the provided list.
    Each entry can be an id, class, or any CSS selector."""
    for key in ids_or_classes:
        key = (key or "").strip()
        if not key:
            continue
        if key.startswith(("#", ".")) or any(ch in key for ch in " >:+~[]"):
            el = soup.select_one(key)
            if el:
                return el.decode_contents()
        el = soup.find(id=key) or soup.find(class_=key)
        if el:
            return el.decode_contents()
    return None

def html_to_markdown(inner_html: str) -> str:
    try:
        import markdownify
        return markdownify.markdownify(inner_html, heading_style="ATX")
    except Exception:
        soup = BeautifulSoup(inner_html, "lxml")
        for a in soup.find_all("a"):
            a.replace_with(f"{a.get_text(strip=True)} ({a.get('href','')})")
        return soup.get_text("
", strip=True)

def chunk_markdown(md: str, max_tokens=800, overlap=80):
    words = md.split()
    approx_ratio = 1/1.3
    max_words = int(max_tokens * approx_ratio)
    overlap_words = int(overlap * approx_ratio)
    out, i = [], 0
    while i < len(words):
        j = min(len(words), i + max_words)
        out.append(" ".join(words[i:j]))
        if j == len(words):
            break
        i = max(0, j - overlap_words)
    return out

Generating Embeddings

There are two embedding providers that my script currently supports, each with its own story and trade‑offs.

Ollama is the default in my setup. It allows me to run models such as mxbai‑embed‑large locally through a simple REST API. This makes deployment straightforward, as the only Python dependency is the requests library. Under the hood, Ollama is capable of GPU acceleration, and because it exposes a network API, it is possible to distribute embedding generation across multiple machines in a cluster. Ollama is a fairly young project, but it has quickly become popular because it lowers the entry barrier to running strong embedding (and large language) models without the need to integrate a whole machine learning framework directly into your code. For me, the decisive advantages are that it avoids any external costs (remember that even decent cost adds up very fast especially as soon as you have automated tasks), it scales out to multiple indexers, and it remains extremly simple to operate.

OpenAI, on the other hand, provides a cloud‑based embedding service as part of their API platform offerings. These embeddings are generated with high‑quality transformer models - though the quality is comparable for most applications to what we can achieve with Ollama. Even on very powerful GPUs there may be a small performance edge, but in practice this is usually negated by the latency of sending requests over the internet. The most important difference is the cost structure: pricing is per token, which can accumulate quickly if you need to embed an entire large corpus (you will not notice for a small blog even if you regenerate the index every day for sure). In addition, using OpenAI ties your pipeline to an external service, which can be acceptable for some use cases but runs counter to my preference for self‑hosting and independence - also when you think from point of view that you may build databases that should last for decades or even longer, then you cannot rely on an external cloud service to keep operating. In my opinion the advantage of using OpenAIs embeddings is neglectable for this application - of course in contrast to the application of large language models where local execution of larger models is usually prohibitive and the cloud is the only economically sane solution.

In the end, I prefer Ollama for day‑to‑day work, while acknowledging that OpenAI’s embeddings may be attractive for teams who value a managed service and are willing to pay for convenience and scalability - or who run serverless on demand scaleable services.

It is worth mentioning BERT here as well. BERT, short for Bidirectional Encoder Representations from Transformers, was one of the first transformer models widely used for generating embeddings. When I started experimenting with semantic similarity I initially used BERT models, typically loaded through Hugging Face Transformers. They produce quality embeddings and have the advantage of being free and being well documented. However, they also introduce a heavy dependency chain: you need to install large Python libraries, manage model weights, and often pull in GPU‑specific (and operating system specific) tooling. For lightweight pipelines and tools meant to be distributed or run in many environments this becomes cumbersome. Because of these additional dependencies and operational complexity, I decided not to include BERT in the final tool, even though it was the starting point of my experimentation.

The embedding calls and provider abstraction are tiny and explicit:

def embed_texts_ollama(texts, model, url):
    embs = []
    for txt in texts:
        r = requests.post(url, json={"model": model, "prompt": txt}, timeout=(20, 600))
        r.raise_for_status()
        embs.append(r.json()["embedding"])
    return embs

def embed_texts_openai(texts, model, base, api_key_env):
    key = os.environ.get(api_key_env)
    if not key:
        raise RuntimeError(f"OpenAI API key missing (env {api_key_env})")
    headers = {"Authorization": f"Bearer {key}"}
    r = requests.post(base, json={"model": model, "input": texts}, headers=headers, timeout=600)
    r.raise_for_status()
    js = r.json()
    return [d["embedding"] for d in js["data"]]

def embeddings_for(texts, cfg):
    emb = cfg["embedding"]
    if emb["provider"] == "ollama":
        return embed_texts_ollama(texts, emb["model"], emb["ollama_url"])
    return embed_texts_openai(texts, emb["model"], emb["openai_api_base"], emb["openai_api_key_env"])

def detect_embedding_dim(cfg):
    vecs = embeddings_for(["dimension probe"], cfg)
    if not vecs or not vecs[0]:
        raise RuntimeError("Failed to detect embedding dimension from provider.")
    return len(vecs[0])

Database Schema

The storage layer is deliberately small and transparent. The pages table holds one row per rendered article, keyed by its URL‐like path (primary key). I have decided to encode this path since the pages I use this for operate all at the root of their domain so it fits the URL scheme. Alongside this identifier we store a content_hash (a SHA over the content area) to detect real changes and avoid needless re‑embedding, the extracted (optional) OpenGraph metadata (title, description, image) for later presentation, and a centroid vector which is the arithmetic mean of all chunk embeddings for that page. Operational columns include updated_at, automatically set to the current timestamp on each upsert, and is_public, a boolean that lets the pipeline exclude drafts or private sections from recommendation generation. This flag is currently not supported - but may be used later on.

The chunks table contains the text fragments produced during chunking. Each row stores the parent path, an ord field that preserves the original order of chunks within the page, the chunk’s markdown text in text_md, and its high‑dimensional embedding vector. The column path is declared as a foreign key to pages(path) with ON DELETE CASCADE, ensuring that removing a page also removes all of its chunks and keeping the database free of orphans. This deviates slightly from the scheme I’d usually use (an artificially generated integer or UUID primary key, an index over path and the foreign key of course via the artificial primary key).

To make similarity search efficient, we create conventional B‑tree indexes where appropriate and vector indexes where they matter. There is an index on chunks(path) to fetch all chunks of a page quickly, and helper indexes on pages(path), pages(updated_at), and pages(is_public) to support frequent filters and maintenance queries. For vector retrieval we use pgvector`` IVF‑Flat indexes onchunks.embeddingand onpages.centroid, both created withUSING ivfflat (… vector_cosine_ops) WITH (lists = 100)```. The cosine operator class gives us cosine distance as the ranking metric (smaller means more similar), while IVF‑Flat provides approximate nearest‑neighbor search that scales well for millions of vectors.

For readers who prefer seeing the schema as DDL, here is the exact creation snippet used by the tool:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS pages (
  path         TEXT PRIMARY KEY,
  content_hash TEXT NOT NULL,
  title        TEXT,
  description  TEXT,
  image        TEXT,
  centroid     vector(DIM),
  updated_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
  is_public    BOOLEAN NOT NULL DEFAULT true
);

CREATE TABLE IF NOT EXISTS chunks (
  id        BIGSERIAL PRIMARY KEY,
  path      TEXT NOT NULL REFERENCES pages(path) ON DELETE CASCADE,
  ord       INTEGER NOT NULL,
  text_md   TEXT NOT NULL,
  embedding vector(DIM) NOT NULL
);

CREATE INDEX IF NOT EXISTS idx_chunks_path      ON chunks(path);
CREATE INDEX IF NOT EXISTS idx_pages_path       ON pages(path);
CREATE INDEX IF NOT EXISTS idx_pages_updated_at ON pages(updated_at);
CREATE INDEX IF NOT EXISTS idx_pages_is_public  ON pages(is_public);
CREATE INDEX IF NOT EXISTS idx_chunks_embedding ON chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
CREATE INDEX IF NOT EXISTS idx_pages_centroid   ON pages USING ivfflat (centroid  vector_cosine_ops) WITH (lists = 100);

In short, pages captures one vector summary per article together with presentation metadata and housekeeping flags, chunks stores the fine‑grained embeddings tied back to their page via a strict foreign key, and a small set of carefully chosen indexes keeps both maintenance and recommendation queries fast.

Error Handling

The script retries failed embedding requests, ensures dimension consistency, and validates that embeddings fit the expected size. On mismatch, a resetdb command rebuilds the schema. The typical failure behaviour at the moment is simply crashing and raising an exception. This propagates as a non‑zero exit code to the upstream application - in my case the Jenkins build system - which is sufficient for detecting and reacting to errors.

Two examples of explicit error raising are the missing OpenAI API key and an embedding dimension mismatch:

if not key:
    raise RuntimeError(f"OpenAI API key missing (env {api_key_env})")

if any(len(e) != dim_probe for e in embs):
    raise RuntimeError("Embedding dimension mismatch; run 'resetdb' after changing model/provider.")

Because the tool is invoked as a CLI, any uncaught exception will terminate the process with a non‑zero exit status, which Jenkins (or any orchestrator or build automation tool) can interpret as a failed build step.

Once indexed, the system generates a JSON file that maps each page to a curated set of neighbors. For every page I first fetch a pool of ksample candidates from the database, which are simply the closest articles according to cosine distance on their centroid embeddings. From this larger pool I then selectk items that will actually appear as the “similar articles” recommendation. The reduction step is important because the nearest‑neighbor set often contains many articles that are semantically close in a very narrow way; I do not want to present a monotonous list but rather a balanced sample that still respects similarity - and also add some feeling of dynamics to the else static page for every page rebuild.

To achieve this balance I apply Boltzmann sampling. The intuition comes from statistical mechanics: each candidate with distance value $d$ is given a weight proportional to $e^{-\frac{d - d_{min}}{T}}$, where $d_{min}$ is the smallest distance in the pool and $T$ is the temperature parameter. At low temperatures (small $T$) the sampling distribution is sharply peaked, favoring the very best matches, while at higher temperatures the distribution flattens out, allowing more diversity. In practice we use $T = 0.7$, which is a good compromise between determinism and variety. This way the same article will not always be paired with exactly the same neighbors, but the selected ones remain recognizably close in meaning.

The configuration knobs live in the defaults and are overridable in your JSON config:

"neighbors": {
    "ksample": 16,
    "k": 8,
    "temperature": 0.7,
    "pin_top": True,
    "seed": None,
    "seealso": 4
}

The candidate retrieval and Boltzmann sampling are implemented as follows (simplified to the essential parts):

-- within cmd_genrel():
SELECT path,
       centroid <=> (SELECT centroid FROM pages WHERE path = %s) AS dist
FROM pages
WHERE centroid IS NOT NULL AND is_public AND path <> %s
ORDER BY dist
LIMIT %s;

def boltzmann_sample(cands, k, temperature=0.7, pin_top=True, seed=None):
    if not cands:
        return []
    k = min(k, len(cands))
    rng = random.Random(seed) if seed is not None else random
    chosen, rest = [], cands[:]
    if pin_top and k > 0:
        chosen.append(rest[0][0])
        rest = rest[1:]
        k -= 1
        if k == 0:
            return chosen
    if not rest:
        return chosen
    dmin = min(d for _, d in rest)
    T = max(float(temperature), 1e-6)
    weights = [math.exp(- (d - dmin) / T) for _, d in rest]
    selected = []
    items = list(zip([p for p, _ in rest], weights))
    for _ in range(k):
        total = sum(w for _, w in items)
        idx = rng.randrange(len(items)) if total <= 0 else next(i for i, (_, w) in enumerate(items) if (lambda r=sum(w for _, w in items)*rng.random(): sum(t for _, t in items[:i+1]) >= r)())
        selected.append(items[idx][0])
        items.pop(idx)
    return chosen + selected

After sampling we exclude already chosen items and, when configured, add a handful of random suggestions labelled as “See also”. These provide serendipitous entry points into other parts of the blog that are not semantically near the source article but may still be interesting for the reader. The final mapping is written into _data/related.json, which Jekyll can consume to render recommendation boxes on each article page.

For completeness, the final packaging step constructs each entries payload and writes the JSON:

def pack(p: str) -> Dict[str, str]:
    mi = meta.get(p, {})
    title = mi.get("title") or p.strip("/").split("/")[-1].replace("-", " ").title()
    desc  = mi.get("desc") or "" # Optionally one can fetch only a subset of characters to limit length
    img   = mi.get("image") or ""
    return {"url": p, "title": title, "desc": desc, "image": img}

result[path] = {
    "related": [pack(p) for p in related],
    "seealso": [pack(p) for p in seealso]
}

out_file.write_text(json.dumps(result, ensure_ascii=False, indent=2), "utf-8")
print(f"Wrote {out_file}")

Jekyll Integration

In Jekyll templates (Liquid), I read the JSON and render a “Related Articles” section at the end of each post. The Liquid code below is what I currently use (provided via an include to different layouts):

{%- assign key = page.url | relative_url -%}
{%- assign lastchar = key | split: '' | last -%}

{%- if key == "/" -%}
  {%- assign key = "/index.html" -%}
{%- elsif key contains ".html" -%} <!-- -->
{%- elsif lastchar == "/" -%}
  {%- assign key = key | append: "index.html" -%}
{%- else -%}
  {%- assign key = key | append: "/index.html" -%}
{%- endif -%}
{%- assign block = site.data.related[key] -%}
{% if block %}
  {% if block.related and block.related.size > 0 %}
    <div class="related">
      <h2> Related articles </h2>
      <div class="relatedgrid">
        {% for it in block.related %}
          <div class="relcard">
            <a href="{{ it.url | relative_url }}">
              {% if it.image  and it.image != "" %}<img src="{{ it.image | relative_url }}" alt="">{% else %}<img src="/assets/images/png/unknownpage_small.png" alt="">{% endif %}
              <h3>{{ it.title }}</h3>
              {% if it.desc %}<p>{{ it.desc }}</p>{% endif %}
            </a>
          </div>
        {% endfor %}
      </div>
    {% endif %}

    {% if block.seealso and block.seealso.size > 0 %}
      <div class="related">
        <h2> Also on this blog </h2>
        <div class="relatedgrid">
          {% for it in block.seealso %}
            <div class="relcard">
              <a href="{{ it.url | relative_url }}">
                {% if it.image and it.image != "" %}<img src="{{ it.image | relative_url }}" alt="">{% else %}<img src="/assets/images/png/unknownpage_small.png" alt="">{% endif %}
                <h3>{{ it.title }}</h3>
                {% if it.desc %}<p>{{ it.desc }}</p>{% endif %}
              </a>
            </div>
          {% endfor %}
        </div>
      </div>
    {% endif %}
  </div>
{% endif %}

This template logic normalizes the current page URL into a key that matches the entries in _data/related.json. It then looks up the corresponding block of recommendations. If related articles exist, it renders them with their title, description, image and link. If configured, it also renders an additional section called “Also on this blog” that displays a few random suggestions. Each entry gracefully falls back to a placeholder image if no specific image is available. The result is that every article page automatically ends with a visually consistent block of recommendations that encourage further exploration.

Configuration

Before using the tool it is important to understand its configuration file. By default, blogsimi looks for a JSON configuration file at ~/.config/blogsimilarity.cfg. This file describes where the rendered site is located, how embeddings should be generated, and how the PostgreSQL database can be reached. It also defines parameters for chunk sizes and neighbor selection.

A minimal configuration might look like this:

{
  "site_root": "_site",
  "data_out": "_data/related.json",
  "embedding": {
    "provider": "ollama",
    "model": "mxbai-embed-large",
    "ollama_url": "http://127.0.0.1:11434/api/embeddings"
  },
  "db": {
    "host": "127.0.0.1",
    "port": 5432,
    "user": "blog",
    "password": "blog",
    "dbname": "blog"
  },
  "neighbors": {
    "ksample": 16,
    "k": 8,
    "temperature": 0.7,
    "seealso": 4
  }
}

Here site_root points to the rendered HTML directory, data_out specifies the JSON file that Jekyll will later consume, embedding configures the embedding provider and model, and db holds the connection details to PostgreSQL. The neighbors section tunes how many candidates are sampled and how many recommendations will be shown, as well as the temperature parameter for Boltzmann sampling and the number of random “see also” entries. You can also override the glob patterns that exclude certain directories (for example drafts/** or private/**) and adjust the chunking parameters (max_tokens and overlap_tokens) to control how the text is split before embedding.

If no configuration file is found, the tool falls back to its internal defaults. Every option can be overridden by supplying a different configuration file with the --config option.

Using the `blogsimi` CLI

This project ships as a single script (blogsimi) providing a small CLI. It is installable via PyPi. The source is available on GitHub. To install the package you can simply execute

pip install blogsimi

All commands read the configuration from ~/.config/blogsimilarity.cfg by default; use --config to point to a different file.

One‑time database setup

A PostgreSQL superuser (or a role with sufficient privileges) must create the role and database and enable the pgvector extension, this required superuser privileges:

-- as a PostgreSQL superuser
CREATE ROLE blog LOGIN PASSWORD 'blog';
CREATE DATABASE blog OWNER blog;
\c blog
CREATE EXTENSION IF NOT EXISTS vector;  -- requires superuser or appropriate privileges

Then you can initialize the tables via the tool:

blogsimi initdb

When you change the embedding provider/model (and thus the vector dimension), recreate the tables (dropping all data):

blogsimi resetdb

Index the rendered site

Point the indexer at your built HTML. Use --page to override the site root (defaults to site_root in the config). The indexer only re‑embeds changed pages.

blogsimi index --page _site

Generate the recommendations JSON

Write the related/seealso mapping into your Jekyll data directory. Use --out to override the output path (defaults to data_out in the config).

blogsimi genrel --out _data/related.json

The CLI exits with a non‑zero status on errors (e.g., DB connection issues, missing API keys). This makes it easy to wire into CI systems like Jenkins.

Conclusion

Semantic search brings a new layer of discovery to static sites. By combining local embeddings, PostgreSQL with pgvector, and Jekyll integration, we can:

Suggest related content that is truly relevant.
Keep all data self‑hosted and under control.
Extend beyond blog posts into any content type.

The result: a richer and more engaging browsing experience, built on top of solid and transparent technology.

Building Semantic Suggested Articles for a Static Blog (and How To Visualize Embeddings)

From Keywords to Semantics

Cosine Similarity

Visualizing Embeddings

Chunks (2D projection, main principal components)

Chunks (3D projection, selectable principal components)

Articles (2D projection, main principal components)

Articles (3D projection, selectable principal components)

Choosing the Right Database Backend

PostgreSQL with pgvector

Chroma

Faiss

The Pipeline

Gathering and Indexing Data

Generating Embeddings

Database Schema

Error Handling

Generating Related Articles

Jekyll Integration

Configuration

Using the blogsimi CLI

One‑time database setup

Index the rendered site

Generate the recommendations JSON

Conclusion

Related articles

A first quick look on different sentence embedding methods - playing with word and sentence embeddings

The Web Is for Everyone, Not Only for Humans

Automatic sitemap generation with Jekyll

Coding with an AI Assistant: My Ongoing Journey into Vibe Coding

Architecting Intelligence: A Comprehensive Guide to LLM Agent Patterns and Behaviors

Statistics on chat service online status

Adding tags for indexing webpages with Jekyll

mini-apigw: A Lightweight Gateway for Multi-Model AI Infrastructure

Also on this blog

TH2025BS-1 circulator

Sane Windows IP configuration: Disabling IPv6 privacy extensions and enabling ICMP echo

The Myth of Neurotypical Help: Why They Never Actually Support Autistic Struggles

Configuring VLANs and bridges on FreeBSD Xen Dom0

Using the `blogsimi` CLI