Pandoc and nbconvert recipes

04 May 2025 - tsp
Last update 04 May 2025
Reading time 10 mins

Introduction

This page contains a quick overview of pandoc and jupyter nbconvert recipes that I use more often – it serves mainly as a personal repository of practical commands and tricks. In particular, it includes techniques for converting Markdown and Jupyter notebooks into self-contained HTML files. One focus is on extending nbconvert through custom Python preprocessors to modify the notebook content before export. These preprocessors can be used to resize or recompress images embedded in notebook outputs or referenced from Markdown cells, enabling more efficient and portable HTML outputs. In addition there is a quick overview on nbconvert Jinja2 templates - though only the absolute basics will be looked at.

Markdown to HTML (simple)

First lets take a look what we need to do to convert a simple Markdown file into a very simple HTML file. This is an excellent job for pandoc (easily installable via pkg install pandoc):

pandoc input.md -o output.html

Jupyter Notebook to Single Page HTML (simple)

One can do the same with an Jupyter notebook though the nbconvert utility supplied by Jupyter is far more advanced. To easily create a single page HTML that includes all graphics (not as references but as base64 encoded embedded resources so one can distribute just the single HTML) one can utilize the following one-liner:

jupyter nbconvert input.ipynb --embed-images --to html --output output.html

Jupyter Notebook to Single Page HTML (resizing images)

Now let’s assume that you have embedded large images (Photographs, renderings, etc.) but want to produce smaller HTML files – still including everything inside the cells. Then you can simply utilize a Python preprocessor that resizes images that are too large. This is a feature not commonly used directly offered by nbconvert though it’s extremly powerful. You can define a new Python file – in this example inline_markdown_images_preprocessor.py. This file includes our Preprocessor (a subclass of Preprocessor from the nbconvert.preprocessors package) that gets passed the Python and Markdown cells by nbconvert by the exporter. We assume the HTML exporter is used (with --to html) - this influences which command line we are going to use later on:

from nbconvert.preprocessors import Preprocessor
from traitlets import Integer
from PIL import Image
import base64
import os
import re
from io import BytesIO

class InlineMarkdownImagesPreprocessor(Preprocessor):
    max_size = Integer(600, help="Maximum width/height in pixels").tag(config=True)
    jpeg_quality = Integer(85, help="JPEG compression quality").tag(config=True)

    def process_img(self, path):
        # If the supplied image is not a filename (for example in case it's
        # already a base64 string or similar) we ignore it
        if not os.path.isfile(path):
            return None

        # Now try to open the image using Pillow
        # and utlize the thumbnail method. This only resizes in case the
        # image is larger along one of the supplied dimensions and
        # preserves the aspect ratio (in-place)
        try:
            img = Image.open(path).convert("RGB")
            img.thumbnail((self.max_size, self.max_size), Image.LANCZOS)
            buffer = BytesIO()
            img.save(buffer, format="JPEG", quality=self.jpeg_quality)
            encoded = base64.b64encode(buffer.getvalue()).decode('utf-8')
            return f'data:image/jpeg;base64,{encoded}'
        except Exception as e:
            # In case we had not been able to modify the image we keep the
            # original
            print(f"Failed to inline image {path}: {e}")
            return None

    # This is the overriden public method that is called by Preprocessor
    def preprocess_cell(self, cell, resources, cell_index):
        # We only handle _markdown_ cells (this does not affect
        # the output of code cells for example).

        if cell.cell_type == "markdown":
            # Find Markdown-style and HTML-style image references
            # utilizing regular expressions - this is not proper
            # parsing of HTML though it should be sufficient for
            # Jupyter notebooks. If not this should be replaced with
            # proper markdown and HTML parsing.
            def replace_match(match):
                path = match.group(1) or match.group(2)
                b64 = self.process_img(path)
                if b64:
                    return f'<img src="{b64}" style="max-width:100%;">'
                else:
                    return match.group(0)

            # Regex for ![alt](file.jpg) or <img src="file.jpg">
            cell.source = re.sub(
                r'!\[.*?\]\(([^)]+)\)|<img\s+[^>]*src="([^"]+)"[^>]*>',
                replace_match,
                cell.source
            )

        return cell, resources

To utilize this exporter one can pass this simply to nbconvert - since we utilize HTML output we pass it as preprocessor to the HTMLExporter:

jupyter nbconvert \
   input.ipynb \
   --embed-images \
   --to html \
   --output output.html \
   --HTMLExporter.preprocessors="['inline_markdown_images_preprocessor.InlineMarkdownImagesPreprocessor']"

Chaining preprocessors: Additionally compress output images, not only markdown images

In case one also wants to compress too large images from code output one can utilize another preprocessor that accesses the cell output. Let’s call this compress_image_preprocessor.py (one can actually embed both in the same Pyhton file of course):

from nbconvert.preprocessors import Preprocessor
from traitlets import Integer
from io import BytesIO
from PIL import Image
import base64

class CompressOutputImagesPreprocessor(Preprocessor):
    max_size = Integer(600, help="Maximum width/height in pixels").tag(config=True)
    jpeg_quality = Integer(85, help="JPEG compression quality").tag(config=True)

    def preprocess_cell(self, cell, resources, cell_index):
        # We only process the "output" produced by executed code cells.
        # Thouse outputs are actually lists of output objects so we check all
        # of them:
        if 'outputs' in cell:
            for output in cell['outputs']:
                # We will only process image/png output for now. Usually
                # this includes a base64 encoded image for Jupyter notebooks
                if 'image/png' in output.get('data', {}):
                    try:
                        # Decode base64 image
                        b64_data = output['data']['image/png']
                        image_bytes = base64.b64decode(b64_data)
                        img = Image.open(BytesIO(image_bytes)).convert("RGB")

                        # Resize if too large
                        img.thumbnail((self.max_size, self.max_size), Image.LANCZOS)

                        # Recompress as JPEG
                        buffer = BytesIO()
                        img.save(buffer, format='JPEG', quality=self.jpeg_quality)
                        buffer.seek(0)
                        jpeg_b64 = base64.b64encode(buffer.read()).decode('utf-8')

                        # Replace with compressed JPEG
                        output['data'].pop('image/png')
                        output['data']['image/jpeg'] = jpeg_b64
                    except Exception as e:
                        print(f"Image processing failed: {e}")

        return cell, resources

To invoke with both filters just pass both of them as preprocessors:

jupyter nbconvert \
   input.ipynb \
   --embed-images \
   --to html \
   --output output.html \
   --HTMLExporter.preprocessors="['inline_markdown_images_preprocessor.InlineMarkdownImagesPreprocessor', 'compress_image_preprocessor.CompressOutputImagesPreprocessor']"

Working with nbconvert Templates

Template Basics and Jinja2

nbconvert uses the Jinja2 templating engine to render notebooks into different output formats like HTML, PDF, or Markdown. Templates allow you to fully control the look, structure, and behavior of the generated files.

Template files are typically located in directories like:

/usr/local/share/jupyter/nbconvert/templates/

Inside, you’ll find folders for formats such as html, lab, or classic. Each of these folders contains .j2 files - these are Jinja2 template files used to generate different parts of the final document. The Jinja2 syntax allows embedding Python-like expressions and logic into the HTML or other output formats.

A look at an example: The classic template

Let’s examine the structure of the classic HTML export template. This template combines configuration (conf.json), a Jinja2-based HTML scaffold (base.html.j2), and optional styles or resources (like CSS or JavaScript).

The file conf.json provides metadata about the template and can declare:

{
  "base_template": "base",
  "mimetypes": {
    "text/html": true
  },
  "preprocessors": {
    "100-pygments": {
      "type": "nbconvert.preprocessors.CSSHTMLHeaderPreprocessor",
      "enabled": true,
      "style": "default"
    }
  }
}

The base.html.j2 file is the heart of HTML layout logic. It defines how each notebook cell type is rendered by overriding blocks such as codecell, markdowncell and various output types.

The top includes inheritance:


{%- extends 'display_priority.j2' -%}
{% from 'celltags.j2' import celltags %}
{% from 'cell_id_anchor.j2' import cell_id_anchor %}

Then blocks such as:


{% block codecell %}
<div {{ cell_id_anchor(cell) }} class="cell border-box-sizing code_cell rendered{{ celltags(cell) }}">
{{ super() }}
</div>
{%- endblock codecell %}

allow fine-grained control over how each kind of output (code, markdown, raw, stderr, stream, SVG, PNG, etc.) is transformed into HTML. You can even modify prompts like In[1]: and Out[1]:, inject custom styles, or hide outputs dynamically.

If present, a static/ folder can include style.css or other resources which are copied and optionally embedded into the output. The classic HTML exporter will embed these via <style> tags if --embed-resources is used.

Writing a Custom HTML Template

To create your own template, you can first simply create an appropriate directory structure:

mkdir -p mytemplate
cd mytemplate

Now we generate a conf.json that inherits from classic:

{
  "base_template": "classic"
}

Now we generate an index.html.j2:


{% extends 'classic/base.html.j2' %}

{% block body %}
  {{ super() }}
{% endblock body %}

<!DOCTYPE html>
<html>
  <head>
    <title>{{ resources.metadata.name }}</title>
    <style>
      {{ resources.inlining.css  }}
    </style>
  </head>
  <body>
    <!-- We will see our custom template here! -->
    {{ body }}
    <!-- And we will also see this part of our custom template here! -->
  </body>
</html>

We can simply apply this template while executing the HTML exporter by supplying it via the command line:

jupyter nbconvert
    input.ipynb
    --to html
    --template=mytemplate
    --embed-images

Creating a Markdown template for Jekyll

To generate Markdown posts for a Jekyll blog directly from notebooks, we can write a custom template that includes YAML front matter, proper formatting, and excludes unnecessary notebook metadata.

Again let’s first create a directory for our templates:

mkdir -p jekyll_md_template

In there we generate a conf.json to inherit from markdown:

{
  "base_template": "markdown"
}

Now let’s create our index.md.j2 template:

---
title: "{{ resources.metadata.title or 'Jupyter Notebook Post' }}"
date: {{ resources.metadata.date or 'Unknown publishing date' }}
layout: post
categories: [notebook, jupyter, nbconvert]
---

{% for cell in nb.cells %}
  {% if cell.cell_type == 'markdown' %}
{{ cell.source }}
  {% elif cell.cell_type == 'code' %}
    ```
{{ cell.source }}
    ```
    {% for output in cell.outputs %}
      {% if output.output_type == 'stream' %}
{{ output.text }}
      {% elif output.output_type == 'execute_result' and 'text/plain' in output.data %}
{{ output.data['text/plain'] }}
      {% endif %}
    {% endfor %}
  {% endif %}
{% endfor %}

This template loops through all notebook cells, adds Markdown as-is, and formats code and output using fenced code blocks. It also inserts a Jekyll-compatible YAML header at the top. As before we can simply apply our custom template:

jupyter nbconvert input.ipynb
  --to markdown
  --template=jekyll_md_template
  --output output.md

Conclusion

This mini guide outlines a practical and extensible workflow for converting Jupyter notebooks and Markdown files into efficient, portable documents. By leveraging pandoc for Markdown and nbconvert for Jupyter notebooks, one can easily create output suited for web publication, long-term archiving, or blogging. Beyond the basic conversion capabilities, I’ve shown how the true flexibility of nbconvert lies in its template system and support for custom preprocessors. These features allow for precise control over HTML and Markdown output, image handling, formatting, and content filtering. Whether you are preparing single-page HTML reports, Jekyll-compatible blog posts, or embedding lightweight exports into larger static sites, nbconvert provides a clean, programmable solution for document generation.

This article is tagged:


Data protection policy

Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)

This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/

Valid HTML 4.01 Strict Powered by FreeBSD IPv6 support