04 May 2025 - tsp
Last update 04 May 2025
10 mins
This page contains a quick overview of pandoc
and jupyter nbconvert
recipes that I use more often – it serves mainly as a personal repository of practical commands and tricks. In particular, it includes techniques for converting Markdown and Jupyter notebooks into self-contained HTML files. One focus is on extending nbconvert
through custom Python preprocessors to modify the notebook content before export. These preprocessors can be used to resize or recompress images embedded in notebook outputs or referenced from Markdown cells, enabling more efficient and portable HTML outputs. In addition there is a quick overview on nbconvert
Jinja2 templates - though only the absolute basics will be looked at.
First lets take a look what we need to do to convert a simple Markdown file into a very simple HTML file. This is an excellent job for pandoc
(easily installable via pkg install pandoc
):
pandoc input.md -o output.html
One can do the same with an Jupyter notebook though the nbconvert
utility supplied by Jupyter is far more advanced. To easily create a single page HTML that includes all graphics (not as references but as base64
encoded embedded resources so one can distribute just the single HTML) one can utilize the following one-liner:
jupyter nbconvert input.ipynb --embed-images --to html --output output.html
Now let’s assume that you have embedded large images (Photographs, renderings, etc.) but want to produce smaller HTML files – still including everything inside the cells. Then you can simply utilize a Python preprocessor that resizes images that are too large. This is a feature not commonly used directly offered by nbconvert
though it’s extremly powerful. You can define a new Python file – in this example inline_markdown_images_preprocessor.py
. This file includes our Preprocessor (a subclass of Preprocessor
from the nbconvert.preprocessors
package) that gets passed the Python and Markdown cells by nbconvert
by the exporter. We assume the HTML exporter is used (with --to html
) - this influences which command line we are going to use later on:
from nbconvert.preprocessors import Preprocessor
from traitlets import Integer
from PIL import Image
import base64
import os
import re
from io import BytesIO
class InlineMarkdownImagesPreprocessor(Preprocessor):
max_size = Integer(600, help="Maximum width/height in pixels").tag(config=True)
jpeg_quality = Integer(85, help="JPEG compression quality").tag(config=True)
def process_img(self, path):
# If the supplied image is not a filename (for example in case it's
# already a base64 string or similar) we ignore it
if not os.path.isfile(path):
return None
# Now try to open the image using Pillow
# and utlize the thumbnail method. This only resizes in case the
# image is larger along one of the supplied dimensions and
# preserves the aspect ratio (in-place)
try:
img = Image.open(path).convert("RGB")
img.thumbnail((self.max_size, self.max_size), Image.LANCZOS)
buffer = BytesIO()
img.save(buffer, format="JPEG", quality=self.jpeg_quality)
encoded = base64.b64encode(buffer.getvalue()).decode('utf-8')
return f'data:image/jpeg;base64,{encoded}'
except Exception as e:
# In case we had not been able to modify the image we keep the
# original
print(f"Failed to inline image {path}: {e}")
return None
# This is the overriden public method that is called by Preprocessor
def preprocess_cell(self, cell, resources, cell_index):
# We only handle _markdown_ cells (this does not affect
# the output of code cells for example).
if cell.cell_type == "markdown":
# Find Markdown-style and HTML-style image references
# utilizing regular expressions - this is not proper
# parsing of HTML though it should be sufficient for
# Jupyter notebooks. If not this should be replaced with
# proper markdown and HTML parsing.
def replace_match(match):
path = match.group(1) or match.group(2)
b64 = self.process_img(path)
if b64:
return f'<img src="{b64}" style="max-width:100%;">'
else:
return match.group(0)
# Regex for  or <img src="file.jpg">
cell.source = re.sub(
r'!\[.*?\]\(([^)]+)\)|<img\s+[^>]*src="([^"]+)"[^>]*>',
replace_match,
cell.source
)
return cell, resources
To utilize this exporter one can pass this simply to nbconvert
- since we utilize HTML output we pass it as preprocessor to the HTMLExporter
:
jupyter nbconvert \
input.ipynb \
--embed-images \
--to html \
--output output.html \
--HTMLExporter.preprocessors="['inline_markdown_images_preprocessor.InlineMarkdownImagesPreprocessor']"
In case one also wants to compress too large images from code output one can utilize another preprocessor that accesses the cell output. Let’s call this compress_image_preprocessor.py
(one can actually embed both in the same Pyhton file of course):
from nbconvert.preprocessors import Preprocessor
from traitlets import Integer
from io import BytesIO
from PIL import Image
import base64
class CompressOutputImagesPreprocessor(Preprocessor):
max_size = Integer(600, help="Maximum width/height in pixels").tag(config=True)
jpeg_quality = Integer(85, help="JPEG compression quality").tag(config=True)
def preprocess_cell(self, cell, resources, cell_index):
# We only process the "output" produced by executed code cells.
# Thouse outputs are actually lists of output objects so we check all
# of them:
if 'outputs' in cell:
for output in cell['outputs']:
# We will only process image/png output for now. Usually
# this includes a base64 encoded image for Jupyter notebooks
if 'image/png' in output.get('data', {}):
try:
# Decode base64 image
b64_data = output['data']['image/png']
image_bytes = base64.b64decode(b64_data)
img = Image.open(BytesIO(image_bytes)).convert("RGB")
# Resize if too large
img.thumbnail((self.max_size, self.max_size), Image.LANCZOS)
# Recompress as JPEG
buffer = BytesIO()
img.save(buffer, format='JPEG', quality=self.jpeg_quality)
buffer.seek(0)
jpeg_b64 = base64.b64encode(buffer.read()).decode('utf-8')
# Replace with compressed JPEG
output['data'].pop('image/png')
output['data']['image/jpeg'] = jpeg_b64
except Exception as e:
print(f"Image processing failed: {e}")
return cell, resources
To invoke with both filters just pass both of them as preprocessors:
jupyter nbconvert \
input.ipynb \
--embed-images \
--to html \
--output output.html \
--HTMLExporter.preprocessors="['inline_markdown_images_preprocessor.InlineMarkdownImagesPreprocessor', 'compress_image_preprocessor.CompressOutputImagesPreprocessor']"
nbconvert
uses the Jinja2 templating engine to render notebooks into different output formats like HTML, PDF, or Markdown. Templates allow you to fully control the look, structure, and behavior of the generated files.
Template files are typically located in directories like:
/usr/local/share/jupyter/nbconvert/templates/
Inside, you’ll find folders for formats such as html
, lab
, or classic
. Each of these folders contains .j2
files - these are Jinja2 template files used to generate different parts of the final document. The Jinja2 syntax allows embedding Python-like expressions and logic into the HTML or other output formats.
Let’s examine the structure of the classic
HTML export template. This template combines configuration (conf.json
), a Jinja2-based HTML scaffold (base.html.j2
), and optional styles or resources (like CSS or JavaScript).
The file conf.json
provides metadata about the template and can declare:
{
"base_template": "base",
"mimetypes": {
"text/html": true
},
"preprocessors": {
"100-pygments": {
"type": "nbconvert.preprocessors.CSSHTMLHeaderPreprocessor",
"enabled": true,
"style": "default"
}
}
}
base_template
: declares inheritance from another template (in this case, base.html.j2
).mimetypes
: specifies the output MIME types supported.preprocessors
: includes built-in or custom Python-based preprocessors that alter notebook content before rendering like we have seen before. The Pygments preprocessor injects syntax-highlighted CSS for example.The base.html.j2
file is the heart of HTML layout logic. It defines how each notebook cell type is rendered by overriding blocks such as codecell
, markdowncell
and various output types.
The top includes inheritance:
{%- extends 'display_priority.j2' -%}
{% from 'celltags.j2' import celltags %}
{% from 'cell_id_anchor.j2' import cell_id_anchor %}
Then blocks such as:
{% block codecell %}
<div {{ cell_id_anchor(cell) }} class="cell border-box-sizing code_cell rendered{{ celltags(cell) }}">
{{ super() }}
</div>
{%- endblock codecell %}
allow fine-grained control over how each kind of output (code
, markdown
, raw
, stderr
, stream
, SVG
, PNG
, etc.) is transformed into HTML. You can even modify prompts like In[1]:
and Out[1]:
, inject custom styles, or hide outputs dynamically.
If present, a static/
folder can include style.css
or other resources which are copied and optionally embedded into the output. The classic HTML exporter will embed these via <style>
tags if --embed-resources
is used.
To create your own template, you can first simply create an appropriate directory structure:
mkdir -p mytemplate
cd mytemplate
Now we generate a conf.json
that inherits from classic
:
{
"base_template": "classic"
}
Now we generate an index.html.j2
:
{% extends 'classic/base.html.j2' %}
{% block body %}
{{ super() }}
{% endblock body %}
<!DOCTYPE html>
<html>
<head>
<title>{{ resources.metadata.name }}</title>
<style>
{{ resources.inlining.css }}
</style>
</head>
<body>
<!-- We will see our custom template here! -->
{{ body }}
<!-- And we will also see this part of our custom template here! -->
</body>
</html>
We can simply apply this template while executing the HTML exporter by supplying it via the command line:
jupyter nbconvert
input.ipynb
--to html
--template=mytemplate
--embed-images
To generate Markdown posts for a Jekyll blog directly from notebooks, we can write a custom template that includes YAML front matter, proper formatting, and excludes unnecessary notebook metadata.
Again let’s first create a directory for our templates:
mkdir -p jekyll_md_template
In there we generate a conf.json
to inherit from markdown
:
{
"base_template": "markdown"
}
Now let’s create our index.md.j2
template:
---
title: "{{ resources.metadata.title or 'Jupyter Notebook Post' }}"
date: {{ resources.metadata.date or 'Unknown publishing date' }}
layout: post
categories: [notebook, jupyter, nbconvert]
---
{% for cell in nb.cells %}
{% if cell.cell_type == 'markdown' %}
{{ cell.source }}
{% elif cell.cell_type == 'code' %}
```
{{ cell.source }}
```
{% for output in cell.outputs %}
{% if output.output_type == 'stream' %}
{{ output.text }}
{% elif output.output_type == 'execute_result' and 'text/plain' in output.data %}
{{ output.data['text/plain'] }}
{% endif %}
{% endfor %}
{% endif %}
{% endfor %}
This template loops through all notebook cells, adds Markdown as-is, and formats code and output using fenced code blocks. It also inserts a Jekyll-compatible YAML header at the top. As before we can simply apply our custom template:
jupyter nbconvert input.ipynb
--to markdown
--template=jekyll_md_template
--output output.md
This mini guide outlines a practical and extensible workflow for converting Jupyter notebooks and Markdown files into efficient, portable documents. By leveraging pandoc
for Markdown and nbconvert
for Jupyter notebooks, one can easily create output suited for web publication, long-term archiving, or blogging. Beyond the basic conversion capabilities, I’ve shown how the true flexibility of nbconvert
lies in its template system and support for custom preprocessors. These features allow for precise control over HTML and Markdown output, image handling, formatting, and content filtering. Whether you are preparing single-page HTML reports, Jekyll-compatible blog posts, or embedding lightweight exports into larger static sites, nbconvert
provides a clean, programmable solution for document generation.
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/