Setting Parameters like Context Length and Temperature in Ollama Models

10 Aug 2025 - tsp
Last update 10 Aug 2025
Reading time 2 mins

This guide explains (or rather takes a note for myself) how to modify parameters for Ollama models - such as context length, temperature, and more - using a custom MODELFILE. Many models available via Ollama come with relatively small context windows (often around 4k tokens) and fixed sampling parameters like a temperature of 0.8. By creating your own model definition, you can override these defaults.

Viewing and Editing a MODELFILE

You can view the MODELFILE of any installed model with:

ollama show --modelfile <model-name>

A MODELFILE gives you control over various aspects of a model. You can set core parameters such as:

num_ctx, which determines the context size in tokens
temperature, which controls output randomness (with 0 producing deterministic responses and higher values making them more creative)
seed, which fixes the random seed to produce reproducible output

You can also adjust sampler behavior through options like top_p, min_p, maximum prediction length, and repetition penalty. In addition, you can define prompts and adapters by specifying:

a TEMPLATE that matches the training format
a SYSTEM prompt to shape the models behavior
an ADAPTER to attach LoRA adapters that expand or change the model’s capabilities

For full parameter details, see Ollama’s MODELFILE documentation.

Example: Increasing Context and Temperature

Suppose you want to boost creativity and allow much longer context on qwen2.5-coder:32b:

FROM qwen2.5-coder:32b
PARAMETER num_ctx 128000
PARAMETER temperature 1.2

Save this as MY_MODELFILE and run:

ollama create qwen2.5-coder:32b_128k --file MY_MODELFILE

This creates a new model variant locally without duplicating the base weights.

Example: Custom System Prompt and LoRA Adapter

You can also set a system prompt and add a LoRA adapter:

FROM llama3.1:70b
PARAMETER num_ctx 32768
SYSTEM "You are a concise assistant specializing in data analysis."
ADAPTER /path/to/finance-lora

Considerations

Hardware usage: Larger num_ctx increases VRAM/RAM usage massively and increases inference time.
Model training limits: Increasing context length beyond the training maximum can degrade output quality. Always check the models capabilities.
Template compatibility: If you change TEMPLATE, ensure it matches the model’s original formatting to avoid degraded performance. Usually you want to use the same model as has been used during training.

By customizing a MODELFILE, you can fine-tune how Ollama models behave for your specific tasks, balancing performance, creativity, and context size to your needs.

Running Ollama on a CPU-Only Headless FreeBSD 14 System

This mini guide walks through compiling and running Ollama on a headless, CPU-only FreeBSD 14.2 system with an Intel Alder Lake-N97 processor. It covers why one might opt for CPU-based inference, the Vulkan-related issue preventing execution without a GUI, and how to disable Vulkan by recompiling from ports. The article provides step-by-step instructions for extracting, modifying, compiling, and installing Ollama, along with details on running inference and transferring models to air-gapped systems.

Another quick glance on the OpenAI API to ChatGPT using function calling

This is a very short overview on how to use the tooling/function API of ChatGPT to call external functions to perform actions or retrieve some simple information. In addition a simple iterative problem solving process of GPTs is shown as an example in Python. It makes use of an external commercial API though.

GPU size estimation for LLMs

Ever wondered how much VRAM you actually need to run a 13B, 65B, or even 500B language model? This compact, no-fluff guide - mostly a personal reference - breaks down GPU memory needs by model size, quantization, and task complexity. Includes real-world GPU examples with affiliate links for anyone thinking of building or upgrading their own AI rig. Great for hobbyists, tinkerers, and anyone asking, 'Can I run that locally?'

Building Semantic Suggested Articles for a Static Blog (and How To Visualize Embeddings)

Semantic search is a trending topic - and as of today a suprisingly simple concept. Instead of relying on keywords, this article shows how to use embeddings to capture meaning and suggest genuinely related posts. Along the way, I will show how to visualize embeddings in 2D and 3D, making the abstract mathematics of similarity tangible through interactive plots that visualize this blog itself. From theory to practice, the article walks through a full pipeline. It covers chunking content, generating embeddings with Ollama or OpenAI, storing them in PostgreSQL with pgvector, and finally integrating related-article boxes directly into Jekyll. Code snippets, schema definitions, and configuration examples make it easy to reproduce - or adapt for your own projects.

Coding with an AI Assistant: My Ongoing Journey into Vibe Coding

Pair programming with an AI might sound futuristic or flaky - but what if it actually works? In this short article (that evolves over time) I reflect on my personal evolving experience with vibe coding: a collaborative, conversational approach to software development using large language models. From architectural planning and boilerplate generation to debugging rituals and code reviews the article reveals how these tools can be both uncanny helpers and frustrating novices - often within the same session. This is not another contribution to the hype. Instead, you'll find some practical advice, mentions of the cost involved, tool recommendations and cautionary tales drawn from real-world projects. The article invites you to explore what it's like to co-create software with a machine that remembers nothing, hallucinates confidently, and - if you guide it well - helps you build better, faster, and with fewer headaches.

How I Use Large Language Models (LLMs) in My Daily Work and Hobbies

In this article, I explore the various ways I integrate large language models (LLMs) like ChatGPT and LLAMA into my daily work and personal projects. From summarizing scientific papers and refining research communication to enhancing creativity through AI-generated artwork and automating everyday tasks, LLMs have become invaluable tools in my workflow. I also discuss how AI can assist in coding, structuring complex ideas, and even helping friends navigate social and emotional challenges. Whether you're a researcher, a maker, or simply curious about AI's capabilities, this article offers insights into practical, real-world applications of LLMs.

Producing Structured Output (JSON mode) with Anthropics API: A Practical Solution

Getting structured data out of large language models is one of the key steps when building automation pipelines. While OpenAI, Mistral, and Ollama make this easy through direct response_format options, Anthropics Claude API takes a different route. There is no built-in schema parameter - yet structured, validated JSON output is still possible if one knows how to guide the model correctly. This article explains how to achieve predictable structured responses from Claude by generating a synthetic tool definition that mirrors the desired schema. The approach is elegant, compatible with other models, and turns any Claude completion into a clean, validated data exchange. A practical example demonstrates how to implement it in Python using Pydantic and how this technique bridges the gap between OpenAIs response_format and Anthropics tool-calling mechanism.

A first quick look on different sentence embedding methods - playing with word and sentence embeddings

A quick overview on how one can generate word and sentence embeddings using BERT, mpnet and OpenAIs sentence embedding API in Python. The second part also includes a mini semantic search engine (with linear index that only serves demonstration purpose). This article does not offer and in depth explanation but only a quick overview glance on the APIs to play around.

Setting Parameters like Context Length and Temperature in Ollama Models

Viewing and Editing a MODELFILE

Example: Increasing Context and Temperature

Example: Custom System Prompt and LoRA Adapter

Considerations

Related articles

Running Ollama on a CPU-Only Headless FreeBSD 14 System

Another quick glance on the OpenAI API to ChatGPT using function calling

GPU size estimation for LLMs

Building Semantic Suggested Articles for a Static Blog (and How To Visualize Embeddings)

Coding with an AI Assistant: My Ongoing Journey into Vibe Coding

How I Use Large Language Models (LLMs) in My Daily Work and Hobbies

Producing Structured Output (JSON mode) with Anthropics API: A Practical Solution

A first quick look on different sentence embedding methods - playing with word and sentence embeddings

Also on this blog

A Rant from the Old Forge: On the State of Software Development

Production Process of Cement: An Overview

Teaching

Emotional Dysregulation: When the Wasps Become a Mausoleum