<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://www.tspi.at/atom.xml" rel="self" type="application/atom+xml" /><link href="https://www.tspi.at/" rel="alternate" type="text/html" /><updated>2026-04-12T19:18:22+02:00</updated><id>https://www.tspi.at/atom.xml</id><title type="html">tspi.at</title><entry><title type="html">Prediction Error vs Measurement Error in Model Fitting</title><link href="https://www.tspi.at/2026/04/11/fitandpredictionerror.html" rel="alternate" type="text/html" title="Prediction Error vs Measurement Error in Model Fitting" /><published>2026-04-11T00:00:00+02:00</published><updated>2026-04-11T14:33:38+02:00</updated><id>https://www.tspi.at/2026/04/11/fitandpredictionerror</id><content type="html" xml:base="https://www.tspi.at/2026/04/11/fitandpredictionerror.html"><![CDATA[<p>When fitting a model function to experimental data, one is often confronted with a subtle but important conceptual issue: the uncertainty of the fitted model is frequently much smaller than the apparent measurement uncertainty of the individual data points. At first glance, this may appear contradictory. How can a model, fitted to noisy data, exhibit smaller uncertainty than the data itself?</p>

<p>This apparent paradox often leads to misinterpretation. Observers may assume that the narrow confidence bands of the fitted model represent the measurement uncertainty, and consequently judge the data against these bands, leading to incorrect conclusions about data quality or model validity.</p>

<p>This article clarifies the distinction between:</p>

<ul>
  <li><a href="#measurement-error">Measurement error</a></li>
  <li><a href="#model-fit-uncertainty">Model (fit) uncertainty</a></li>
  <li><a href="#residuals-and-data-driven-variance">Residuals and data-driven variance (Measurement error)</a></li>
  <li><a href="#prediction-error">Prediction error</a></li>
</ul>

<p>We will demonstrate how these quantities arise via a <a href="#a-practical-example">simulated measurement</a>, how they should be interpreted, and how they can be computed in practice.</p>

<p>In the end we will provide a <a href="#conclusion-and-the-ommon-interpretation-pitfall">short summary and conclusion</a>.</p>

<h2 id="measurement-error-vs-model-uncertainty">Measurement Error vs Model Uncertainty</h2>

<h3 id="measurement-error">Measurement Error</h3>

<p>Measurement error describes the uncertainty associated with each observed data point. Formally, we write:</p>

[
\begin{aligned}
y_i &= f(x_i, \theta) + \epsilon_i
\end{aligned}
]

<p>Here</p>

<ul>
  <li>$y_i$ is the measured value</li>
  <li>$f(x_i, \theta)$ is the underlying model at the position $x_i$ and for the parameter set $\theta$</li>
  <li>$\epsilon_i$ is the random error term with a variance $\sigma_y^2$</li>
</ul>

<p>The error $\epsilon_i$ is typically determined by the measurement process itself (measurement noise, environmental fluctuations, discretization, etc.). This corresponds to the standard deviation of the measurement process. $y_i$ is typically the mean obtained from repeated measurements.</p>

<h3 id="model-fit-uncertainty">Model (Fit) Uncertainty</h3>

<p>When fitting a model $f(x, \theta)$ to data, the parameters $\theta$ are estimated from <em>all</em> observations. The uncertainty of these parameters is given by the covariance matrix:</p>

[
C := \mathrm{Cov}(\theta)
]

<p>The matrix encodes how precisely the parameters are determined by the fitting / regression procedure. The uncertainty of the <em>model prediction</em> at a given point $x$ is obtained by propagating the covariance:</p>

[
\begin{aligned}
\sigma_f^2(x) &= \left(\nabla_\theta f(x, \theta)\right)^T C \nabla_\theta f(x, \theta)
\end{aligned}
]

<p>The quantity $\sigma_f$ represents the <strong>confidence band</strong> of the fitted model. The width of this band decreases with the number of data points (similar to the <a href="/2025/07/18/sesdstable.html">standard error or a measurement</a>. For well conditioned problems with independent observations, the scaling can be <em>estimated</em> for many problems as:</p>

[
\sigma_f \sim \frac{\sigma_y}{\sqrt{N}}
]

<p>Even if individual measurements are noisy, the estimated parameters of the assumed model can be determined very precisely.</p>

<blockquote>
  <p><strong>Fit uncertainty</strong> $\sigma_f$: How confident can we be about the fitted model.</p>
</blockquote>

<p>Note that a small $\sigma_f$ <strong>does not imply that the model is correct</strong>. You need to apply proper statistical tests on your hypothesis.</p>

<h2 id="residuals-and-data-driven-variance">Residuals and Data-Driven Variance</h2>

<p>The residuals quantify how well the model describes the observed data:</p>

[
r_i = y_i - \hat{y_i}
]

<p>Here $\hat{y_i} = f(x_i, \hat{\theta})$ is the prediction of the data value by the fitted model. From these residuals one can estimate the variance of the data around the model:</p>

[
\sigma_r^2 = \frac{1}{N-p} \sum_{i=1}^{N} r_i^2
]

<p>Here:</p>

<ul>
  <li>$N$ is the number of observations (measurement points)</li>
  <li>$p$ is the number of fitted parameters</li>
  <li>$N-p$ thus is the degrees of freedom</li>
</ul>

<p>The quantity $\sigma_r$ represents the <strong>intrinsic scatter</strong> of the data and is typically <strong>comparable to the measurement noise</strong> (but they are not equal). In case of <em>correlated noise</em> $\sigma_r$ <strong>underestimates</strong> true uncertainty.</p>

<blockquote>
  <p><strong>Residual error / intrinsic scatter</strong> $\sigma_r$: How much does the measurement process scatter (typically comparable to the measurement noise, though $\sigma_r$ includes also the model mismatch, unmodeled systematics, etc.)</p>
</blockquote>

<h2 id="prediction-error">Prediction Error</h2>

<p>The prediction error describes where - at a given point of your model - you would expect the next measurement to reside with a given certainty. This must account for two contributions, that are typically <em>treated as independent</em>:</p>

<ul>
  <li>Uncertainty of the model parameters (given by $\sigma_f$)</li>
  <li>Scatter of the data around the model (given by $\sigma_r$)</li>
</ul>

<p>This corresponds to the classical distinction between <strong>confidence intervals</strong>, the uncertainty of the fitted mean model, and <strong>prediction intervals</strong>, the uncertainty of individual observations.</p>

<p>Under the <strong>assumption of independence</strong> this yields a total error $\sigma$:</p>

[
\begin{aligned}
\sigma^2(x) &= \sigma_f^2(x) + \sigma_r^2 \\
\sigma(x) &= \sqrt{\sigma_f^2(x) + \sigma_r^2}
\end{aligned}
]

<blockquote>
  <p><strong>Prediction error</strong> $\sigma$: How well can the model predict a new measurement at position $x$</p>
</blockquote>

<p>Keep in mind that the assumption of independence breaks in case of heteroscedastic errors or correlated noise!</p>

<h2 id="a-practical-example">A Practical Example</h2>

<p>To illustrate the concepts, we simulate a derivative Lorentzian (Cauchy) shaped signal, add noise in both axes, perform a fit and then compute the parameter uncertainties, the model confidence band, the residual variance and the full prediction uncertainty.</p>

<p><img src="/assets/images/png/fiterrors001.png" alt="" /></p>

<p>In this plot one can see:</p>

<p>First the <strong>blue (simulated) datapoints</strong>. The simulation assumes an amplitude of $A=120$, $x_0=400$, $\mathrm{FWHM}=3.0$ (i.e. $\gamma=1.5$), $\sigma_x = 0.3$ and $\sigma_y = 0.7 * \mathrm{max}(y_i)$. On top of these we performed the <strong>orange fit</strong> using the Levenberg-Marquardt algorithm to perform a least squares fit against the same model function that has been used to synthesize the data. This yields  $\hat{x_0} = 399.892560 \pm 0.097864 \mathrm{MHz}$ and $\mathrm{FWHM} = 2.369803 \pm 0.385163 \mathrm{MHz}$. The narrow <strong>blue region</strong> around the orange fit function is the fit uncertainty $\sigma_f$. As one can see this is extremly narrow and does not reflect the scatter of individual datapoints. When one would compare this region with new datapoints one would get the impression that the measurements would not confirm the hypothesis given by the model. When adding the residual measurement error $\sigma_r$ one gets the total error $\sigma = \sqrt{\sigma_f(x)^2 + \sigma_r^2}$, which is shown as the <strong>orange band</strong>. This is much wider and one can estimate this to include around 68 percent of all datapoints. The <strong>blue errorbar like line</strong> on top of the points is again $\sigma_r$, the expected scatter of individual measurements.</p>

<p>As one can see the <strong>confidence of the model (orange region)</strong> is much narrower than the <strong>prediction region</strong> for individual measurements.</p>

<h2 id="conclusion-and-the-common-interpretation-pitfall">Conclusion and the Common Interpretation Pitfall</h2>

<p>Comparing measurement data directly to the confidence band $\sigma_f(x)$ instead of the prediction interval $\sigma(x)$ is a common mistake and leads to systematic overestimation of discrepancies between model and data.</p>

<p>The correct interpretation is:</p>

<ul>
  <li>$\sigma_f(x)$: Confidence in the mean fitted model</li>
  <li>$\sigma_r$: Scatter of individual measurements</li>
  <li>$\sigma(x)$: Uncertainty of predicted future observations</li>
</ul>

<p>This implies, that for sufficiently large datasets:</p>

[
\sigma_f(x) \ll \sigma_r
]

<p>This leads to the conclusions:</p>

<ul>
  <li>A fitted model can be known much more precisely than individual measurements</li>
  <li>The covariance matrix $\mathrm{Cov}(\theta)$ describes the <em>parameter uncertainty</em>, not measurement noise</li>
  <li>The model (fit) uncertainty $\sigma_f$ describes the confidence of the model. This corresponds to the <em>posterior uncertainty of the model</em>.</li>
  <li>Residuals $\sigma_r$ capture the <em>intrinsic data scatter</em> or the <em>measurement noise</em></li>
  <li>The correct uncertainty for <em>future predictions</em> is the combination of both effects $\sigma(x) = \sqrt{\sigma_f(x)^2 + \sigma_r^2}$. This corresponds to the <em>posterior predictive distribution</em>.</li>
</ul>]]></content><author><name>tsp</name></author><category term="Physics" /><category term="School math" /><category term="Math" /><category term="Basics" /><category term="Tutorial" /><category term="Statistics" /><category term="Measurements" /><summary type="html"><![CDATA[When fitting models to experimental data, a subtle but critical misunderstanding often arises: the uncertainty of a fitted model can be significantly smaller than the apparent measurement error of the data it is based on. This frequently leads to confusion, especially when narrow confidence bands are misinterpreted as representing the scatter of the measurements themselves. In practice, this can result in incorrect judgments about data quality or even the validity of the model. This article clarifies the distinction between measurement error, model uncertainty, residual variance, and prediction error, and explains how these quantities are related but fundamentally different. Using a practical simulated example, it demonstrates why fitted models can be precise and how to correctly interpret uncertainty as well as confidence when comparing models to experimental data or when predicting future observations.]]></summary></entry><entry><title type="html">ModBus in Practice: From RS485 Buses to Secure, Scalable Automation</title><link href="https://www.tspi.at/2026/04/07/modbus.html" rel="alternate" type="text/html" title="ModBus in Practice: From RS485 Buses to Secure, Scalable Automation" /><published>2026-04-07T00:00:00+02:00</published><updated>2026-04-07T00:00:08+02:00</updated><id>https://www.tspi.at/2026/04/07/modbus</id><content type="html" xml:base="https://www.tspi.at/2026/04/07/modbus.html"><![CDATA[<p>Modern automation systems often appear deceptively complex. Fieldbuses, industrial protocols, cloud integrations, and proprietary stacks suggest a level of complexity that is often unnecessary for many real-world applications. At the core of many reliable automation systems, however, lies a much simpler idea: a shared communication medium with deterministic request–response semantics.</p>

<p>One of the most enduring implementations of this idea is <a href="https://www.modbus.org/modbus-specifications">ModBus</a>, particularly in its RS485-based RTU variant. Despite its age, ModBus remains widely used in industrial control, laboratory environments, energy systems, and increasingly in small-scale automation such as homes, gardens, and greenhouses.</p>

<p>This article explores ModBus from a practical systems perspective. It focuses on my own RS485-based deployments, the challenges that arise when integrating such systems into modern software environments, and presents a set of tools I personally designed to bridge the gap between legacy fieldbus systems and contemporary infrastructure that I use with my own deployments, including a gateway service, a hardware implementation for Atmel AVR microcontrollers and a software client for Python applications and scripts.</p>

<ul>
  <li><a href="#why-modbus-still-matters">Why ModBus Still Matters</a></li>
  <li><a href="#physical-layer-rs485-in-practice">Physical Layer: RS485 in Practice</a></li>
  <li><a href="#protocol-layer-modbus-rtu">Protocol Layer: ModBus RTU</a></li>
  <li><a href="#the-real-problem-multi-master-access">The Real Problem: Multi-Master Access</a></li>
  <li><a href="#architecture-a-central-modbus-gateway">Architecture: A Central ModBus Gateway</a>
    <ul>
      <li><a href="#my-implementation-modbusgw">My Implementation: modbusgw</a>
        <ul>
          <li><a href="#installation">Installation</a></li>
          <li><a href="#configuration">Configuration</a></li>
          <li><a href="#frontend-configurations">Frontend Configurations</a>
            <ul>
              <li><a href="#virtual-serial-ports-pty">Virtual Serial Ports (pty)</a></li>
              <li><a href="#modbus-ip-socket">ModBus IP TCP Socket</a></li>
            </ul>
          </li>
          <li><a href="#backend-configurations">Backend Configurations</a>
            <ul>
              <li><a href="#hardware-serial-ports">Hardware Serial Ports</a></li>
              <li><a href="#modbus-ip-via-tcp">ModBus IP via TCP</a></li>
            </ul>
          </li>
          <li><a href="#routing-configuration">Routing Configuration</a></li>
          <li><a href="#example-configuration-file">Example configuration file</a></li>
          <li><a href="launching-and-controlling-the-gateway">Launching and Controlling the Gateway</a></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="#client-library">Client Library</a>
    <ul>
      <li><a href="#a-simple-example">A Simple Example</a></li>
    </ul>
  </li>
  <li><a href="#embedded-side-avr-framework">Embedded Side: AVR Framework</a></li>
  <li><a href="#practical-applications">Practical Applications</a>
    <ul>
      <li><a href="#my-favorite-hardware-devices">My Favorite Hardware Devices</a></li>
    </ul>
  </li>
  <li><a href="#security-considerations">Security Considerations</a></li>
  <li><a href="#design-philosophy">Design Philosophy</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
  <li><a href="#references">References</a></li>
</ul>

<p><img src="/assets/images/png/modbus001.png" alt="" /></p>

<h2 id="why-modbus-still-matters">Why ModBus Still Matters</h2>

<p>What makes ModBus particularly interesting is not its feature set - but the absence of it. Its simplicity leads to <strong>robustness, debuggability and long-term stability</strong>. There are no hidden layers, no opaque negotiation steps, and no dynamic topology discovery. Everything is explicit and implementable by any manufacturer or hobbyist.</p>

<p>This simplicity extends all the way down to the physical and protocol layers, making ModBus exceptionally easy to implement even on very constrained hardware. Unlike <a href="https://de.wikipedia.org/wiki/Controller_Area_Network">CAN</a> or <a href="https://de.wikipedia.org/wiki/Ethernet">Ethernet</a>-based systems - which require more complex controllers, protocol stacks and often significantly more expensive interface hardware - ModBus RTU over RS485 can be realized on sub-Euro microcontrollers with minimal resources and very inexpensive transceiver ICs. This makes it particularly attractive for distributed sensing and control applications where cost, simplicity and reliability are more important than raw throughput.</p>

<p>Compared to more modern systems such as MQTT-based automation or high-performance fieldbuses like <a href="https://de.wikipedia.org/wiki/EtherCAT">EtherCAT</a>, ModBus trades flexibility and throughput for predictability, reliability and ease of implementation. In many environments - especially where timing constraints are moderate and reliability is critical - this tradeoff is highly desirable.</p>

<h3 id="limitations">Limitations</h3>

<p>While ModBus excels through its simplicity, this minimalism also imposes a number of practical limitations that must be considered when designing real-world systems.</p>

<p>One of the most prominent constraints is <em>throughput</em>. Especially in RS485-based RTU deployments, the achievable data
rate is relatively low. At 9600 baud, effective payload throughput is on the order of only a few kilobytes
per second, and even at higher baud rates the request–response nature of the protocol introduces unavoidable
overhead. As the number of devices grows, polling cycles become longer, increasing latency for both control
and monitoring tasks. Though for most simple monitoring and control tasks that operate more on the order of
multiple seconds to minutes this does not matter.</p>

<p>Closely related to this is the strict master–slave model, which prevents concurrent access to the bus. All
communication must be serialized and initiated by the master, and even read-only operations cannot be
performed in parallel. This becomes increasingly problematic in modern systems where multiple independent
services require access to the same data. Without an additional coordination layer, such as the gateway
architecture presented later in this article, this leads to contention and non-deterministic behavior. In 
addition the requirement of the master initiating the communication prevents fast event notification from
sensors, the time constraint is defined by the polling interval by the master.</p>

<p>Another limitation lies in the lack of higher-level protocol features. ModBus provides no built-in mechanisms
for device discovery, configuration, or semantic description of data. Registers are purely numerical and their
meaning is defined externally, often in device-specific documentation. This makes integration straightforward
for simple systems, but increasingly complex as systems grow and heterogeneous devices are introduced.</p>

<p>From a security perspective, ModBus in its original form offers no authentication, no encryption, and no
integrity protection beyond basic checksums. While this is acceptable in isolated industrial networks,
it becomes a critical issue when systems are connected to larger infrastructures or exposed to untrusted
environments.</p>

<p>Finally, although often described as deterministic, real-world ModBus systems can exhibit variable latency
due to device response times, retries, and bus contention. Determinism exists primarily at the protocol
level, but system-level timing guarantees depend heavily on implementation details and network design.</p>

<p>These limitations do not diminish the value of ModBus - in many cases, they are the direct consequence
of its simplicity. However, they highlight the need for carefully designed system architectures when
integrating ModBus into modern, distributed environments.</p>

<h2 id="physical-layer-rs485-in-practice">Physical Layer: RS485 in Practice</h2>

<p>RS485 provides a differential signaling scheme that allows reliable communication over long distances and in electrically noisy environments. Unlike single-ended signaling, RS485 transmits the difference between two lines, making it highly resilient against common-mode noise. The following image shows the capture of the A line (yellow), the B line (turquoise) and the calculate difference (violet) of a RS485 transmission on a <a href="https://amzn.to/48tOLsA">cheap USB oscilloscope</a></p>

<p><img src="/assets/images/jpg/rs485_01.jpg" alt="Example oscilloscope trace showing an RS485 transmission" /></p>

<p>Another important characteristic is its physical reach. At relatively low baud rates such as 9600, cable lengths of up to roughly 1400 meters are achievable on standard twisted-pair copper cabling without requiring fiber optics (note that you are usually using $0.75 \mathrm{mm}^2$ cabling with 4 poles for A, B, ground and DC supply voltage between 5 and 36V). Even at higher data rates like 115200 baud, distances on the order of 400 meters are still realistic within a single segment. This makes RS485 particularly attractive for distributed installations such as gardens, greenhouses, industrial halls, or laboratory environments where devices are spread across medium scale areas.</p>

<p>Typical deployments use a linear bus topology with termination resistors at both ends. Correct termination ($120 \Omega$ resistors) and biasing are essential to avoid reflections and undefined bus states. In practice, many issues attributed to <em>protocol problems</em> are in fact caused by improper physical layer implementation.</p>

<p>The bus is usually operated in <strong>half-duplex mode</strong>, meaning that only one device can transmit at a time. This leads directly to one of the central architectural constraints of ModBus RTU systems: arbitration.</p>

<p>There are, however, practical limits. While the protocol allows addressing up to 255 devices, real-world deployments are usually constrained by electrical loading of the bus. In many cases, a single RS485 segment supports on the order of 32-128 devices, depending on transceiver characteristics, bus loading, termination quality, and topology. Careful design - such as using repeaters or segmenting the bus - may be required for larger installations.</p>

<h2 id="protocol-layer-modbus-rtu">Protocol Layer: ModBus RTU</h2>

<p><a href="https://www.modbus.org/modbus-specifications">ModBus RTU</a> operates on a strict master–slave model. A single master initiates all communication, while slaves only respond to requests addressed to them.</p>

<p>Frames are transmitted within timing constraints, including mandatory silent intervals that delimit frames (typically 3.5 character times), which is used as frame delemiter and to detect message boundaries. Each frame contains:</p>

<ul>
  <li>Device address</li>
  <li>Function code</li>
  <li>Payload</li>
  <li>CRC checksum</li>
</ul>

<p>Even though timing is part of the protocol and slave implementations must be careful when interpreting silence periods, on the master side, typical implementations using hardware UARTs (for example in USB-to-RS485 adapters) do not impose strict timing constraints and are largely insensitive to operating system scheduling or buffering and thus easy to implement from the software side. Timing requirements are primarily relevant on the slave side, where frame detection depends on correct interpretation of inter-frame gaps.</p>

<h2 id="the-real-problem-multi-master-access">The Real Problem: Multi-Master Access</h2>

<p>ModBus RTU assumes a single master. In practice, modern systems often require multiple independent software components to access the same bus. These software architectures are typically composed of multiple loosely coupled services - as of today often following microservice principles - to improve modularity, scalability and maintainability through separation of concerns. In such environments, it is common that different services require access to the same physical devices for control, monitoring or logging purposes. Techniques such as Command Query Responsibility Segregation (CQRS) further emphasize this separation by distinguishing between control paths (commands that modify system state) and read paths (queries used for monitoring and reporting), which may also operate under different security constraints.</p>

<p>Without coordination, this leads to collisions, corrupted frames and undefined system behavior. Even if collisions are avoided, interleaving requests from multiple sources can break assumptions about timing and state. This mismatch between the original design and modern usage patterns is one of the key challenges when integrating ModBus into contemporary systems.</p>

<p>I resolved the problem for my personal deployments by developing modbusgw, presented in the next section.</p>

<h2 id="architecture-a-central-modbus-gateway">Architecture: A Central ModBus Gateway</h2>

<p>To resolve the multi-master problem, a gateway can be introduced. This gateway acts as the <strong>only physical master</strong> on the given RS485 bus, while exposing multiple logical interfaces to clients. Conceptually, the gateway acts as a serialization layer for bus access while exposing a parallel interface to clients.</p>

<p>The gateway performs:</p>

<ul>
  <li>Arbitration between competing requests</li>
  <li>Scheduling of bus access</li>
  <li>Mapping between different transport layers</li>
  <li>Remapping device IDs between different virtual bus representations and the real physical backends</li>
</ul>

<p>It allows multiple applications to interact with the same physical buses safely and deterministically.</p>

<h3 id="my-implementation-modbusgw">My Implementation: modbusgw</h3>

<p>The presented solution, <code class="language-plaintext highlighter-rouge">modbusgw</code>, implements this gateway architecture in a modular fashion.</p>

<p>Frontends allow clients to access the service:</p>

<ul>
  <li>Virtual serial ports (PTY) provide a interface that looks like the real physical UART based hardware</li>
  <li>Unix domain sockets (UDS) for locally interacting ModBus/TCP systems without exposing themselves to the network.</li>
  <li>TCP sockets for remote access via control networks.
    <ul>
      <li>Optional TLS and mutual TLS for secure communication and client authentication for systems exposed to non-isolated networks.</li>
    </ul>
  </li>
</ul>

<p>Backends connect to actual devices:</p>

<ul>
  <li>RS485 buses via USB adapters</li>
  <li>Remote ModBus/TCP devices</li>
</ul>

<p>A routing layer in between allows mapping of device IDs and registers, as well as filtering requests. This enables the creation of <strong>security boundaries</strong> within the system and allows selective exposure of functionality to different sets of clients.</p>

<h4 id="installation">Installation</h4>

<p>The gateway has been implemented in Python and is <a href="https://github.com/tspspi/modbusgw">available on GitHub</a>. It can be installed via it’s <a href="https://pypi.org/project/modbus-gateway">PyPi package</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip install modbus-gateway
</code></pre></div></div>

<h4 id="configuration">Configuration</h4>

<p>The application is configured from a single JSON configuration file. By default it resides - when using the FreeBSD <code class="language-plaintext highlighter-rouge">rc.init</code>
script - at <code class="language-plaintext highlighter-rouge">/usr/local/etc/modbusgateway.cfg</code>, when executing the program from the commandline the default configuration
resides at <code class="language-plaintext highlighter-rouge">~/.config/modbusgateway.cfg</code>. The location configuration file can be overriden via the <code class="language-plaintext highlighter-rouge">--config</code> flag or
the <code class="language-plaintext highlighter-rouge">modbusgw_config</code> option in <code class="language-plaintext highlighter-rouge">/etc/rc.conf</code></p>

<p>The configuration is split into different sections:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">service</code> provides configuration of the main daemon</li>
  <li><code class="language-plaintext highlighter-rouge">bus</code> configures the internal message bus</li>
  <li><code class="language-plaintext highlighter-rouge">frontends</code> contains a list of frontend configurations over which clients
are capable of accessing the daemon</li>
  <li><code class="language-plaintext highlighter-rouge">backends</code> is the counterparts and defines the interfaces that are accessed
on behalf of the clients via the gateway.</li>
  <li><code class="language-plaintext highlighter-rouge">routes</code> provides a match-list based configuration on how to route messages 
between frontends and backends.</li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">service</code> section configures PID file to prevent multiple running
instances, the state directory that will be used for log- and tracefiles
as well as the loglevel:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"service" : {
   "log_level" : "INFO",
   "pid_file" : "/var/run/modbusgw.pid",
   "state_dir" : "/var/modbusgw/",
   "reload_grace_seconds" : 5
}
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">bus</code> configuration configures the internal buffer for incoming requests
that are routed to various backends:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"bus" : {
   "request_queue_size" : 64,
   "response_timeout_ms" : 1500
}
</code></pre></div></div>

<p>Note that this timeout should be shorter than the applications and frontends
timeouts.</p>

<h4 id="frontend-configurations">Frontend Configurations</h4>

<h5 id="virtual-serial-ports-pty">Virtual Serial Ports (pty)</h5>

<p>Virtual serial ports are directly accessible via <code class="language-plaintext highlighter-rouge">pyserial</code>  and similar interfaces.
This allows existing legacy software to access the gateway via unmodified code by
pointing it at the virtual serial port file handles:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
   "id" : "virtual_serial_rtu",
   "type" : "serial_rtu_socket",
   "socket_path" : "/var/modbusgw/ttyBus0",
   "pty_mode" : "rw",
   "idle_close_seconds" : 600,
   "frame_timeout_ms" : 5.0
}
</code></pre></div></div>

<p>The shown configuration instantiates a virtual serial port at the specified <code class="language-plaintext highlighter-rouge">socket_path</code>,
allowing read-write transactions. The frame timeout handles incomplete messages on the
application side. The name <code class="language-plaintext highlighter-rouge">virtual_serial_rtu</code> is an arbitrary chosen name that is
used in the routing configuration.</p>

<h5 id="modbus-ip-tcp-socket">ModBus IP TCP Socket</h5>

<p>A ModBus IP socket speaks the ModBus IP protocol over an TCP socket (optionally
supporting TLS or mTLS for authenticated sessions). The following configuration exposes
unencrypted ModBus IP applying only IP subnet based filters:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
   "id" : "frontend_tcp",
   "type" : "tcp_modbus_tcp",
   "host" : "192.0.2.1",
   "port" : 1234,
   "cidr_allow" : [
      "127.0.0.0/8",
      "192.0.2.0/24"
   ]
}
</code></pre></div></div>

<p>If TLS is desired the following configuration can be added to the frontend configuration
object:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>   "tls" : {
      "cert_file" : "/path/to/server.crt",
      "key_file" : "/path/to/server.key",
      "ca_file" : "/path/to/rootca.crt",
      "require_client_cert" : true,
      "client_dn_allow" : [
         "CN=ModbusGW Test Client"
      ]
   }
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">cert_file</code> and <code class="language-plaintext highlighter-rouge">key_file</code> establish the server identity. The <code class="language-plaintext highlighter-rouge">ca_file</code> is only
used when <code class="language-plaintext highlighter-rouge">require_client_cert</code> is set to <code class="language-plaintext highlighter-rouge">true</code> to allow client authentication. The
additional (optional) <code class="language-plaintext highlighter-rouge">client_dn_allow</code> filter allows to filter the DNs from
valid certificates (after certificate validation) that are allowed to access the frontend.</p>

<h4 id="backend-configurations">Backend Configurations</h4>

<h5 id="hardware-serial-ports">Hardware Serial Ports</h5>

<p>The <code class="language-plaintext highlighter-rouge">pyserial</code> backend uses the <a href="https://pypi.org/project/pyserial/">pyserial</a> library
to access an USB to RS485 based interface. This is the most simple hardware interface 
for DIY setups. The specified serial configuration is applied when accessing the backend.
Again the arbitrary <code class="language-plaintext highlighter-rouge">id</code> is used in the routing configuration.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
   "id" : "hardware_serial",
   "type" : "pyserial",
   "device" : "/dev/ttyU0",
   "baudrate" : 9600,
   "parity" : "N",
   "stop_bits" : 1,
   "request_timeout_ms" : 1200
}
</code></pre></div></div>

<h5 id="modbus-ip-via-tcp">ModBus IP via TCP</h5>

<p>A TCP backend can be configured via the <code class="language-plaintext highlighter-rouge">tcp_modbus</code> backend:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
   "id" : "tcp_backend",
   "type" : "tcp_modbus",
   "host" : "127.0.0.1",
   "port" : 1234,
   "connect_timeout" : 2.0,
   "pool_size" : 2,
   "use_tls" : true,
   "tls" : {
      "ca_file" : "/path/to/root.crt",
      "cert_file" : "/path/to/client.crt",
      "key_file" : "/path/to/client.key"
   }
}
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">use_tls</code> and <code class="language-plaintext highlighter-rouge">tls</code> blocks are optional and are only used when (m)TLS is
desired. The <code class="language-plaintext highlighter-rouge">root.crt</code> is used for validation, the client keys for authentication
via mTLS.</p>

<h4 id="routing-configuration">Routing Configuration</h4>

<p>The routing configuration is provided as a list of routing commands that are matched
against incoming requests from the frontends. The first match determines to which backend 
a message is routed. The <code class="language-plaintext highlighter-rouge">backend</code> key and the <code class="language-plaintext highlighter-rouge">mirror_to_mqtt</code> key is not used
for matching, all other fields apply:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
   "frontend" : "virtual_serial_rtu",
   "backend" : "hardware_serial",
   "match" : {
      "unit_ids" : [ "*" ],
      "function_codes" : [ "*" ]
   },
   "mirror_to_mqtt" : [ ]
}
</code></pre></div></div>

<p>The routing <code class="language-plaintext highlighter-rouge">match</code> block allows to filter given device IDs and function codes
as well as operations. For example to allow only function code 1 (read coils)
for the virtual device <code class="language-plaintext highlighter-rouge">5</code>, redirecting the operation to the backend device id <code class="language-plaintext highlighter-rouge">1</code>,
one would use</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
   "frontend" : "virtual_serial_rtu",
   "backend" : "hardware_serial",
   "match" : {
      "unit_ids" : [ 5 ],
      "function_codes" : [ 1 ],
      "operations" : [ "read" ]
   },
   "unit_override" : 1,
   "mirror_to_mqtt" : [ ]
}
</code></pre></div></div>

<p>Here the <code class="language-plaintext highlighter-rouge">match</code> block specifies conditions that <em>have</em> to be fulfilled (all
have to be fulfilled). The optional <code class="language-plaintext highlighter-rouge">unit_override</code> replaces the device ID
on the virtual frontend bus to the given unit number before handing off the
the backend device. All fields can be used in arbitrary combinations.</p>

<h4 id="example-configuration-file">Example configuration file</h4>

<p>The following configuration exposes a single serial to RS485 interface
via a local virtual serial port as well as a ModBus IP socket available
via unencrypted TCP:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{
   "service" : {
      "log_level" : "INFO",
      "pid_file" : "/var/run/modbusgw.pid",
      "state_dir" : "/var/modbusgw/",
      "reload_grace_seconds" : 5
   },
   "bus" : {
      "request_queue_size" : 64,
      "response_timeout_ms" : 1500
   },
   "frontends" : [
      {
         "id" : "virtual_serial_rtu",
         "type" : "serial_rtu_socket",
         "socket_path" : "/var/modbusgw/ttyBus0",
         "pty_mode" : "rw",
         "idle_close_seconds" : 600,
         "frame_timeout_ms" : 5.0
      },
      {
         "id" : "frontend_tcp",
         "type" : "tcp_modbus_tcp",
         "host" : "192.0.2.1",
         "port" : 1234,
         "cidr_allow" : [
            "127.0.0.0/8",
            "192.0.2.0/24"
         ]
      }
   ],
   "backends" : [
      {
         "id" : "hardware_serial",
         "type" : "pyserial",
         "device" : "/dev/ttyU0",
         "baudrate" : 9600,
         "parity" : "N",
         "stop_bits" : 1,
         "request_timeout_ms" : 1200
      }
   ],
   "routes" : [
      {
         "frontend" : "virtual_serial_rtu",
         "backend" : "hardware_serial",
         "match" : {
            "unit_ids" : [ "*" ],
            "function_codes" : [ "*" ]
         },
         "mirror_to_mqtt" : [ ]
      },
      {
         "frontend" : "frontend_tcp",
         "backend" : "hardware_serial",
         "match" : {
            "unit_ids" : [ "*" ],
            "function_codes" : [ "*" ]
         },
         "mirror_to_mqtt" : [ ]
      }
   ]
}
</code></pre></div></div>

<h4 id="launching-and-controlling-the-gateway">Launching and Controlling the Gateway</h4>

<p>The gateway can be executed in foreground mode on the command line:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ modbusgw
</code></pre></div></div>

<p>In addition it supports executing daemonized. To control the daemon the
command line client supports the usual commands:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ modbusgw start
$ modbusgw stop
$ modbusgw status
$ modbusgw restart
$ modbusgw reload
</code></pre></div></div>

<p>For usage on <a href="https://www.freebsd.org">FreeBSD</a>, my operating system of choice, one
can use an <code class="language-plaintext highlighter-rouge">rc.init</code> script stored in <code class="language-plaintext highlighter-rouge">/usr/local/etc/rc.d/modbusgw</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/bin/sh
# PROVIDE: modbusgw
# REQUIRE: LOGIN
# KEYWORD: shutdown

. /etc/rc.subr

name="modbusgw"
rcvar="modbusgw_enable"

load_rc_config $name

: ${modbusgw_enable:="NO"}
: ${modbusgw_command:="/usr/local/bin/modbusgw"}
: ${modbusgw_config:="/usr/local/etc/modbusgateway.cfg"}
: ${modbusgw_user:="modbusgw"}
: ${modbusgw_group:="modbusgw"}
: ${modbusgw_pidfile:="/var/run/modbusgw.pid"}
: ${modbusgw_var_dir:="/var/modbusgw"}
: ${modbusgw_log_file:="${modbusgw_var_dir}/modbusgw.log"}
: ${modbusgw_timeout:="15"}
: ${modbusgw_flags:=""}

command="${modbusgw_command}"
pidfile="${modbusgw_pidfile}"
required_files="${modbusgw_config}"
extra_commands="reload restart status"
start_cmd="${name}_start"
stop_cmd="${name}_stop"
reload_cmd="${name}_reload"
restart_cmd="${name}_restart"
status_cmd="${name}_status"

modbusgw_ensure_var_dir()
{
	if [ ! -d "${modbusgw_var_dir}" ]; then
		install -d -o "${modbusgw_user}" -g "${modbusgw_group}" -m 0750 "${modbusgw_var_dir}"
	else
		chown "${modbusgw_user}:${modbusgw_group}" "${modbusgw_var_dir}"
	fi
}

modbusgw_build_cmd()
{
	_subcmd="$1"
	shift
	_cmd="${command} -c \"${modbusgw_config}\" ${_subcmd}"
	if [ -n "${modbusgw_log_file}" ]; then
		_cmd="${_cmd} --log-file \"${modbusgw_log_file}\""
	fi
	for _arg in "$@"; do
		_cmd="${_cmd} ${_arg}"
	done
	if [ -n "${modbusgw_flags}" ]; then
		_cmd="${_cmd} ${modbusgw_flags}"
	fi
	echo "${_cmd}"
}

modbusgw_run()
{
	_cmd=$(modbusgw_build_cmd "$@")
	if [ "$(id -un)" = "${modbusgw_user}" ]; then
		/bin/sh -c "${_cmd}"
	else
		su -m "${modbusgw_user}" -c "${_cmd}"
	fi
}

modbusgw_start()
{
	modbusgw_ensure_var_dir
	modbusgw_run start
}

modbusgw_stop()
{
	modbusgw_run stop --timeout "${modbusgw_timeout}"
}

modbusgw_reload()
{
	modbusgw_run reload
}

modbusgw_restart()
{
	modbusgw_stop
	sleep 1
	modbusgw_start
}

modbusgw_status()
{
	modbusgw_run status
}

run_rc_command "$1"
</code></pre></div></div>

<p>Then the configuration happens as usual for this system via <code class="language-plaintext highlighter-rouge">/etc/rc.conf</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>modbusgw_enable="YES"
modbusgw_config="/usr/local/etc/modbusgateway.cfg"
modbusgw_user="modbusgw"
modbusgw_group="modbusgw"
modbusgw_pidfile="/var/modbusgw/modbusgw.pid"
modbusgw_var_dir="/var/modbusgw"
modbusgw_log_file="/var/modbusgw/modbusgw.log"
</code></pre></div></div>

<p>Control then is performed via the following commands:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ /usr/local/etc/rc.d/modbusgw start
$ /usr/local/etc/rc.d/modbusgw stop
$ /usr/local/etc/rc.d/modbusgw status
$ /usr/local/etc/rc.d/modbusgw restart
$ /usr/local/etc/rc.d/modbusgw reload
</code></pre></div></div>

<h2 id="client-library">Client Library</h2>

<p>A corresponding client library, also available in the same <a href="https://github.com/tspspi/modbusgw/">GitHub repository</a> and
installable via a separate <a href="https://pypi.org/project/modbusgw-client/">PyPi package</a> <code class="language-plaintext highlighter-rouge">modbusgw-client</code>,  provides a unified
interface across different transports to Python applications and scripts. It supports:</p>

<ul>
  <li>Serial (RTU via pyserial), which can also be used to connect to pty based frontends.</li>
  <li>ModBus/TCP over UDS</li>
  <li>ModBus/TCP over TCP
    <ul>
      <li>Secure variants using TLS and mTLS</li>
    </ul>
  </li>
</ul>

<p>This abstraction allows applications to switch between local and remote deployments without changes to application logic and
without exposure to the actual protocol encoding.</p>

<h3 id="a-simple-example">A simple example</h3>

<p>First, let’s install the package</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ pip install modbusgw-client
</code></pre></div></div>

<p>Now one can use the <code class="language-plaintext highlighter-rouge">TcpClient</code> or the <code class="language-plaintext highlighter-rouge">SerialClient</code> classes in a very simple fashion:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#!/usr/local/bin/python3

from time import sleep

from modbusgw_client.tcp_client import TcpClient
from modbusgw_client.serial_client import SerialClient
from modbusgw_client.pdu import WriteSingleCoilRequest, ReadCoilsRequest, ReadHoldingRegistersRequest

# TCP backend

with TcpClient(host="192.0.2.2", port=1234, timeout=10) as client:
   client.execute(WriteSingleCoilRequest(
      unit_id = 2, # The device ID on the virtual bus
      address = 5, # "Coil" index
      True # Coil status
   )

# Serial backend

with SerialClient("/var/modbusgw/ptyBus0", baudrate=9600, timeout=10) as client:
   client.execute(WriteSingleCoilRequest(
      unit_id = 2, # The device ID on the virtual bus
      address = 5, # "Coil" index
      True # Coil status
   )

</code></pre></div></div>

<h2 id="embedded-side-avr-framework">Embedded Side: AVR Framework</h2>

<p>On the device side, I developed a lightweight AVR-based ModBus framework, that allows implementation
of custom ModBus slaves on cheap readily available Atmel ATMega microcontrollers.</p>

<p>The framework is particularly useful for:</p>

<ul>
  <li>Custom sensor readout</li>
  <li>Simple control systems</li>
  <li>Bridging analog signals into ModBus systems</li>
</ul>

<p>One of my current application is the readout of a <a href="https://www.inficon.com/de/produkte/vakuummessgeraete-und-controller/heiss-ionisationsmessgeraete/pbr-260">Infinicon PBR260 Pirani pressure gauge</a> with analog output, making it accessible via ModBus in a vacuum system, as well as the readout of <a href="https://amzn.to/4meSEqZ">ultrasonic water level sensors</a> for water management in a small garden setup.</p>

<p>The framework is available on <a href="https://github.com/tspspi/avrModBus">GitHub</a>. It is built with <code class="language-plaintext highlighter-rouge">avr-gcc</code> and targets
the Atmel <a href="https://amzn.to/48yDirN">ATMega328P</a> and <a href="https://amzn.to/41ho3Q8">ATMega2560</a>. It allows easy interfacing
with the coils, input registers, output registers and holding registers. Examples are provided in
the <a href="https://github.com/tspspi/avrModBus/blob/master/examples/basic/main.c">GitHub repository</a>.</p>

<h2 id="practical-applications">Practical Applications</h2>

<p>In practical deployments, a wide range of ModBus-capable devices can be integrated into a unified system.</p>

<p>Relay modules (e.g. 2, 8, or 32 channel units) can control loads such as pumps, valves, or lighting. Temperature and humidity
sensors provide environmental monitoring. Soil sensors measure moisture and nutrient levels (NPK), enabling
automated irrigation and fertilizing strategies.</p>

<p>Pulse counting modules allow integration of flow sensors, making it possible to monitor water usage as well as
valve operation and failure. Combined with relay-controlled valves, this enables fully automated irrigation systems.</p>

<p>In home and lab environments, ModBus is frequently used to monitor HVAC systems, control of lights, control and
monitoring of cooling loops, and monitoring of power consumption via smart meters. In laboratory setups,
RS485-based systems are commonly used for devices like vacuum pumps and other slow control systems.</p>

<h3 id="my-favorite-hardware-devices">My Favorite Hardware Devices</h3>

<p>My most favorite devices are:</p>

<ul>
  <li><a href="https://amzn.to/4sR16PL">Waveshare USB to RS485 interface modules</a></li>
  <li>Waveshare <a href="https://amzn.to/3PTKrwz">32 channel</a> and <a href="https://amzn.to/4sh4WAy">8 channel</a> RTU relais modules, providing control
of typical 230V and low voltage appliances</li>
  <li>Interface modules <a href="https://amzn.to/3Od1Cse">with digital inputs and outputs</a> that I use to interface to <a href="https://amzn.to/4cidMbE">anemometers</a></li>
  <li><a href="https://de.aliexpress.com/item/1005006429949294.html">X0 Pulse Counter</a> modules to interface to <a href="https://amzn.to/4ttOoX0">flow sensors</a></li>
  <li>Chinese soil <a href="https://de.aliexpress.com/item/1005005697940574.html">humidity and NPK sensors</a> providing direct ModBus readout</li>
  <li><a href="https://de.aliexpress.com/item/1005005471608120.html">Indoor SHTC3</a> and outdoor <a href="https://de.aliexpress.com/item/1005004870015772.html">SHT30</a>
temperature and humidity sensors</li>
  <li>GPIO <a href="https://de.aliexpress.com/item/1005003162434730.html">NPN and PNP boards</a></li>
  <li>The <code class="language-plaintext highlighter-rouge">R3DCB08</code> interface board to <a href="https://amzn.to/485D8rZ">DS18B20 onewire temperature sensors</a></li>
</ul>

<h2 id="security-considerations">Security Considerations</h2>

<p>Raw ModBus/TCP has no authentication, no encryption, and no concept of access control. Exposing it directly
to untrusted networks is inherently unsafe. <strong>ModBus/TCP should never be exposed outside of an isolated
automation network or VLAN</strong>.</p>

<p>A gateway-based architecture enables:</p>

<ul>
  <li>Wrapping ModBus communication in TLS, thus providing confidentiality</li>
  <li>Enforcing <strong>client authentication</strong> via mTLS</li>
  <li>Restricting access to specific devices and registers and thus providing
monitoring only access paths, enabling separation of control and monitoring
paths, aligning with CQRS architectures.</li>
</ul>

<p>Unix domain sockets provide an additional option for local communication with the gateway, avoiding
unnecessary network exposure.</p>

<h2 id="design-philosophy">Design Philosophy</h2>

<p>A key design principle is to keep the physical and protocol layers simple, while moving complexity into
controlled software layers. The RS485 bus remains deterministic, easy to debug and especially very 
easy and cheap to implement. All advanced features - security, multiplexing, abstraction - are implemented
in user space, where they can be maintained, audited, and evolved without affecting system stability.</p>

<p>This separation leads to systems that are both robust and flexible, combining the reliability of industrial
fieldbuses with the capabilities of modern software architectures.</p>

<h2 id="conclusion">Conclusion</h2>

<p>ModBus, and in particular its RS485-based RTU variant, demonstrates that simplicity is not a limitation
but a design strength. Its minimalism allows it to remain understandable, debuggable and implementable across
a wide range of systems - from industrial installations to small-scale home and laboratory setups.</p>

<p>At the same time, modern software architectures impose requirements that the original protocol was never
designed to address. Multiple independent services, distributed systems, and stricter security expectations
fundamentally conflict with the single-master assumption and lack of built-in protection mechanisms.</p>

<p>By introducing a central gateway layer, these two worlds can be reconciled. The physical bus remains
simple, deterministic and reliable, while higher-level concerns such as arbitration, access control,
transport abstraction and security are handled in user space. This separation allows systems to scale
without sacrificing the robustness of the underlying fieldbus.</p>

<p>In practice, this approach enables a wide range of applications - from garden irrigation and environmental
monitoring to laboratory instrumentation and energy management - using inexpensive hardware and straightforward
implementations.</p>

<p>Rather than replacing ModBus with more complex alternatives, it is often more effective to embrace its
simplicity and complement it with well-designed software layers. This combination provides a powerful
foundation for building reliable, secure, and maintainable automation systems.</p>

<h2 id="references">References</h2>

<ul>
  <li>My own software:
    <ul>
      <li>GitHub repositories:
        <ul>
          <li>The <a href="https://github.com/tspspi/modbusgw/">gateway and client implementation</a></li>
          <li><a href="https://github.com/tspspi/avrModBus">AVR ModBus client firmware</a></li>
        </ul>
      </li>
      <li>PyPi packages:
        <ul>
          <li>The <a href="https://pypi.org/project/modbus-gateway">modbus-gateway</a> gateway implementation</li>
          <li>The accompanying <a href="https://pypi.org/project/modbusgw-client">modbusgw-client</a> Python client implementation</li>
        </ul>
      </li>
    </ul>
  </li>
  <li>Hardware:
    <ul>
      <li><a href="https://amzn.to/4vkvX9k">MAX485 transceivers</a> for custom components built around Atmel <a href="https://amzn.to/48yDirN">ATMega328P</a>
and <a href="https://amzn.to/41ho3Q8">ATMega2560</a> boards.</li>
      <li><a href="https://amzn.to/4ctpQrx">4 pole cabling</a></li>
      <li><a href="https://amzn.to/4sR16PL">Waveshare USB to RS485 interface modules</a></li>
      <li>Waveshare <a href="https://amzn.to/3PTKrwz">32 channel</a> and <a href="https://amzn.to/4sh4WAy">8 channel</a> RTU relais modules, providing control
of typical 230V and low voltage appliances</li>
      <li>Interface modules <a href="https://amzn.to/3Od1Cse">with digital inputs and outputs</a> that I use to interface to <a href="https://amzn.to/4cidMbE">anemometers</a></li>
      <li><a href="https://de.aliexpress.com/item/1005006429949294.html">X0 Pulse Counter</a> modules to interface to <a href="https://amzn.to/4ttOoX0">flow sensors</a></li>
      <li>Chinese soil <a href="https://de.aliexpress.com/item/1005005697940574.html">humidity and NPK sensors</a> providing direct ModBus readout</li>
      <li><a href="https://de.aliexpress.com/item/1005005471608120.html">Indoor SHTC3</a> and outdoor <a href="https://de.aliexpress.com/item/1005004870015772.html">SHT30</a>
temperature and humidity sensors</li>
      <li>GPIO <a href="https://de.aliexpress.com/item/1005003162434730.html">NPN and PNP boards</a></li>
      <li>The <code class="language-plaintext highlighter-rouge">R3DCB08</code> interface board to <a href="https://amzn.to/485D8rZ">DS18B20 onewire temperature sensors</a></li>
    </ul>
  </li>
  <li>The <a href="https://www.modbus.org/modbus-specifications">ModBus specification</a></li>
  <li>The <a href="https://amzn.to/48tOLsA">Hantek 6022BE USB oscilloscope</a> for debugging purposes</li>
</ul>]]></content><author><name>tsp</name></author><category term="Programming" /><category term="Electronics" /><category term="Hardware" /><category term="DIY" /><category term="RS485" /><category term="Microcontroller" /><category term="Home automation" /><category term="Automation" /><category term="ModBus" /><summary type="html"><![CDATA[ModBus RTU over RS485 remains one of the simplest and most reliable ways to build distributed automation systems - yet integrating it into modern software architectures is anything but straightforward. While the protocol itself is minimal and easy to implement even on low-cost microcontrollers, its strict single-master model clashes with todays multi-service environments and increasing security requirements. This article explores how to bridge that gap using a gateway-based architecture, utilizing a gateway developed by myself, that enables safe multi-client access, transport abstraction, and secure communication via TLS and mTLS. Along the way, it covers practical RS485 deployment considerations, real-world hardware setups for home, garden, and lab automation, and a lightweight AVR-based framework for building custom ModBus devices.]]></summary></entry><entry><title type="html">Bringing XPPen Tablets to FreeBSD: Reverse Engineering a USB Protocol</title><link href="https://www.tspi.at/2026/04/06/xppenfreebsd.html" rel="alternate" type="text/html" title="Bringing XPPen Tablets to FreeBSD: Reverse Engineering a USB Protocol" /><published>2026-04-06T00:00:00+02:00</published><updated>2026-04-06T18:01:10+02:00</updated><id>https://www.tspi.at/2026/04/06/xppenfreebsd</id><content type="html" xml:base="https://www.tspi.at/2026/04/06/xppenfreebsd.html"><![CDATA[<p>There is a certain kind of frustration that only appears when perfectly functional hardware refuses to cooperate with your operating system of choice. In my case, this happened with an <a href="https://amzn.to/41hn56p">XPPen graphics tablet</a>.</p>

<p><a href="https://www.storexppen.de/">XPPen tablets</a> are, in many respects, surprisingly capable devices. The passive models in particular offer excellent value: precise pen input, good pressure sensitivity, and a solid overall build quality - at a fraction of the cost typically associated with professional drawing tablets. While the <a href="https://amzn.to/4smGVbB">highly professional display-equipped variants</a> can become quite expensive (in my personal opinion totally worth the price), the simpler models are almost irresistible for hobby and development setups.</p>

<p>Out of the box, the tablet was detected at the USB and HID level but none of the usual tooling - neither the base system utilities nor common open-source drivers - managed to produce usable input events. The root cause quickly became apparent: the device did not behave like a standard HID tablet. Instead, it required a vendor-specific initialization sequence and produced non-standard event packets that needed translation before they could be consumed by applications such as GIMP.</p>

<p>At that point, there were two options: abandon the device or solve the problem. Hating unresolved problems and having zero acceptance for unsupported hardware I choose the second route.</p>

<blockquote class="disclaimer">
  <p>⚠️ <strong>TL;DR</strong>: This application is in no way associated with the official manufacturer and available on <a href="https://github.com/tspspi/xppenfbsd">GitHub</a> and installable via <a href="https://pypi.org/project/xppenfbsd/">PyPi</a> via <code class="language-plaintext highlighter-rouge">pip install xppenfbsd</code>. It solves my problem with the deco mini7.</p>
</blockquote>

<ul>
  <li><a href="#why-wacom-works-and-xppen-does-not">Why Wacom Works (and XPPen does not)</a></li>
  <li><a href="#reverse-engineering-the-protocol">Reverse Engineering the Protocol</a></li>
  <li><a href="#from-usb-packets-to-usable-input">From USB Packets to Usable Input</a></li>
  <li><a href="#integrating-with-x11">Integrating with X11</a></li>
  <li><a href="#design-considerations">Design Considerations</a></li>
  <li><a href="#limitation-and-future-work">Limitations and Future Work</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
  <li><a href="#references">References</a></li>
</ul>

<p><img src="/assets/images/png/xppen001.png" alt="" /></p>

<h2 id="why-wacom-works-and-xppen-does-not">Why Wacom Works (and XPPen does not)</h2>

<p>To understand why this problem exists at all, it is useful to look at how Wacom devices are typically supported.</p>

<p>Most Wacom tablets implement (or at least closely emulate) pseudo-standard HID interfaces. On Unix-like systems, these devices are handled by the generic input stack and then enhanced by specialized drivers such as the X11 <a href="https://github.com/linuxwacom/xf86-input-wacom">xf86-input-wacom</a> driver. These drivers understand additional semantics like pressure, tilt, tool types, and button mappings—but crucially, they operate on top of a well-defined input abstraction.</p>

<p>In other words: Wacom devices speak a language the operating system and the X11 framework already understands.</p>

<p>XPPen devices, in contrast, often rely on vendor-specific extensions of the HID protocol, often requiring initialization sequences and custom report parsing on top of USBs HID device class. While they expose HID endpoints, they typically <em>require an explicit activation or configuration sequence</em> before they start producing any data. Even then, the reported data does not directly match standard input expectations and must be interpreted and transformed.</p>

<p>This difference explains why Wacom devices tend to work out of the box, while XPPen devices appear <em>dead</em> without proprietary applications.</p>

<h2 id="reverse-engineering-the-protocol">Reverse Engineering the Protocol</h2>

<p>The approach I chose was pragmatic: observe a working system and replicate its behavior.</p>

<p>A Windows test machine was set up with the <a href="https://www.xp-pen.com/download/deco-mini7-v2.html">official XPPen application</a> performing the translation. Using <a href="https://github.com/desowin/usbpcap">USBPcap</a>, I recorded the raw USB traffic generated while interacting with the tablet. This included device initialization, pen movement, pressure changes, and button events.</p>

<p>The resulting capture was then analyzed using <a href="https://www.wireshark.org/">Wireshark</a>. By filtering out unrelated traffic and focusing on the relevant USB endpoints, it became possible to isolate the sequences responsible for device activation and continuous stylus data streaming.</p>

<p>To accelerate the decoding process, I first filtered the packets from the various phases (activation, data delivery, etc.) and used <a href="https://chatgpt.com/codex/">OpenAIs Codex</a> to assist in identifying structural patterns and generating candidate parsers. While the initial suggestions required manual correction and validation, this significantly reduced the time required to move from raw captures to a working understanding of the protocol.</p>

<h2 id="from-usb-packets-to-usable-input">From USB Packets to Usable Input</h2>

<p>Once the protocol was sufficiently understood, the next step was to reproduce the behavior on <a href="https://www.freebsd.org">FreeBSD</a>. Using <a href="https://github.com/pyusb/pyusb">libusb via a Python interface</a>, I <a href="https://github.com/tspspi/xppenfbsd">implemented a userspace daemon</a> that performs three essential tasks:</p>

<p>First, it detects the tablet and performs the required initialization sequence. Without this step, the device remains silent.</p>

<p>Second, it continuously reads raw data packets from the stylus endpoint. These packets encode position, pressure, tilt, and button states in a vendor-specific format.</p>

<p>Third, it translates these packets into standard input events.</p>

<p>This translation layer is the core of the system. Instead of attempting to modify the kernel or introduce a custom driver, the daemon creates a virtual input device and re-injects events into the system again via an emulated HID device (<code class="language-plaintext highlighter-rouge">uinput</code> via the <code class="language-plaintext highlighter-rouge">evdev</code> compatibility layer). From the perspective of the operating system, this virtual device behaves like a normal stylus.</p>

<p>This design has several advantages. It keeps the implementation entirely in userspace and allows rapid iteration when refining the protocol understanding. It also makes the solution portable across systems that provide similar input injection mechanisms.</p>

<h2 id="integrating-with-x11">Integrating with X11</h2>

<p>Once the virtual device is available, it can be consumed by the existing X11 driver stack.</p>

<p>A minimal configuration binds the generated event device to the stylus driver, allowing applications to interpret the input correctly. The exact device node depends on the system’s input enumeration and should ideally be discovered dynamically, but even a static configuration is sufficient for initial setups.</p>

<p>In my case the input device always appeared as <code class="language-plaintext highlighter-rouge">event7</code> leading to the following configuration placed in <code class="language-plaintext highlighter-rouge">/usr/local/etc/X11/xorg.conf.d/11-wacom.conf</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Section "InputDevice"
       Identifier "XP-PEN"
       Driver "wacom"
       Option "Device" "/dev/input/event7"
       Option "Type" "stylus"
       Option "USB" "on"
EndSection

Section "ServerLayout"
        Identifier "Default Layout"
        InputDevice "XP-PEN" "SendCoreEvent"
EndSection
</code></pre></div></div>

<p>At this point, the entire pipeline is complete: vendor-specific USB HID protocol &gt; userspace translation &gt; standard input events &gt; X11 driver &gt; application.</p>

<p>Applications such as <a href="https://www.gimp.org/">GIMP</a> can use the tablet without modification, including pressure sensitivity and stylus buttons.</p>

<h2 id="design-considerations">Design Considerations</h2>

<p>One might consider implementing this functionality as a kernel driver. However, this would not be a natural fit for the problem.</p>

<p>The primary task here is not hardware access but protocol translation. The kernel already provides mechanisms for exposing input devices and injecting events, and duplicating this logic in kernel space would introduce unnecessary complexity.</p>

<p>More importantly, from a design perspective, the kernel should remain as small and minimal as possible due to its privileged nature. Moving complex parsing and translation logic into the kernel increases the risk surface and makes debugging significantly harder. By keeping this layer in userspace, failures are contained, iteration is faster, and the system remains more robust overall.</p>

<p>Separating this functionality into a dedicated userspace process is therefore the correct architectural choice - not only from a modularity standpoint but also from a security perspective. It allows the protocol handling to evolve independently while relying on the kernel only for what it does best: providing stable and well-defined interfaces.</p>

<p>Additionally, the solution integrates naturally with system facilities such as device event frameworks, allowing automatic discovery and activation when the tablet is connected.</p>

<h2 id="limitations-and-future-work">Limitations and Future Work</h2>

<p>The current implementation focuses on a single device and a specific model family. Extending it to support multiple tablets or additional models would require further protocol analysis.</p>

<p>There is also room for improvement in device discovery and configuration, particularly in dynamically resolving input device paths and integrating more tightly with desktop environments as well as optimizing the runtime (adding daemonization support, etc.).</p>

<p>Nevertheless, the core functionality is complete: the tablet behaves like a native input device and can be used in real-world applications without noticeable limitations.</p>

<h2 id="conclusion">Conclusion</h2>

<p>What started as a simple compatibility issue turned into a small reverse engineering project spanning USB protocols, driver behavior, and input subsystems. Interestingly, the entire process - from capturing USB traffic to having a fully working implementation - took essentially one sleepless night. This alone highlights how little effort would actually be required for a manufacturer to provide proper cross-platform support by simply releasing documentation (the manufacturers themselves could not cover all available operating systems in a feasable way due to QA, support, device generations, etc. but rely on the community).</p>

<p>The key insight is that many unsupported devices are not inherently incompatible - they are simply <em>undocumented</em>. With the right tools and a structured approach, it is often possible to bridge that gap. However, it also raises the question why this gap exists in the first place.</p>

<p>In this case, the combination of USB traffic capture, protocol reconstruction, and userspace event translation resulted in a fully usable graphics tablet on FreeBSD without relying on proprietary drivers. The fact that this can be achieved so quickly strongly suggests that <em>publicly available protocol documentation</em> would make such integrations almost trivial.</p>

<p>It is therefore somewhat difficult to understand why manufacturers so often hesitate to <em>document</em> their hardware interfaces. Instead of enabling broad compatibility across platforms, effort is invested into maintaining proprietary applications for a limited set of operating systems (they can never by themselves span the whole ecosystem of operating systems in a sane way). From a technical standpoint, <em>publishing protocol specifications</em> would significantly reduce duplicated effort, empower the community, and improve the longevity and usability of the hardware.</p>

<p>For anyone encountering similar issues, this workflow provides a practical path forward: observe, decode, translate, and integrate.</p>

<h2 id="references">References</h2>

<p>My implementation is available on <a href="https://github.com/tspspi/xppenfbsd">GitHub</a> and directly installable via <a href="https://pypi.org/project/xppenfbsd/">PyPi</a></p>

<ul>
  <li>The passive <a href="https://amzn.to/41hn56p">XPPen graphics tablet</a> as well as it’s <a href="https://amzn.to/4smGVbB">professional counterpart</a></li>
  <li>The <a href="https://www.storexppen.de/">manufacturer</a> as well as the <a href="https://www.xp-pen.com/download/deco-mini7-v2.html">official XPPen application</a></li>
  <li><a href="https://www.wireshark.org/">Wireshark</a> and <a href="https://github.com/desowin/usbpcap">USBPcap</a></li>
  <li><a href="https://github.com/pyusb/pyusb">pyusb</a> as <code class="language-plaintext highlighter-rouge">libusb</code> frontend useable from Python</li>
  <li>The traditional <a href="https://github.com/linuxwacom/xf86-input-wacom">xf86-input-wacom</a> driver for Wacom and Wacom-compatible tablets</li>
  <li><a href="https://chatgpt.com/codex/">OpenAIs Codex</a></li>
</ul>]]></content><author><name>tsp</name></author><category term="Programming" /><category term="System administration" /><category term="Hardware" /><category term="Python" /><category term="Vibe coding" /><category term="FreeBSD" /><category term="X11" /><category term="Reverse engineering" /><summary type="html"><![CDATA[When a graphics tablet works flawlessly on one system but appears completely lifeless on another, the problem is rarely the hardware itself. This article explores how an XPPen tablet - perfectly functional and excellent hardware, yet unusable on FreeBSD - was brought to life through a pragmatic reverse engineering approach. By capturing USB traffic, reconstructing the devices initialization sequence and translating vendor-specific data into standard input events, a fully working userspace solution emerged. Rather than relying on proprietary drivers or kernel modifications, the implementation demonstrates how clean architecture and a structured workflow can bridge compatibility gaps. Along the way, it highlights not only the mechanics of USB protocol analysis and input subsystem integration, but also a broader question: why so many capable devices remain artificially limited by a lack of documentation - when making them work can, in some cases, be surprisingly straightforward.]]></summary></entry><entry><title type="html">Programmatic 3D Model Generation with the Tripo3D API</title><link href="https://www.tspi.at/2026/04/06/tripo3dapi.html" rel="alternate" type="text/html" title="Programmatic 3D Model Generation with the Tripo3D API" /><published>2026-04-06T00:00:00+02:00</published><updated>2026-04-06T16:54:26+02:00</updated><id>https://www.tspi.at/2026/04/06/tripo3dapi</id><content type="html" xml:base="https://www.tspi.at/2026/04/06/tripo3dapi.html"><![CDATA[<p>In recent months a number of services emerged that allow generating 3D assets from either text prompts or reference images. While the web interfaces of these platforms are often polished and interactive, they are typically optimized for manual workflows and subscription-based usage. For engineering pipelines, reproducibility, and automation, however, what we really want is API access.</p>

<p>Beyond the purely technical perspective, this is also a rather fascinating shift: these systems dramatically lower the barrier for creating artistic 3D content. They do not replace skilled artists - and realistically they will not in the foreseeable future - but they act as powerful <em>amplifiers</em>. For experienced artists they can accelerate iteration and ideation, while at the same time enabling people without strong artistic 3D skills to finally materialize their ideas, worlds, and characters.</p>

<p>Things get even more interesting when combined with modern image generation systems (like <a href="https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0">Stable Diffusion</a> and similar approaches). With reasonably consistent prompt engineering, one can first generate coherent visual concepts and then lift them into 3D space. This opens the door to semi-automatically building consistent 3D scenes, asset libraries, or even entire worlds that share a unified style.</p>

<p>In my own workflow I primarily use these systems as a complement to traditional CAD. Most of my classical modeling work is engineering-focused (mechanical parts, devices, tooling), where parametric CAD (mostly in <a href="https://www.freecad.org/">FreeCAD</a>) is still the right tool, tools like the presented one are not capable of performing proper technical design at this stage of development. However, for non-technical, artistic or decorative objects - especially for 3D printing - these generative approaches are extremely valuable. They allow me to produce shapes and aesthetics that I would otherwise struggle to model manually.</p>

<div style="text-align: center">
    <img src="/assets/images/png/tripo3d_sample2_image.png" style="width:28.6%" alt="The input image generated via SD" />
    <img src="/assets/images/png/tripo3d_sample2_slicer.png" style="width:27.1%" alt="Robot mesh in slicer" />
    <img src="/assets/images/png/tripo3d_sample2_print.png" style="width:24.3%" alt="3D printed robot as a single color object" />
</div>

<p>In this article I will walk through a Python-based pipeline, the whole script is <a href="#the-complete-script">provided at the end of the article</a>, that uses the <a href="https://platform.tripo3d.ai/">Tripo3D API</a> to generate, process, and export 3D models in a fully automated fashion. The focus is not just on <em>getting a model</em>, but on building a structured pipeline that produces pseudo-deterministic outputs, metadata, and multiple export formats suitable for further processing (e.g., CNC, 3D printing, simulation, or game engines).</p>

<ul>
  <li><a href="#why-a-pipeline-instead-of-the-web-ui">Why a Pipeline Instead of the Web UI?</a></li>
  <li><a href="#high-level-pipeline-overview">High-Level Pipeline Overview</a></li>
  <li><a href="#core-design-ideas">Core Design Ideas</a>
    <ul>
      <li><a href="#treat-everything-as-a-task">Treat Everything as a Task</a></li>
      <li><a href="#metadata-as-first-class-output">Metadata as First-Class Output</a></li>
      <li><a href="#deterministic-file-naming">Deterministic File Naming</a></li>
    </ul>
  </li>
  <li><a href="#base-model-generation">Base Model Generation</a></li>
  <li><a href="#discovering-parts">Discovering Parts</a></li>
  <li><a href="#texturing-strategies">Texturing Strategies</a>
    <ul>
      <li><a href="#whole-model-texturing">Whole-Model Texturing</a></li>
      <li><a href="#per-part-texturing">Per-Part Texturing</a></li>
    </ul>
  </li>
  <li><a href="#export-system">Export System</a>
    <ul>
      <li><a href="#unified-export-function">Unified Export Function</a></li>
      <li><a href="#supported-formats">Supported Formats</a></li>
      <li><a href="#per-part-export">Per-Part Export</a></li>
    </ul>
  </li>
  <li><a href="#optional-rigging">Optional Rigging</a></li>
  <li><a href="#cli-interface">CLI Interface</a></li>
  <li><a href="#practical-observations">Practical Observations</a></li>
  <li><a href="#moving-from-models-to-reality">Moving from Models to Reality</a></li>
  <li><a href="#outlook">Outlook</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
  <li><a href="#references">References</a>
    <ul>
      <li><a href="#useful-tools">Useful Tools</a></li>
    </ul>
  </li>
  <li><a href="#the-complete-script">The Complete Script</a></li>
</ul>

<h2 id="why-a-pipeline-instead-of-the-web-ui">Why a Pipeline Instead of the Web UI?</h2>

<p>The <a href="https://www.tripo3d.ai/">web frontend</a> is excellent for exploration, iteration, and interactive refinement as known from typical artistic workflows. However, it has a few limitations when used in technical environments. Typical limitations in technical environments include the lack of reproducible batch processing, limited control over export formats and intermediate steps, missing structured metadata capture, and poor integration into existing toolchains.</p>

<p>The script presented here addresses these issues by treating every operation as a task with persistent metadata, storing all intermediate results, supporting both <em>text-to-model</em> and <em>image-to-model</em> workflows, enabling per-part processing and exports, and enforcing deterministic naming and file organization. The key gain of this pipeline approach is <em>automation</em> and therefore <em>scalability</em> - there is no meaningful scaling without automation. Once the process is expressed as a pipeline, generating tens, hundreds, or thousands of assets becomes a straightforward extension rather than a manual effort. Additionally, using the API directly typically means usage-based billing instead of subscription-based time periods, which aligns much better with batch workloads and sporadic large-scale generation runs (taking on the order of ten minutes for a single high quality model without texture and rigging).</p>

<h2 id="high-level-pipeline-overview">High-Level Pipeline Overview</h2>

<p>The pipeline is structured into several stages:</p>

<ol>
  <li><strong>Base model generation</strong> (text or image input)</li>
  <li><strong>Optional full-model texturing</strong></li>
  <li><strong>Optional per-part texturing</strong></li>
  <li><strong>Optional rigging</strong></li>
  <li><strong>Full model export (STL / 3MF / etc.)</strong></li>
  <li><strong>Per-part export</strong></li>
</ol>

<p>Each stage is implemented as an asynchronous task and stored together with its metadata.</p>

<p>Conceptually, the pipeline looks like this (violet is your data, green the mandatory step and yellow optional steps):</p>

<p><img src="/assets/images/png/tripo_api_steps_001.png" alt="Just a  graphical representation of the steps mentioned above" /></p>

<h2 id="core-design-ideas">Core Design Ideas</h2>

<h3 id="treat-everything-as-a-task">Treat Everything as a Task</h3>

<p>The Tripo API internally works with tasks. Instead of hiding this abstraction, the script embraces it.</p>

<p>Every step:</p>

<ul>
  <li>Returns a <code class="language-plaintext highlighter-rouge">task_id</code></li>
  <li>Is polled until completion</li>
  <li>Is serialized into a JSON metadata file</li>
</ul>

<p>This is implemented via:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">wait_success</span><span class="p">(</span><span class="n">client</span><span class="p">:</span> <span class="n">TripoClient</span><span class="p">,</span> <span class="n">task_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">label</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
    <span class="n">task</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="nf">wait_for_task</span><span class="p">(</span><span class="n">task_id</span><span class="p">,</span> <span class="n">polling_interval</span><span class="o">=</span><span class="mf">2.0</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
    <span class="n">status</span> <span class="o">=</span> <span class="nf">str</span><span class="p">(</span><span class="nf">getattr</span><span class="p">(</span><span class="n">task</span><span class="p">,</span> <span class="sh">"</span><span class="s">status</span><span class="sh">"</span><span class="p">,</span> <span class="sh">""</span><span class="p">)).</span><span class="nf">lower</span><span class="p">()</span>
    <span class="k">if</span> <span class="sh">"</span><span class="s">success</span><span class="sh">"</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">status</span><span class="p">:</span>
        <span class="k">raise</span> <span class="nc">RuntimeError</span><span class="p">(...)</span>
    <span class="k">return</span> <span class="n">task</span>
</code></pre></div></div>

<p>This pattern ensures robust error handling, since failures are detected explicitly and surfaced immediately, while also providing full traceability because every step is captured as a task with associated metadata. At the same time, it enables straightforward debugging and replay, as individual steps can be inspected, reproduced, or rerun without having to reconstruct the entire pipeline.</p>

<h3 id="metadata-as-first-class-output">Metadata as First-Class Output</h3>

<p>Instead of only saving meshes, the pipeline stores <em>everything</em> about each task:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">save_task_metadata</span><span class="p">(</span><span class="n">task</span><span class="p">:</span> <span class="n">Any</span><span class="p">,</span> <span class="n">out_dir</span><span class="p">:</span> <span class="n">Path</span><span class="p">,</span> <span class="n">stem</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Path</span><span class="p">:</span>
    <span class="n">meta</span> <span class="o">=</span> <span class="nf">task_to_dict</span><span class="p">(</span><span class="n">task</span><span class="p">)</span>
    <span class="n">path</span> <span class="o">=</span> <span class="n">out_dir</span> <span class="o">/</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">stem</span><span class="si">}</span><span class="s">.task.json</span><span class="sh">"</span>
    <span class="n">path</span><span class="p">.</span><span class="nf">write_text</span><span class="p">(</span><span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">(</span><span class="n">meta</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">ensure_ascii</span><span class="o">=</span><span class="bp">False</span><span class="p">))</span>
    <span class="k">return</span> <span class="n">path</span>
</code></pre></div></div>

<p>This is extremely useful when comparing different parameter settings, debugging failed generations, and building higher-level automation on top of the pipeline. In addition you can resume operation with an model at each intermediate step this way.</p>

<h3 id="deterministic-file-naming">Deterministic File Naming</h3>

<p>Generated assets are renamed into a consistent scheme:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>01_base.glb
02_textured_full.glb
03_textured_part__001__wheel.glb
06_export_full_stl.stl
08_export_part_stl__003__handle.stl
</code></pre></div></div>

<p>This is handled via:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dst</span> <span class="o">=</span> <span class="n">out_dir</span> <span class="o">/</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">stem</span><span class="si">}</span><span class="s">.</span><span class="si">{</span><span class="n">model_type</span><span class="si">}{</span><span class="n">src</span><span class="p">.</span><span class="n">suffix</span><span class="si">}</span><span class="sh">"</span>
</code></pre></div></div>

<p>Together with <code class="language-plaintext highlighter-rouge">sanitize_name()</code> this guarantees filesystem-safe naming.</p>

<h2 id="base-model-generation">Base Model Generation</h2>

<p>The entry point is either:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">text_to_model()</code></li>
  <li><code class="language-plaintext highlighter-rouge">image_to_model()</code></li>
</ul>

<p>Example:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">base_task_id</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="nf">text_to_model</span><span class="p">(</span>
    <span class="n">prompt</span><span class="o">=</span><span class="n">args</span><span class="p">.</span><span class="n">prompt</span><span class="p">,</span>
    <span class="n">texture</span><span class="o">=</span><span class="nf">bool</span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">texture</span><span class="p">),</span>
    <span class="n">face_limit</span><span class="o">=</span><span class="n">args</span><span class="p">.</span><span class="n">face_limit</span><span class="p">,</span>
    <span class="n">generate_parts</span><span class="o">=</span><span class="n">args</span><span class="p">.</span><span class="n">generate_parts</span><span class="p">,</span>
<span class="p">)</span>
</code></pre></div></div>

<p>Important parameters:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">texture</code>: Generate textures directly if set to <code class="language-plaintext highlighter-rouge">true</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">face_limit</code>: Control mesh complexity by supplying the absolute face limit. Note that cost typically scales with this limit.</li>
  <li><code class="language-plaintext highlighter-rouge">generate_parts</code>: Ask the model to segment the object.</li>
  <li><code class="language-plaintext highlighter-rouge">smart_low_poly</code>: Useful for real-time applications - first generate a low polygon representation first and later expand to high polygon count.</li>
</ul>

<div style="text-align: center">
    <img src="/assets/images/png/tripo3d_sample1_image.png" style="width:43.1%" />
    <img src="/assets/images/png/tripo3d_sample1_slicer.png" style="width:36.9%" />
</div>

<h2 id="discovering-parts">Discovering Parts</h2>

<p>One particularly interesting feature is automatic part discovery. Conceptually, you can think of the generative model not only producing a single mesh, but internally reasoning about the object as a composition of semantic substructures—wheels, handles, bodies, limbs, or decorative elements—very much like how a human would describe or sketch it. While the API does not expose this internal representation directly, traces of it appear in the task output, where parts may be listed explicitly or implicitly. By probing these structures defensively, the pipeline reconstructs a usable set of part identifiers.</p>

<p>This is powerful because it turns a monolithic generated mesh into something closer to a structured assembly. Once parts are identifiable, they can be processed independently: textured differently, exported separately, simplified or refined with different parameters, or even replaced downstream. In practical terms, this enables workflows that resemble classical CAD assemblies or game asset pipelines, but starting from a generative model rather than manual modeling.</p>

<p>What makes this particularly compelling is that it bridges a gap between purely artistic generation and engineering-style decomposition. Instead of treating the generated object as a static artifact, it becomes a manipulable system. For example, you can generate a complex object once, then iterate only on a specific component (e.g., retexturing just the “handle” or exporting only the “base” for printing). This selective control is where generative models begin to feel less like black boxes and more like cooperative tools that expose structure—imperfectly, but often sufficiently—to be integrated into real workflows.</p>

<p>Since the SDK does not clearly document where part names are stored, my script uses a defensive extraction strategy:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">discover_part_names</span><span class="p">(</span><span class="n">task</span><span class="p">:</span> <span class="n">Any</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">List</span><span class="p">[</span><span class="nb">str</span><span class="p">]:</span>
    <span class="n">candidates</span> <span class="o">=</span> <span class="p">[</span><span class="n">task</span><span class="p">,</span> <span class="nf">getattr</span><span class="p">(</span><span class="n">task</span><span class="p">,</span> <span class="sh">"</span><span class="s">output</span><span class="sh">"</span><span class="p">,</span> <span class="bp">None</span><span class="p">)]</span>
    <span class="bp">...</span>
</code></pre></div></div>

<p>This scans:</p>

<ul>
  <li>Raw task object</li>
  <li>Task output</li>
  <li>Serialized dictionary representations</li>
</ul>

<p>The result is a list of part names such as:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>["body", "wheel", "handle", "base"]
</code></pre></div></div>

<p>Note that this approach may break at any point in time. It just worked while I wrote the script but it relies on undocumented behaviour.</p>

<h2 id="texturing-strategies">Texturing Strategies</h2>

<p>Before looking at the concrete approaches, it is useful to briefly clarify what “texturing” actually means in this context. A generated 3D model typically consists of geometry (vertices, edges, faces) that define the shape, and separate surface information that defines how it looks. Texturing is the process of assigning image-based or procedurally generated information onto the surface of that geometry via UV mappings, effectively telling the renderer or downstream tool what color, roughness, metallic properties, and fine visual details each point on the surface should have. Without textures, most models look like uniform gray meshes; with textures, they become visually rich objects with materials such as wood, metal, fabric, or painted surfaces. In many pipelines this also includes PBR (physically based rendering) parameters, which control how light interacts with the surface. For purely functional workflows such as single-color 3D printing, however, textures are typically not required at all - the geometry alone is sufficient, and formats like STL intentionally ignore any surface appearance information.</p>

<p>There are two approaches implemented:</p>

<h3 id="whole-model-texturing">Whole-Model Texturing</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">texture_task_id</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="nf">texture_model</span><span class="p">(</span>
    <span class="n">original_model_task_id</span><span class="o">=</span><span class="n">full_mesh_task_id</span><span class="p">,</span>
    <span class="n">texture</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="n">pbr</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="n">text_prompt</span><span class="o">=</span><span class="n">args</span><span class="p">.</span><span class="n">texture_prompt</span><span class="p">,</span>
<span class="p">)</span>
</code></pre></div></div>

<p>This produces a single coherent material.</p>

<h3 id="per-part-texturing">Per-Part Texturing</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">part_tex_task_id</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="nf">texture_model</span><span class="p">(</span>
    <span class="n">part_names</span><span class="o">=</span><span class="p">[</span><span class="n">part_name</span><span class="p">],</span>
    <span class="bp">...</span>
<span class="p">)</span>
</code></pre></div></div>

<p>This enables:</p>

<ul>
  <li>Different materials per component</li>
  <li>Fine-grained control for asset pipelines</li>
</ul>

<h2 id="export-system">Export System</h2>

<p>The export stage is surprisingly powerful and still very simple.</p>

<h3 id="unified-export-function">Unified Export Function</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">export_one_format</span><span class="p">(...):</span>
    <span class="n">export_task_id</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="nf">convert_model</span><span class="p">(</span>
        <span class="nb">format</span><span class="o">=</span><span class="n">fmt</span><span class="p">,</span>
        <span class="n">flatten_bottom</span><span class="o">=</span><span class="n">flatten_bottom</span><span class="p">,</span>
        <span class="n">pivot_to_center_bottom</span><span class="o">=</span><span class="n">pivot_to_center_bottom</span><span class="p">,</span>
        <span class="n">pack_uv</span><span class="o">=</span><span class="n">pack_uv</span><span class="p">,</span>
    <span class="p">)</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">flatten_bottom</code> option modifies the geometry such that the lowest region of the model is projected onto a plane, effectively creating a flat base. This is particularly useful for 3D printing because many printers require stable contact with the build plate. Without a flat surface, models may require support structures, which increase print time, material usage, and post-processing effort. By flattening the bottom, the model can often be printed directly, improving adhesion and reliability.</p>

<p>The <code class="language-plaintext highlighter-rouge">pivot_to_center_bottom</code> parameter adjusts the coordinate system of the model such that its origin is moved to the center of the base. This is not just a convenience for slicers, but fundamentally changes how the model is positioned and manipulated in downstream tools. With this pivot, rotations occur around a physically meaningful point (the contact surface), and placement into scenes or assemblies becomes more predictable. For printing workflows, this often means the object appears correctly aligned on the build plate without additional transformations.</p>

<p>The <code class="language-plaintext highlighter-rouge">pack_uv</code> parameter operates on the texture coordinate space rather than geometry. It reorganizes the UV layout to make more efficient use of available texture space. This reduces wasted texture area, improves resolution of surface details, and is especially relevant when exporting to formats used in rendering or game engines where texture memory and quality are important.</p>

<h4 id="supported-formats">Supported Formats</h4>

<p><code class="language-plaintext highlighter-rouge">STL</code> is the most basic and widely supported format for 3D printing. It encodes only the surface geometry as a triangle mesh and intentionally contains no information about colors, materials, or textures. This simplicity makes it robust and universally compatible, but also limits it to purely geometric workflows. The major advantage is that <code class="language-plaintext highlighter-rouge">STL</code> is <a href="https://github.com/tspspi/libstlio">extremely simple to implement</a> in comparison to the other alternatives.</p>

<p><code class="language-plaintext highlighter-rouge">3MF</code> can be seen as a modern replacement for <code class="language-plaintext highlighter-rouge">STL</code>. It supports not only geometry but also metadata such as colors, materials, multiple objects in a single file, and even printer-specific settings. For advanced printing workflows, especially with multi-material or color printers, <code class="language-plaintext highlighter-rouge">3MF</code> is often the better choice.</p>

<p><code class="language-plaintext highlighter-rouge">GLTF</code> and <code class="language-plaintext highlighter-rouge">FBX</code> are formats primarily used in rendering, simulation, and game engines. They support hierarchical scene structures, materials, textures, animations, and sometimes skeletal rigs. <code class="language-plaintext highlighter-rouge">GLTF</code> is designed as a modern, efficient, and open standard (often described as the <em>JPEG of 3D</em>), while <code class="language-plaintext highlighter-rouge">FBX</code> is older, widely supported, and deeply integrated into many commercial tools.</p>

<p><code class="language-plaintext highlighter-rouge">USDZ</code> is a format designed for augmented reality ecosystems, particularly in environments like mobile devices. It supports compact packaging of geometry, materials, and animations in a way that is optimized for real-time rendering and distribution, making it suitable for AR previews or interactive product visualization.</p>

<h3 id="per-part-export">Per-Part Export</h3>

<p>For manufacturing workflows, splitting models is often essential.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">export_parts_for_format</span><span class="p">(...):</span>
    <span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">part_name</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">part_names</span><span class="p">,</span> <span class="n">start</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
        <span class="n">stem</span> <span class="o">=</span> <span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">base_stem</span><span class="si">}</span><span class="s">__</span><span class="si">{</span><span class="n">idx</span><span class="si">:</span><span class="mi">03</span><span class="n">d</span><span class="si">}</span><span class="s">__</span><span class="si">{</span><span class="nf">sanitize_name</span><span class="p">(</span><span class="n">part_name</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span>
</code></pre></div></div>

<p>This results in:</p>

<ul>
  <li>Individually printable components</li>
  <li>Clean naming</li>
  <li>Structured manifests</li>
</ul>

<h2 id="optional-rigging">Optional Rigging</h2>

<p>The pipeline also supports automatic rigging. In the context of 3D graphics, <em>rigging</em> refers to adding an internal skeleton (a hierarchy of bones or joints) to a static mesh, together with weights that define how each part of the surface deforms when those bones move. You can think of it as turning a rigid statue into something that can be posed or animated: bending an arm, rotating a head, or walking becomes possible because the mesh is now bound to this underlying structure. In practice, rigging also defines constraints, joint limits, and sometimes control handles that make animation easier to author. For many readers coming from CAD or printing workflows this concept may be unfamiliar, since purely geometric models are typically static; however, for game engines, simulation, or character animation, rigging is the essential step that converts geometry into something dynamic and controllable.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rig_task_id</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="p">.</span><span class="nf">rig_model</span><span class="p">(</span>
    <span class="n">rig_type</span><span class="o">=</span><span class="nf">normalize_rig_type</span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">rig_type</span><span class="p">),</span>
    <span class="n">spec</span><span class="o">=</span><span class="nf">normalize_rig_spec</span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">rig_spec</span><span class="p">),</span>
<span class="p">)</span>
</code></pre></div></div>

<p>Supported rig types correspond to different anatomical or kinematic archetypes. A <strong>biped</strong> rig assumes two legs and typically two arms arranged around a vertical spine; it is the standard for humanoid characters and benefits from well-established conventions (for example Mixam -style skeletons), making retargeting animations straightforward. A <strong>quadruped</strong> rig is optimized for four-legged locomotion with a horizontal spine and coordinated gait cycles; it better captures weight distribution and natural motion for animals like dogs or horses, but requires different animation clips and controllers than bipeds. An <strong>avian</strong> rig introduces wings and often tail articulation, with joints arranged to support flapping, folding, and gliding; it is useful for birds or winged creatures but can be more complex due to additional degrees of freedom and coupled motions. A <strong>serpentine</strong> rig represents elongated bodies composed of many segments; instead of discrete limbs, motion is produced by propagating waves along the body, which is ideal for snakes or tentacle-like structures but requires spline- or chain-based control schemes.</p>

<p>Each choice encodes <em>assumptions</em> about joint hierarchy, constraints, and typical motion patterns. The advantage is that the resulting skeleton is immediately compatible with common animation tools and libraries for that class, enabling reuse of existing animation data (retargeting) and predictable behavior in physics or IK solvers. The downside is that mismatching the rig type to the geometry can produce unnatural deformation or require additional cleanup.</p>

<p>In animation and game pipelines this is extremely valuable because it converts a static mesh into a controllable asset that can be posed, animated, and simulated in real time. Engines rely on skeletal animation for efficiency (skinning on the GPU), blending between clips (like idle, walk, run), inverse kinematics for interactions (feet on ground, hands on objects), and physics-driven secondary motion. With a suitable rig, the same model can be reused across scenes and behaviors, integrated into state machines, and driven by gameplay logic turning a generated object into a fully interactive entity rather than a fixed piece of geometry with minimal or no additional manual work.</p>

<h2 id="cli-interface">CLI Interface</h2>

<p>My script exposes the described features, making it easy to integrate into other tools:</p>

<p>Example usage:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./tripo.py \
  --mode text \
  --prompt "small steampunk robot with tracks" \
  --texture \
  --generate-parts \
  --export-stl \
  --out ./output
</code></pre></div></div>

<h2 id="practical-observations">Practical Observations</h2>

<p>A few interesting takeaways from using this setup:</p>

<ul>
  <li>The API becomes particularly powerful in the context of automated workflows. While the web UI remains the superior tool for artists - who can leverage it to iteratively refine and craft highly detailed results - the API excels when building fully automated pipelines, especially for users without artistic modeling skills. In these scenarios it enables reproducible generation, large-scale asset creation, and efficient scaling. Additionally, for many hobbyist use cases it can even be more cost-effective, as billing is typically based on actual usage rather than fixed subscription periods.</li>
  <li>In practice, the results are often surprisingly close to what one would intuitively expect from the prompt, frequently requiring little to no additional refinement.</li>
  <li>Part extraction is extremly useful but not always perfect. It generates consistent matching parts, still separated into individual objects.</li>
  <li>Texture alignment strongly influences visual quality, and in practice generating a consistent texture on the full model first and only then splitting it into parts tends to yield significantly more coherent results than designing or texturing each part individually.</li>
  <li>Export parameters (especially pivot and flattening) matter a lot for downstream workflows</li>
</ul>

<p>Most importantly: once this pipeline exists, it becomes trivial to generate hundreds of assets in a reproducible way.</p>

<h2 id="moving-from-models-to-reality">Moving from Models to Reality</h2>

<p>If you want to move from a virtual 3D object to a physical object in a very easy way, just use the <a href="https://github.com/ultimaker/cura">Cura slicer</a> and a simple 3D printer like the <a href="https://amzn.to/4dwS1qB">Creality Ender3 V3 SE</a>, the newer <a href="https://amzn.to/4tvoZMQ">Creality Ender3 V3 KE</a> featuring a ceramis heater or a more costly but capable multi filament printer like the <a href="https://amzn.to/4sPUXmV">Creality K2 Plus</a> for multi color prints. Most models that are created for decorative purposes are directly printable without further modifications. If you need to perform further modifications use either the slicers limited editing capabilities or move on to <a href="https://www.blender.org/">Blender</a>, which is an amazing tool with a steep but rewarding learning curve, before entering the CAD/CAM pipeline.</p>

<h2 id="outlook">Outlook</h2>

<p>There are several obvious extensions:</p>

<ul>
  <li>Integration into CAD/CAM pipelines (like automatic toolpath generation)</li>
  <li>Coupling with simulation environments</li>
  <li>Closed-loop optimization (generate, evaluate, refine patterns)</li>
  <li>Integration into larger automated pipelines (for example built in <code class="language-plaintext highlighter-rouge">n8n</code>), combining idea and description generation, consistent image generation via diffusion models, 3D asset creation, assembly into animated scenes, and optionally fully automated video production workflows</li>
</ul>

<p>From a systems perspective, this is where things get interesting: the moment generative models become just another pseudo-deterministic component in an engineering pipeline.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Using the <a href="https://platform.tripo3d.ai/">Tripo3D API</a> directly allows transforming 3D generation from an interactive tool into a programmable system component. By structuring the workflow into tasks, capturing metadata, and enforcing pseudo-deterministic outputs, the script provides a solid foundation for integrating generative 3D models into real engineering processes. What makes this particularly compelling is not just the ability to generate individual assets, but to embed generation into larger systems: once automated, the process scales naturally, turning what would otherwise be manual, creative effort into a reproducible and extensible pipeline.</p>

<p>At the same time, this does not replace traditional modeling or artistic workflows, but complements them. Artists will continue to achieve higher-quality and more refined results using interactive tools, while API-driven approaches enable entirely different use cases: batch generation, rapid prototyping, dataset creation, and integration into simulation, robotics, or manufacturing pipelines. In this sense, generative 3D becomes less of a standalone tool and more of a building block that can be composed with other systems.</p>

<p>For practical applications—whether generating printable objects, populating virtual environments, or building automated content pipelines—the combination of automation, usage-based cost models, and flexible export capabilities makes this approach particularly attractive. It allows engineers and technically inclined users to access domains that previously required significant artistic effort, while still leaving room for manual refinement where needed.</p>

<p>If you are already working with CNC, simulation, robotics, or content pipelines, this is where the real shift begins: the moment 3D asset generation becomes just another programmable step in a larger system, rather than a separate, manual process.</p>

<div style="text-align: center">
    <img src="/assets/images/png/tripo3d_sample3_image.png" style="width:25.3%" alt="The image model generated via SD" />
    <img src="/assets/images/png/tripo3d_sample3_slicer.png" style="width:28.9%" alt="The mesh of the dragon curtain holder" />
    <img src="/assets/images/png/tripo3d_sample3_print.png" style="width:25.8%" alt="Printed and spray painted curtain holder" />
</div>

<h2 id="references">References</h2>

<ul>
  <li>The <a href="https://www.tripo3d.ai/">Tripo3D frontend</a></li>
  <li>The <a href="https://platform.tripo3d.ai/">Tripo3D API</a></li>
</ul>

<h3 id="useful-tools-and-resources">Useful Tools and Resources</h3>

<ul>
  <li><a href="https://amzn.to/3QlMDgk">Learning Blender</a> by Oliver Villar is a good resource to learn the basics of 3D rendering, animation and composition in <a href="https://www.blender.org/">Blender</a></li>
  <li><a href="https://github.com/ultimaker/cura">Cura slicer</a> to turn models into machine paths</li>
  <li>3D printers:
    <ul>
      <li>The cheap but high quality <a href="https://amzn.to/4dwS1qB">Creality Ender3 V3 SE</a> or the newer <a href="https://amzn.to/4tvoZMQ">Creality Ender3 V3 KE</a>, being very beginner friendly and easy to repair and tune by yourself</li>
      <li>A more costly but way more capable multi filament printer like the <a href="https://amzn.to/4sPUXmV">Creality K2 Plus</a> for multi color prints.</li>
    </ul>
  </li>
  <li>In Austria <a href="https://www.3djake.at/">3DJake</a> is a very good and reliable source for filaments.</li>
</ul>

<h2 id="the-complete-script">The Complete Script</h2>

<p>The complete script is available as a <a href="https://gist.github.com/tspspi/a0dbffa5c95f48224f7100018b88614d">GitHub GIST</a>:</p>

<script src="https://gist.github.com/tspspi/a0dbffa5c95f48224f7100018b88614d.js"></script>]]></content><author><name>tsp</name></author><category term="Programming" /><category term="Python" /><category term="Tutorial" /><category term="3D printing" /><category term="CAD" /><category term="Machine learning" /><category term="How stuff works" /><category term="Automation" /><summary type="html"><![CDATA[Modern generative systems are beginning to reshape how 3D assets are created, lowering the barrier between idea and implementation. While most platforms focus on interactive web interfaces, their real potential emerges when treated as programmable components in an automated pipeline. By combining text or image based generation with structured processing, it becomes possible to create automated, scalable workflows that produce not just individual models, but entire libraries of assets. This article explores how to build such a pipeline using the Tripo3D API, focusing on task-based execution, metadata tracking, deterministic file organization, and flexible export strategies. Rather than replacing traditional CAD or artistic workflows, this approach complements them, bridging the gap between generative models and engineering processes, and turning 3D asset creation into a fully automatable system.]]></summary></entry><entry><title type="html">A Better Approach on Billing for Electricity</title><link href="https://www.tspi.at/2026/04/03/betterebilling.html" rel="alternate" type="text/html" title="A Better Approach on Billing for Electricity" /><published>2026-04-03T00:00:00+02:00</published><updated>2026-04-03T19:10:37+02:00</updated><id>https://www.tspi.at/2026/04/03/betterebilling</id><content type="html" xml:base="https://www.tspi.at/2026/04/03/betterebilling.html"><![CDATA[<ul>
  <li><a href="#introduction">Introduction</a>
    <ul>
      <li><a href="#why-variability-is-expensive">Why variability is expensive</a></li>
    </ul>
  </li>
  <li><a href="#current-payment-models-for-households">Current Payment Models for Households</a></li>
  <li><a href="#proposed-billing-model">Proposed Billing Model</a>
    <ul>
      <li><a href="#optional-higher-order-penalty">Optional Higher-Order Penalty</a></li>
      <li><a href="#the-total-bill">The Total Bill</a></li>
      <li><a href="#discrete-implementation">Discrete Implementation</a></li>
    </ul>
  </li>
  <li><a href="#conclusion">Conclusion</a></li>
  <li><a href="#appendix-how-to-reduce-your-load-variation-in-practice">Appendix: How to Reduce Your Load Variation in Practice</a>
    <ul>
      <li><a href="#avoid-sudden-large-load-steps">Avoid Sudden Large Load Steps</a></li>
      <li><a href="#use-smart-scheduling-for-flexible-loads">Use Smart Scheduling for Flexible Loads</a></li>
      <li><a href="#decouple-consumption-from-instantaneous-demand-buffering">Decouple Consumption from Instantaneous Demand (Buffering)</a></li>
      <li><a href="#limit-high-frequency-switching">Limit High-Frequency Switching</a></li>
      <li><a href="#coordinate-loads-within-the-household">Coordinate Loads Within the Household</a></li>
      <li><a href="#monitoring">Monitoring</a></li>
      <li><a href="#shift-thinking-from-total-energy-to-smooth-power">Shift Thinking from Total Energy to Smooth Power</a></li>
    </ul>
  </li>
  <li><a href="#references">References</a></li>
</ul>

<p><img src="/assets/images/png/powergrid001_small.png" alt="" /></p>

<h2 id="introduction">Introduction</h2>

<p>Electric power systems are not monolithic. They are a layered composition of generation assets, each optimized for a different role in time and variability.</p>

<p><strong>Baseload plants</strong> such as run of river, nuclear and coal power plants are designed to operate continuously at or near their nominal output. Their economics are dominated by high capital expenditure and very low marginal costs. These plants achieve their lowest levelized cost of electricity when running steadily. Any deviation like ramping up and down, partial loading or cycling reduces thermodynamic efficiency, increases mechanical stress and raises effective cost per kWh.</p>

<p><strong>Load following plants</strong> like combined cycle gas turbines and reservoire hydro can adjust output on timescales of minutes to hours. They fill the gap between baseload and fast-response resources. Their marginal costs are higher than baseload but they are far more flexible</p>

<p><strong>Peaking and balancing resources</strong> like open cycle gas turbines, pumped hydro storage and batteries are designed for rapid response on time scales of seconds to minutes (this cannot be achieved by load following- or baseload plants). They are essential for:</p>

<ul>
  <li>Frequency regulation</li>
  <li>Contingency reserves (i.e. responding to sudden outages or sudden demands)</li>
  <li>Intra day balancing</li>
</ul>

<p>These resources have the highest marginal cost and often low utilization. Their costs must be covered by fewer operating hours which means they are the most expensive resources on the grid, but they cannot be replaced by baseload or load following plants.</p>

<p>The timescales the grid operates on can be separated into four categories, which can roughly be separated into technical requirement (primary and secondary) and a economic stage (tertiary and day-head or intraday markets):</p>

<ul>
  <li>The <em>primary control</em> happens on the scale of seconds. This is automatic frequency response and requires extremly fast ramping resources (network capacity, open cycle gas turbines, batteries)</li>
  <li>The <em>secondary control</em> happens on the scale of tens of seconds to minutes. Here centralized control happens to restore frequency and balancing areas. This still requires fast ramping resources (open cycle gas turbines, batteries and partially already combines cycle gas turbines and reservoire hydro).</li>
  <li><em>Tertiary control</em> happens on the scale of minutes to hours and handles economic dispatch and reserve replacement.</li>
  <li><em>Day-ahead</em> and <em>intraday markets</em> operating on the scale of hours to days. Here generation is sheduled and planned via economic rules (compareable to a stock market).</li>
</ul>

<h3 id="why-variability-is-expensive">Why variability is expensive</h3>

<p>For many people energy cost is only given by the amount of total energy (in kWh) consumed. But even more important is the shape of the load curve. A perfectly constant load can be served almost entirely by baseload generation. A highly variable load on the other hand requires more reserved held online, more ramping of thermal units, increased cycling (and thus wear and maintaneance costs) as well as higher reliance on fast response (expensive) assets. Rapid househols level switching aggregates across millions of users into steep net-load ramps. This leads to transformer and distribution stress (wear and infrastructure cost), voltage deviations and increased reserve requirements. These integration and balancing costs add up massively at higher variability levels[<a href="#ref1">1</a>]. The same total energy volume can cost a retailer far more[<a href="#ref2">2</a>] if the load shape is spiky/variable due to hedging costs, peak procurement and lost opportunities to use low-cost periods.</p>

<p>Frequency regulation markets see price spikes with higher variability or renewable penetration. For example increasing wind from low to 30% can raise regulation prices by 32% and doubling regulation requirements can increase them by 84%. Flexible resources (hydro, batteries) help mitigate this but the baseline cost of fast response is clearly elevated compared to steady operation[<a href="#ref3">3</a>]. Soothing of shifting load (i.e. the opposite of rapid switching) lowers system costs by reducing peak demand, deferring infrastructure and dampening price volatility. This can cut operational expenses, improve reliability and avoid billions in distribution upgrades[<a href="#ref4">4</a>].</p>

<p><strong>Variable residential profiles are far more expensive to serve than steady commercial and industrial ones</strong>. In addition the injected variability by renewabeles like solar and wind also injects the same kind of cost, being the dominant cause of load variations at this point in time. Load variability also compresses grid margins through tighter reserves and more cycling of equipment[<a href="#ref2">2</a>].</p>

<h2 id="current-payment-models-for-households">Current Payment Models for Households</h2>

<p>Today, most residential billing is based almost exclusively on total energy consumption:</p>

[
\begin{aligned}
    E = \int_0^T P(t) \mathrm{d}t
\end{aligned}
]

<p>This model assumes implicitly that all kWh are equal, regardless of when or how they are consumed. In reality on the other hand a kWh drawn during a stable low-demand period can be extremly cheap due to the very low marginal cost of baseload energy. The real system cost for households id increasing variability they inject (sudden powering on or off of an EV charger, a heat pump, an air conditioner, of induction stoves, plugged or unplugged chargers, switching on and off of devices, etc.), which forces the grid operator to deploy fast response resources (like gas peaking supply, hydro pump storage, hydrogen storage and batteries) that are far more expensive per kWh than baseload.</p>

<p>Some tariffs already introduce demans charges</p>

[
P_\mathrm{max} = \max_{t} P(t)
]

<p>However this only captures the maximum level, not the dynamics that cause the real massive costs.</p>

<p>Modern smart meters already sample power at intervals between 1 and 60 seconds (while only reporting on timescales of around 15 minutes). Therefore, the <em>temporal structure</em> of the consumption is already observable without changes to infrastructure, it is just not used in billing.</p>

<h2 id="proposed-billing-model">Proposed Billing Model</h2>

<p>In the following section we extend the billing model to include not only total energy but also the <em>temporal variation</em> of the load.</p>

<p>Let</p>

<ul>
  <li>$P(t)$: instantaneous power (kW)</li>
  <li>$T$: billing interval (for example $T=720h$ for one month)</li>
</ul>

<p>Then the model includes two key quantities:</p>

<p>As in the traditional model the <strong>energy consumption</strong>, which can be kept very small or even zero reflecting the very low marginal cost of steady generation:</p>

[
E = \int_0^T P(t) \mathrm{d}t
]

<p>The second term is the <strong>load variation</strong>:</p>

[
V = \int_0^T \mid \frac{\mathrm{d}P}{\mathrm{d}t} \mid \mathrm{d}t
]

<p>This is the <em>total variation</em> of $P(t)$. Every time one turns a 2 kW load on and off again, $V$ increases by $4 kW$, regardless of how long it was on. Rapid cycling drives $V$ up dramatically, steady load barely moves it. This term is <em>independent of duration</em> and <strong>reflects how agressively the grid is stressed</strong>.</p>

<p>A even more physical but slightly more complicated model would incorporate the frequency response of the system (i.e. provide <em>frequency cost</em>):</p>

[
V_\mathrm{phys} = \int_0^T \mid \mathfrak{H}(\omega) P(\omega) \mid^2 \mathrm{d} \omega
]

<h3 id="optional-higher-order-penalty">Optional Higher-Order Penalty</h3>

<p>To penalize rapid fluctuations even more the implementation of high pass filtering and the square of the derivative can be used:</p>

[
V = \int_0^T \left( \frac{\mathrm{d}P}{\mathrm{d}t} \right)^2 \mathrm{d}t
]

<p>Such a term would in particular hit high frequency switching devices without proper filtering.</p>

<h3 id="the-total-bill">The Total Bill</h3>

<p>The monthly total bill $B$ is given by</p>

[
B = c_E E + c_V V + c_D \max_{t} P(t) + F
]

<p>The factors are given by:</p>

<ul>
  <li>$c_E$: The energy price, which could be set very low (even in the range of 1 Cent per kWh).</li>
  <li>$c_V$ which is the variation rate measuring the load swings. This layer would recover the balancing and infrastructure costs and has to be tuned so the utility recovers its fixed and control costs.</li>
  <li>$c_D$ is an optional demand rate charging for the maximum available power. I personally would not introduce this.</li>
  <li>$F$ is a fixed customer charge handling the provisioning of metering infrastructure, the paperwork and support availability.</li>
</ul>

<h3 id="discrete-implementation">Discrete Implementation</h3>

<p>Due to the discrete sampling of the smart meters in units of $\Delta t$ the real implementation would of course be discrete:</p>

[
V \propto \sum_{i=1}^N \mid P_i - P_{i-1} \mid \frac{\Delta t}{3600}
]

<h2 id="conclusion">Conclusion</h2>

<p>This system could be implemented easily with existing smart-meter infrastructure available in many parts of Europe and would give the customer a clear financial incentive to smooth their load. Such a system would provide behavioural incentives reflecting the real cost structure:</p>

<ul>
  <li><strong>Encouraged behaviour</strong>:
    <ul>
      <li>Slow ramping of loads (soft starting, not switching larger loads, etc.)</li>
      <li>Buffering via Batteries or thermal storage</li>
      <li>Sheduling of flexible loads</li>
    </ul>
  </li>
  <li><strong>Discouraged behaviour</strong>:
    <ul>
      <li>Rapid on/off cycling of loads</li>
      <li>Synchronized switching across households</li>
      <li>High frequency load fluctuations</li>
      <li>Non conforming devices leaking high frequency switching noise towards the grid</li>
    </ul>
  </li>
</ul>

<p>Industrial consumers already face complex tariffs reflecting demand peaks, power factor and time-of-use. Residential users currently do not, despite contributing significantly to variability. This model <strong>aligns individual cost with system impact</strong>, making pricing more economically efficient and fair. Importantly this model is <strong>immediately implementable</strong> using existing infrastructure.</p>

<h2 id="appendix-how-to-reduce-your-load-variation-in-practice">Appendix: How to Reduce Your Load Variation in Practice</h2>

<p>If billing starts to reflect not only <em>how much energy</em> is consumed but also <em>how it is consumed</em>, then reducing load variation becomes both economically and technically meaningful. Fortunately, many strategies are already available with relatively simple tools.</p>

<h3 id="avoid-sudden-large-load-steps">Avoid Sudden Large Load Steps</h3>

<p>The main driver of load variation is not energy usage itself, but rapid changes in power. Typical problematic patterns include:</p>

<ul>
  <li>Switching high-power devices abruptly (like EV chargers, heaters, induction stoves, etc.)</li>
  <li>Multiple devices turining on simultaneously</li>
  <li>Thermostatic systems oscillating between full on and off</li>
</ul>

<p>Whenever possible:</p>

<ul>
  <li>Prefer devices with soft-start or ramping behaviour</li>
  <li>Avoid manually switching multiple high loads at once</li>
  <li>Stagger the activation of large consumers by a few seconds</li>
</ul>

<p>Even small delays between devices can significantly reduce aggregated grid stress.</p>

<h3 id="use-smart-scheduling-for-flexible-loads">Use Smart Scheduling for Flexible Loads</h3>

<p>Here it gets more interesting and technical. Some loads are inherently flexible in time:</p>

<ul>
  <li>EV charging</li>
  <li>Dishwashers</li>
  <li>Washing machines</li>
  <li>Water heating</li>
</ul>

<p>Instead of starting them immediately, shift them to periods where your total load is already stable. A simple strategy is:</p>

<ul>
  <li>Only start new loads when current consumption is low and stable</li>
  <li>Avoid starting devices during existing ramps (e.g. while heating systems are activating)</li>
</ul>

<p>Home automation systems can implement this with simple rules.</p>

<h3 id="decouple-consumption-from-instantaneous-demand-buffering">Decouple Consumption from Instantaneous Demand (Buffering)</h3>

<p>This could be done in various ways:</p>

<ul>
  <li><strong>Electrical buffering</strong>: Small home batteries can absorb rapid changes, even modest capacities (1-5 kWh) are sufficient to smooth short spikes. Charging and discharging can be controlled to keep grid-facing power nearly constant.</li>
  <li><strong>Thermal buffering</strong>: Heat pumps and buffer tanks and water boilers storing thermal energy. These systems can run at constant power and store energy for later use, transforming spiky load into a smooth baseline.</li>
</ul>

<h3 id="limit-high-frequency-switching">Limit High-Frequency Switching</h3>

<p>Many modern devices (especially cheap power electronics) introduce rapid fluctuations. These include poorly filtered switching power supplies, fast PWM-controlled heaters and cheap motor controllers. While individually small, these effects can accumulate. To mitigate those effects prefer higher quality devices with proper filtering, add input filtering (LC filters) where applicable and avoid unnecessary rapid on/off control loops. In a billing model sensitive to variation, these small fluctuations may become economically visible.</p>

<h3 id="coordinate-loads-within-the-household">Coordinate Loads Within the Household</h3>

<p>A household is not a single device but a combination of dynamic systems. With coordination the total load variation can reduced by ramping devices up and down in a coordinated fashion, keeping the total power approximately constant. Especially coordinating devices like EV charging, heating or washing can drastically reduce load variations.</p>

<h3 id="monitoring">Monitoring</h3>

<p>Without monitoring one does never know what to optimize - or if there is something to optimize. Optimization can be done very easily via</p>

<ul>
  <li><a href="https://amzn.to/3OmwqqF">Digitally readable smart meters</a> providing time resolved readings</li>
  <li><a href="https://amzn.to/4sQ8YB4">Energy monitors</a> providing the total consumed energy</li>
  <li>Logging systems based on digitally readable smart meters and switch status of home automation systems.</li>
</ul>

<p>This allows to visualize the load profile $P(t)$ and the rate of load changes $\Delta P(t)$. When looking at the data inefficiencies often become immediately obvious.</p>

<h3 id="shift-thinking-from-total-energy-to-smooth-power">Shift Thinking from Total Energy to Smooth Power</h3>

<p>This is most likely the hardest one since most of us learnt at school to turn of unneeded devices to conserve power. The useful mental model is that the grid prefers a <em>constant request</em> over multiple small <em>bursty ones</em>. This shift in thinking would lead to fewer peaks, less cycling and lower system cost.</p>

<h2 id="references">References</h2>

<ul>
  <li>[<span id="ref1">1</span>] <a href="https://www.irena.org/-/media/Files/IRENA/Agency/Publication/2015/IRENA_Baseload_to_Peak_2015.pdf">From baseload to peak: Renewables provide a reliable solution</a></li>
  <li>[<span id="ref2">2</span>] <a href="https://www.gridcog.com/blog/volume-and-shape-understanding-the-cost-of-electricity-and-the-value-of-flexibility">Volume and Shape: Understanding the cost of electricity and the value of flexibility</a></li>
  <li>[<span id="ref3">3</span>] <a href="https://www.anl.gov/esia/prices-in-frequency-regulation-markets-impacts-of-natural-gas-prices-and-variable-renewable-energy">Prices in Frequency Regulation Markets: Impacts of Natural Gas Prices and Variable Renewable Energy</a></li>
  <li>[<span id="ref4">4</span>] <a href="https://www.esig.energy/demand-response-can-play-a-vital-role-in-ensuring-grid-reliability-and-dampening-price-volatility-in-wholesale-electricity-markets/">Demand response can play a vital role in ensuring grid reliability and dampening price volatility in wholesale electricity markets</a></li>
</ul>

<p><img src="/assets/images/png/vampire_electrician001.png" alt="" /></p>]]></content><author><name>tsp</name></author><category term="Opinion" /><category term="Basics" /><category term="Power grid" /><category term="Finance" /><summary type="html"><![CDATA[Rethinking electricity billing by aligning costs with load variability rather than just total consumption would allow a better distribution of caused costs and provide an incentive to reduce the strain on the network by reducing load variations.]]></summary></entry><entry><title type="html">Using Codex with Hardware In The Loop for Microcontrollers</title><link href="https://www.tspi.at/2026/03/24/hardwareloop.html" rel="alternate" type="text/html" title="Using Codex with Hardware In The Loop for Microcontrollers" /><published>2026-03-24T00:00:00+01:00</published><updated>2026-03-29T01:25:16+01:00</updated><id>https://www.tspi.at/2026/03/24/hardwareloop</id><content type="html" xml:base="https://www.tspi.at/2026/03/24/hardwareloop.html"><![CDATA[<p>When people talk about <em>“AI pair programmers”</em> they usually picture yet another autocomplete window. In this blog article I treated Codex as an embedded teammate with full access to a ModBus target bench: a single <a href="https://amzn.to/4sFikj5"><code class="language-plaintext highlighter-rouge">ATmega2560</code></a> fitted with a <a href="https://amzn.to/4bHhKtR"><code class="language-plaintext highlighter-rouge">MAX485</code></a> transceiver flashable via its main serial port and the Arduino bootloader, accessed on it’s secondary <code class="language-plaintext highlighter-rouge">UART</code> through a <a href="https://amzn.to/4bOt6wj">USB serial-to-RS485 adapter</a> Because <code class="language-plaintext highlighter-rouge">Codex</code> can compile, flash, and interrogate that modest setup as part of its inner loop, we ended up with a development cadence that felt like test-driven hardware bring-up instead of the usual edit-build-burn cycle.</p>

<p>This article just describes very brieflyvery briefly a flow that was really used this way, yielding a <a href="https://github.com/tspspi/avrModBus">workable library</a>, it is neither sugar coated nor exaggregated. Keep in mind this was a small scale project so it was a very easy task for the agent to perform. Note that the conversations with the agent are trimmed down, the shown snippets should just provide a rough idea.</p>

<ul>
  <li><a href="#bootstrapping-the-collaboration-and-designing-the-architecture">Bootstrapping the collaboration and designing the architecture</a></li>
  <li><a href="#iterating-with-real-silicon-in-the-loop">Iterating with real silicon in the loop</a></li>
  <li><a href="#debug-automation-without-babysitting">Debug automation without babysitting</a></li>
  <li><a href="#future-proofing-with-formal-methods">Future-proofing with formal methods</a></li>
  <li><a href="#what-are-the-benefits-of-hardware-in-the-loop">What are the benefits of hardware in the loop</a></li>
  <li><a href="#references">References</a></li>
</ul>

<p><img src="/assets/images/png/vibecode001.png" alt="" /></p>

<h2 id="bootstrapping-the-collaboration-and-designing-the-architecture">Bootstrapping the collaboration and designing the architecture</h2>

<p>We started by writing <code class="language-plaintext highlighter-rouge">AGENTS.md</code> as an executable contract. It spells out that <code class="language-plaintext highlighter-rouge">Codex</code> must keep <code class="language-plaintext highlighter-rouge">Timer0</code> free unless an optional <code class="language-plaintext highlighter-rouge">sysclock</code> is enabled, ship <code class="language-plaintext highlighter-rouge">UART</code> ISRs with ring buffers, and update every artifact (<code class="language-plaintext highlighter-rouge">AGENTS</code>, <code class="language-plaintext highlighter-rouge">DESIGN_DOCUMENT</code>, <code class="language-plaintext highlighter-rouge">TODO</code>, user docs) whenever reality changes. That file also pins the toolchain (<code class="language-plaintext highlighter-rouge">avr-gcc</code>/<code class="language-plaintext highlighter-rouge">avr-libc</code>/<code class="language-plaintext highlighter-rouge">binutils</code>/<code class="language-plaintext highlighter-rouge">avrdude</code>/GNU make via <code class="language-plaintext highlighter-rouge">gmake</code>), the target MCUs (<code class="language-plaintext highlighter-rouge">ATmega328P</code> utilizing <code class="language-plaintext highlighter-rouge">UART0</code> and <code class="language-plaintext highlighter-rouge">ATmega2560</code> utilizing UART0 and UART1 even though only the 2560 rig was in the loop for now), and communication habits (line-referenced file mentions, short status updates, immediate blocker escalation). Having that behavior encoded up front was the equivalent of onboarding a senior engineer in writing: every later decision referenced back to it, and Codex kept it fresh whenever constraints shifted.</p>

<blockquote>
  <p>“You are a design architect and software developer. We are going to implement a ModBus Slave on an ATMega microcontroller. First we are creating a design document till all open questions are resolved and we have specified all technical details. We do this in a back and forth conversation, you do not take decisions. Present open questions and provide suggestions. The user decides which decissions to take. You do not decide yourself. Follow the design rules from DESIGNRULES.md when writing the architecture document. In the first stage we are writing <code class="language-plaintext highlighter-rouge">docs/DESIGN_DOCUMENT.md</code> as a detailed technical design document and <code class="language-plaintext highlighter-rouge">docs/TODO.md</code> as an [ ] open, [x] done, [-] rejected ToDo list that you keep up to date all the time. After we finished designing you are going to implement the project according to the ToDo list using avr-gcc, avr-libc, binutils and avrdude. You build using gmake. Use no other tools. You can flash the program to the microcontroller using <code class="language-plaintext highlighter-rouge">gmake flash</code> and access the serial port via <code class="language-plaintext highlighter-rouge">/dev/ttyU0</code> as well as the RS485 bus on <code class="language-plaintext highlighter-rouge">/dev/ttyU1</code>. First draft your <code class="language-plaintext highlighter-rouge">AGENTS.md</code> that explains your role. Only ever edit <code class="language-plaintext highlighter-rouge">~/projectdirectory</code>.”</p>
</blockquote>

<p>With the agent contract in place we drafted <code class="language-plaintext highlighter-rouge">docs/DESIGN_DOCUMENT.md</code>. Codex drove that conversation like an architecture review. It enumerated which registers must be memory-backed, how MAX485 control pins are abstracted, what the ISR boundary looks like, and even future knobs (<code class="language-plaintext highlighter-rouge">Timer0</code> vs <code class="language-plaintext highlighter-rouge">Timer1</code> tick sources). Whenever ambiguity popped up - How should holding register 255 trigger EEPROM commits? Should UUIDs live in callbacks or static blocks? - Codex paused, listed options, and asked for confirmation before touching code. That high-velocity Q&amp;A mirrored how a human architect would unblock a team, just without the context loss that happens when humans juggle too many requirements.</p>

<blockquote>
  <p>“Start writing the ModBus RTU slave architecture for ATmega2560 + MAX485 attached to UART1. Document UART buffers, gap timing and the reset behavior before you write firmware. Also honor proper timeout processing. We support input registers, coil outputs, holding registers and output registers. We need write single, write multiple, read single and read multiple commands for those registers. We implement a set of fixed registers for the device address (…), baud rate (…) as well as an UUID based identity register to identity the device and the firmware. Later we are also going to implement an RS485 capable bootloader so we are able to flash the device via the RS485 bus (this will be an independent project)”</p>
</blockquote>

<p>To keep implementation straight we used <code class="language-plaintext highlighter-rouge">docs/TODO.md</code> as both a kanban lane and a verification ledger. Items ranged from API scaffolding to tests with the single board RS485 link. The checklist style (explicit <code class="language-plaintext highlighter-rouge">[ ]</code> vs <code class="language-plaintext highlighter-rouge">[x]</code>) made it trivial to see which capabilities still needed either implementation or bench validation. Parallel to that we maintained <code class="language-plaintext highlighter-rouge">KB/index.md</code>, a placeholder knowledge base meant for any external ModBus or AVR timing references Codex might have had to fetch. Even when the KB stayed empty, the scaffolding reminded us that Codex could, on demand, go out to public documentation, store PDFs or markdown summaries under <code class="language-plaintext highlighter-rouge">KB/</code>, and cite them later (this was left out in the presented instructions above).</p>

<p>This is a pretty standard approach to use an coding agent. One will spend usually between an hour and half a day on writing a design document this way, depending on the project scope, talking back and forth with the agent to resolve open questions, discuss feasability as well as pros and cons and take decisions. This phase feels like the meetings with human engineers during the design and architecture phase, though being way more productive and experiencing less friction and social stress. And in contrast to the human world dumb ideas always get harsh feedback.</p>

<p><img src="/assets/images/png/vibecode003.png" alt="" /></p>

<h2 id="iterating-with-real-silicon-in-the-loop">Iterating with real silicon in the loop</h2>

<p>Once the architecture felt solid Codex shifted into coding mode. It produced the UART/RS-485 hardware layer, the ModBus core parser, and the register handlers in digestible steps, performing unit tests as it went. It always followed the same pattern: update the TODO, write code, run <code class="language-plaintext highlighter-rouge">gmake</code>, run unit tests, flash the ATmega2560 and immediately exercise register read and writes over the physical bus via on-the-fly written <code class="language-plaintext highlighter-rouge">pyserial</code> scripts. Because the MAX485 driver enable lines and UART ISRs were part of the same repo, Codex could inject temporary instrumentation (extra GPIO toggles, debug prints gated behind <code class="language-plaintext highlighter-rouge">#ifdef MODBUS_DEBUG</code>, CRC probes, etc.) without breaking the contract, test the hypothesis on hardware, and then strip the probes again - all inside a single loop. This fitted the <code class="language-plaintext highlighter-rouge">printf</code> style debugging that junior engineers often use.</p>

<blockquote>
  <p>“Perform a sequence of reads and writes into and from the registers and dump debug messages on the AVRs serial port. Interact with the device via the RS485 bus and inspect validity of the reaction on the serial port. Create valid and invalid requests.”</p>
</blockquote>

<p>A concrete example came up when we noticed that writing a new device ID took effect immediately, which violates ModBus expectations. Codex traced the bug by flashing diagnostic builds that logged both the pending and active IDs after every frame. It then restructured <code class="language-plaintext highlighter-rouge">modbus_core.c</code> to separate <code class="language-plaintext highlighter-rouge">PendingConfig</code> from <code class="language-plaintext highlighter-rouge">ActiveDeviceId</code>, staged new values in RAM only, and confirmed on the rig that the slave still answered on the old ID until the reset magic <code class="language-plaintext highlighter-rouge">0xAA55</code> forced a reboot. That entire investigation - code change, compilation, flashing, scripted ModBus transactions, and regression verification - ran autonomously while we observed the terminal output.</p>

<blockquote>
  <p>“I see a device ID bug. The device seems to automatically apply directly after writing into the respective register. Reproduce the bug, capture pending vs active IDs over UART, and refactor to fix so we only apply the change after rebooting the device via reboot magic.”</p>
</blockquote>

<p>Because hardware was always in the loop Codex could also stress scenarios that usually wait for the lab: half-duplex turnaround timing, bursts, deliberate line silence to test the 1.5/3.5 character gap watchdog, and watchdog-induced resets. Whenever a test exposed a weakness, Codex modified the source (sometimes inserting extra assertions or statistics counters), rebuilt, and reran the scenario minutes later. There was no <em>“hand code to a human to flash”</em> delay, so iteration speed approached software-only TDD despite touching real silicon.</p>

<p><img src="/assets/images/png/vibecode002.png" alt="" /></p>

<h2 id="debug-automation-without-babysitting">Debug automation without babysitting</h2>

<p>The workflow never depended on someone manually driving a serial console. Instead we kept lightweight Python and shell utilities around to spray ModBus frames, capture responses, and reset boards via the watchdog harness. Codex could call those scripts, parse their output, and decide on the next change without waiting for human prompts. That made higher-level experiments feasible: for example, sweeping coil write bursts across dozens of registers while monitoring current draw, or verifying that ring buffer overruns stay cleared even when the main loop intentionally starves <code class="language-plaintext highlighter-rouge">ModBusService()</code> for a few milliseconds.</p>

<p>This autonomy extended to documentation and guardrails. Any time the behavior changed, Codex updated <code class="language-plaintext highlighter-rouge">AGENTS</code>, the design doc, <code class="language-plaintext highlighter-rouge">TODO</code>, and the example app notes. It also would have been trivial to pull protocol specs from the public internet, normalize them into <code class="language-plaintext highlighter-rouge">KB/*.md</code>, and cite them inline - handy when juggling RTU timing or EEPROM endurance data. The same mechanism can ingest errata sheets, ModBus application notes or even oscilloscope captures dumped via the <a href="https://github.com/tspspi/pylabdevs">pylabdevs</a> devices, giving future sessions instant context.</p>

<h2 id="future-proofing-with-formal-methods">Future-proofing with formal methods</h2>

<p>One of the underrated perks of this setup is how easily it can grow into formal assurance. The same Codex agent that compiles and flashes code can also emit <code class="language-plaintext highlighter-rouge">ACSL</code> annotations for critical routines. Feed those annotations plus the source into Frama-C, and you gain static guarantees (no runtime errors, preserved invariants) before the bits ever hit flash. Coupling Frama-C proofs with hardware-in-the-loop regression runs lets you blend mathematical confidence with empirical validation, a combination that is usually out of reach for small embedded teams.</p>

<p><img src="/assets/images/png/vibecode004.png" alt="" /></p>

<h2 id="what-are-the-benefits-of-hardware-in-the-loop">What are the benefits of hardware in the loop</h2>

<p>Putting Codex inside the hardware loop changed the economics of firmware work. Instead of queuing questions for a future lab slot, we answered them immediately with the actual boards. Instead of hoping a human remembered every constraint, we encoded expectations in <code class="language-plaintext highlighter-rouge">AGENTS</code> and the design doc so the assistant could enforce them relentlessly. Instead of deferring documentation, we kept the narrative up to date as part of every change. Most importantly, the assistant never tired: it could keep iterating—tweaking ISR latency, adjusting ModBus timing, mutating register maps, or running soak tests—long after a human would have walked away.</p>

<p>If you are building microcontroller firmware with tight loops, shared peripherals or embedded networks, wiring Codex into your hardware bench gives you the confidence of continuous validation with the speed of scripted development. Whether you need a minimally guided debugging partner or a fully autonomous regression runner, the same ingredients apply: define the agent contract, capture the architecture, keep <code class="language-plaintext highlighter-rouge">TODO</code> and <code class="language-plaintext highlighter-rouge">KB</code> artifacts honest, and hand the assistant access to your toolchain plus your boards. The result is a development flow that feels both methodical and fast—exactly what embedded projects need.</p>

<h2 id="references">References</h2>

<ul>
  <li><a href="https://chatgpt.com/codex">OpenAI codex</a></li>
  <li><a href="https://www.tspi.at/2021/02/10/rs485avr.html">RS485 communication using Atmel ATMega328P</a></li>
  <li>Used utilities:
    <ul>
      <li><a href="https://amzn.to/4sFikj5">Arduino Mega 2560</a></li>
      <li><a href="https://amzn.to/4bOt6wj">Waveshare USB to RS485 converter</a></li>
      <li><a href="https://amzn.to/4bHhKtR">MAX485 breakout board</a></li>
      <li><a href="https://amzn.to/4d2yq1l">Hantek 6022BE USB Digital Oszilloskop</a></li>
    </ul>
  </li>
  <li>The <a href="https://github.com/tspspi/avrModBus">resulting ModBus RTU slave toolkit for AVR</a></li>
</ul>

<p><img src="/assets/images/png/vibecode007.png" alt="" /></p>]]></content><author><name>tsp</name></author><category term="Programming" /><category term="Opinion" /><category term="Case study" /><category term="Artificial Intelligence" /><category term="Tutorial" /><category term="Hardware" /><category term="RS485" /><category term="Large Language Models" /><category term="ANSI C" /><category term="Microcontroller" /><category term="AVR" /><category term="Vibe coding" /><category term="ModBus" /><summary type="html"><![CDATA[What happens when an AI does not just suggest code, but actually compiles it, flashes it onto a microcontroller, and tests it against real hardware? In this article we explore a workflow where Codex is wired directly into a ModBus RS485 test bench, turning firmware development into a continuous loop of design, implementation, and validation on real silicon. Instead of the usual edit–build–flash cycle, the system behaves more like a self-driven engineer: asking architectural questions, updating documentation, running tests, and iterating until the behavior matches the specification. Using a concrete ATmega2560 + MAX485 setup, we walk through how design documents, TODO tracking, and an explicit agent contract enable this workflow—and what it feels like to debug firmware when the assistant can probe the system itself. From catching subtle protocol violations to restructuring configuration handling in real time, the result is a development process that blends structured engineering with rapid experimentation, all with hardware permanently in the loop.]]></summary></entry><entry><title type="html">LLMs Do Not Remember Facts, They Encode Patterns</title><link href="https://www.tspi.at/2026/03/10/llmsdontremember.html" rel="alternate" type="text/html" title="LLMs Do Not Remember Facts, They Encode Patterns" /><published>2026-03-10T00:00:00+01:00</published><updated>2026-03-10T22:44:48+01:00</updated><id>https://www.tspi.at/2026/03/10/llmsdontremember</id><content type="html" xml:base="https://www.tspi.at/2026/03/10/llmsdontremember.html"><![CDATA[<p>When people talk about large language models (LLMs), they often say that the model <em>“stores knowledge”</em> in its neural network weights. This sounds intuitive and convenient, but it is also deeply misleading. Treating an LLM as if it were a database full of facts leads to confusion about both its capabilities and its limitations.</p>

<p>A much more accurate picture emerges if we stop thinking about LLMs as knowledge containers and instead see them as pattern engines that have learned how ideas, statements, equations, and explanations tend to transform into each other.</p>

<p>To understand why this distinction matters, we need to look at what actually happens when a language model produces an answer.</p>

<ul>
  <li><a href="#why-an-llm-is-not-a-knowledge-database">Why an LLM Is Not a Knowledge Database</a></li>
  <li><a href="#even-scientific-formulas-are-patterns">Even Scientific Formulas Are Patterns</a></li>
  <li><a href="#what-actually-lives-inside-the-model">What Actually Lives Inside the Model</a></li>
  <li><a href="#why-external-knowledge-systems-are-necessary">Why External Knowledge Systems Are Necessary</a>
    <ul>
      <li><a href="#vector-retrieval-finding-similar-text">Vector Retrieval: Finding Similar Text</a></li>
      <li><a href="#graph-retrieval-recovering-structure">Graph Retrieval: Recovering Structure</a></li>
    </ul>
  </li>
  <li><a href="#why-this-matters">Why This Matters</a></li>
</ul>

<p><img src="/assets/images/png/llmsdontlearn001.png" alt="" /></p>

<h2 id="why-an-llm-is-not-a-knowledge-database">Why an LLM Is Not a Knowledge Database</h2>

<p>A classical knowledge system stores information explicitly. A database entry might look like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(country="Austria", capital="Vienna")
</code></pre></div></div>

<p>If we ask the system for the capital of Austria, it simply performs a lookup and returns the stored value.</p>

<p>A language model does something fundamentally different. It does not retrieve a stored record. Instead it predicts the most probable continuation of a text sequence based on statistical patterns learned during training.</p>

<p>At the outermost level, the system produces tokens by sampling from a probability distribution over possible continuations. Mathematically this is often written as</p>

[
P\left(w_t \mid w_1, w_2, ..., w_{t-1}\right)
]

<p>which describes how likely a particular token is given the tokens that came before it. During training the model learns this probability distribution by adjusting its internal parameters to minimize a loss function. In practice this is typically the cross-entropy loss between the predicted probability distribution and the actual next token in the training data. Gradient descent is then used to update billions of parameters so that the model gradually becomes better at predicting the next token in a sequence. Over many training iterations this process shapes the internal representations of the network so that useful linguistic and conceptual patterns emerge. The model is therefore not explicitly programmed with rules or facts. Instead its internal structure is optimized purely through exposure to vast amounts of text.</p>

<p>However, this formula alone can be misleading. If language models were <em>only</em> performing classical statistical estimation over token frequencies, they would behave much more like sophisticated n-gram models or Bayesian predictors. Such systems <em>can</em> reproduce <em>local</em> statistics, but they cannot <em>generalize</em> well and they cannot discover deeper structures in language.</p>

<p>The crucial difference is the neural network itself. A modern transformer model contains many layers of nonlinear transformations and attention heads that dynamically route information across the sequence. These mechanisms allow the network to detect relationships between words, concepts, and symbolic expressions that may be far apart in the text.</p>

<p>The key mechanism that enables transformers to capture long-range relationships is called <em>attention</em>. Instead of processing tokens strictly one after another like earlier neural networks, the model dynamically decides which parts of the input sequence are relevant when interpreting a particular token. In practice each token generates queries that search for relevant keys among all other tokens in the sequence. The resulting weighted combinations of information allow the model to connect words that may be far apart in the text. This mechanism is what allows modern language models to track references, follow arguments across paragraphs, and relate mathematical symbols to explanations appearing elsewhere in the context.</p>

<p>The probability distribution above is therefore only the final sampling interface of the system. Behind it lies a very large nonlinear pattern recognition machine. During training, the neural network learns internal representations that capture regularities in language, mathematics, explanations, and reasoning patterns. Crucially, what the model learns are patterns, not explicit facts. The training process does not insert statements like “Vienna is the capital of Austria” into a memory structure. Instead it adjusts billions of parameters so that certain regions of a very high‑dimensional representation space correspond to recurring conceptual relationships observed in the training data.</p>

<p>When the model answers a question like <em>“What is the capital of Austria?”</em> it does not retrieve Vienna from a memory table. Instead the network transforms the prompt through these learned representations until the sequence of tokens corresponding to the word “Vienna” becomes overwhelmingly likely under the learned patterns. In practice most tokenizers do not even operate on full words, but on sub‑word fragments, so the model is assembling the answer piece by piece according to the patterns it has learned.</p>

<p>The difference might sound subtle, but it has deep consequences. Databases store facts explicitly. Language models instead learn structures in which certain statements naturally follow from certain contexts.</p>

<p><img src="/assets/images/png/llmsdontlearn_dbvspattern.png" alt="" /></p>

<h2 id="even-scientific-formulas-are-patterns">Even Scientific Formulas Are Patterns</h2>

<p>This becomes even clearer if we look at something that appears to be very precise: a physics equation.</p>

<p>Consider the equation</p>

[
F = \frac{\mathrm{d}p}{\mathrm{d}t}
]

<p>At first glance it might seem as if the model simply memorized this formula, much like a bad student who has learned a line from a textbook without really understanding what it means. But that interpretation is misleading. The equation itself is not the knowledge. It is only a symbolic representation of a deeper concept: force describes the change of momentum over time.</p>

<p>To see why this matters, it helps to compare two kinds of understanding. A student who memorizes the formula $F = \frac{\mathrm{d}p}{\mathrm{d}t}$ may be able to reproduce the symbols on an exam, but the expression itself is just a sequence of characters to them. A physicist, in contrast, does not think primarily about the letters or the notation. For them the equation activates a much richer conceptual structure.</p>

<p>When a physicist sees this expression, it immediately connects to a broader pattern of how the universe behaves. Ideas about dynamics, momentum, and interaction come into play. In modern physics this also touches deeper principles: symmetries of space and time, conservation laws, and the structures described by symmetry groups. The equation is only one compact way of encoding these relationships. Mathematics is essentially the language we use to describe those patterns precisely.</p>

<p>In other words, the formula is not an isolated statement. It is a symbolic gateway into a network of concepts describing how physical systems evolve.</p>

<p>An LLM learns something somewhat analogous on the linguistic level. It does not store the equation as a static mathematical object. Instead it learns the linguistic and symbolic patterns connecting force, momentum, change, Newtonian dynamics and the notation used to express those relations.</p>

<p>Because these models are trained on an enormous portion of written human knowledge, they are exposed to a vast range of explanations, arguments, analogies, and reasoning styles. What the network therefore absorbs are not individual statements, but recurring patterns of thinking: how humans explain physics, how they derive formulas, how they reason about systems, and how concepts connect to each other. Over time the training process shapes a high‑dimensional representation space that reflects many of the cognitive patterns present in human discourse.</p>

<p>When it writes $F = \frac{\mathrm{d}p}{\mathrm{d}t}$ it is reproducing a learned mapping between these representational forms. The model has internalized the pattern linking them, not the equation as an isolated fact.</p>

<p>This is why language models are surprisingly good at rewriting equations into explanations and explanations back into equations. Because they have learned the structural patterns connecting these representations rather than memorizing individual statements, they can often generalize those patterns to new situations. When faced with a problem they have never seen before, the model can still apply similar reasoning structures it encountered during training, which is why LLMs are sometimes capable of solving entirely new problems that were never explicitly present in their training data.</p>

<p><img src="/assets/images/png/llmsdontlearn001_formulas.png" alt="" /></p>

<h2 id="what-actually-lives-inside-the-model">What Actually Lives Inside the Model</h2>

<p>Inside the neural network there are no explicit facts, rules, or entries. Instead there is a high‑dimensional parameter space that encodes regularities of language, concepts, and symbolic relations. One can think of this space as a vast hyperspace in which related meanings, explanations, equations, and narratives occupy nearby regions. During training the model gradually shapes this hyperspace so that patterns that frequently appear together in human communication become geometrically aligned.</p>

<p>Importantly, the intermediate layers of the neural network learn these structures largely on their own during training. No human explicitly tells the model which internal neurons should represent which concepts. Instead the network discovers useful internal patterns because doing so improves its ability to predict the next token. These internal features therefore do not necessarily correspond to the neat conceptual categories humans might use to organize knowledge. The model may capture many subtle correlations present in the training data — sometimes meaningful conceptual relationships, sometimes statistical associations that humans would not consciously describe.</p>

<p>Modern architectures are also intentionally designed to prevent the model from simply memorizing the training data. Neural networks contain narrow information pathways and compression steps that force the system to represent information efficiently. If the network could simply memorize every sentence it saw, it would fail to generalize. This problem is known as overfitting: the model would reproduce training examples perfectly but perform poorly on new inputs.</p>

<p>Because of these architectural constraints, the model has little choice but to learn reusable patterns instead of storing individual facts. In other words, the structure of the network itself encourages the discovery of general relationships rather than direct memorization.</p>

<p>A useful mental model is that the weights define a transformation landscape. Certain prompts push the internal state of the network into regions where specific continuations become highly probable. If a prompt mentions <em>“capital”</em> and <em>“Austria”</em> the internal representation of the prompt moves into a region of this hyperspace where the continuation corresponding to the word <em>“Vienna”</em> becomes highly probable, most likely activating zones representing cities, capitals, governmental systems, vacations, etc. on the way. But this is not a <em>discrete</em> memory. It is more like an <em>attractor in a probability field</em>.</p>

<p>One of the most striking discoveries in modern AI research is that the capabilities of these models follow relatively predictable scaling laws. As the size of the neural network, the amount of training data, and the available computation increase, the performance of the model improves in a smooth and often surprisingly regular way. Larger models tend to discover richer internal representations and capture increasingly subtle patterns in language and reasoning. At certain scales new capabilities appear that were not obvious in smaller systems. This phenomenon, sometimes described as <a href="/diy/gpusizeestimatellm.html">emergent abilities</a>, is one reason why very large models can perform tasks that smaller models struggle with, even though they are trained with the same fundamental objective of next-token prediction.</p>

<p>The model therefore behaves less like a database and more like a system that has learned how concepts tend to follow each other.</p>

<p><img src="/assets/images/png/llmsdontlearn001_whatsinside.png" alt="" /></p>

<h2 id="why-external-knowledge-systems-are-necessary">Why External Knowledge Systems Are Necessary</h2>

<p>Because LLMs operate through pattern reproduction rather than fact retrieval, they are not ideal sources of authoritative knowledge.</p>

<p>The model can generate extremely plausible statements that were never true in the first place if those statements match the typical reasoning or explanation patterns found in human language. In other words, the model can produce answers that <em>sound exactly like something a knowledgeable human might say</em> even when the underlying statement is incorrect or incomplete.</p>

<p>Interestingly, something very similar happens in human thinking. People sometimes believe they understand a topic because they can reproduce the usual explanation pattern associated with it. Only when they try to verify the statement or derive the result do they discover that their understanding was incomplete. In that sense, the failure mode of LLMs is not entirely foreign - it mirrors a common limitation of human reasoning as well.</p>

<p>For applications where factual accuracy matters, the model therefore needs access to external information sources. This is the motivation behind <strong>Retrieval Augmented Generation (RAG)</strong>.</p>

<p>In a RAG system, the language model does not rely solely on its internal patterns. Instead, it receives relevant documents retrieved from an external knowledge base and reasons over them while generating the answer. The architecture then becomes conceptually simple. A retrieval system finds relevant information, and the language model acts as a reasoning engine that interprets and synthesizes that information.</p>

<p>This division of labor <em>mirrors how humans work</em>. A scientist does not memorize every paper ever written. Instead they consult references and then reason about the information they find. Humans routinely perform their own form of <em>retrieval augmented reasoning</em>: they look up articles in encyclopedias or on Wikipedia, consult textbooks or lexica, and use formal tools such as mathematics to verify whether a statement is actually correct.</p>

<p>Another remarkable property of large language models is <em>in-context learning</em>. Even though the models weights remain fixed after training, it can temporarily adapt its behavior based on examples provided directly in the prompt. If a prompt includes several demonstrations of how a task should be performed, the model often continues the pattern correctly for new inputs. In effect the model performs a form of short-term learning inside the context window. The internal representations inferred from the prompt guide the generation process without requiring any permanent update to the model parameters. This ability further illustrates that the model operates by reproducing patterns rather than retrieving stored rules.</p>

<h3 id="vector-retrieval-finding-similar-text">Vector Retrieval: Finding Similar Text</h3>

<p>Most RAG systems rely on vector embeddings to retrieve relevant documents.</p>

<p>Text passages are converted into vectors in a high‑dimensional space. When a user asks a question, the system computes the embedding of the query and searches for passages whose vectors are nearby.</p>

<p>What does “nearby” mean in this context? During training, embedding models learn to place pieces of text that are used in similar contexts close to each other in this space. The geometry of the space therefore begins to encode meaning. Sentences that talk about related ideas tend to end up in neighboring regions, even if they use different words. At the same time, this space often captures stylistic and rhetorical patterns as well. Technical explanations cluster differently from casual descriptions, and scientific writing occupies different regions than narrative text. You can spot the difference in the vector embeddings of <a href="/2025/09/25/simi.html">this blog</a></p>

<p>In other words, the high‑dimensional embedding space simultaneously encodes aspects of semantics, style, and conceptual associations. Similarity between two vectors is typically measured using cosine similarity or related metrics, which essentially check whether two vectors point in a similar direction in that space. Performing a nearest neighbor search in this space yields all semantically similar statements inside the knowledge base (I use this for example <a href="/2025/09/25/simi.html">for the suggested articles on the bottom of every page</a>)</p>

<p>It is also worth noting that embedding vectors themselves are often generated using transformer models very similar to the LLMs that later consume them. These models learn to map text into this geometric representation so that related meanings occupy nearby regions of the hyperspace.</p>

<p>This approach works very well when the answer is contained in text that is semantically similar to the query.</p>

<p>However, similarity is not the same as structure.</p>

<p>Many difficult questions depend on relationships between entities rather than simple textual similarity.</p>

<p>If the relevant information is spread across multiple documents that describe different parts of a system, vector retrieval may return fragments that are individually related to the question but fail to capture how those fragments connect to each other.</p>

<p>Typical backends are databases like <code class="language-plaintext highlighter-rouge">pg_vector</code>  in <a href="https://www.postgresql.org/">PostgreSQL</a> or database systems like <a href="https://www.trychroma.com/">ChromaDB</a>.</p>

<p><img src="/assets/images/png/llmsdontlearn_vector001.png" alt="" /></p>

<h3 id="graph-retrieval-recovering-structure">Graph Retrieval: Recovering Structure</h3>

<p>Graph‑based retrieval addresses this limitation by representing knowledge as a network of entities and relationships. Instead of storing only text chunks, the system builds a graph where nodes represent concepts or objects and edges represent relationships such as causation, dependency, or hierarchy. When a query arrives, the system retrieves a relevant subgraph rather than a collection of independent text fragments. This explicit structure makes complex reasoning easier. If the system already knows that</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>component A depends on component B
</code></pre></div></div>

<p>and that</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>component B failed during a redesign
</code></pre></div></div>

<p>then the reasoning path connecting those events is already encoded in the graph.</p>

<p>The language model can then focus on interpreting the structure rather than reconstructing it from scattered prose. The model can traverse such graph structures iteratively: it can follow relationships from one node to the next, interpret the intermediate results and then decide which connections to explore next. By repeating this process over multiple steps, the model can perform multi‑hop reasoning across the graph.</p>

<p>The relationships stored in such graphs often resemble what is known in the semantic web world as <a href="https://en.wikipedia.org/wiki/Semantic_triple">RDF triples</a>. These triples represent knowledge as simple subject‑predicate‑object statements, for example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>("Vienna", "is capital of", "Austria")
("Electron", "has property", "charge")
("Component B", "failed during", "redesign")
</code></pre></div></div>

<p>When many such triples are connected together they form a rich knowledge graph that captures relationships between entities. Graph databases such as <a href="https://neo4j.com/">Neo4j</a> are commonly used to store and query these structures efficiently.</p>

<p>Interestingly, the extraction of these triples from unstructured text is often performed using the same kind of transformer models discussed earlier. LLMs can read documents, identify entities and relationships, and convert them into structured graph representations that can later be used for graph‑based retrieval and reasoning.</p>

<p><img src="/assets/images/png/llmsdontlearn001_graph.png" alt="" /></p>

<h2 id="why-this-matters">Why This Matters</h2>

<p>The deeper lesson is that modern AI systems work best when we separate three roles.</p>

<ul>
  <li>External systems store information.</li>
  <li>Retrieval systems locate relevant pieces of that information.</li>
  <li>Language models reason about the retrieved material and transform it into explanations, summaries, or decisions.</li>
</ul>

<p>When these components are combined, the language model becomes something closer to a cognitive engine operating on a structured information environment. In practice this means that asking a standalone LLM factual questions without any grounding is often the wrong way to use the technology. The model itself is not designed to be the authoritative storage location of knowledge. Its real strength lies in interpreting patterns, combining ideas, performing reasoning steps, and synthesizing information once the relevant data has been supplied by retrieval systems such as RAG or GraphRAG.</p>

<p>Interestingly, the much‑discussed phenomenon of <em>“hallucinations”</em> is closely related to this capability. The same mechanism that allows the model to generate plausible statements beyond its training examples is what enables creativity and generalization. If the system were restricted to reproducing only statements that appeared verbatim in its training data, it would behave like a database lookup and would be incapable of solving new problems or combining ideas in novel ways.</p>

<p>In that sense hallucinations are not purely a bug, they are a side effect of the very property that makes these systems powerful. When they appear problematic, it is often a sign that the system is being used without proper grounding. Once external retrieval systems provide the factual information and the LLM is used primarily for reasoning and interpretation, the architecture begins to resemble a much more robust cognitive system.</p>

<p>This same mechanism can deliberately be used as a feature. Because the model can generate plausible variations and speculative ideas, it can be used to explore creative solution spaces. If the system is connected to verification tools - for example a mathematical <a href="/2026/01/08/proofllm.html">proof assistant</a>, a symbolic solver, or a simulation toolkit - the model can propose candidate ideas while the external tool checks whether they are actually correct. In this way hallucination becomes a generator of hypotheses while external tools provide validation. This pattern is increasingly used in research systems where LLMs propose conjectures, derive candidate formulas, or sketch solution paths which are then verified automatically.</p>

<p>It is therefore helpful to think of the LLM as a machine that has learned how ideas move - how arguments unfold, how explanations are constructed, and how pieces of knowledge connect to each other. And once this reasoning engine is connected to reliable sources of information and verification tools, its ability to analyze, explore, and synthesize knowledge becomes extraordinarily powerful.</p>]]></content><author><name>tsp</name></author><category term="Artificial Intelligence" /><category term="Tutorial" /><category term="How stuff works" /><category term="Machine learning" /><category term="LLM" /><summary type="html"><![CDATA[Large language models are often described as systems that store knowledge, but this picture is misleading. In reality, modern AI models do not function like databases filled with facts. Instead they learn complex patterns that describe how ideas, explanations, and symbols tend to relate to each other. When an LLM answers a question, it is not retrieving a stored entry, it is generating the most plausible continuation of a pattern learned from enormous amounts of text. This article explains how those patterns emerge inside neural networks, why LLMs sometimes produce convincing but incorrect answers, and why systems such as RAG and knowledge graphs are essential for reliable AI applications. By understanding how these models actually work, we can stop treating them like encyclopedias and start using them as what they really are: powerful reasoning engines operating on external knowledge systems.]]></summary></entry><entry><title type="html">Using MySQL as a Tool for n8n Agents - Flexible Queries without SQL Injection</title><link href="https://www.tspi.at/2026/03/07/n8nmysql.html" rel="alternate" type="text/html" title="Using MySQL as a Tool for n8n Agents - Flexible Queries without SQL Injection" /><published>2026-03-07T00:00:00+01:00</published><updated>2026-03-07T09:34:35+01:00</updated><id>https://www.tspi.at/2026/03/07/n8nmysql</id><content type="html" xml:base="https://www.tspi.at/2026/03/07/n8nmysql.html"><![CDATA[<p>When I started using <a href="https://n8n.io/">n8n</a> in combination with the <a href="https://www.mysql.com/">MySQL</a> node
I somewhat struggled with the documentation. I wished there was a single clear simple <em>recipe</em> describing
how to give an <code class="language-plaintext highlighter-rouge">n8n</code> AI agent access to a MySQL database by providing a list of parameters, being able
to write some filter code and specify the SQL statement yourself.</p>

<p>Most examples either:</p>

<ul>
  <li>Come in video format and are thus not very accessible.</li>
  <li>Use the existing <code class="language-plaintext highlighter-rouge">SELECT</code>, <code class="language-plaintext highlighter-rouge">INSERT</code> and <code class="language-plaintext highlighter-rouge">UPDATE</code> methods which are restricted with
respect to arbitrary table access, arbitrary ordering and similar ideas</li>
  <li>allow completely free SQL (which is dangerous)</li>
  <li>restrict things so much that the agent becomes useless.</li>
</ul>

<p>The pattern described here works reliably in practice and allows an agent to query any allowed table in
measurement databases flexibly while still maintaining proper boundaries.</p>

<ul>
  <li><a href="#basic-idea">Basic Idea</a></li>
  <li><a href="#create-a-minimal-database-user-least-privilege">Create a Minimal Database User (Least Privilege)</a></li>
  <li><a href="#write-a-strict-tool-description">Write a Strict Tool Description</a></li>
  <li><a href="#defining-parameters-with-fromai">Defining Parameters</a></li>
  <li><a href="#building-the-sql-query">Building the SQL Query</a></li>
  <li><a href="#always-add-a-limit">Always Add a LIMIT</a></li>
  <li><a href="#security-rules-you-should-actually-follow">Security Rules You Should Actually Follow</a></li>
  <li><a href="#useful-extensions">Useful Extensions</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
</ul>

<p><img src="/assets/images/png/n8nmysql001.png" alt="" /></p>

<h2 id="basic-idea">Basic Idea</h2>

<p>The concept is simple:</p>

<ul>
  <li>Attach a <strong>MySQL Tool Node</strong> to an agent node.</li>
  <li>Configure the connection statically (specify server, username, password and database)</li>
  <li>Give the tool a <strong>detailed description</strong> so the model understands what tables exist, in best case
informing it about the meaning of the columns.</li>
  <li>Build the SQL query dynamically in <strong>Execute SQL mode</strong> using JavaScript expressions. Apply whitelists
to restricted names like table names or to perform parameter validation. No matter what you try, there
is no other safe method than whitelisting.</li>
</ul>

<p>Never allow the LLM to generate raw SQL if your data should not be corrupted (though it is interesting to
see how an LLM handles it’s own database when it gets arbitrary access, just make sure to sandbox the
environment properly).</p>

<h2 id="create-a-minimal-database-user-least-privilege">Create a Minimal Database User (Least Privilege)</h2>

<p>If you don’t yet have a dedicated database user for <code class="language-plaintext highlighter-rouge">n8n</code>, create one with minimal permissions. For example:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">USER</span> <span class="s1">'n8nuser'</span> <span class="n">IDENTIFIED</span> <span class="k">WITH</span> <span class="n">mysql_native_password</span> <span class="k">BY</span> <span class="s1">'REPLACE_ME'</span><span class="p">;</span>

<span class="k">GRANT</span> <span class="k">SELECT</span> <span class="k">ON</span> <span class="n">exampledb</span><span class="p">.</span><span class="o">*</span> <span class="k">TO</span> <span class="s1">'n8nuser'</span><span class="p">;</span>
<span class="k">GRANT</span> <span class="k">SELECT</span> <span class="k">ON</span> <span class="n">exampledb2</span><span class="p">.</span><span class="o">*</span> <span class="k">TO</span> <span class="s1">'n8nuser'</span><span class="p">;</span>
</code></pre></div></div>

<p>For most reporting or sensor databases <strong>SELECT is enough</strong>. Avoid granting statements with side effects
like <code class="language-plaintext highlighter-rouge">INSERT</code>, <code class="language-plaintext highlighter-rouge">UPDATE</code>, <code class="language-plaintext highlighter-rouge">DELETE</code>, etc. unless you really need them. This is the same principle as for any
web application - only grant the minimal required permissions.</p>

<h2 id="write-a-strict-tool-description">Write a Strict Tool Description</h2>

<p>You can let <code class="language-plaintext highlighter-rouge">n8n</code> auto-generate the tool description, but it is usually better to define it manually, especially
when not restricting access in a restrictive manner. A strict and explicit description helps the agent understand
the schema and also prevents hallucinated tables.</p>

<p>A short example could look like the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Execute a query in the historical measurement database (temperatures and humidities as well as information about the present sensors)

* table has to be one of the following table names (excluding the column descriptions):
   * humiditysensors(id, label, description)
   * temperaturesensors(id, label, description)
   * humidityvalues(ts, sensorid, humidity) are the humidity values of the sensors with sensor id sensorid at unix time ts.
   * temperaturevalues(ts, sensorid, temp) is the temperature of the sensor with sensorid (foreign key to temperaturesensors) at time ts (unix timestamp)
</code></pre></div></div>

<p>This does two things:</p>

<ul>
  <li>It teaches the agent which tables exist.</li>
  <li>It provides the basis for strict whitelisting later.</li>
</ul>

<h2 id="defining-parameters-with-fromai">Defining Parameters with <code class="language-plaintext highlighter-rouge">$fromAI</code></h2>

<p>This has been there in nearly all tutorials but it was never said <em>explicit</em> enough for me to get it immediately. To <em>define</em> a
parameter one is accessing it with the <code class="language-plaintext highlighter-rouge">fromAI</code> method using:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">$fromAI</span><span class="p">(</span><span class="dl">'</span><span class="s1">parametername</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">description</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">datatype</span><span class="dl">'</span><span class="p">,</span> <span class="dl">'</span><span class="s1">default</span><span class="dl">'</span><span class="p">)</span>
</code></pre></div></div>

<p>The parameter is created the moment the expression is evaluated. Note that each parameter can only be defined <em>once</em>. This
means one cannot use <code class="language-plaintext highlighter-rouge">fromAI</code> multiple times to access the same parameter. Instead one has to read it once, store it in an
variable and use the variable afterwards. Trying to reuse <code class="language-plaintext highlighter-rouge">$fromAI</code> directly in multiple places fails the node.</p>

<h2 id="building-the-sql-query">Building the SQL Query</h2>

<p>In the MySQL node you can enable <strong>Execute SQL</strong> and generate the query dynamically with JavaScript. This is powerful,
but it is of course also prone for SQL injection. The safe pattern is:</p>

<ul>
  <li>The agent provides only a <strong>key</strong> (for example <code class="language-plaintext highlighter-rouge">temperaturevalues</code>)</li>
  <li>The key is mapped to a <strong>hardcoded SQL identifier</strong></li>
  <li>Data is always safely escaped before being inserted into the statement (you can use the query parameter
setting and placeholders). An even better approach is the usage of prepared statements and bound parameters
(which is out of scope for this short article).</li>
</ul>

<p>The following shows an example implementation that allows the agent to access the tables mentioned above in an
arbitrary fashion. In addition it will order the results from tables that include timestamps by timestamp in
descending order:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">SELECT</span> 
<span class="p">{{</span>
  <span class="p">(()</span> <span class="o">=&gt;</span> <span class="p">{</span>
    <span class="o">//</span> <span class="k">Read</span> <span class="n">AI</span> <span class="k">parameters</span> <span class="k">only</span> <span class="n">once</span>
    <span class="n">const</span> <span class="n">t</span> <span class="o">=</span> <span class="n">String</span><span class="p">(</span><span class="err">$</span><span class="n">fromAI</span><span class="p">(</span><span class="s1">'table'</span><span class="p">,</span> <span class="s1">'The table name'</span><span class="p">,</span> <span class="s1">'string'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span> <span class="o">||</span> <span class="s1">''</span><span class="p">)</span>
      <span class="p">.</span><span class="k">trim</span><span class="p">()</span>
      <span class="p">.</span><span class="n">toLowerCase</span><span class="p">();</span>

    <span class="o">//</span> <span class="n">Whitelist</span> <span class="n">mapping</span>
    <span class="n">const</span> <span class="n">allowed</span> <span class="o">=</span> <span class="p">{</span>
      <span class="n">humiditysensors</span><span class="p">:</span> <span class="s1">'`humiditysensors`'</span><span class="p">,</span>
      <span class="n">humidityvalues</span><span class="p">:</span> <span class="s1">'`humidityvalues`'</span><span class="p">,</span>
      <span class="n">temperaturesensors</span><span class="p">:</span> <span class="s1">'`temperaturesensors`'</span><span class="p">,</span>
      <span class="n">temperaturevalues</span><span class="p">:</span> <span class="s1">'`temperaturevalues`'</span><span class="p">,</span>
    <span class="p">};</span>

    <span class="n">const</span> <span class="n">tablesWithTs</span> <span class="o">=</span> <span class="k">new</span> <span class="k">Set</span><span class="p">([</span><span class="s1">'temperaturevalues'</span><span class="p">,</span> <span class="s1">'humidityvalues'</span><span class="p">]);</span>

    <span class="n">if</span> <span class="p">(</span><span class="o">!</span><span class="n">allowed</span><span class="p">[</span><span class="n">t</span><span class="p">])</span> <span class="p">{</span>
      <span class="n">throw</span> <span class="k">new</span> <span class="n">Error</span><span class="p">(</span><span class="nv">`Disallowed or unknown table key: ${t}`</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="n">const</span> <span class="n">orderClause</span> <span class="o">=</span>
      <span class="n">tablesWithTs</span><span class="p">.</span><span class="n">has</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
        <span class="o">?</span> <span class="s1">' ORDER BY `ts` DESC'</span>
        <span class="p">:</span> <span class="s1">''</span><span class="p">;</span>

    <span class="n">const</span> <span class="n">selector</span> <span class="o">=</span>
      <span class="n">tablesWithTs</span><span class="p">.</span><span class="n">has</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
        <span class="o">?</span> <span class="s1">'*, FROM_UNIXTIME(ts) AS ts_readable'</span>
        <span class="p">:</span> <span class="s1">'*'</span><span class="p">;</span>

    <span class="k">return</span> <span class="nv">`${selector} FROM ${allowed[t]}${orderClause}`</span><span class="p">;</span>
  <span class="p">})()</span>
<span class="p">}}</span>
<span class="k">LIMIT</span> <span class="mi">50</span><span class="p">;</span>

</code></pre></div></div>

<p>What this does:</p>

<ul>
  <li>The agent provides the table <strong>only as a logical key</strong>. It never gets passed into the
statement directly.</li>
  <li>The key must exist in the whitelist (you can create the whitelist also in n8n by first
executing a <code class="language-plaintext highlighter-rouge">SHOW TABLES</code> statement and passing its result of course).</li>
  <li>Time series tables are automatically ordered by <code class="language-plaintext highlighter-rouge">ts DESC</code>.</li>
  <li>A readable timestamp (<code class="language-plaintext highlighter-rouge">ts_readable</code>) is generated using <code class="language-plaintext highlighter-rouge">FROM_UNIXTIME()</code>.</li>
</ul>

<p>This approach prevents the agent from injecting arbitrary SQL fragments.</p>

<h2 id="always-add-a-limit">Always Add a <code class="language-plaintext highlighter-rouge">LIMIT</code></h2>

<p>Never allow unlimited queries. If you do, you will regret it. Also do not make it configurable in
an unbounded fashion. Not even when it looks like it works on the first hand. If you forget the <code class="language-plaintext highlighter-rouge">LIMIT</code>,
sooner or later the agent will try to read an entire table which typically leads to:</p>

<ul>
  <li>MySQL running <em>very</em> long queries</li>
  <li><code class="language-plaintext highlighter-rouge">n8n</code> workers blocking</li>
  <li>the agent producing unuseable responses due to context thrashing or <a href="https://research.trychroma.com/context-rot">context rot</a>.</li>
  <li>you use very large context windows with an insane number of tokens. You will spot this on your bill.</li>
</ul>

<p>A safe default is something like:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LIMIT 50
</code></pre></div></div>

<p>For more advanced setups you can add pagination. Also do not forget to limit the maximum number of
iterations that your agent can loop. Your financial account will thank you.</p>

<h2 id="security-rules-you-should-actually-follow">Security Rules You Should Actually Follow</h2>

<p>Treat <em>LLM input exactly like user input</em>. This means:</p>

<ul>
  <li>Never insert raw LLM strings into SQL.</li>
  <li>Only allow whitelisted identifiers.</li>
  <li>Fail the request when validation fails.</li>
  <li>Use a least-privilege database user.</li>
</ul>

<p>LLMs are not malicious but they are <em>very</em> creative. And creativity plus databases without
guardrails tends to produce unpleasant surprises.</p>

<h2 id="useful-extensions">Useful Extensions</h2>

<p>Once the basic version works, you can extend the system safely. Typical improvements include:</p>

<ul>
  <li><strong>Filtering</strong>: In the above example allow a numeric <code class="language-plaintext highlighter-rouge">sensorid</code> parameters and parse it strictly as an integer.</li>
  <li><strong>Time window queries</strong>: Allow queries such as <code class="language-plaintext highlighter-rouge">WHERE ts BETWEEN ... AND ...</code> but enforce maximum time spans.</li>
  <li><strong>Aggregation</strong>: Support queries like <code class="language-plaintext highlighter-rouge">AVG(temp) GROUP BY hour</code> again through explicit whitelists.</li>
</ul>

<p>This allows scaling the system without turning the SQL builder into a huge block of logic.</p>

<h2 id="conclusion">Conclusion</h2>

<p>With a MySQL tool node, strict tool descriptions, <code class="language-plaintext highlighter-rouge">$fromAI</code> parameters and a whitelist-based SQL builder you can
create a flexible agent-driven database interface without exposing your database to SQL injection. Just keep in mind:</p>

<p><strong>Never trust LLM input, treat it like user input.</strong></p>]]></content><author><name>tsp</name></author><category term="Programming" /><category term="Basics" /><category term="Artificial Intelligence" /><category term="Tutorial" /><category term="Web" /><category term="Large Language Models" /><category term="n8n" /><summary type="html"><![CDATA[Using large language models together with automation platforms like n8n often requires giving the agent controlled access to structured data. Simply allowing an LLM to generate arbitrary SQL queries is dangerous, but overly restrictive configurations quickly make database tools useless. This article presents a practical pattern that allows flexible database queries while still enforcing strict safety boundaries. The approach combines n8ns MySQL tool node, fromAI parameters and a whitelist based SQL builder to prevent SQL injection while still allowing agents to explore a measurement database intelligently. With a few simple rules - least-privilege database users, identifier whitelists and mandatory query limits - one can safely expose structured data to an AI agent without risking ones database.]]></summary></entry><entry><title type="html">The Many Faces of Coherence in Physics (and Beyond)</title><link href="https://www.tspi.at/2026/01/19/coherence.html" rel="alternate" type="text/html" title="The Many Faces of Coherence in Physics (and Beyond)" /><published>2026-01-19T00:00:00+01:00</published><updated>2026-02-15T17:29:21+01:00</updated><id>https://www.tspi.at/2026/01/19/coherence</id><content type="html" xml:base="https://www.tspi.at/2026/01/19/coherence.html"><![CDATA[<div style="text-align: center; width: 100%;">
    <iframe width="560" height="315" src="https://www.youtube.com/embed/7bUIEMtzJC0?si=tVQBrATEuBtj7WzH" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen=""></iframe>
</div>

<p>The term coherence has multiple meanings across physics and philosophy, all centered on an underlying idea of parts <em>sticking together</em> or <em>acting in unison</em>. In everyday language and philosophy, coherence usually means <em>logical consistency and intelligibility</em> - a coherent argument is one whose parts fit together without contradiction. In physics, coherence more specifically describes <em>correlated behavior of waves</em> or <em>quantum states</em>. For example, coherent waves are in phase with each other (maintaining a fixed phase relationship), and quantum coherence refers to the definite phase relationships in a <em>quantum superposition</em>. Despite the varied contexts, these meanings share the notion of a unified, orderly relationship among components (whether phases of waves, quantum amplitudes, or propositions in an argument). In the sections below, I survey the diverse meanings of <em>“coherence”</em>, focusing primarily on physics (classical waves and quantum mechanics) and touching on philosophical usage.</p>

<ul>
  <li><a href="#historical-evolution-of-the-concept">Historical Evolution of the Concept</a></li>
  <li><a href="#coherence-in-classical-wave-physics">Coherence in Classical Wave Physics</a></li>
  <li><a href="#quantum-coherence-and-superposition">Quantum Coherence and Superposition</a>
    <ul>
      <li><a href="#coherent-states-in-quantum-mechanics">Coherent States in Quantum Mechanics</a></li>
      <li><a href="#quantum-decoherence">Quantum Decoherence</a></li>
      <li><a href="#coherent-control-and-manipulation-in-quantum-systems">Coherent Control and Manipulation in Quantum Systems</a></li>
    </ul>
  </li>
  <li><a href="#coherence-in-philosophy-and-logic">Coherence in Philosophy and Logic</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
</ul>

<p><img src="/assets/images/png/coherence001.png" alt="" /></p>

<h2 id="historical-evolution-of-the-concept">Historical Evolution of the Concept</h2>

<p>The concept of coherence in physics emerged from 19th century studies of wave interference. Thomas Youngs famous <em>double-slit experiment</em> back in 1801 implicitly required coherent light - using a single light source split into two paths - to produce stable interference fringes. Young and other physicists at that time recognized, that two independent light sources - for example the sun and a lamp - generally do not form visible interference because they lack fixed phase relations. In 1819, Fresnel and Arago formulated laws of interference, effectively noting conditions under which light waves cohere (e.g. same frequency and polarization) to produce fringes. By the late 19th century, techniques like Michelsons interferometry further quantified coherence: observers noticed that white light interference disappeared beyond a certain path difference, hinting at a finite coherence length.</p>

<p>In the early 20th century, coherence was formalized in statistical optics. Pioneering work by Fritz Zernike in 1938 introduced the degree of coherence as a quantitative measure by measuring the fringe visibility between two points in a wavefield. Zernikes work and the van Cittert–Zernike theorem showed how a sources size and spectral bandwidth determine partial coherence of light. The invention of the laser in 1960 provided a source of nearly fully coherent, highly monochromatic and phase-stable light, which revolutionized optics and validated these theories. Llaser light can have coherence lengths of kilometers, whereas sunlight’s coherence length is only a few microns.</p>

<p>The quantum era brought new facets to coherence. In 1963, Roy Glauber developed the quantum theory of optical coherence, introducing coherent states of the electromagnetic field and correlation functions to describe photon statistics. Glaubers work - awarded with a Nobel Prize 2005 - established how classical coherence concepts extend to quantum light. Meanwhile, physicists like Erich Joos, Dieter Zeh and Wojciech Zurek in the 1970s - 1990s studied quantum decoherence - how interactions with the environment destroy coherence and make quantum systems appear classical. By the 21st century, quantum coherence became recognized as a <em>resource</em> for technologies like quantum computing, requiring careful preservation and manipulation.</p>

<p>In <em>philosophy</em>, on the other hand, the notion of coherence has an older pedigree in theories of truth and knowledge. <em>Coherence</em> as a criterion of truth was advanced by 19th-20th century idealist philosophers (e.g. Hegel, Bradley) and later formalized as the coherence theory of truth. According to this view, a proposition is true if it coheres (i.e. is consistent or entailed by) a set of other accepted propositions. Early versions simply equated coherence with logical consistency, though more refined versions involve mutual explanatory support. Thus, the idea of <em>“coherence”</em> as internal consistency in a system of ideas has a long history in epistemology alongside its development in physics.</p>

<h2 id="coherence-in-classical-wave-physics">Coherence in Classical Wave Physics</h2>

<p>In classical optics and wave physics, coherence describes the ability of waves to exhibit stable interference due to fixed phase relationships. Optical coherence specifically refers to the capacity of a light wave (or two waves) to produce an interference pattern of alternating constructive and destructive fringes. If two light beams show no interference (no stable bright/dark pattern), they are said to be incoherent with each other; if they produce clear, high-contrast fringes (including complete destructive cancellation at some points), they are fully coherent. Intermediate cases (partial fringe visibility) indicate partial coherence.</p>

<p>In Physics one distinguishes two aspects of coherence for waves:</p>

<ul>
  <li><strong>Temporal Coherence</strong>: Correlation of a wave with itself at different times. This relates to the waves monochromaticity (single-frequency purity). A perfectly monochromatic (single-frequency) wave has infinite temporal coherence - its phase is predictable for any time shift, yielding interference even between widely separated time samples. In practice, real sources have finite <em>bandwidth</em>, so the phase drifts over time. The coherence time $\Delta \tau_c$ is defined as the maximum time interval over which the waves phase is predictable or correlated with itself. Equivalently, it is the time over which the field <em>“looks”</em> approximately sinusoidal with a stable phase. Beyond $\tau_c$, phase relationships effectively randomize and interference visibility drops to zero. The coherence length $L_c = c \Delta \tau_c$ is the propagation distance corresponding to the coherence time. For example, a stabilized laser with extremely narrow linewidth might have $\tau_c \sim 10^{-4}$ s and $L_c$ on the order of $30$ km, whereas broadband sunlight has $\tau_c$ on the order of $10^{-12}$ s and $L_c$ of a few micrometers. Temporal coherence is what a Michelson interferometer measures by varying path delay: fringes contrast diminishes as the delay exceeds the source’s coherence time. In summary, temporal coherence quantifies how well a wave maintains a stable phase over time – the narrower the spectrum (smaller bandwidth), the longer the coherence time.</li>
  <li><strong>Spatial Coherence</strong>: Correlation of a waves phase at different points in space across an extended wavefront or between separate beams. This is related to the spatial uniformity of phase across the wave. Spatial coherence determines whether two points on a wavefront can form interference fringes when combined. A wave is spatially coherent across a region if any two points within that region emit waves with a fixed phase relation. Youngs double-slit experiment is a test of spatial coherence: if the two pinholes (separated by some distance $d$) are illuminated by a source, clear interference fringes appear on a screen only if $d$ is within the sources transverse coherence length. If the pinholes are too far apart, each sees a different phase of the source at a given time, and the fringe visibility degrades to nothing. The coherence area is defined as the area over which the light field is spatially coherent (e.g. for filtered sunlight it might be ~$4\times10^{-3}$ mm$^2$, meaning pinholes must lie within a ~0.06 mm distance to observe interference). In general, smaller or distant sources (like a distant star approximating a point source) have higher spatial coherence than extended sources (like the sun’s disk). The van Cittert–Zernike theorem provides a quantitative link: the spatial coherence function across a plane is essentially the Fourier transform of the sources intensity distribution - a large angular source yields low spatial coherence, and a point-like source yields high coherence.</li>
</ul>

<p>In practice most light fields are neither fully coherent nor fully incoherent, but somewhere in between. The <strong>degree of coherence</strong> (first-order coherence) can be quantified by a complex coherence function or correlation coefficient $\gamma_{12}$ between two points or times. This is essentially the normalized cross-correlation of the wave’s electric field at the two points/times. If $\mid\gamma_{12}\mid = 1$, the fields are perfectly coherent (phase of one completely predicts the phase of the other) and interference contrast is maximal. If $\mid\gamma_{12}\mid = 0$, they are completely uncorrelated (incoherent) and no clear interference appears. Partially coherent light yields an intermediate $0&lt;\mid\gamma\mid&lt;1$, producing fringes of reduced contrast (washed-out interference). In summary, coherence in classical waves refers to the presence of stable correlations (in phase and amplitude) either over time or across space, enabling observable interference effects. Techniques like holography, interferometry, and optical coherence tomography all rely on manipulating and measuring these coherence properties of light.</p>

<p>In other wave phenomena beyond optics, coherence has analogous meanings. For example, in acoustics two sound waves are coherent if they have a constant phase difference; in radio communications, a coherent receiver maintains a reference phase to interfere the incoming signal. Coherence is a unifying wave concept signifying the presence of an underlying order or correlation in the wavefield.</p>

<h2 id="quantum-coherence-and-superposition">Quantum Coherence and Superposition</h2>

<p>In quantum physics, coherence refers to the existence of definite phase relationships between quantum states. A quantum state that is a superposition of two or more basis states is coherent if the relative phases are well-defined and stable, allowing for interference effects at the quantum level. In contrast, if those phase relationships are randomized or unknown (as in a statistical mixture), the superposition is incoherent. Quantum coherence is what differentiates a pure quantum superposition from a mere classical probabilistic mixture.</p>

<p>For example, an electron in a superposition $\frac{1}{\sqrt{2}}(\mid\uparrow&gt; + e^{i\phi}\mid\downarrow&gt;)$ has coherence between the spin-up and spin-down components - the phase $e^{i\phi}$ will lead to interference effects in experiments. If that phase is completely random or the state is an incoherent mixture $\tfrac{1}{2}(\mid\uparrow&gt;\langle\uparrow\mid + \mid\downarrow&gt;\langle\downarrow\mid)$, no single-particle interference can occur between the $\mid\uparrow&gt;$ and $\mid\downarrow&gt;$ outcomes. Thus, quantum coherence is essential for phenomena like single-particle interference (an electron or photon interfering with itself in a double-slit experiment) and is a prerequisite for <em>entangled correlations</em> between particles.</p>

<p>Formally, quantum coherence can be defined in terms of the density matrix of a system. In a chosen reference basis, the off-diagonal elements of the density matrix measure the coherence between the corresponding basis states. An incoherent state is one whose density matrix is diagonal in the reference basis (no superposition terms). Any state with nonzero off-diagonal entries is a coherent superposition in that basis. It possesses quantum coherence as a resource. For example, if we take the computational basis ${\mid 0&gt;,\mid 1&gt;}$ for a qubit, an incoherent state would be of the form $\rho = p\mid 0&gt;\langle 0\mid + (1-p)\mid 1&gt;\langle 1\mid$ (diagonal), whereas a state like $\mid\psi&gt; = \alpha\mid 0&gt; + \beta\mid 1&gt;$ has off-diagonal terms $\alpha\beta^*$ in its density matrix and hence is coherent. The magnitude of those off-diagonals relates to the visibility of interference one could observe between the states $\mid 0&gt;$ and $\mid 1&gt;$.</p>

<p>In modern quantum information science, coherence is treated as a <strong>quantifiable resource</strong> (much like entanglement). There are measures, coherence monotones, that assign a number to how much coherence a given state has relative to a specified basis. Intuitively, this corresponds to how well the state can produce interference or be used in quantum algorithms. Coherence is <em>“consumed”</em> or <em>degraded</em> by interactions that cause decoherence and it can be partially converted into other quantum resources like entanglement under the right operations. Quantum computing relies on maintaining coherence in qubits throughout computational gate operations. The superposition of $\mid 0&gt;$ and $\mid 1&gt;$ in each qubit (and across multiple qubits) must remain coherent long enough to perform interference-based algorithms. If qubits lose coherence too quickly, quantum computation reverts to classical outcomes.</p>

<p>At a fundamental level, quantum coherence underlies phenomena like quantum interference (e.g. electron diffraction patterns require the electron wavefunction to remain coherent across the paths) and is linked to entanglement. Entanglement can be viewed as a kind of <em>coherence between subsystems</em>: an entangled pair of particles has no local coherence (each reduced state may be mixed) but has joint coherence in the global state. In summary, quantum coherence captures the <em>wavelike</em> aspect of quantum states - the ability of probability amplitudes to superpose and interfere.</p>

<h3 id="coherent-states-in-quantum-mechanics">Coherent States in Quantum Mechanics</h3>

<p>In quantum mechanics, the phrase <em>coherent state</em> has a specific technical meaning beyond just <em>“state with coherence”</em>. Coherent states typically refer to a special set of quantum states of a harmonic oscillator or fields, that most closely resemble classical oscillations. The canonical example is the Glauber–Sudarshan coherent state of the electromagnetic field - the quantum state of light that a stabilized laser outputs.</p>

<p>Mathematically, a coherent state $\mid \alpha&gt;$ (for a harmonic oscillator or single mode of the field) is defined as the eigenstate of the annihilation (lowering) operator $\hat{a}$.
These $\mid \alpha&gt;$ states form an overcomplete, non-orthogonal basis of the oscillators Hilbert space. They are often written as displaced vacuum states and have the minimum uncertainty allowed by quantum mechanics (equal uncertainties in $x$ and $p$), which is why they are sometimes called <em>“minimum uncertainty wavepackets”</em>. A coherent state exhibits Poissonian number statistics and, in many respects, behaves like a classical sinusoidal oscillation with amplitude $\mid \alpha\mid$ and phase $\arg(\alpha)$. For instance, the electric field expectation oscillates in time as a classical field would, and the probability of finding $n$ photons in $\mid \alpha&gt;$ is $P(n)=e^{-\mid\alpha\mid^2}\mid\alpha\mid^{2n}/n!$ (a Poisson distribution).</p>

<p>$
\begin{aligned}
 \hat{a} \mid\alpha&gt; &amp;= \alpha \mid\alpha&gt;
\end{aligned}
$</p>

<p>Historically, coherent states were studied by Schrödinger (1926) as Gaussian wavepackets that remain localized in a harmonic potential without spreading. In the 1960s, Glauber and Sudarshan formally introduced them in the context of quantum optics to describe the output of a laser and to define what <em>classical light</em> means quantum-mechanically. In optical coherence theory, these coherent states are considered the most classical states of the field - any state that is a statistical mixture of coherent states is regarded as a classical light field, whereas states that cannot be expressed as such a mixture (e.g. squeezed states, Fock states) are nonclassical. In fact, one can define optical nonclassicality as the presence of quantum coherence that cannot be accounted for by a random classical field. Coherent states thus straddle the line between quantum and classical: they are quantum states with maximal coherence (in the sense that they saturate certain coherence measures) but they produce dynamics and statistics reminiscent of classical waves.</p>

<p>Coherent states play a role in coherent state quantization and path integrals - they provide a convenient basis to represent quantum dynamics (especially in quantum optics and many-body theory). In summary, a coherent state in quantum mechanics is a specific type of quantum state (especially of oscillators/fields) characterized by classical-like behavior and defined by eigenstate relations like $\hat{a}\mid\alpha&gt; = \alpha\mid\alpha&gt;$. It should not be confused with the broader notion of <em>“a state having coherence”</em>; rather, it is a term for these minimum-uncertainty wavepackets that remain as coherent as possible.</p>

<h3 id="quantum-decoherence">Quantum Decoherence</h3>

<p>Quantum decoherence is the process by which quantum coherence is lost - or apparently lost - due to a systems interaction with its environment. When a quantum system is not perfectly isolated, the phase information that defines its coherent superposition can become entangled with environmental degrees of freedom. From the perspective of the system alone, the coherent superposition then appears to collapse into a mixture, as the relative phases are no longer observable (having <em>“leaked”</em> into the environment). Decoherence is the mechanism that explains how classical behavior emerges from quantum systems: it suppresses interference between a systems quantum states by irreversibly correlating those states with different states of the environment.</p>

<p><em>“Quantum decoherence is the loss of quantum coherence”</em>, typically through <strong>loss of information from the system to the environment</strong>. During decoherence, no fundamental wavefunction collapse is assumed to occur; rather, the system-plus-environment evolves unitarily, but the system’s reduced state transitions from pure to mixed. The off-diagonal elements of the system’s density matrix decay towards zero as coherence is delocalized into the environment. A well-known analogy is <em>friction</em>: just as mechanical energy dissipates into environmental heat, quantum phase information dissipates into environmental degrees of freedom. The coherence is <em>not destroyed</em> per se but becomes <em>inaccessible</em> - stored as correlations between the system and environment that are practically irretrievable.</p>

<p><strong>Decoherence theory</strong>, developed by Zeh, Zurek and others, provides a resolution to the quantum measurement paradox in that it explains the apparent collapse of the wavefunction. For example, a state describing Schrödingers cat $\frac{1}{\sqrt{2}}(\mid\text{alive}&gt; + \mid\text{dead}&gt;)$ rapidly decoheres due to environmental interactions (air molecules, photons, etc.) that <em>measure</em> the cat’s state. The environment <em>gains information</em> about which branch (alive or dead) occurred, and interference between the branches becomes unobservable - the cat is effectively in a classical probabilistic mixture from any local standpoint. Decoherence does not produce a true wavefunction collapse on its own, but it makes interference between macroscopically distinct states vanish extraordinarily quickly, yielding the appearance of a definite outcome.</p>

<p>Quantitatively, decoherence can be characterized - analogous to coherence time - by a <em>decoherence time</em> over which off-diagonals decay. This can be exceedingly short for macroscopic differences - e.g. a dust particle can decohere in $10^{-20}$ seconds when hit by air molecules. Decoherence ties into the concept of <em>pointer states</em>: the basis in which the system remains robust (diagonal) under environmental interaction. Those pointer states (like <em>alive</em> or <em>dead</em> cat) are the classical-like states that do not themselves get blurred by interference <strong>because the environment continually monitors them</strong>.</p>

<p>For emerging quantum technologies, decoherence is a critical enemy. Qubits in a quantum computer must be isolated from noise sources because any unmonitored interaction (thermal fluctuations, stray fields, etc.) can entangle with the qubit and collapse superpositions, thereby ruining quantum computations. Quantum error-correcting codes and decoherence-free subspaces are being developed to counteract this.</p>

<blockquote>
  <p>Decoherence is the process that destroys coherence: a coherent quantum superposition evolves into an incoherent mixture when the system’s phase information leaks into the environment. It bridges quantum and classical physics by explaining why large, open systems don’t usually display quantum interference, and it highlights why maintaining coherence (isolation or error correction) is essential in quantum experimentation and technology.</p>
</blockquote>

<h3 id="coherent-control-and-manipulation-in-quantum-systems">Coherent Control and Manipulation in Quantum Systems</h3>

<p>The term coherent is also used in the context of <em>controlling</em> physical systems, especially in quantum mechanics, to imply control that <em>preserves phase relations</em>. <strong>Coherent manipulation</strong> (or <strong>coherent control</strong>) refers to using interactions (like laser pulses, microwave fields, etc.) to steer a quantum systems state in a deterministic, phase-preserving way. The system’s evolution is unitary and maintains quantum coherence throughout the process, as opposed to <em>incoherent</em> processes (like measurements or random thermal kicks) that induce decoherence.</p>

<p>For example, <a href="/2026/01/11/nmrexp01.html">one can apply a sequence of precisely timed magnetic resonance pulses to a spin qubit system</a> to rotate the spin state on the <a href="/2024/02/16/electricdipole01.html">Bloch sphere</a>. If these operations are done <em>faster</em> than the <em>decoherence time</em> and with precise <em>phase reference</em>, the spin is undergoing coherent manipulation. The ability to perform such operations is a prerequisite to any scalable quantum information platform - you must be able to apply coherent control to qubits in order to do quantum gates without losing the information to decoherence. Coherent manipulation typically implies that the system remains in a <em>pure state</em> during the operation.</p>

<p>In a broader sense, coherent control is a subfield of physics and chemistry where interference of quantum amplitudes is used to direct outcomes. For instance, in photochemistry, carefully shaped laser pulses have been used to coherently control <em>reaction pathways</em> by interfering different excitation pathways. The coherence here implies that the driving fields and the systems response maintain a fixed phase relationship, producing constructive or destructive interference in the quantum transition amplitudes to favor a desired product.</p>

<p>A classic example is coherent population trapping or <em>STIRAP</em> (STImulated Raman Adiabatic Passage) in atomic physics, where two coherent lasers create interference that drives an atom from state $\mid A&gt;$ to $\mid B&gt;$ <em>without populating an intermediate state</em> $\mid C&gt;$. The success of this technique relies on maintaining phase coherence between the two laser fields and the atomic polarization. If the lasers were not phase-coherent with each other, the interference would average out and the coherent transfer would fail.</p>

<blockquote>
  <p>Describing an experimental technique as coherent (coherent manipulation, coherent spectroscopy, etc.) implies that phase coherence is preserved throughout the process. The systems evolution is phase-correlated with the driving fields or between its own states. This is crucial in quantum computing (for executing logic gates on superpositions), in quantum optics (such as creating entangled photons via coherent pump processes), and in quantum sensing (where a phase-coherent superposition interacts with a field and accumulates a measurable phase shift). A large effort in modern physics is to extend the duration of coherent control (increase coherence times) by improving isolation, material purity, and using dynamical decoupling or error correction.</p>
</blockquote>

<h2 id="coherence-in-philosophy-and-logic">Coherence in Philosophy and Logic</h2>

<p>Outside of physics, coherence generally refers to a property of statements, beliefs, or arguments - essentially, a logical and orderly consistency. A coherent statement or theory is one whose parts are logically connected and free of contradictions, so that the whole <em>sticks together</em> conceptually. For example, we might say a scientific theory is coherent if its various hypotheses and observations support each other and form a unified explanation. In everyday terms, someone <em>“speaking coherently”</em> is expressing their thoughts clearly and consistently.</p>

<p>In epistemology and the theory of truth, coherence plays a prominent role through the coherence theory of truth and coherentism. The coherence theory of truth holds that the truth of a proposition consists in its coherence with a specified set of other propositions. In other words, a new claim is true if it fits harmoniously into a larger body of beliefs without contradiction and with mutual support. This is in contrast to the correspondence theory of truth, which says truth consists in correspondence to objective facts. Coherentist philosophers argue that an isolated proposition cannot be judged <em>true</em> or <em>false</em> except by seeing whether it coheres with an entire system of propositions believed to be true. Early versions of coherentism equated coherence with mere logical consistency, but more developed versions require a stronger entailment or explanatory relation among beliefs. For instance, one belief coheres with others not just by avoiding contradiction, but by perhaps being entailed by them or contributing to their overall support.</p>

<p>Coherentism in epistemology is the view that justification of beliefs lies in their coherence with all other beliefs one holds, rather than in some foundational <em>self-evident</em> truths. In this view, our knowledge is like a web where each strand supports the others, and the entire network is coherent if it contains no contradictions and is <em>mutually reinforcing</em>. A perfectly coherent belief system would be one where every belief is consistent with every other and perhaps each is derivable from the whole. While perfect coherence is an ideal, coherence theories allow that truth and justification come in degrees - a belief can be more or less coherent with the rest, and accordingly more or less justified or likely true.</p>

<p>To illustrate, consider a detective assembling a story of what happened at a crime scene. A coherent explanation is one where all pieces of evidence fit into a single consistent timeline and causation, with no inexplicable gaps or contradictions. If a new piece of evidence doesn’t contradict but instead is predicted by or entailed by the theory, it increases the coherence of the theory, and thus increases our confidence in its truth. If the evidence creates a contradiction, the theory becomes <em>incoherent</em> and must be revised or rejected.</p>

<blockquote>
  <p>In the realm of ideas, coherence means internal consistency and logical connectivity. A coherent argument has premises that support the conclusion in an organized way. A coherent policy plan is one where the measures align with each other and with the overall goals. The common thread is that nothing <em>sticks out</em> as inconsistent or unrelated - all parts contribute to a unified whole. This everyday/philosophical notion of coherence, while conceptually distinct from physical coherence, metaphorically resonates with the physics usage: just as coherent waves march in step to produce a clear signal, coherent thoughts align together to produce clear understanding.</p>
</blockquote>

<h2 id="conclusion">Conclusion</h2>

<p>Across physics and philosophy, <em>coherence</em> signifies a kind of unity or correlation that makes the behavior of a system (be it waves, particles, or ideas) orderly and intelligible. In physics, coherence underlies the patterns of interference and the delicate power of quantum superpositions - it marks the difference between random, uncorrelated phenomena and ones that act with a fixed relationship. In quantum systems, maintaining coherence is essential for harnessing quantum effects before decoherence sets in. In philosophy, coherence is the glue of rationality, binding beliefs and assertions into a consistent worldview. The many faces of coherence all reflect the original Latin root <em>cohaerere</em>, <em>“to stick together”</em>.</p>

<div style="text-align: center; width: 100%;"><video controls="" height="450" src="/assets/videos/mp4/novelvideo_doubleslit.mp4" type="video/mp4" width="600"></video></div>]]></content><author><name>tsp</name></author><category term="Physics" /><category term="Quantum mechanics" /><category term="Quantum optics" /><category term="Electrodynamics" /><category term="Philosophy" /><category term="Basics" /><summary type="html"><![CDATA[Coherence is one of those words that quietly carries very different meanings depending on context - logical consistency in philosophy, phase stability in optics, superposition in quantum mechanics, and fragile order in the face of decoherence. This article traces the concept across its historical roots and modern usage, starting from classical wave interference and moving through lasers, statistical optics, quantum superposition, coherent states, and decoherence theory. Along the way, it clarifies what physicists actually mean when they speak of temporal coherence, spatial coherence, and quantum coherence as a physical resource. Rather than collapsing these meanings into a single definition, the article shows how they are related by analogy rather than identity. In physics, coherence marks the presence of stable correlations that enable interference and control; in quantum systems, it is the delicate ingredient that makes superposition, entanglement, and quantum computation possible before the environment washes it away. By briefly contrasting this with philosophical notions of coherence as internal consistency, the article highlights why the term is so powerful - and why its meaning may differ when it is used across disciplines.]]></summary></entry><entry><title type="html">Schmitt Trigger - OpAmp based switches exhibiting hysteresis</title><link href="https://www.tspi.at/2026/01/12/schmitttrigger.html" rel="alternate" type="text/html" title="Schmitt Trigger - OpAmp based switches exhibiting hysteresis" /><published>2026-01-12T00:00:00+01:00</published><updated>2026-01-12T06:48:32+01:00</updated><id>https://www.tspi.at/2026/01/12/schmitttrigger</id><content type="html" xml:base="https://www.tspi.at/2026/01/12/schmitttrigger.html"><![CDATA[<p>A Schmitt trigger is a <em>comparator</em> with intentional <strong>positive feedback</strong>. Instead of a single switching threshold, it has two: one for rising inputs and one for falling inputs. This creates <em>hysteresis</em>. This is often needed when a signal is noisy or slowly varying. Without the hysteresis the output can chatter rapidly around a single threshold and cause digital inputs to flap.</p>

<p>In this post we derive the model behaviour of the two common operational amplifier (OpAmp) configurations (inverting and non-inverting) using the idealized amplifier model. We assume infinite input impedance of the operational amplifier, i.e. zero bias currents, as well as an infinite open loop gain. The amplifier is assumed to swing to it’s saturation voltage - which is assumed to be the rail voltage in this example. For a practical circuit you will have to check which output levels you really need. The output will only change between the upper and lower saturation voltage. In addition you will have to verify if the input bias current is negligible.</p>

<p>In this blog article we are going to look into:</p>

<ul>
  <li>The <a href="#inverting-schmitt-trigger">inverting Schmitt trigger</a> and it’s characteristics</li>
  <li>And the <a href="#non-inverting-schmitt-trigger">non inverting Schmitt trigger</a></li>
  <li>A short <a href="#conclusion">conclusion</a></li>
</ul>

<h2 id="inverting-schmitt-trigger">Inverting Schmitt Trigger</h2>

<p><img src="/assets/images/png/opamp/schmitt_inverting001.png" alt="Inverting Schmitt Trigger Circuit" /></p>

<p>First let’s recall the output of an optimal operational amplifier. The open loop gain is assumed to be $G \to \infty$. This means that the operational amplifier will swing to it’s maximum or minimum output voltage (i.e. close to the supply rail) depending on the inbalance between the inverting input $U_{-}$ and the non inverting input $U_{+}$:</p>

[
\begin{aligned}
U_{out} &= \begin{cases}
U_{vcc} & \text{if } U_{+} \gt U_{-} \\
-U_{vcc}  & \text{if } U_{+} \lt U_{-} \\
\end{cases}
\end{aligned}
]

<p>The voltage on the inverting input is given by the input voltage $U_{-} = U_{in}$. The voltage on the non inverting input can be described via a simple voltage divider between the output voltage $U_{out}$ and the bias voltage $U_{b}$:</p>

[
\begin{aligned}
I_{g} &= \frac{U_{out} - U_b}{R_1 + R_2} \\
U_{+} &= R_2 I_g + U_b \\
 &= \frac{R_2}{R_1 + R_2} (U_{out} - U_b) + U_b \\
 &= \frac{R_2}{R_1 + R_2} U_{out} + \left( 1 - \frac{R_2}{R_1 + R_2} \right) U_b
\end{aligned}
]

<p>Taking a look at the two possible output cases one can determine the switchpoints of the circuit.</p>

<p>In the <strong>first case</strong> case $U_{out} = U_{vcc}$ one gets:</p>

[
\begin{aligned}
U_{out} &= U_{vcc} \\
U_{+} &= \frac{R_2}{R_1 + R_2} U_{vcc} + U_b \left( 1 - \frac{R_2}{R_1 + R_2} \right) = U_A
\end{aligned}
]

<p>Switching happens when $U_{in} &gt; U_A$.</p>

<p>In the <strong>second case</strong> $U_{out} = -U_{vcc}$:</p>

[
\begin{aligned}
U_{out} &= -U_{vcc} \\
U_{+} &= -\frac{R_2}{R_1 + R_2} U_{vcc} + U_b \left( 1 - \frac{R_2}{R_1 + R_2} \right) = U_B
\end{aligned}
]

<p><img src="/assets/images/png/opamp/schmitt_inverting002.png" alt="Typical behaviour of an inverting Schmitt trigger" /></p>

<p>Under assumption of a symmetric supply voltage a look at the width of the hysteresis curve yields</p>

[
\begin{aligned}
\Delta U &= (U_A - U_B) \\
 &= 2 \frac{R_2}{R_1 + R_2} U_{vcc}
\end{aligned}
]

<p>The center of the hysteresis is given by $U_b$. As one can see the width only depends on the resistor values, not on the bias voltage.</p>

<h2 id="non-inverting-schmitt-trigger">Non Inverting Schmitt Trigger</h2>

<p><img src="/assets/images/png/opamp/schmitt_noninverting001.png" alt="Non Inverting Schmitt Trigger" /></p>

<p>The output of the operational amplifier is again determined by the inbalance between the inverting input $U_{-}$ and the non inverting input $U_{+}$. Again assuming infinite open loop gain $G \to \infty$ the amplifier simply swings to it’s maximum or minimum output voltage:</p>

[
\begin{aligned}
U_{out} &= \begin{cases}
U_{vcc} & \text{if } U_{+} \gt U_{-} \\
-U_{vcc}  & \text{if } U_{+} \lt U_{-} \\
\end{cases}
\end{aligned}
]

<p>Applying the voltage divider like in the inverting case again:</p>

[
\begin{aligned}
I_{g} &= \frac{U_{out} - U_{in}}{R_1 + R_2} \\
U_{+} &= R_2 I_{g} + U_{in} \\
 &= \frac{R_2}{R_1 + R_2} U_{out} + \left(1 - \frac{R_2}{R_1 + R_2} \right) U_{in}
\end{aligned}
]

<p>The switch points between the two states are now a bit more complex than in the inverting case due to the contribution of the input signal.</p>

<p>In the <strong>first case</strong> $U_{out} = U_{vcc}$ we get</p>

[
\begin{aligned}
U_{out} &= U_{vcc} \\
U_{+} &= \frac{R_2}{R_1 + R_2} U_{vcc} + \left(1 - \frac{R_2}{R_1 + R_2} \right) U_{in}
\end{aligned}
]

<p>The switch happens when $U_{+} \lt U_b$:</p>

[
\begin{aligned}
U_{+} &\lt U_b \\
\frac{R_2}{R_1 + R_2} U_{vcc} + \left(1 - \frac{R_2}{R_1 + R_2} \right) U_{in} &\lt U_b \\
U_{in} &\lt \underbrace{\frac{U_b - \frac{R_2}{R_1 + R_2} U_{vcc}}{\left(1 - \frac{R_2}{R_1 + R_2} \right)}}_{U_A}
\end{aligned}
]

<p>In the <strong>second case</strong> $U_{out} = -U_{vcc}$ we get</p>

[
\begin{aligned}
U_{out} &= -U_{Vcc} \\
U_{+} &= -\frac{R_2}{R_1 + R_2} U_{Vcc} + \left(1 - \frac{R_2}{R_1 + R_2} \right) U_{in}
\end{aligned}
]

<p>The switch happens when $U_{+} \gt U_b$:</p>

[
\begin{aligned}
-\frac{R_2}{R_1 + R_2} U_{Vcc} + \left(1 - \frac{R_2}{R_1 + R_2} \right) U_{in} &\gt U_b \\
U_{in} &\gt \underbrace{\frac{U_b + \frac{R_2}{R_1 + R_2} U_{vcc}}{\left(1 - \frac{R_2}{R_1 + R_2} \right)}}_{U_B}
\end{aligned}
]

<p>Again we get the characteristic shape exhibiting the hysteresis as we have already seen from the inverting Schmitt Trigger.</p>

<p><img src="/assets/images/png/opamp/schmitt_noninverting002.png" alt="Typical behaviour of the non inverting Schmitt trigger" /></p>

<p>Under assumption of a symmetric supply voltage a look at the width of the hysteresis curve yields</p>

[
\begin{aligned}
\Delta U &= (U_A - U_B) \\
 &= 2 \frac{R_2}{R_1} U_{vcc}
\end{aligned}
]

<p>The center of the hysteresis is given by $U_b$. As one can see the width again only depends on the resistor values, not on the bias voltage.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Both Schmitt trigger configurations implement the same idea. The positive feedback creates two distinct switching points. In the <strong>inverting configuration</strong> the input goes straight into an op-amp input, which in the ideal case is assumed to have infinite input impedance. Practically, the input current is the bias current. There is no intentional resistive path from output to the input source and thus no direct back-action of the circuit to the input.</p>

<p>In the <strong>non inverting configuration</strong> the input is connected through a resistive divider into a summing node that is also connected to the output. This means the output can source or sink current through the resistors causing a <em>back action</em> of the circuit onto the input. The input impedance seen by the source is finite and depends on the present output state.</p>

<p>For practical applications keep in mind that dedicated components implementing Schmitt triggers exist. These are usually faster than OpAmp based circuits and contain polysilicon resistors already on the chip die, which makes the circuit simpler and more compact.</p>]]></content><author><name>tsp</name></author><category term="Tutorial" /><category term="Electronics" /><category term="Basics" /><category term="OpAmp" /><summary type="html"><![CDATA[A Schmitt trigger is a comparator with positive feedback, yielding a well-defined hysteresis window. This makes it an essential building block whenever slow, noisy, or ambiguous signals must be converted into clean digital transitions. In this article, both the inverting and non-inverting Schmitt trigger configurations are derived step by step using an ideal operational amplifier model, making the underlying mechanism and assumptions transparent. Starting from first principles, the switching thresholds and hysteresis widths are calculated explicitly for both circuit topologies. The analysis shows how the feedback resistor ratios and supply voltage determine the threshold spacing and how the bias voltage sets the center of the hysteresis. The article concludes with a comparison of both configurations, highlighting their different input impedances and back-action characteristics, and places op-amp-based Schmitt triggers in context with dedicated comparator solutions used in practical designs.]]></summary></entry></feed>