The Efficient Lie: Minimum Description Length as a Measure of Narrative Engineering

ai complexity-theory model-selection narrative-efficiency legal-analysis

When a party tells the truth, they are not choosing between competing explanations — there is simply one reality to describe. The parsimony of truth is not merely ethical but computational: a single, consistent account can be communicated efficiently because every statement constrains and reinforces every other. When a party fabricates, they must continually improvise new explanations to cover inconsistencies, each addition increasing the complexity of the overall story. Minimum Description Length — a formal framework from algorithmic information theory that combines Kolmogorov complexity with model selection — provides the mathematical apparatus for measuring exactly this asymmetry between parsimonious truth and engineered narrative.

The Mathematical Foundation

Minimum Description Length formalises the intuition that the best explanation for data is the one that minimises the total length of the description needed to communicate both the explanation and the data it accounts for. Given a corpus of communications, MDL asks: what is the shortest possible description of this data, and what does the structure of that description reveal about how the data was generated?

The MDL principle distinguishes between two components in any description: the model that explains the data, and the residual information that the model does not capture. A simple model — say, a single set of core facts — can describe a large corpus if the corpus is generated from those facts with only minor variations. A complex model — a sprawling narrative with multiple contradictory threads — requires more bits to specify, even before accounting for the residuals.

Formally, MDL measures the description length in bits, derived from the probabilistic codelengths of the model and the data given that model. The optimal trade-off between model complexity and fit to data is the one that minimises total description length. In communication analysis, this translates to a direct comparison: how complex must our model of this party's knowledge be to explain their communications, and does that complexity grow disproportionately over time?

Application to Narrative Analysis

The MDL approach differs from raw Kolmogorov complexity in a crucial way: it incorporates model selection. Where Kolmogorov measures the intrinsic complexity of a string, MDL asks what model best explains the string and how parsimonious that model is. Applied to narrative analysis, this means we are not just measuring how hard a story is to compress — we are measuring how complex a generative model must be to produce it.

Good-faith communicators have low MDL because their narrative is generated by a simple model: the facts they know, expressed with contextual variation. The model is stable across time and across topics. Each new statement adds little to the description length because it is largely predicted by the existing model. The total description length grows linearly with the number of statements at a rate close to the entropy of the underlying facts.

Fabricators have high and growing MDL because their narrative requires a complex model — one that tracks multiple evolving explanations, covers contradictions, and accommodates the improvised patches applied to earlier fabrications. As time passes and more communications accumulate, the model must grow to accommodate the expanding web of claims. The description length does not grow linearly; it accelerates as the narrative becomes increasingly elaborate.

The Growth Rate Diagnostic

The key diagnostic is not absolute MDL but the rate of MDL growth as the corpus expands. Honest communicators exhibit sublinear growth: each additional statement adds less than one bit of model complexity because the statement is largely captured by the existing model. The per-statement description length stabilises to a constant that reflects the natural variability of human expression around a fixed underlying reality.

Fabricators exhibit superlinear growth: each new statement requires meaningful updates to the model because it introduces material that does not fit the existing structure. The per-statement description length does not stabilise; it increases as the narrative becomes more elaborate. At some point, the model complexity exceeds a threshold that is cognitively impossible for genuine knowledge to maintain — no human being actually remembers a complex web of fabricated events with internal consistency sufficient to keep the MDL growth low.

The crossover point — where MDL growth transitions from sublinear to superlinear — often corresponds to identifiable events in the dispute timeline. A sudden spike in MDL growth rate may coincide with the receipt of legal advice, the moment when the fabrication strategy shifted from simple denial to elaborate counter-narrative.

Cross-Party MDL Comparison

MDL enables direct comparison between parties in the same dispute without normalising for communication volume or style. Take each party's communications over the same period and compute the MDL of each corpus. The party whose narrative can be explained by a simpler model — whose description length per statement is lower — is the party whose account is more parsimonious. This is not a judgment about truth or falsehood; it is a measurement of narrative efficiency that correlates strongly with honesty.

In practice, the MDL ratio between adversarial parties in the same dispute reveals an asymmetry that is difficult to explain through legitimate differences in perspective. Both parties may have genuine reasons for their positions, but only one party's narrative will show the characteristic signatures of engineering: high complexity, accelerating growth, and model structure that diverges from the natural compression patterns of honest communication.

Complementary Measurement

MDL operates on a different axis than compression-ratio analysis. Where raw compression measures the redundancy in a text, MDL measures the complexity of the generative model required to produce it. A text could be highly compressible (high redundancy) while requiring a complex model if the redundancy is repetitive rather than structurally coherent. Conversely, a text could be poorly compressed but generated by a simple model if the variation is random rather than engineered.

Combining MDL analysis with the other metrics in this framework — KL divergence for anchor erosion, Wasserstein distance for distributional shift, mutual information for temporal coherence — creates a multi-dimensional profile that captures different aspects of narrative structure. A story that fails all four tests — high MDL, accelerating drift, distributional restructuring, and decaying coherence — is structurally incoherent in ways that truthful accounts simply are not.