The Geometry of Deception: Mapping Information Theory to Forensic Communication Analysis

ai communication forensics information-theory nlp deception

In the modern landscape of digital communication, the primary challenge for forensic investigators, legal professionals, and behavioral psychologists has shifted. We no longer suffer from a lack of data; rather, we suffer from an inability to distinguish between substantive disclosure and strategic obfuscation. As Large Language Models (LLMs) become more adept at generating "plausible-sounding" narratives, the traditional methods of linguistic analysis—keyword frequency, pronoun usage, and sentiment tracking—are becoming increasingly obsolete.

A new frontier is emerging: the application of Information Theory and Topological Data Analysis (TDA) to communication records. This approach, exemplified by the "Indicia AI" framework, moves beyond the "what" of a conversation to analyze its "how"—specifically, the geometric and mathematical properties of ideas as they evolve in a high-dimensional vector space. By treating a narrative not as a string of words, but as a trajectory through semantic space, we can begin to quantify deception, gaslighting, and narrative drift with the same rigor we apply to physics or financial fraud detection.

1. The Stability of Truth: Cauchy Convergence

At the heart of any forensic investigation is the concept of consistency. Traditionally, this is a subjective "vibe check": does the story change over time? In a vector-based forensic model, this is replaced by the Cauchy Criterion. In mathematics, a Cauchy sequence is one where the elements become arbitrarily close to each other as the sequence progresses. In a communication context, a truthful narrative should "converge" toward a stable set of facts.

When a subject is telling the truth, even across multiple retellings or under pressure, the semantic distance between their statements should decrease. They are describing a fixed point in reality. Conversely, a deceptive narrative often exhibits high "oscillation." As the subject invents new details to cover logical holes or reacts to new evidence, their narrative "jumps" across the semantic manifold. By measuring the "Cauchy Convergence" of statement embeddings, we can mathematically flag narratives that are structurally unstable—a "failure to converge" that serves as a quantitative signature of fabrication.

2. The Efficiency of Substance: Fisher Information and the Nat-Ratio

One of the most common tactics in cooperative obstruction is the use of "word salad"—an overwhelming volume of text that contains a negligible amount of information. To combat this, we can employ Fisher Information, which in this context measures how much a specific statement updates our "posterior estimate" of a party’s true position.

The breakthrough metric here is the Word-to-Information Ratio (words per nat). An honest, efficient communicator provides a high yield of information per word. A manipulator, seeking to appear responsive while revealing nothing, produces a "flat" cumulative Fisher information curve. They may speak for hours, but the "transport cost" of their ideas is zero; they haven't moved the needle on our understanding of the facts. Quantifying "communicative efficiency" allows us to distinguish between genuine detail and "strategic verbosity."

3. The Asymmetry of Deception: KL Divergence and Strategic Reframing

Narrative drift is natural. As we remember things or gain new context, our stories evolve. However, there is a fundamental difference between organic drift and strategic reframing. This difference is captured by the Kullback-Leibler (KL) Divergence, specifically its property of asymmetry (D_KL(P ∥ Q) ≠ D_KL(Q ∥ P)).

In a forensic setting, we treat the subject's first substantive statement as an "anchor" ($P$). As the conversation progresses ($Q$), we measure the divergence. If the "forward divergence" (the cost to get from the truth to the current story) is significantly higher than the "reverse divergence" (the cost to get back), we have detected a targeted revision.

Organic memory error tends to be symmetric or "noisy" across all semantic dimensions. Strategic reframing, however, is highly concentrated. A manipulator will keep the "Timeline" dimension stable to maintain a veneer of credibility, while drastically "drifting" the "Responsibility" or "Blame" dimensions. KL Divergence allows us to decompose narrative drift and identify exactly which parts of the story are being "rewritten" in real-time.

4. Quantifying the "Move": Wasserstein Distance and Goalpost Shifting

We are all familiar with the colloquialism "moving the goalposts." In the past, proving this in a legal or interpersonal dispute was a matter of rhetoric. Using Wasserstein Distance (also known as Earth Mover's Distance), it becomes a matter of calculation.

Wasserstein distance measures the minimum "work" required to transform one probability distribution into another. If we map a party's communication themes into a distribution—e.g., 20% on Workload, 10% on Support, 5% on Conflict—we can track the "transport cost" of their narrative over time. If a party begins with a "Support-heavy" distribution and, over the course of an investigation, "moves" that semantic mass into a "Blame-heavy" distribution, the Wasserstein distance provides a concrete metric for that shift. It turns a subjective argument about "changing the subject" into an objective measurement of "narrative transport cost."

5. The Architecture of the Void: Persistent Homology

Perhaps the most sophisticated tool in this new forensic kit is Persistent Homology, a branch of Topological Data Analysis. While the other metrics look at what is there, Persistent Homology looks at what is missing.

A coherent, truthful narrative forms a dense "cloud" of points in semantic space. A deceptive or incomplete narrative, however, often contains "holes"—logical voids where a piece of information should be but isn't. These "voids" persist even when the subject tries to fill them with fluff. By analyzing the "betti numbers" (the count of holes in different dimensions) of a narrative's topology, we can identify "logical discontinuities." These are the areas where the story doesn't just "not make sense"—it literally doesn't "hold together" mathematically.

Conclusion: From Interrogation to Information Interrogation

The era of "vibes-based" communication analysis is coming to a close. As we move into an age where human and AI-generated text are indistinguishable on a surface level, we must look deeper into the mathematical structures that underpin our ideas.

The "Indicia" approach—using Cauchy sequences to measure stability, Fisher information to measure density, and Wasserstein distance to measure transport—represents a paradigm shift. We are no longer merely "summarizing" data; we are "interrogating" its geometry. This doesn't just make analysis faster; it makes it deterministic. It provides a "Hard Math" foundation for the "Soft Science" of human behavior, allowing us to see the shape of a lie before the words are even fully spoken.