Can we really identify LLM Generated Text? The promise and limits of watermarking

Exploring the theoretical and practical aspects of watermarking techniques for detecting AI-generated content, including trade-offs, failure modes, and information-theoretic limits

Introduction

The rise of Large Language Models (LLMs) and Multimodal models has changed the nature of digital content, often making the boundary between human and machine authorship increasingly porous. While this capability offers utility, it also introduces risks, such as the spread of misinformation, academic dishonesty, and a general erosion of trust in digital communication .

For instance, ask yourself this, how will you confidently determine that this blog post you are reading is written by a human or is purely LLM-Generated?

The central question is whether the text generated by an LLM (or in case of Multimodal models any content in the form of text, image or audio) can be reliably distinguished from that written by a human. Several papers have come up with watermarking techniques, notably among them appearing in ICLR presentations and conference .

In the sections that follow, we aim to give the readers a robust theoretical as well as practical understanding of watermarking while emphasizing the trade-offs and failure modes of these techniques. We also aim to tie in information-theoretic limits—how much signal one can embed without degrading text—as well as to what extent detection error bounds are acceptable

All of the detection can be roughly bifurcated into two distinct methods; Detection Methods which operate post-hoc on finished text, and watermarking which represents a more proactive approach to establishing provenance. Watermarking is a step which aims to embed an imperceptible statistical signal into text during the generation process, thus creating a verifiable link between an output and its source. This signal is not a secret message itself more a detectable pattern that identifies the text as machine generated.

The AI Generated writing has two notable characteristics that makes it seem a little too-perfect (and a little less human). These two characteristics are Perplexity and Burstiness.

Perplexity

Perplexity - In Language modeling, perplexity quantifies a model’s uncertainty or “confusion” when predicting the next token in the sequence. Mathematically, it is the exponential of the average negative log-likelihood per token.

\[\text{Perplexity}(x_{1:T}) = \exp\left(-\frac{1}{T} \sum_{t=1}^{T} \log p(x_t \mid x_{<t})\right)\]

A lower perplexity scores indicates that the model is more confident in its predictions, as it is effectively choosing from a smaller set of likely next words.

The goal of LLM training is to minimize perplexity on a corpus of human text. Thus they tend to sample high-probability tokens and their generated text often has a lower perplexity score when evaluated by a language model than typical human-written text.

Note - Perplexity can be understood more intuitively through its geometric mean formulation. The geometric mean of a set of numbers is the Tth root of their product (where T is the number of values) and perplexity is the geometric mean of the inverse probabilities: $$ \text{Perplexity}(x_{1:T}) = \left(\prod_{t=1}^{T} \frac{1}{p(x_t \mid x_{<t})}\right)^{1/T} $$

Burstiness

Burstiness - While Perplexity measures the average predictability of a text, burstiness measures its variance. it is defined as the change in perplexity over the course of a document. Mathematically,

\[B = \frac{\lambda - k}{\lambda + k}\]

where $B$ = Burstiness, $\lambda$ = Mean inter-arrival time between bursts, $k$ = Mean burst length

Human writing is often characterized by a “bursts” of high perplexity, where a writer uses a creative metaphor, a rare word, or an unconventional sentence structure. In contrast, LLM-generated text tends to maintain a more uniform level of perplexity, resulting in low burstiness.

Beyond such raw statistics, it can also be observed that machine text exhibits noticeable stylistic patterns. Studies like indicate that AI-generated text often adopts a more analytic and formal tone, may use a higher density of adjectives, and can be less readable than human generated text. Human writing on the other hand, has much more variability with more sophisticated discourse patterns, reflecting a more dynamic and uniform authoring process .