Can we really detect LLM-Generated Text ?

This blog post explores the promises and limits of watermarking; a technique that embeds information in the output of a model to verify its source, aims to mitigate the misuse of such AI-generated content

Introduction

The rise of Large Language models (LLMs) and later Multimodal models has fundamentally altered the digital content, in many cases making the boundary between human and machine authorship increasingly porous. This capability while offering immense benefits, also introduces significant risks including the spread of misinformation academic dishonesty, and erosion of trust in digital communication </>

For instance, ask yourself this, how will you confidently determine that this blog post you are reading is written by a human or is purely LLM-Generated?

The central question is whether the text generated by an LLM (or in case of Multimodal models any content in the form of text, image or audio) can be reliably distinguished from that written by a human. Several papers have come up with watermarking techniques, notably among them appearing in ICLR presentations and conference

In this blog post, we aim to give the readers a robust theoretical as well as practical understanding of watermarking while emphasizing the trade-offs and failure modes of Watermarking. We also aim to tie in information-theoretic limits - how much signal one can embed without degrading text as well as to what extent detection error bounds are acceptable. We also look at some of the upcoming novel design space for watermarking and detecting LLM-Generated text. (See Learning to Watermark via RL) Lastly, we aim to showcase some experiments from SOTA labs that focus on watermarking (SynthID Deepmind)

ALl of the detection can be roughly bifurcated into two distict methods; Detection Methods which operate post-hoc on finished text, and watermarking which represents a more proactive approach to establishing provenance. Watermarking is a step which aims to embed an inperceptible statistical signal into text duringg the generation process, thus creating a verifiable link between an output and its source. This signal is not a secret message itself more a detectable pattern that identifies the text as machine generated.

The AI Generated writing has two notable characteristics that makes it seem a little too-perfect (and a little less human). These two characteristics are Perplexity and Burstiness.

Perplexity - In Language modeling, perplexity quantifies a model’s uncertainity or “confusion” when predicting the next token in the sequence. Mathematically, it is the exponential of the average negative log-likelihood per token.

\[\text{Perplexity}(x_{1:T}) = \exp\left(-\frac{1}{T} \sum_{t=1}^{T} \log p(x_t \mid x_{<t})\right)\]

A lower perplexity scores indicates that the model is more confident in its predictions, as it is effectively choosing from a smaller set of likely next words.

The goal of LLM training is to minimize perplexity on a corpus of human text. Thus they tend to sample high-probability tokens and their generated text often has a lower perplexity score when evaulated by a language model than typical human-written text.

Note - Perplexity can be understood more intuitively through its geometric mean formulation. The geometric mean of a set of numbers is the Tth root of their product (where T is the number of values) and perplexity is the geometric mean of the inverse probabilities:

\[Perplexity(x_{1:T}) = \left(\prod_{t=1}^{T} \frac{1}{p(x_t \mid x_{<t})}\right)^{1/T}\]

Burstiness - While Perplexity measures the average predictability of a text, burstiness measures its variance. it is defined as the change in perplexity over the course of a document. Mathematically, $B = \frac{\lambda - k}{\lambda + k}$

where $B$ = Burstiness, $\lambda$ = Mean inter-arrival time between bursts, $k$ = Mean burst length

Human writing is often characterized by a “bursts” of high perplexity, where a writer uses a creative metaphor, a rare word, or an unconventional sentence structure. In contrast, LLM-generated text tends to maintain a more uniform level of perplexity, resulting in low burstiness.

Beyond such raw statistics, it can also be observed that machine text exhibits noticeable stylistic patterns. Studies like indicate that AI-generated text often adopts a more analytic and formal tone, may use a higher density of adjectives, and can be less readable than human generated text. Human writing on the other hand, has much more variability with more sophisticated discourse patterns, reflecting a more dynamic and uniform authoring process.

SynthID by Deepmind is a promising approach

This theme supports rendering beautiful math in inline and display modes using MathJax 3 engine. You just need to surround your math expression with $$, like $$ E = mc^2 $$. If you leave it inside a paragraph, it will produce an inline expression, just like $E = mc^2$.

To use display mode, again surround your expression with $$ and place it as a separate paragraph. Here is an example:

\[\left( \sum_{k=1}^n a_k b_k \right)^2 \leq \left( \sum_{k=1}^n a_k^2 \right) \left( \sum_{k=1}^n b_k^2 \right)\]

Note that MathJax 3 is a major re-write of MathJax that brought a significant improvement to the loading and rendering speed, which is now on par with KaTeX.

Citations

Citations are then used in the article body with the <d-cite> tag. The key attribute is a reference to the id provided in the bibliography. The key attribute can take multiple ids, separated by commas.

The citation is presented inline like this: (a number that displays more information on hover). If you have an appendix, a bibliography is automatically created and populated in it.

Distill chose a numerical inline citation style to improve readability of citation dense articles and because many of the benefits of longer citations are obviated by displaying more information on hover. However, we consider it good style to mention author last names if you discuss something at length and it fits into the flow well — the authors are human and it’s nice for them to have the community associate them with their work.

Footnotes

Just wrap the text you would like to show up in a footnote in a <d-footnote> tag. The number of the footnote will be automatically generated.This will become a hoverable footnote.

Code Blocks

Syntax highlighting is provided within <d-code> tags. An example of inline code snippets: <d-code language="html">let x = 10;</d-code>. For larger blocks of code, add a block attribute:

var x = 25; function(x) { return x * x; }

Note: <d-code> blocks do not look good in the dark mode. You can always use the default code-highlight using the highlight liquid tag:

var x = 25;
function(x) {
return x \* x;
}

Interactive Plots

You can add interative plots using plotly + iframes

The plot must be generated separately and saved into an HTML file. To generate the plot that you see above, you can use the following code snippet:

import pandas as pd
import plotly.express as px
df = pd.read_csv(
'https://raw.githubusercontent.com/plotly/datasets/master/earthquakes-23k.csv'
)
fig = px.density_mapbox(
df,
lat='Latitude',
lon='Longitude',
z='Magnitude',
radius=10,
center=dict(lat=0, lon=180),
zoom=0,
mapbox_style="stamen-terrain",
)
fig.show()
fig.write_html('assets/plotly/demo.html')

Details boxes

Details boxes are collapsible boxes which hide additional information from the user. They can be added with the details liquid tag:

Click here to know more

Additional details, where math $2x - 1$ and code is rendered correctly.

Layouts

The main text column is referred to as the body. It is the assumed layout of any direct descendants of the d-article element.

.l-body

For images you want to display a little larger, try .l-page:

.l-page

All of these have an outset variant if you want to poke out from the body text a little bit. For instance:

.l-body-outset

.l-page-outset

Occasionally you’ll want to use the full browser width. For this, use .l-screen. You can also inset the element a little from the edge of the browser by using the inset variant.

.l-screen

.l-screen-inset

The final layout is for marginalia, asides, and footnotes. It does not interrupt the normal flow of .l-body sized text except on mobile screen sizes.

.l-gutter

Other Typography?

Emphasis, aka italics, with asterisks (*asterisks*) or underscores (_underscores_).

Strong emphasis, aka bold, with asterisks or underscores.

Combined emphasis with asterisks and underscores.

Strikethrough uses two tildes. ~~Scratch this.~~

First ordered list item
Another item ⋅⋅* Unordered sub-list.
Actual numbers don’t matter, just that it’s a number ⋅⋅1. Ordered sub-list
And another item.

⋅⋅⋅You can have properly indented paragraphs within list items. Notice the blank line above, and the leading spaces (at least one, but we’ll use three here to also align the raw Markdown).

⋅⋅⋅To have a line break without a paragraph, you will need to use two trailing spaces.⋅⋅ ⋅⋅⋅Note that this line is separate, but within the same paragraph.⋅⋅ ⋅⋅⋅(This is contrary to the typical GFM line break behaviour, where trailing spaces are not required.)

Unordered list can use asterisks
Or minuses
Or pluses

I’m an inline-style link

I’m an inline-style link with title

I’m a reference-style link

You can use numbers for reference-style link definitions

Or leave it empty and use the link text itself.

URLs and URLs in angle brackets will automatically get turned into links. http://www.example.com or http://www.example.com and sometimes example.com (but not on Github, for example).

Some text to show that the reference links can follow later.

Here’s our logo (hover to see the title text):

Inline-style: alt text

Reference-style: alt text

Inline code has back-ticks around it.

var s = "JavaScript syntax highlighting";
alert(s);

s = "Python syntax highlighting"
print s

No language indicated, so no syntax highlighting.
But let's throw in a <b>tag</b>.

Colons can be used to align columns.

Tables	Are	Cool
col 3 is	right-aligned	$1600
col 2 is	centered	$12
zebra stripes	are neat	$1

There must be at least 3 dashes separating each header cell. The outer pipes (|) are optional, and you don’t need to make the raw Markdown line up prettily. You can also use inline Markdown.

Markdown	Less	Pretty
Still	`renders`	nicely
1	2	3

Blockquotes are very handy in email to emulate reply text. This line is part of the same quote.

Quote break.

This is a very long line that will still be quoted properly when it wraps. Oh boy let’s keep writing to make sure this is long enough to actually wrap for everyone. Oh, you can put Markdown into a blockquote.

Here’s a line for us to start with.

This line is separated from the one above by two newlines, so it will be a separate paragraph.

This line is also a separate paragraph, but… This line is only separated by a single newline, so it’s a separate line in the same paragraph.