One Mistake per Billion: How DNA Polymerase Proofreads

## The Scale of the Problem

Consider the numbers. Human cells contain about 3 billion base pairs of DNA. Each cell division requires copying all of it. You start life as a single cell and end up with about 37 trillion cells. That's 37 trillion rounds of DNA replication, each copying 3 billion bases.

The raw error rate for nucleotide insertion — how often DNA polymerase puts in the wrong base — is about 1 in 100,000. That sounds low. But if your genome were copied 3 billion times at that rate, you'd introduce about 30,000 mutations per replication event. In 37 trillion cell divisions, the cumulative mutation load would be... catastrophic.

The actual measured error rate in human cells is about 1 in a billion. Something between initial synthesis and the final product is catching and correcting about 99.99% of the initial errors. That something is a set of interlocking mechanisms that operate at the polymerase itself and after synthesis.

## Proofreading by the Polymerase

DNA polymerase III in bacteria, and Pol ε (and to some extent Pol δ) in eukaryotes, have a built-in 3'→5' exonuclease activity — a molecular "backspace key." After adding each nucleotide, the polymerase checks whether the base pairing is correct. A correctly paired base sits snugly in the active site. A mispaired base is geometrically awkward — it distorts the DNA and slows the polymerase.

When the polymerase senses something wrong, it shifts the 3' end of the new strand into the exonuclease domain, cleaves off the mismatched nucleotide, shifts back to the polymerase domain, and tries again. This intrinsic proofreading reduces the error rate from about 1-in-100,000 to about 1-in-10-million.

The mechanism is kinetic, not chemical. The polymerase doesn't "recognize" a mispair explicitly — it just moves more slowly when a mispair is present, which gives the exonuclease domain more time to act. Speed and accuracy are in tension: a faster polymerase proofreads less effectively.

## Mismatch Repair: Catching What Proofreading Missed

After the polymerase has passed through a region, another system scans the newly synthesized DNA for remaining errors. This is mismatch repair (MMR).

The key proteins in bacteria are MutS (which recognizes mismatched bases), MutL (a mediator), and MutH (which nicks the newly synthesized strand near the mismatch). In eukaryotes, there are MutS and MutL homologs (MSH and MLH proteins) that perform equivalent functions without MutH — the eukaryotic system identifies the new strand by other means (likely the nicks from Okazaki fragment synthesis and the 3' ends).

The mismatch repair system has to solve a non-trivial problem: it needs to know *which* strand is the newly synthesized one (and therefore contains the error) versus the template strand. In bacteria, this distinction is made using methylation. E. coli methylates adenine at GATC sequences, but methylation lags slightly behind synthesis — the new strand is briefly unmethylated while the template is methylated. The MutH enzyme cuts preferentially in the unmethylated strand, directing repair to the new strand.

Human MMR proteins preferentially repair the strand with nicks and gaps — features of newly synthesized DNA.

MMR reduces the final error rate another ~100-fold, to about 1-in-a-billion.

## Why This Matters Clinically

Defects in mismatch repair genes cause Lynch syndrome, one of the most common hereditary cancer syndromes. Patients with Lynch syndrome have inherited mutations in one copy of MMR genes like MLH1, MSH2, MSH6, or PMS2. Their cells have normal MMR function — one good copy is enough. But when the second copy acquires a somatic mutation (loss of heterozygosity), the cell loses MMR entirely.

Without mismatch repair, replication errors accumulate much faster than normal. This shows up as microsatellite instability (MSI) — abnormal variation in the length of short tandem repeat sequences — which is a diagnostic hallmark of Lynch syndrome-associated tumors.

The connection between MMR failure and cancer is direct: remove the proofreading mechanism, accumulate mutations faster, hit a tumor suppressor gene or oncogene, and malignant transformation follows. It's one of the clearer mechanistic links in oncology between a specific molecular defect and a clinical outcome.

The next chapter compares how bacteria and eukaryotes solve the same replication problem differently — particularly the challenge of scale.

One Mistake per Billion: How DNA Polymerase Proofreads

// COMMENTS

ON THIS PAGE