by Malcolm Rowe

Pangrammatic windows

Over on Language Log, there’s a post about pangrammatic windows, and a bot that searches Twitter posts for them. Pangrammatic windows are pangrams — a piece of text using all the letters in the (English) alphabet — that occur within otherwise naturally-occurring text.

For example, the shortest known natural sequence is 42 letters, from Piers Anthony’s Cube Route, discovered in an article in Word Ways:

“We are all from Xanth,” Cube said quickly. “Just visiting Phaze. We just want to find the dragon.”
Piers Anthony’s Cube Route (pangrammatic window highlighted)

I thought it might be interesting to work out how you’d go about searching a given text for pangrammatic windows. A short chat at work and some quick hacking later, and I had a simple proof-of-concept, but no data to run against.

That was easily solved by downloading the Project Gutenberg April 2010 DVD image1 and unzipping everything within. That gave me 11.6GB of text files, ranging in size from 336 bytes (one of the chapters of Moby Dick) to a single 43MB file comprising Webster’s Unabridged Dictionary.

I’ll post about the technical side separately, but suffice to say that this search doesn’t exactly tax a modern PC: my laptop has enough RAM to load all of the Gutenberg text into memory, and even from cold, it takes only 80 seconds to search through it all.

So what did I find? Well, firstly, several thousand occurrences of “the alphabet”. In retrospect, that probably should have been obvious.

I did find another 42-letter sequence, but I don’t think it can really count, as it occurs during a discussion of pangrams itself: De Morgan (the mathematician), while snarking about numerology, writes about trying to construct a meaningful sentence using all the letters save ‘v’ and ‘j’ exactly once:

There is a kind of Cabbala Alphabetica which the investigators of the numerals in words would do well to take up: it is the formation of sentences which contain all the letters of the alphabet, and each only once. No one has done it with v and j treated as consonants; but you and I can do it. Dr. Whewell and I amused ourselves, some years ago, with attempts. He could not make sense, though he joined words: he gave me Phiz, styx, wrong, buck, flame, quid.
Augustus De Morgan, A Budget of Paradoxes

The shortest sequence that seems to fit within the rules is the following 53-letter sequence, from The Life of Charles Dickens:

[…] there was a second reading to which the presence and enjoyment of Fonblanque gave new zest; and when I expressed to Dickens […]
John Forster, The Life of Charles Dickens

However, this, and a similar 56-letter sequence (“Köckeritz! Where is the king?”) in Napoleon and the Queen of Prussia both still seem somewhat unnatural to me, since they depend upon proper names to work (and to be fair, the same is true of the Piers Anthony quote as well).

Given that, I think the contender for the shortest truly “natural” pangrammatic window in the Gutenberg corpus is the following 57-letter sequence, from Andre Norton’s YA-esque civil war adventure, Ride Proud, Rebel!:

They had turned off the road, which was now filled with men, horses, men, artillery, and men, all slogging purposefully forward. They composed an army roused out before daylight, on the move toward another army holed in behind a breastworks and waiting. And over all, the exhausting blanket of mid-July heat which pressed to squeeze all the vital juices out of both man and animal.
Andre Alice Norton, Ride Proud, Rebel!, 1961

Funnily enough, one thing that I did expect to find, but didn’t, were any common examples of pangrams — in fact, the word “pangram” does not appear (with that meaning) in the Gutenberg corpus at all! The closest I got were the two near-misses: “the quick, brown fox jumped over the lazy dog” and “the swift brown fox jumps over the lazy dog”, the former of which is, I think, a misquote (the latter isn’t, as it’s called out in the text as an almost-pangram).

That’s it for this post. I also have a separate post that goes into a little detail about the code itself.

  1. Hey, 14-year-old me? Remember when you spent over an hour on the phone to download 150KB of BBS software on a 300 baud connection? I just took about the same time to download 8.4GB, and I have enough space to store an uncompressed copy too. The future rocks! But while we’re here: could you buy some Apple stock during 2002? Thanks!