Sunday, November 11, 2012

The language of phone numbers, thanks to GMSV and Language Log


The language of phone numbers

What xkcd is getting at with the latest comic is about syntax and semantics. I'll show you the syntax below, but as far as meaning is concerned, the point is that cell phone numbers have almost no semantics. The area code part (the first three digits) used to function as a locational marker when phones were in fixed locations in houses, but since Americans not only tend to move every three years or so but they now take phone numbers with them, and cell phone universality only really began to pick up in America five to ten years ago, it really does tend to reflect a former abode. My cool son Calvin, for example, has a number which implies that he lives in Oakland, California; he doesn't, he does his video game programming in the Pacific North West.
And the rest of the number, the other seven digits? Space enough there for some real personal information, but it is not used. It functions merely as arbitrary material to distinguish one cell phone's location point in the information universe from all the others.
With 10-digit strings we can distinguish roughly 10,000,000,000 phones from each other. That assumes someone can have the number 000-000-0000, which is probably God's number; and sure, maybe Satan has laid claim to 666-666-6666, so it's not available; but we're only being approximate here. The bottom line is that there's enough space in principle for everyone in the USA to have 20 or 30 different cell phone numbers, if we use it efficiently.
But we don't. I have often stared at documents like gas bills and been amazed to see things like account numbers or other identification numbers as long as 18 or 20 digits. There are only about 7 × 109 people in the world. Some account numbers are so long you could give separate account numbers to every member of the population on a billion planets with populations like ours. Those numbers could record the addresses and ages and incomes of the customers instead of just being random digit strings. But we don't do that. The information society that people get so worried about — the world in which The Government knows all your details and tracks everything you do — hasn't arrived yet, and probably never will. We're not that organized as a species. We waste too much time and too many of our computational resources keeping track of pointless random digit strings and being unable to relate them to each other.
(You may say I'm not paranoid enough about government intrusion and snooping, but I say that a civilization in which it is all but impossible to get duplicate entries out of junk mail address databases is not a civilization that is going to be able to figure out how to correlate your pornography rentals, golf club memberships, gas bills, and political leanings. I'm not saying they aren't evil enough to do it; I'm saying they're too incompetent to do it.)
Now for the syntax of cell phone numbers (I did promise). The syntax of North American phone numbers — the only phone numbers you can enter in many databases, which is truly a curse for someone like me who commutes across the Atlantic — can be described by a grammar in which the dictionary is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -}, the "parts of speech" (syntactic categories for words and phrases) are {N, A, B, C, D, E, F}, and the rules say that a phone number consists of a 3-digit area code and a body with a separator between them; that a body consists of a 3-digit exchange code and a 4-digit local number with a separator between them; and that the digits are {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -}. This is how it is done with precision:
  1. N → A C B
  2. A → D D D
  3. B → E C F
  4. C → -
  5. E → D D D
  6. F → D D D D
  7. D → 0
  8. D → 1
  9. D → 2
  10. D → 3
  11. D → 4
  12. D → 5
  13. D → 6
  14. D → 7
  15. D → 8
  16. D → 9
This is a rewriting system — what Noam Chomsky calls a generative grammar. You start by writing down an N, and then read the arrow as "may be replaced by". The grammar describes a set: the set of all and only those digit strings that you can arrive at through some series of string rewritings that the rules permit. (There are techniques for abbreviating such grammars, like writing the last ten rules as "D → 0|1|2|3|4|5|6|7|8|9|-", but I haven't bothered to use them.) If you like, you can think of the capital letters as having these mnemonic meanings:
  • N: Number considered as a whole
  • A: Area code (or rather, what was once an area code)
  • B: Body of the number
  • C: Character like hyphen or space for separating segments
  • D: Digit chosen from {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
  • E: Exchange (or what used to be the exchange back in 1950)
  • F: Four random digits with no pretense at any meaning at all
A sequence of strings licensed by the rules is called a derivation. It's essentially a proof that a certain string is indeed a cell phone number. Here's a proof that shows, by building the number steadily from the left hand side according to the rules, that 202-466-1033 is a possible cell phone number (though actually it's the landline number for The Chronicle of Higher Education). N is the first line; every derivation starts with that. Each subsequent line is formed from the previous line by applying just one of the rules.

N
A C B
D D D C B
2 D D C B
2 0 D C B
2 0 2 C B
2 0 2 - B
2 0 2 - E C F
2 0 2 - D D D C F
2 0 2 - 4 D D C F
2 0 2 - 4 6 D C F
2 0 2 - 4 6 6 C F
2 0 2 - 4 6 6 - F
2 0 2 - 4 6 6 - D D D D
2 0 2 - 4 6 6 - 1 D D D
2 0 2 - 4 6 6 - 1 0 D D
2 0 2 - 4 6 6 - 1 0 3 D
2 0 2 - 4 6 6 - 1 0 3 3

It's probably possible to make a similar grammar for the whole of English. It might start (using S for "sentence", NP for "noun phrase", VP for "verb phrase", D for "determinative", N for "noun", etc.) like this:
S → NP VP
NP → D N
VP → V NP
. . .
And so on for the rest of English sentence structure (see The Cambridge Grammar of the English Language for an informal overview of the other details you'd have to cover). Piece of cake. A few more decades and we linguists will have it done.

Language Log » The language of phone numbers

'via Blog this'

No comments:

Post a Comment