Why IPv6 Won’t Be Here By 2005

This message was posted on October 27, 2003 to a mailing list in response to a post that claimed that IPv6 would be widespread by 2005 due to an IPv4 address shortage. Given that 2005 has come without IPv6 having taken off, this post feels vindicated in its conclusion.

NATs, unfortunately, made a need to switch over to IPv6 wholly unnecessary. Such a switchover will probably not happen for at least another ten years. Even ten years ago, we were “running out of” IPv4 space due to incredibly inefficient allocations using the “class based addressing” method – by which your network was deemed to either to likely possess 253 computers, 65,533 computers, or 16,777,213 computers. A specific network was identified by 24, 16, or 8 bits. (The more bits it takes to identify a network, the more networks can exist but at the expense of having fewer unique addresses per network.)

This was quickly determined to be an inordinate waste of addresses and as early as the early 90’s folks were predicting we’d rapidly run out of addresses. So class allocations changed a little, and instead of giving an organization with 1000 computers a class B (with 65,533 useable addresses), they’d give them four class C’s (with 1012 addresses). This helped stem the tide for a bit and arguably saved the Internet’s ass, but it was clear that a more elegant system for identifying networks was needed.

After some backbone technology re-architecting, a new scheme called Classless Internet Domain Routing, or CIDR was introduced, which allowed bit-sized granularity, meaning that a network was identified by exactly as many bits as you needed. Your network could possess 13 computers, or 16,381 computers, and the system could deal with that efficiently. CIDR definitely also helped save the Internet’s ass. But the addresses kept on coming; that dang Internet was getting popular very quickly! Pundits started talking about The Great IPv6 changeover, despite the fact that less than one person in 100 on the Internet had an IPv6-enabled operating system.

Then came NATs. While Network Address Translation had been used in many environments, it hadn’t really taken off tremendously. Then Linksys released a rather affordable cute little blue box. This piece of hardware let home users plug in several computers to the blue box, configure it with a web interface, jack in their cable/DSL connection and suddenly be sharing Internet access easily with everyone in the house, using one IP address and so fooling the ISP into thinking that there was only one computer using the Internet (many ISPs either don’t permit or don’t have the infrastructure to give out multiple addresses to a customer). These NATs had a secondary benefit, which was that by default, all incoming connections from the outside are dropped on the floor. I’m not sure Linksys had such “firewalling” in mind when originally designing the device – it’s purely a practical issue. I mean, if someone says to a NAT “here’s this piece of information” – to who which of the four connected computers should the NAT send it? By default, the NAT will give up and just drop the sorry packet. This means that when you’re behind a NAT, you’re protected from a whole class of Internet attacks. This realization further drove adoption.

Companies with low IT budgets realized that they wouldn’t have to buy extra IP addresses from their ISP (which often came at a premium) and that they could have simple firewalling without a complex configuration. Both companies and people could not see the inherent value in having each of their computers have an Internet-deliverable address, and there was real value (protection) to be had in NOT be addressable from the Internet.

This, again, saved the Internet’s ass. Instead of an organization of 1000 needing a class B, wasting hundreds of thousands of IPs, or even four Class Cs, this organization now only needs a single IP address to cover all of its desktops. Now instead of thinking about IP addresses as computer addresses, they have started to become network addresses,
which is to say, the WHOLE 32 BITS is the network identifier. While I am sure that there are rapidly going to be more than four billion network-connected devices (which would fill the entire IPv4 address space), I’m not convinced that there are going to be more than, say,
100 million individual *networks* in the next 5-10 years. The transition to NATs is going to completely obviate a near-term requirement for a changeover to IPv6.

There’s only one problem: this destroys one of the fundamental
principles upon which the Internet was constructed – “Every node is
born equal.” In theory, the servers that run HotMail should be no
different from the computer no your desk. Sure a HotMail computer is
probably rackmounted next to dozens of other servers, and probably has
a faster Internet connection, but your computer should be able to run
a slow version of what runs at HotMail. This is the way that networks
used to work and what enabled everything from Yahoo and Google’s
development, running off of nodes in dorm rooms, to modern P2P
networks like RedSwoosh and Kazaa. None of these could operate
properly in a NAT environment, because the outside world would have no
way of making a spontaneous connection to a sever behind the NAT. If
the whole of Stanford campus had been behind one IP, countless
companies could not have sprung up, running custom web and email
services in dorm rooms.

This rising dichotomy, coupled with the dramatically rising
download/upload ratios of broadband (my current cable modem can
download 10x faster than it can upload!) means that there are now
really two classes of Internet citizens – ones with an IP address and
a synchronous connection (servers, broadcasters, “true nodes”), and
ones behind a NAT with very little upload capacity (consumers /
plebians). This may rapidly turn computers into advanced televisions
instead of interactive information sharing devices. Consider the
inequality today – most broadband users can listen to Internet radio
but can’t publish their own streams.

P2P also fundamentally stops working well with high download/upload
ratios. On a P2P network, the aggregate download speed is equal to the
aggregate upload speed. This means that if everyone on the network can
download ten times faster than they can upload, downloads off of a P2P
network will be ten times slower than downloads off a server
directly. This means that P2P CDNs cannot really succeed, which would
be a crying shame.

IPv6 could resolve the addressing concerns, if not the disparity in
connection speeds. I personally think it would be great, especially
considering how it could potentially bring multicast to the
masses. But the adoption is just not there. I run a colocation site
and we’ve been asking our upstream ISP, who is one of the world’s
leading IPv6 providers (and who offers a free IPv6 tunnel broker), if
they would permit routing of IPv6 traffic over our existing
connection. “Any moment now” they’ve been saying. So deployment is
nearly non-existant. Implementations are, too – Microsoft only offered
an alpha-quality IPv6 stack for Windows 2000 from an obscure location
on the Microsoft Research site. The fact that it didn’t come standard
on Windows XP should speak volumes; but it is available on the
WindowsUpdate site to users of XP. (The only thing that bugs me is
that now it does TWO DNS resolves for every name – first for the AAAA
record, then for the A record!) Windows98/ME and Windows 2000 users
almost assuredly can’t do IPv6, and only Windows XP customers who have
upgraded can, so I’m guessing that it’s still less than 1% of the
desktops out there that can do IPv6.

All of this is a long way of saying “Don’t hold your breath for massive IPv6 deployment by 2005.” 🙂

Formats

For protocols there’s Ethernet, TCP, IP, UDP, AIM (in ICQ, OSCAR, and TOC flavors!), Yahoo! Messenger, MSN Messenger, IRC, SOCKS5, HTTP, HTTPS, IMAP, POP3, FTP, Shoutcast, SSH, and Kazaa…and that’s just what’s running on my desktop right now! Then there are umpteen formats for data storage. MBX, PBX, DOC, MP3, AVI, WAV, MPG, and so forth.

In the Open Source world there are many different programs that each implement these formats in their own way. Ethereal, tcpdump, GAIM, and Jabber all try to implement the AIM protocol, which is often getting revised. When a change comes out, all the developers for the respective projects have to go rush out and update their software. Seems like an awful lot of duplicated effort, especially for the poor sap who comes along and wants to build a new product that interfaces with some fair number of protocols.

What if there were a better answer? What if there were a repository of structured information about every structured bitformat out there – from files to network protocols? A developer could just grab the latest structured descriptions that s/he needed (perhaps from a web service) and, together with a parsing library, be good to go for juggling ten different protocols in no time.

Such a repository could also be used to automatically generate documentation for these protocols, everything from simple RFC-like descriptions to interactive Flash webpages to walk you through the intricacies of the protocol. It would be a boon to Internet development and would effectively “factorize” duplicated work from different Open Source projects.

I might just try and put it together when I get some “free” time.

Linux Unicode Support Sucks

Unicode is just about the coolest thing since sliced bread. It’s the kind of thing after which you wonder how things possibly worked previously. The idea is simple: one character set can represent any character in any language. There are a few different ways of encoding Unicode characters (there are more than 100,000 of them!) but the most popular is “UTF-8” for the 8-bit Unicode transfer format.

UTF-8 is not-very-coincidentally also a superset of 7-bit ASCII, meaning that most English documents are already valid UTF-8. The algorithms for dealing with UTF-8 are not overly complex – while characters are multibyte, the leading bits actually tell you what bit you’re on, so character counting is easy and does not require the insane juggling of some of the Asian character encoding techniques to find out where you are in a string.

Windows 2000 & XP (and indeed, even NT!) seem to have pretty decent Unicode support. Internet Explorer lets me read UTF-8 pages just fine, including posting mixed Chinese & Hebrew comments on websites. 🙂 I can cut and paste between Notepad and IE without
difficulty. Yay!

There was one minor quirk, and that was that IE basically seemed to ignore the META tag on a website (specifying that a page was encoded in UTF-8) if there was a Content-Type header from the server that contradicted it. So I needed to explicitly set the Content-Type
header to Content-type: text/html; charset=UTF-8. Then everything seemed to render alright.

So I flip on over to Linux to see what the current state of the international toolchain is, figuring it’s probably going to be incredibly robust, since, what, the whole of China is basically using Linux, right? Wrong. Hardly anything worked.

To be fair, I first needed to configure my Windows SSH client (PuTTY) to “assume” that the character encoding it was being fed was UTF-8; unfortunately, it seems that terminal protocols don’t have a good mechanism for indicating the client or server’s capacity for various encoding techniques. So I also need to set the LANG environment variable
to en_US.UTF-8 from its default of en_US. This still wasn’t enough.

To get lynx (the terminal web browser) to work properly I needed to call it with –display_charset=utf-8. I finally saw Chinese over SSH. Yay.

I looked at quite a few editors (Emacs, Vim, Xemacs, QEmacs, and mined) of which only mined seemed to support UTF-8 editing in any sort of reasonable capacity. The next version of Emacs promises “real UTF-8 support”, but I seem to recall hearing those sorts of promises last year, too. I’m frankly distressed. Thankfully the basic tools like less seem to be fully UTF-8 compliant. Odd.

Anyhow, more rant later. 🙂