Mapping the Internet

With only about 50% of PBwiki’s traffic coming from North America and with preliminary benchmarks showing 3+ second page load times in Paris, I’ve been thinking a bit about how to make the PBwiki experience snappy for people around the world. We’ve experimented with using various CDNs, but I’ve actually yet to be blown away by any of them. Having our own nodes at the edge can provide a number of benefits, such as having a well-defined cache invalidation strategy, performing DNS closer to the edges of users’ networks, caching secure data, and performing SSL handshakes quickly for logins.

So a reasonable question to ask at this point is – where are the spots where we’d get the most bang for the buck adding a new server? Answering this question requires a basic understanding of Internet topology. Armed with VPS accounts in Singapore and The Netherlands and traceroute.org, I set out to get a basic feel for the current state of global networks.

My principal hoped-for finding turned out to be true – links are mostly additive. Meaning that if it takes 200ms to get from Singapore to California and 85ms to get from California to Virginia, it takes nearly 285ms to get from Singapore to Virginia. Generally the direct route times were about 10% faster than the sums of the links, but never a lot less than that. This was encouraging because it said that latency was fairly consistently due to the speed of light.

That said, there were some startling findings as to global connectivity – The Netherlands are about 4500 miles away from India, but packets from Amsterdam consistently routed through Palo Alto on a 16,000+ mile journey the wrong way around the planet.

I also found that most of South America seems to route through Miami – even traffic within South America! And that traffic for South Africa often routes through New York, even from London, crossing the Atlantic twice. SAT-3 doesn’t seem to be doing its job.

David’s Tips on International Expansion Ordering:

  1. 1st cluster: To be most Net-accessible, your first cluster should probably be hosted in the US on either the West or East coast, depending on target demographic.
  2. 2nd cluster: Your second cluster should probably be on the other US coast – this will mean you’re within ~40ms of nearly all of North America, are under 100ms from Europe, and are under 200ms from Asia & Oceania.
  3. 3rd, 4th, 5th, and 6th clusters: Once you get into the swing of having a few clusters, the remaining spots that make sense seem to be Europe (Amsterdam & London are eminently reasonable choices), Australia or New Zealand, Japan, and Singapore or Hong Kong. It looks like the European ISPs have been peering reasonably well and are all under 50ms from London or Amsterdam. AU and NZ are ~30ms apart (Sydney from Auckland), as are Singapore and Hong Kong.
  4. Extra clusters: As needed, you can deploy in Brazil (which won’t help other South American traffic), South Africa (which won’t help other African traffic), India, or Israel (for Middle East acceleration).

More later on how to expand into additional points of presence at a low cost.

Author: dweekly

I like to start things. :)

8 thoughts on “Mapping the Internet”

  1. Regarding the Amsterdam packets routing through Palo Alto – you may want to present your findings to PBwiki’s network connectivity providers. Such routing may be indicative of faulty BGP setups somewhere along their edge routers.

  2. Paul – thanks for the comment. PBwiki’s network was not involved in this case, however. A virtual machine located in Amsterdam and not on PBwiki’s production network traced connections to several machines located in India which are also not on PBwiki’s production network. It’s possible that there may have been some temporary BGP issues involved, but I find it more likely that the global network is still dealing with the FLAG fiber cuts. http://www.dailywireless.org/2008/01/30/oceanic-fiber-cut/

  3. As a further followup, I’m currently in a suburb of Amsterdam at a conference.

    A traceroute to the State Bank of India (sbi.co.in) is 27 hops, 358ms RTT, and routes Amsterdam->London->NYC->San Jose->Singapore->India.

    Then again, a traceroute to Reliance (www.rcom.co.in) is only 13 hops, 159ms and routes from London directly to Mumbai over Flag Telecom (n.b. a wholly owned subsidiary of Reliance telecom), showing that not everyone has bad routes. Then again, if one of the global network infrastructure leaders didn’t have a good route to their website in India, who would?

    As an interesting sidenote, it seems that some of India’s largest corporations do their hosting in the US. Tata and Aditya Birla both even host with the same SF ISP, Cybercon.

Comments are closed.