Censored.

I found out from an anonymous email account that my website is being censored / blocked by the University of Central Florida, i.e., students at UCF can no longer read my site. I just sent the following letter to the UCF student government in response.

Hello. My name is David Weekly; I’m a student at Stanford University. I’ve just received an anonymous email from a UCF student telling me that my website has been censored by your institution, and after a cursory review that information seems to be correct. All Internet traffic between computers on my network and computers at UCF is blocked.

 

There are no illegal files to download on my website, nor does it consume an excessive amount of bandwidth. I do not have hate speech on my website nor do I incite people to disobey the law. I have absolutely no pornography, violent language, or any real quantity of epithets. In short, to the best of my knowledge, my website has never been deemed “offensive.”

 

They are censoring my website for the knowledge that it conveys. It is a pure Free Speech issue. Among other things my website describes how one could get around a Napster blockade such as the one at UCF, while at the same time including commentary as to a possible reason why one might not want to do this. This is most likely the reason for which my entire website (and indeed all of the computers on my network) have been blocked from the UCF network.

 

The key issue at hand is that UCF is a state-sponsored university. Therefore, the governance of Florida is suppressing the dissemination of free speech. If they can censor this information, why not information about unpopular ideas, too?

 

Just my $0.02,

David E. Weekly
Stanford University

As a sidenote, I’m kind of psyched. It’s one of the greatest honors I could
think of to be censored. It’s one thing to be told you’re an idiot or bright,
or that what one is doing is meaningless or full of purpose, and another thing
altogether to have someone who doesn’t agree with you disagree with you so
strongly and have such a belief that you could change things that they would
actively try and shut you up lest you actually succeed in making a difference.

The best compliments come from your enemies: the stronger their actions,
the more they respect you.


UPDATEMay 4, 2000:
It looks like they quietly unblocked the site as far as I can see (anyone
at UCF care to comment?). Nothing to see here, folks. Carry on, carry on.

Client as Server: A New Model

A new model is emerging from the Internet. It represents the culmination of years of incremental evolution in the structure of the network and the clients that feed upon it. It is based upon the same principles upon which the Internet was founded. It is this: the client is the server.

darpanet

The Internet was created as a distributed network. Originally conceived as the Defense Advanced Research Projects Agency’s network (DARPANET), it was to be able to withstand a nuclear attack from the Russians. There could be no single point of failure in the system, and in this was it had to be different from most other networks yet conceived.

People previously had grown to the notion that there must be one, central arbiter that oversees all transactions on a network: a Mainframe. This model has an obvious weakness: when the Mainframe goes down, the whole system is unusable. Then again, if there is only a singular important point of failure, you could pay some people a lot of money to sit there and fix problems as soon as they happened (and hopefully to insure that the problems never happen in the first place). Unfortunately, it’s difficult to do this with regards to a nuclear bomb. So a different model was needed.

DARPANET provided this model by removing the server. It’s sort of like taking a model where everyone hands their mail to the post office, having people pass a letter to a friend, who passes it to their friend, who passes it to the recipient. While at first this might seem a little odd or maybe even a little inefficient, it means that it would be a lot harder for someone to stop a flow of mail to you (or a flow of mail in general). Instead of simply bombing the post office, now they’ve got to assassinate each and every one of my friends to prevent me from getting mail. Going back to the real world, there would be no single point of failure where the Russians could bomb to take down our communications.

It was a revolutionary as strange way of thinking about things. To this day, some people don’t understand it and ask questions like “where is the server that runs the Internet?” or even “where is the Internet?” It’s hard to understand that every server that is on the Internet is a part of the Internet.

availability

These days, we are amidst an equally paradigmatic change which almost perfectly mirrors the first. Corporate servers, which distribute information and services to clients and participate in “e-business” need to not crash. Companies like eBay whose computers crash often get a bad name and lose billions of dollars on their company’s valuations, almost a worse fate than the actual millions of dollars in customer transactions that go out the door when servers die.

A quick fix is to employ a large number of servers configured exactly the same, such that if one goes down, traffic is quickly diverted to the others. Work is equally distributed amongst these servers by use of a “load balancer.” This solves a few problems, but what if your server cluster is in California and the network link from California to New Zealand is getting bogged down? While the long-term answer is to invest in a faster connection to New Zealand, the short-term way to solve this problem is to put a server cluster in New Zealand. This sort of rapid expansion can quickly get expensive to deploy and manage. Some bright kids from MIT figured this out a few years ago and cobbled together what is now one of the fastest growing companies out there: Akamai. (Hawaiian for “cool” if you’re wondering.)

Akamai has already gone through the trouble of buying several thousand servers and putting them in network closets all around the world. The idea being that you can spoon off delivery of the parts of your site that don’t change much (the pictures, the movies, etc.) to Akamai and they’ll take care of making sure that your readership can always quickly access your content. Cute idea. “Cool,” even.

Distributed services lead to higher data availability. The more machines that are distributing your content in the more places, the more people will be able to access your content quickly. It’s a straightforward idea. This notion of distributing work is also useful for distributing computation…

processing power

It’s expensive to build really fast chips. It’s really expensive. To make a chip twice as fast as today’s high-end consumer chips costs about ten times as much money. That’s largely because today’s consumer chips are, pretty much by definition, as fast as it is possible to make them and have them still be reasonably cheap. If this wasn’t the case, another company would have come along and made reasonably cheap, screaming fast processors and have swept the market away. But the tight competition in the chip manufacturing business has kept the “bang for the buck” ratio screaming up through the ceiling, much to the delight of consumers.

It’s important to note that making fast chips is expensive, because if I want ten times the processing power that comes in a top-of-the-line consumer PC, the best way to do that and save your money is not to buy a machine that’s ten times faster, it’s to buy ten top-of-the-line consumer PCs. People have understood this general concept for a long, long time: wire together a bunch of processors to get a very, very fast machine. It’s called “Massive MultiProcessing” (MMP) and is pretty much how all of the supercomputers of yore (and of today!) work.

The recent concept is that it’s possible to do this with off-the-shelf PCs. Recently, software (such as Beowulf) has been developed to make it very easy to make a cluster of PCs act like one very fast PC. Sites that previously had deployed very expensive custom supercomputer systems are actively investigating using massively distributed commodity hardware to serve their computing needs. That would be remarkable as-is, but this concept of distributing computing cycles has gone even further than clumps of commodity hardware: it’s gone into the home.

seti @ home

For roughly the last forty years, there has been a serious and conscientious effort to search for intelligent life in the universe by listening for patterns in radio transmissions. The process of analyzing incoming radio transmissions for patterns such as those that might be emitted by intelligent life forms is mind bogglingly difficult, requiring vast numbers of computations. While privately funded, the Search for ExtraTerrestrial Intelligence didn’t have enough money to process all of the data that they were getting. They did, however, have a sizeable fan base. (A number of people on this planet think it would be pretty cool / important to discover intelligent life out there.) So what did they do? They distributed the work.

Some clever programmers put together the software used for analyzing the data returned by the Arecibo antenna (the largest radio receiver on Earth), put some pretty graphics on it, got it to act as a screensaver, and put it on the web. Consequently, several hundred thousand people downloaded it and ran it as their screensaver. While they’re away from their computers, this pretty screensaver crunches through vast quantities of data, searching for patterns in the signals. The SETI project (as of this writing) in this way has a “virtual computer” that is computing 13.5 trillion floating-point operations per second, thanks to the people running the “screen saver.” Individual computers can be used to distribute mathematical work!

(I feel I should also mention distributed.net, which spends its time using people’s computing power to crack cryptography codes. Their “virtual computer” is currently cracking a 64-bit cipher known as RC4 at the rate of 130 billion keys per second.)

data services

So it’s now clear that it’s advantageous to distribute computation and the serving of data across as many computers as possible. We’ve seen how a few projects have distributed computation across end users, but what projects have distributed data services?

Napster is one of the first and best examples of end-users acting as distributed servers. When you install Napster, it asks you where your MP3 files are. You tell it and it goes out and makes a list of what MP3 files you have, how long each song is, and what quality the recording is. It then uploads this list (but not the songs) to a central server. In this way, the central server has a whole bunch of lists: it knows who has what music of everybody who is running Napster. You can ask the server who has got songs by Nirvana and then contact those other users (while your Beck tunes are possibly getting served to some Scandinavian with a predilection for American music). This model allows for information (in this case, MP3 files) to be rapidly and efficiently served to thousands of users.

The problem with it is both technical and legal: there is a single point of failure: Napster’s servers. While there is more than one server (the client asks a “meta-server” what server it should connect to), they are all owned by Napster. These servers unfortunately do not share their file lists between each other and as a result, you can only share files (and see the files of) others connected to the same server that you happen to have connected to. Napster is currently being sued by the RIAA for acting as a medium for distributing illegal MP3 files. While it is true that Napster can be easily used for illegally distributing MP3 files, they themselves don’t actually copy the bits for users: a bit more like acting as a Kinko’s that happens to be used by subversives than actually making MP3 copies themselves.

If you are a Napster user, you should be worried about this lawsuit, because if the RIAA succeeds, they will probably want to come and shut down Napster’s servers, thus theoretically shutting down the whole Napster network. In short order they could quickly close down any Napster clones due to the legal precedent that the anti-Napster case would set. Boom. Game over, no more illegal music.

Theoretically.

a virtual internet

The RIAA mentality is one and the same of the Russians of yesteryear: a desire to stop the flow of information through the network. The answer to the Russians is one and the same as the answer to the RIAA: a completely distributed system. If every client on the network was connected to a handful of other clients, each of which in turn connected to others like some apocalyptically enormous online incarnation of Amway, then every person could have some connection to other people through a chain of mutual acquaintances. It’s Six Degrees of Freedom. (There exists a theory that says that on average you know someone who knows someone who knows someone who knows someone who knows someone who knows anyone in the world. That is to say, you are about six degrees from every human on the planet.)

This is a “virtual Internet” of sorts where links are not physical (a wire from you to me) but logical (e.g., I know you). Data flows through this “web of friendship” in such a way as it looks like you are only talking with your friends, when really you are talking to your friends’ friends, and so forth.

gnutella

The same rebellious college hacker genius who created the fabulously popular MP3 player, WinAMP (and was subsequently bought out by America Online, now America Online-Time/Warner-Netscape-EMI-And-Everything-Else-But-Microsoft) happily hacked out a program that allows for the free exchange of just about any kind of file over such a peered network. Unfortunately, his bosses discovered it halfway through development and quietly tried to erase the fact that the renegade project had ever existed in the first place. The name of the program? Gnutella. (Named after the delicious chocolate spread, Nutella.)

Since there’s no central server around which Gnutella revolves, AOL’s shutdown of the project didn’t actually stop Gnutella from working. A growing user base of several thousand souls (myself included) uses the product on a daily basis to share files of all types, from music to movies, to programs. At last check, there were about 2200 people using it, sharing 1.5terabytes of information. Wow.

There’s no way to shut it down. There is no organization to sue to stop it. There is no server to unplug that would bring the network tumbling down. Because as long as at least two people are running the software, the network is up and running.

freenet

There exist even more advanced projects in the works that will build upon these notions to create an even more powerful incarnation of a peered network that incorporates notions of perfect anonymity, trust, secrecy, realtime communication, and even banking. Freenet is perhaps the furthest along in this, although it has a very long way to go as of this writing. If you’re interested you can read about my own scheme for a Secure + Anonymous File EXchange.

the future net

Akamai has shown that it is clearly advantageous to have content distributed by as many nodes as possible: companies are willing to pay good money to have their content on thousands of servers all over the world. Gnutella is showing that it is possible to create distributed networks that cannot be shutdown, even in the face of legal and technical opponents. Napster shows that such networks can become popular and that people are willing to act as servers. Seti@Home shows that people will even allow others to use their computing power for a “greater good.”

What is enabling this now? Well, computers are unsurprisingly getting faster every year. The average desktop that’s sold to Joe User for doing word processing, email, and web browsing can, when properly configured, deliver hundreds of thousands of email messages a day, serve millions of web pages, route Internet traffic for tens of thousands of users, or serve gigabytes of files a day. (Joe probably isn’t aware of this and will still kick it when Word takes five minutes to load.) His hard drive could store 100,000 websites each having ten or so pages, email for 1000 users and still have room for a few thousand of his favorite songs. Furthermore, if Joe has a DSL or a cable line to his house, he’s got a static IP (an address on the Internet that doesn’t change often, if at all), is almost always connected to the Internet, and is online at high speed.

In short, Average Joe’s computer resembles one of the best Internet servers of yesteryear.

If thousands of Joes end up running “community” applications like Gnutella, they can take advantage of their connectivity, disk space, and computing power. New “co-hosting” services will spring up like popcorn in the microwave. Here are a few possibilities in that direction:

the distributed future

Visualize for a moment, sending out your website into a collective ether, to be served by hosts around the world: if one computer goes down, others will spring up to serve it. Your page never goes down. Your friends send you email encrypted so only you can read it; it is stored on half a dozen of your friends’ computers, accessible to you from anywhere on the planet. All of this in exchange for setting aside a small chunk of your hard drive (100 megabytes or so) and a little bit of your bandwidth to serve web pages and people’s email. Any content that you consume (except for your personal email!) instantly is rebroadcast over the network: your computer in this way will help content flow to where it is popular.

requiem server

In the future, there will be no need for centralized services. All content will be available on this peered network. Strategies for “partial consumption” such as letting people read the first few paragraphs and charging for the story or hearing the low-quality song and charging for the audiophile version will be adopted along with anonymous payment schemes. It will be possible to send out intelligent agents to this network to search for books, music, or other merchandise: clients (such as Amazon.com, CDNow, eBay, your neighbor, etc.) that have a match for the merchandise will communicate with you through the peered network, preserving your anonymity. You will be able to make an anonymous payment (or merely a secure payment if you prefer) and your goods will be on their way to you. No more URLs. No more servers that crash, email that is unavailable, websites that you can’t get to, or data that you can’t find. It will truly be the end of the server as the line between what it means to be a “client” and a “server” on the network becoming increasingly blurred to the point of indistinguishability.

 


a footnote on wireless

It is worth pointing out that wireless Internet access may well become democratized as well. Since high-speed wireless Internet is taking so long to reach America, citizens may well do it themselves. Apple recently popularized the IEEE 802.11 standard for wireless Ethernet by including AirPort in the iBook. Wireless Ethernet cards are now available for PC and Mac desktops and portables. More exciting yet, people have been working on extending the range of the AirPort from a couple hundred feet to tens of miles. You can imagine now a future a few years away where one person every city block has got a base station and everyone else jacks in. Some have postulated this as being the ultimate incarnation of the communality and free spirit of the Internet.


ADDENDUM This article was posted on freshmeat and several readers posted very interesting comments (scroll to the bottom) on the essay. I also posted a reply to some of the comments there. Among other notes, Akamai means both smart and cool. =)

Why XML Will Fail

There has been a fair amount of hype surrounding XML, or the eXtensible Markup Language, over the past year and the hype is slowly but surely growing. I decided it would be a prudent thing to think about, and chewed it over. I came to the following conclusion: XML will fail before it has even started on its most basic premise: openly structured data.

Now careful here, before you take out your pitchforks and hot tar, I’m not saying that XML cannot be useful in a number of arenas. In the backend of websites and in certain limited academic exchanges it may prove the perfect sort of thing to use. But I don’t think that the web will move from HTML to XML.

Why? Because very few people with content want that content to be well-structured externally. One example that is given is of a recipe for cookies. The idea is that if you structure your recipe (author, title, ingedients, directions) then other people can parse that data in a useful way and you could automatically be entered in, say, a recipe database.

While that may sound very cool technologically, it’s a really dumb idea in the bigger picture. If you produce content, you do not want other people to automatically be able to take the content without the context! The majority of content-producing sites on the Net (e.g., Yahoo!news.com, and Suck, are all advertiser supported. If you could just grab out the author, date, and content from all of their articles and put together a virtual daily newspaper for yourself with none of their advertisements, then you have deprived them of their sole source of revenue. Since anybody in their right mind who could turn the advertising off would, the site would not have a way of sustaining itself and would be forced to shut down. More likely, the site would realize that making their data easy to export in the first place would be an awfully bad idea. As long as the data is hard to extract (but relatively easy to search for), there is a need to come to the page and see the design, the advertisements, the other articles, and all the site has to offer. It gives them a chance to create a website, a homestead, and a destination, not just small chunks of universally amalgamated data.

So what is the answer? I think it is something like slashdot, in which people point other people to neat and useful things going on on the Net (usually to URLs on other sites that are advertiser supported). In this way, people get access to information they find useful, the sites get plenty of traffic and advertising impressions, and everyone walks out happy.

It’s not just advertiser impressions, though. Copyright is important, too. Although I value openness, it is nice to know how and where your work is being used. XML would be the ultimate plagiarist’s tool. So I think that most people who are actually producing content, from the small scale, non-commercial folks (like me!) to the commercial information service sites, will resist a flow to structured data and XML. If websites don’t push over to XML, browsers will have no incentive to work hard to support it. It is an unimportant issue, like asking if the latest release of your product shipped with a Xhosa language pack. XML will be dead on its very premise: content producers do not want their content to be openly structured!

Why SDMI Will Fail

Hounded by open formats like MP3 that encourage free copying of music, record companies have been praying for an escape: a high-quality, secure digital audio format that could be sold to consumers everywhere online and without a risk of piracy. Unsatisfied with proprietary solutions offered by Liquid Audio or AT&T, they have forged to create a new standard that would be ultimately secure and powerful. They dubbed this the Secure Digital Music Initiative, or SDMI. SDMI was hailed as the future savior of the record industry. This was a Bad Move.

Why? It does seem perfectly reasonable for record companies to desire a way to make sure their billions of dollars of assets are protected from thieves. The answers are not obvious. One, digital intellectual property, especially digital audio, is insecure by its nature. Two, a little bit more obviously, good standards take a long time to iron out in a committee of multinational corporations.

The first point is a purely technical one, but one that is so straightforward that I wonder at the sheer ignorance of those promoting secure standards. When a computer program, sayRealPlayer or WinAMP, wants to play some music for the user, they have to send that data in raw form to the sound card. It is a trivial manner to write a piece of software that pretends that it is a sound card! This software can then capture all of the music that was intended for the actual sound card and store it away. In this manner, it would be possible to render any secure music format insecure and copyable.

Mind you, I’m not out of my mind on this. GoodNoise is a new Internet-based record label that signs artists and promotes them online, selling some portion of their repertoire on their website. One strange thing about GoodNoise is that none of the songs on their website are encrypted. A cursory glance might allow you to dismiss this as naivite or stupidity. A closer look at the company will reveal the astonishing fact that the vast majority of those in charge of the company came from Pretty Good Privacy (PGP), one of the world’s leading encryption companies! In other words, the people who know most about how to encrypt data are just about the only people not pushing to encrypt music!

The blunt truth of it all is that the software industry learned this lesson 15 years ago the hard way. A number of companies were out there spending millions of dollars and tens of man-months to protect their software only to find it cracked weeks after release. Software companies tried fancy manuals, cryptography, dongles…the works! to no avail. Most of these schemes did not deter the hackers but proved annoying to valid customers. The crackers were just excited by the prospect of “a difficult crack” as it gave more prestige to the one who managed to break the program’s codes. Software companies eventually gave up and decided to spend their money and time on making good software products instead of protecting their intellectual property. Today, less than 1% of software has any significant form of copy protection on it.

A second and equally practical reason why SDMI is going to fail is that it will not be here fast enough. When I talked with the chairman of a certain, large record label today, he was expecting SDMI to be out and finalized before June. Boy, did I have news for him! From multiple inside sources, the work on SDMI is going amazingly slow: even slower than the meetings on DVD. The tech companies are despairing at the persistent (and often conflicting!) desires of the labels to have an efficient, convenient, and infinitely secure format specced, ironed out, tested, debugged, and implemented in a matter of a few months.

The fact of the matter is, MP3 is here now. Yes, it is not as high quality. Yes, it has a reputation for piracy and illegal copying. But you know what? Tens of millions of consumers are using it today. Excellent software infrastructure has been developed for creating and distributing MP3s in streaming and downloading formats. Portable MP3 players are popping up everywhere and more exciting hardware is just around the corner. It is too late for SDMI. Even if SDMI were finalized in a year, which is being optimistic, the range of MP3 applications for digital audio encoding, distribution, and listening would prove far more compelling to the consumer than an investment in new technology for the explicit purpose of restricting their access to music. Microsoft, with a very capable format already released (read my review!) is going to have a very hard battle as is convincing the consumer to use MS Audio instead of MP3. SDMI wouldn’t stand hardly a chance now, much less any chance in a year.

Ultimately, it’s about the consumers. Right now, they have music in their ears. They’re clicking away, making playlists of their favorite 200 swing music songs or sending their favorite U2 single to their buddies online. You will have a very hard time taking these joys away from them. How about this for a radical change: why not work for your consumers instead of against them?

On Collaborative Filtering

A recent set of technologies have been devised to help websites learn
about their users and take intelligent actions accordingly. These
technologies, called “recommendation engines” or “collaborative
filtering,” examine a user’s past viewing habits and compare them with
other users who have similar interests. If your interests were found
to parallel another group of users, then the system could start making
suggestions: suppose you normally never listened to any country music,
but you liked bands X, Y, and Z, a whole lot. Now if a whole bunch of
other people who don’t normally like country and also like X, Y, and Z
suddenly all are listening to (and loving) this one country band, the
system might suggest it to you and be relatively confident that you’ll
like it.

This technology is neither brand new or obscure: Amazon.com uses it
extensively on their website to recommend books to buyers. Indeed,
Firefly applied recommendation engine technology to CD purchases on
the web many years ago. Unfortunately for them, they took too long to
license out their technology (they wanted to be the only people on the
‘Net with that technology) and were subsequently steamrolled over as
companies like Net Perceptions came to market swiftly with
sophisticated engines. Microsoft quietly picked Firefly off with what
was rumored to be a humiliatingly cheap acquisition.

But the fact that no online digital music providers have yet to openly
embrace this technology seems surprising to me: this technology is
absolutely key to the success of online audio. Why? Because new
publishing and distribution infrastructures will make it very easy for
artists to publish profusely on the web. Like the 500-channel
television, the diversity of content is appreciated, but the sheer
quantity of music on the Net could prove so overwhelming as to
discourage listeners (and potential buyers) from seeking out the music
they would enjoy. Techie geeks refer to this as “the signal-to-noise
ratio problem:” if you only hear one band you like (signal) for every
twenty you don’t (noise), you won’t want to spend your time poking
around for that one band.

The record industry had a fairly effective technique for increasing
the signal-to-noise ratio for music: the original point of those
practicing A&R, or “Artists and Repertoire,” at record labels was to
seek out the good bands that the majority of the population would
enjoy. But the Internet offers us what no A&R man could – the
potential for individuals to have access to the bands they love both
big and small, from all around the world. Recommendation engines make
this possible, and reduce the signal-to-noise ratio by presenting
music that, based on your prior listening tastes, you’re likely to
enjoy.

Ultimately, this obviates A&R: once somebody has heard a band that she
deems pleasant to listen to, it will be recommended to those of
similar taste – if they like it, it may get recommended to their
friends, etc. In this way, the popularity of music is decided on more
by the taste of the people than the marketing push of a major label. A
small folk artist in Oklahoma could become vastly popular in Northern
India; who knows? Everybody benefits from this technology: artists,
who get better exposure; consumers, who hear more music that they
enjoy; and sites, which have more satisfied customers than before.

To date, people have argued that online audio sites have not yet
adopted this technology because of a paucity of content: when there
are only 15 artists on a site, a recommendation engine is hardly
appropriate. But with the rising tide of acceptance of online
distribution, floods of artists have been flocking to centralized
music portals like eMusic, MP3.Com, and Audio Explosion. This newfound
influx has left the sites unable to provide tasteful experiences for
their users, leaving them instead awash in a flow of “exotic” (to
phrase it kindly) amateur music from around the world. They have
reached the size and maturity to move to collaborative filtering.

And move they must: the stakes are large in this brave new world and
the listeners plentiful. The winners will quickly adopt and manage
these new technologies, and those less nimble will be left wondering
why more people didn’t come listen to their 15,000 artists.