URLs Should be Easy

In the first of what I hope to be many posts on the topic of computer usability for the masses ™, tonight I wish to address my concerns with the web and URLs.

The web has been around for 10+ years. Heck, I’ve been on the web for a good 10+ years. It’s hard to imagine really. Things have changed. I remember a time when getting some new plugin to work with Netscape 2 was a pain. On a 14.4 Kbps modem, it was never fun downloading the latest Real Player (in whatever naming incarnation it came in at the time) so I could get a sneak peak at some random stream I decided to watch/listen to. More often than not, it turned out to be a big waste of time.

One thing that hasn’t changed is the bad use of URLs. Back in the day, there were but a few web document (and I italicise that for a reason; more in a minute) file extensions: (s)htm(l), pl, cgi, and perhaps a few others. Today there are dozens. We have php, asp, aspx, pl, py, etc. And the plethora of other document types that people are accessing via the web these days, such as pdf. But, as I said, I’m talking about web document types. I’m really unconcerned about anything that is not rendered directly by your browser which isn’t a media type. So let’s forget about images, movies, sound files, style sheets, and pdfs. All I’m talking about herein are pages that are rendered in one way or another as html by your web browser.

These web documents are essentially the root URL one would use to access a given location. All the media files that come with it are rather extraneous to the end-users cause and they shouldn’t have to know about them (which they don’t — it’s a good thing).

So why are so many URLs utterly unreadable, or at least painfully typed, by humans? Of course we have things like bookmarks and search engines to help us having to avoid typing them in. I’m not trying to advocate that we should type them in — but merely that we should be able to.

How many times have you read an article on some news site — perhaps a newspaper, perhaps a techie site — and the document URL contains some combination of indecipherable numbers, commas, and maybe even a shortened page title? To make matters worse, these often are followed by a “.aspx” extension or something else. Or pages that have seemingly simple URLs but are followed by complicated HTTP GET querystrings that are longer than the width of your browser allows for display.

The good news about modern web servers is support for URL rewritting, which is a method for a web server to take a clean URL and convert it into something more ugly that the web server’s file system can understand.

For example, to view my blog archives from April of this year, the clean URL of http://www.realbt.com/content/post/2006/04/exists. But this isn’t a real path in my web server’s file system. Instead, this is converted to something like http://www.realbt.com/content/wp-archives.php?year=2006&month=04.

Which do you prefer to look at?

In my mind, URLs should be short and readable. We should be able to type them in. No document file type (html, php, asp, etc.) is necessary, and the only non-alphanumeric characters should be hyphens and slashes — maybe an underscore or two is okay if absolutely necessary. (But hyphens are easier to type.)

So to all those mass media web sites — get your act together. I’m not going to enumerate all the possible technical solutions to the problem of matching pretty names to ugly ones because there are simply too many. Anybody moderately familiar with their web server should be able to figure it out.

Another example.

A Macleans story about Harper’s new government has this lovely URL: http://www.macleans.ca/topstories/politics/article.jsp?content=20060417_125314_125314. This could be converted to simply http://www.macleans.ca/topstories/politics/article/20060417_125314_125314. Even though the following number is relatively meaningless, it is still more human-friendly to look at. Long argument lists and document types are simply not necessary for humans to see.

Most URLs of this form translate into some sort of key that is used by a script or application running on the web server to look something up (usually in a database). These are easily simplified. But what about web sites that are purely static. Allow me to use an example.

xyzpaving.com is a paving company. Their web site is structured as follows:

/index.html: the main page
/about.html: about

/services/index.html: the main services page
/services/blacktop.html: some page about blacktop paving
/services/cement.html: some page about cement paving

… you get the idea. The URLs are certainly concise and human readable. And anybody could type them in. Let’s say in 2 years time they decide to change something about their site that requires they move their pages to being php files. Suddenly all their URLs change, broken links emerge everywhere. Anyone who links to a specific page will have to update their URLs. And search engines will be temporarily broken.

This is another reason I advocate for dropping file extensions on web documents (as defined above). This should make the URL structure look like:

/: the main page
/about: about

/services: the main services page
/services/blacktop: some page about blacktop paving
/services/cement: some page about cement paving

Note that index entries (default documents for folders) do not need to have “index” referenced. Not only are these cleaner, but it also provides a simple abstraction layer in the case document types are changed and extensions are changed. It will all happen transparently, and no one will be the wiser. This whole process is easily accomplished in modern web servers with a line or two in a configuration file. Nothing complicated necessary. What a simple solution.

Comments are closed.