How to design good URLs
Friday, July 27th, 2007One of the overlooked areas of the web is good URL design and maintenance. While a lot of effort is spent on site development, search engine optimization (SEO), generating traffic etc., little or no effort is taken to design and maintain the URLs . Poorly managed URLs result in several broken links as a website ages - a process known as link rot. This can result in lower search engine rankings, poor website reputation, poor user experience and lost business.
New technologies, SEO, security, usability - all contribute to the need to modify the URLs. However, a bit of planning will go a long way in improving the lifespan of URLs. This article looks at design practices that can help prevent link rot.
1. A URL should contain only WHAT, not HOW
A URL is an interface to a resource or a service provided by a web site. Like in any good design, an interface should be separated from the implementation, enabling one to change the implementation without affecting the users - on the web, that means anyone who has linked to a URL on a website. The web is littered with URLs that violates this principle. Several of them depend on a particular software, directory structure, taxonomy or filename extension. All of these are prone to change and will result it broken URLs, when that happens.
For example, a URL http://www.example.com/cgi-bin/catalog/list.pl ties it to a technology and a directory - a cgi perl script under catalog directory. The better approach is to use generic URLs like http://www.example.com/catalog/list and then map it to the current implementation.
Even a URL like http://www.example.com/homepage.html is poor design, since that ties it down to a markup language (however popular that may be) and to a static page.
2. Keep URLs simple
The URLs should be kept clean and simple, even if they don’t have any implementation details - for reasons of security and usability.
If the users are informed that the secured areas for a website will always be a single point of entry, identified by a simple, easy-to-remember URL like https://www.example.com/login , they are less likely to spoofed, compared to cluttered URLs like: https://signin.example.com/ws/login?SignIn;ru=http://www.example.com/
F_trksid=Dm37&_trksid=m37
3. Optimize URLs for search engines:
Search engines are known for their dislike of dynamic URLs. So, a URL like http://www.example.com/pages/2008/07/20 is more search engine friendly than http://www.example.com/pages?year=2008&month=07&date=20.
Setting up search engine friendly URLs this way, will minimize the need to make changes in the future.
4. Use POST, not GET for a service
GET is meant for safe operations like a lookup, a query or a read. Any URL that performs a service (an action) that changes the state on the server should be behind a POST request. A classic example is placing an order. Failure to do this can result in unintended actions and is a potential security risk. A malicious website may hide the URL behind an image or a javascript, leading users to execute actions without them never intending to do so.
Another consequence of this design was the fiasco when Google released Google Web Accelerator (GWA) .GWA pre-fetches the pages on behalf of the user and some of the pre-fetching, resulted in actions without the user being aware of it. Search engine spiders also navigates the links, but is less damaging than GWA pre-fetching, since GWA operates as a proxy for the user and so, gets behind password protected pages.
5. Use the Post-Redirect-Get (PRG) pattern for posts.
When a user submits a form using POST, he or she may repeat the action unintentionally, by refreshing the browser, or by using the back and then forward button. Since POST changes the state on the server, these actions by the user will result in double submits and hence, unintended results.
This problem can be avoided by redirecting the POST request to a GET request to display the results of the POST. The PRG pattern is explained in detail, in this article.
Redirecting to GET after POST also lets users bookmark full URLs. Displaying the results immediately following a POST, lets users bookmark invalid URLs (a POST action without form parameters) resulting in poor response pages.
A related point is, URL redirection when a user tries to access a secured resource. After the request is intercepted and the user validated, the website should redirect the user to the requested resource and not to the default login landing page.
6. Secure all the required pages.
While almost every website uses HTTPS for secure form submissions, an often overlooked point is non-HTTPS form pages (though they maybe submitting to HTTPS), often in the name of performance. This, however, can lead to man-in-the-middle attack by rewriting the form page itself.
Another bad practice is mixing of http and https content, which can also lead to man-in-the-middle attacks.
7. Handle moved and deleted resources
In spite of all precautions, there will be the need to move or delete resources from a website. The best way to handle moved resources is using the search engine friendly 301 Redirect (Moved permanently), and not using META refresh tag. For resources that needs to be deleted, it is best to update the page with the reason why it has been done so, and maybe point the user to other related or useful resources. A published URL, should, never go to a generic 404 page.
Moved resources, handled incorrectly, can cause a 404 error at best and stale information at worst. A good example can be found on this W3C “Hall of Flame” story , where a web page kept track of school closings due to snow. Later the website create another page for tracking the same information, leaving the original page orphaned. The result ? Any links that pointed to the original page contained stale information.
8. Setup error pages for invalid URLs and application errors
This is for the “catch all” scenario. Users may intentionally or otherwise link to or type in URLs which do not exist. A website should be able to gracefully handle such a sceneario. A good 404 error page explaining that the resource was not found with navigation links and/or a search box is a good design. For web applications, a 500 error page also need to be setup, instead of throwing back the php code or database information or an “Internal Server Error” to the browser. Like some of the other issues, displaying the details of the application is not only a usability issue, but a security risk as well.
References:
W3C style guideline for URI design.
URIs, Addressability, and the use of HTTP GET and POST
Redirect after post (PRG pattern)
TLS and SSL in the real world







