How to Maintain your Blog’s URL Consistency

The problem

URL (also known as web address) is an abbreviation of Uniform Resource Locator. It is a string that constitutes a reference to a web resource.

The following URLs may lead to same web page:

This is a bad practice. Each reference to the same resource (web page) must use an identical URL.

If more than one URLs lead to the same page, it is possible to

  • Duplicate Google (or other search engines) index (increased possibility of split page rank)
  • Lose Facebook likes, Tweets and similar social media rankings
  • Lose Disqus (or similar services) comments

This situation may affect the functionality of any service which uses the URL to identify a web page (resource). That’s why URL Consistency is so important.

The solution

It is a complex problem and possible solutions vary case by case. Here are some available solutions:

  • Use a single Canonical HostName
  • Strip unwanted query strings from incoming URLs

Use a single Canonical HostName

Most websites response to hostname either contains www or not. That is right. However, it is recommended to redirect www to non-www hostname or the opposite. Which one to select? There are arguments for each choice. See and

  • redirect non-WWW to WWW: google, bing, baidu, qq, amazon, alexa, youtube, wikipedia, blogger, reddit, mozilla, facebook, linkedin, stumbleupon, microsoft, apple, tumblr, paypal, bbc
  • redirect WWW to non-WWW: twitter, wordpress, vimeo, github, jquery, sourceforge, pinterest, instagram, delicious

Actually, you can select anyone you prefer, but to have to use it permanently.

I prefer the non-WWW to WWW redirection. Here is how non-WWW redirected to WWW using Apache configuration files (in Debian):

Except of main configuration file, which looks like:

        DocumentRoot /var/www/

another configuration file is created:

nano /etc/apache2/sites-available/

with the following content:

        Redirect /

If you don’t want to directly change Apache configuartion, you may use mod_rewrite. In order to redirect non-WWW to WWW, create an .htaccess file in the server root with the following content:

RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/?(.*)$1 [L,R,NE]

Strip unwanted query strings from incoming URLs

A query string is the part of a URL that contains data to be passed to web applications such as CGI programs. For example:

The part of URL after the question mark (“?”) is the Query String (id=1&category=sales).

Some websites (among them some great services as Linkedin and Feedburner) include query strings to incoming URLs of your website for tracking purposes (like “?goback=” etc). In most cases these query strings considered “unwanted” and could be stripped.

Here is a solution for Apache web server. Similar solutions are available for Microsoft IIS and NGINX web servers.

I use an .htaccess file in the server root with the following content:

RewriteEngine On
RewriteCond %{QUERY_STRING} !=""
RewriteCond %{REQUEST_URI} !^/search.*
RewriteCond %{REQUEST_URI} !^/wiki.*
RewriteCond %{REQUEST_URI} !^/bbs.*
RewriteCond %{REQUEST_URI} !^/admin.*
RewriteRule ^(.*)$ /$1? [R=301,L]

  • Line 2: If query string exists
  • Line 3-6: Exclude directories search, wiki, bbs, admin
  • Line 7: Remove query string

Using RewriteCond you can exclude any QUERY_STRING or REQUEST_URI, according to your needs. Of course, we will never strip query strings, we are using in our website.

WARNING: There are no universally valid solutions. You should read carefully Apache mod_rewrite documentation and create .htaccess according to your own environment.

For example, WordPress users might need the following lines:

RewriteCond %{QUERY_STRING} !^p=.*
RewriteCond %{REQUEST_URI} !^/wp-admin.*

  • Line 1: allow post tempalinks
  • Line 2: Exclude admin directory

Use simple Feedburner URLs

If you select Feedburner to track detailed statistics for your feed, the URLs to your website contains query strings like utm_source and &utm_medium). FeedBurner URL seems like…

In order Feedburner URLs to be exactly as your site URLs, navigate to the Analyze tab, click on Configure Stats and deselect checkbox for Item link clicks as follows:

Share your experience with other web servers (e.g. Microsoft IIS). Do you prefer WWW or non-WWW? Leave us a comment.