Session IDs
Even though the search engines can now index parameters, there is one type of parameter they avoid – session IDs. As you may know, the Web is a stateless environment. This means that each and every request for a page from a web site is treated as a new connection to the site. Basic HTTP/HTML has no built-in knowledge of previous items added to shopping carts, previous successful login attempts, etc. Session IDs are one way a site keeps track of the state of a particular connection. By assigning your browser a session ID, the site can remember what was in your shopping cart, or any other information it may want to store.
A session ID looks like this:
http://www.mysite.com/shoppingcart.asp?ID=C1537D0AECA6606406D7D3A
Though the parameter name can change, it generally has "id" in it, e.g., oscid, phpid, etc. The parameter value is dynamically generated and will vary, but generally consists of over ten characters in the range a-z, A-Z and 0-9.
Why Are Session IDs a Problem?
The search engines are now indexing pages with session IDs; but with the problem of duplicate pages, it can be a detriment to themselves and to the site owner.
Session IDs preserve state. Basic HTTP is stateless, meaning there is no connection between the current page requested and the next one you might request. By preserving state, the web site will know what you did previously, such as add something to the shopping cart. If the state contains sensitive data, such as a credit card number associated with your shopping cart, then any visitor using the URL with that session ID will be able to see your credit card details.
Another problem can be sharing state. All visitors from the search engine will share the same state, so when one adds something to a shopping cart, all users using that session ID will see that item in their cart.
Further problems appear for the spiders when they index the same page with different session IDs – they could end up filling their search index with thousands of copies of the same page with different session IDs. Not only is this wasted disk space for the search engines, it also wastes the site owners’ server resources and bandwidth.
Comments