[ The PC Guide ] The PC Guide - Web Robot Policy The PC Guide is a very large web site: the main part of the site contains over 3,000 content pages, and the burgeoning Discussion Forums currently contain over 10,000 topics with 60,000 posts and growing every day. This material represents a wealth of information that is provided for free to anyone who has access to the Internet, subject to the terms of use specified in the Site Guide. Most people are content to use the site on the Internet in the "normal" way in which it was originally intended: you go to a page, read the material there, and then go to the next page when you are done. Unfortunately, some people, seeing the large amount of material on the site, decide to use automated programs to download either the entire site or large sections of it. These programs go by different names such as web robots, spiders, or offline browsers. They work by having the program download one page, examine it for links, download all the linked pages and then repeat the process on those pages. The problem with these web robots is that they generate a tremendous number of requests to The PC Guide server. Worse, some web robots are poorly written and generate thousands of spurious requests, and some people run them overnight where they chug away for hours. This is especially problematic when someone tries to download the Discussion Forums, since each page requires a CGI script and has dozens of links. When web robots are used in this manner, they monopolize the server, slowing down or even locking out other users. I have seen a single user's web robot responsible for almost half the page requests of the entire site over a period of time. The end result is usually that other people get errors trying to use the site, and sometimes the whole server must be shut down and restarted. It is simply not acceptable for a handful of people to make the site difficult or impossible to use for thousands of others. For this reason, the use of web robots is prohibited on The PC Guide. I mentioned this in the Statement on Copyright and Submissions, and also on the Topic Index, but the programs continue to be used and are causing real problems for me. So, I decided to explain the issue in more detail and put a prominent link to this page from many areas of the site. The bottom line is that for the sake of the site as a whole, I cannot tolerate single users generating thousands of requests an hour to the server. If you really want to view the entire site offline (except for the forums) the Disk Edition is an inexpensive option that you may want to consider. Otherwise, I would ask that everyone please give the server a break and just use the site manually, one page at a time. If you want to download a section for offline use that's fine, but please only download a few pages at a time. Finally, and I only write this reluctantly, but in the spirit of full disclosure I must say the following to those who may wish to ignore what I am saying here. All page requests to the server are logged, and those who use web robots or in any other way degrade the usability of the site are subject to being denied access to the server in the future. I have already had to have the nice folks at pair Networks (who host the site) firewall several IP addresses that were attempting to grab the whole site with dozens of simultaneous requests, and have even had to contact ISPs in the past to get this behavior to stop. I really hope I won't have to do it again. Thanks for your understanding and cooperation.
|