Blocked Web Browser User Agents
If you were automatically directed to this page when you tried to view another page, you're using a Web browser (or other software) that sends one of the following text strings as the "User-Agent":
- BuckyOHare / hypefactors.com
- DTS Agent
- gsa-crawler (M2-AMCWPFAKDA6AS, T2-B9E742J9WQSAB, and S5-KRWBRM63Y6JJT)
- heritrix (when followed by an invalid URL like “+http://firstname.lastname@example.org”)
- Missigua Locator
- Morfeus F*cking Scanner
These connections have been identified as "abusive" by our technical staff.
If you're a legitimate user (that is, if you're a normal human being who has been redirected to this page), please contact us and mention that you're being “blocked based on the HTTP User-Agent when connecting from IP address 126.96.36.199”.
On the rest of this page:
- Can a site owner override this restriction?
- What do you mean by "abusive"?
- Why is “MJ12bot” included?
- Why are other bots included?
Can a site owner override this restriction?
If you’re the owner of a site hosted with us, and you want to allow any connection with one of the above “User-Agent” strings to connect to your site anyway, create an empty file named .tigertech-dont-block-user-agents at the top level of your site. Note that the filename begins with a dot, and that dot must be included.
Doing this is not recommended because it may open your site to connections that cause high load, excess CPU usage, or outages on your site.
What do you mean by "abusive"?
The main reason we consider a connection abusive is that it attempts to load all pages on a site without automatically attempting to spread the load over a reasonable time period, and without automatically slowing down when it detects script-based pages that load slowly.
A well-written search engine spider/robot should spread the page requests over an extended period. For example, if you need to load 1000 pages from a site, those could be loaded over 24 hours, not over an hour.
It should also detect how long it takes to load a page, then "sleep" for at least ten times that period before loading a similar URL. This ensures that if a site uses script-based pages that consume large amounts of CPU time, the spider/robot won't increase the total site load by more than 10%.
In addition, a robot should never open more than one simultaneous connection to a particular site.
Finally, we also consider user agents abusive if they repeatedly try to index URLs that return 404 errors, 301 redirects, and so on.
Why is “MJ12bot” included?
MJ12bot claims to be a project to “spider the Web for the purpose of building a search engine”. The company that makes it asks volunteers to install the indexing software on their own computers, using the volunteers’ own bandwidth and CPU resources instead of the company’s.
The idea of a community-run search engine sounds great — however, the MJ12bot authors have not operated a search engine for many years. Instead, they use the information that people are generating to sell SEO services on a different site.
The MJ12bot software often also requests malformed URLs that generate “404 not found” errors, increasing CPU usage on WordPress sites.
Because of this, and because the MJ12bot software is often one of the primary causes of site slowdowns and CPU overage fees for our customers, we’ve blocked it from sites we host.
If you want to allow MJ12bot to index your site anyway, you can use use the trick described above: create an empty file named .tigertech-dont-block-user-agents at the top level of your site, which will bypass the restriction.
Why are other bots included?
Several other bots listed are also run by companies that sell SEO services or the like, including:
- BuckyOHare / hypefactors.com
These services are used by a tiny fraction (if any) of our customers, but the costs and slowdown caused by these bots affects everyone.
“Data mining for profit” bots are fundamentally different than search engine indexers like Googlebot. Search engine bots may send future visitors to a site, which benefits the site owner. It’s reasonable to allow the bots to consume site resources in exchange for that. Search engines also make money off the data, but it’s a symbiotic relationship where both parties get something.
But most data mining bots don’t provide any benefit to the site owner at all. It’s a parasitic relationship, not symbiotic. We don’t go out of our way looking for parasitic bots, but when one of them causes such abnormal resource consumption that it would lead to overage fees for our customers or affect the speed of a site, we block it. It’s not reasonable for our customers to incur expenses for something that won’t benefit them.
If you’re a customer of ours and you want to allow these bots to index your site despite this, you can use use the trick described above: Create an empty file named .tigertech-dont-block-user-agents at the top level of your site, which will bypass the restriction.