Blocking odd bots from Search.aspx and Category.aspx

For general questions and discussions specific to the AbleCommerce GOLD ASP.Net shopping cart software.
Post Reply
User avatar
efficiondave
Commander (CMDR)
Commander (CMDR)
Posts: 151
Joined: Tue Dec 02, 2008 10:20 am
Location: St. Louis Missouri
Contact:

Blocking odd bots from Search.aspx and Category.aspx

Post by efficiondave » Thu Nov 19, 2015 1:13 pm

In tracking down a serious performance issue that resulted from highly aggressive bots, I noticed the majority of pageviews come from odd bots that are nailing the search.aspx and category.aspx pages with all sorts of weird queries.

Are there recommended AbleCommerce best practices for the Robots.txt file to reduce server load and errors that result from these bots. I definitely want to let Google, Bing, and Yahoo crawl as needed but want to reduce abuse by malicious bots.

Sample Bots that made the majority of my page views:
Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+)
Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)
AppEngine-Google; (+http://code.google.com/appengine; appid: s~skawa-easyling)
Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)


Thanks,
David

User avatar
efficiondave
Commander (CMDR)
Commander (CMDR)
Posts: 151
Joined: Tue Dec 02, 2008 10:20 am
Location: St. Louis Missouri
Contact:

Re: Blocking odd bots from Search.aspx and Category.aspx

Post by efficiondave » Thu Nov 19, 2015 5:24 pm

I've been looking into this a bit and the following seem to work:
If you are having intermittent performance issues, you'll want to look at your ac_PageViews table. It records all pages that have been served with the ActivityTime, IP, UserAgent, UriQuery and more. If you are current having issues, run this query and look at the UserAgent field to see the culprit:

SELECT top 100 * from ac_PageViews ORDER BY ActivityDate DESC

To find the worst offenders over time, run this query:

SELECT Count(UserAgent) totalviews, UserAgent FROM ac_PageViews GROUP BY UserAgent HAVING Count(UserAgent) > 500 ORDER By TotalViews DESC

You should expect to see GoogleBot, bingbot, Yahoo! Slurp, and other major search engines in there and they are fine (though you can control the frequency of visits in Robots.txt if needed but proceed with caution).

1.) For the worst offenders, block their IP address using a Firewall. You can use Windows Firewall to do this. To see which IPs they are using run this:

SELECT Count(UserAgent) UserAgentCount, UserAgent, Count(RemoteIP) ipCount, RemoteIP FROM ac_PageViews
GROUP BY UserAgent, RemoteIP HAVING Count(UserAgent) > 500
ORDER By UserAgentCount DESC

2.) You can also try blocking them by User Agent using IIS's Request Blocking functionality as described here:
https://moz.com/ugc/blocking-bots-based-on-useragent

Hope this helps someone else. I was at a loss on how to track down the problem when I noticed severe site performance issues with an AbleCommerce site.

User avatar
efficiondave
Commander (CMDR)
Commander (CMDR)
Posts: 151
Joined: Tue Dec 02, 2008 10:20 am
Location: St. Louis Missouri
Contact:

Re: Blocking odd bots from Search.aspx and Category.aspx

Post by efficiondave » Thu Nov 19, 2015 5:52 pm

I should also note you should probably only worry about dealing with unknown bots logging 10,000 plus hits in a day. The main offender that forced me to look into this logged 40,000 page views in a 3 hour period and that caused my server to crawl.

User avatar
AbleMods
Master Yoda
Master Yoda
Posts: 5170
Joined: Wed Sep 26, 2007 5:47 am
Location: Fort Myers, Florida USA

Re: Blocking odd bots from Search.aspx and Category.aspx

Post by AbleMods » Fri Nov 20, 2015 3:47 am

Hey Dave, check your ac_Users table too. It could be bloated now.

I ran into this a few months back. The bot wasn't persisting the Able cookie between hits. So we were seeing 200,000+ new anonymous user records daily.

After three weeks, we had to clear 6,000,000+ user records - that took a while :)

The bot wasn't respecting robots.txt at all. And bots can easily change useragent strings. We wound up blocking the bot in IIS itself. It's a tough battle to fight with no simple long-term solution....
Joe Payne
AbleCommerce Custom Programming and Modules http://www.AbleMods.com/
AbleCommerce Hosting http://www.AbleModsHosting.com/
Precise Fishing and Hunting Time Tables http://www.Solunar.com

User avatar
efficiondave
Commander (CMDR)
Commander (CMDR)
Posts: 151
Joined: Tue Dec 02, 2008 10:20 am
Location: St. Louis Missouri
Contact:

Re: Blocking odd bots from Search.aspx and Category.aspx

Post by efficiondave » Fri Nov 20, 2015 7:01 am

Thanks Joe. 57,000 anonymous users created yesterday. Ugh.

How did you block the bot with IIS?
Last edited by efficiondave on Fri Nov 20, 2015 7:27 am, edited 1 time in total.

User avatar
AbleMods
Master Yoda
Master Yoda
Posts: 5170
Joined: Wed Sep 26, 2007 5:47 am
Location: Fort Myers, Florida USA

Re: Blocking odd bots from Search.aspx and Category.aspx

Post by AbleMods » Fri Nov 20, 2015 7:09 am

I did a reverse IP lookup and determined the bot was from a security scan company. I then looked through their support pages and found the IP blocks they use.

Then I added those blocks to the IP restrictions on the website in IIS.

Rogue/illegal/DDOS bots are much tougher to block - they could be coming from anywhere. In most cases, the source IP resolves to an obscure third-world country of which the client will never, ever do business with. So I either add the country to the GeoIP filter in the client firewall (if it supports it), or look up that country's assigned IP blocks and block them manually.

Ugly way to do it I know, but it is what it is. It's easiest if the client is hosted in my hosting business because my firewalls support GeoIP filters. I just add a rule "Block everything from Crotia going to www.clientsite.com" and that's it.

Works great on cutting down email spam too....
Joe Payne
AbleCommerce Custom Programming and Modules http://www.AbleMods.com/
AbleCommerce Hosting http://www.AbleModsHosting.com/
Precise Fishing and Hunting Time Tables http://www.Solunar.com

fiddycent
Lieutenant, Jr. Grade (LT JG)
Lieutenant, Jr. Grade (LT JG)
Posts: 45
Joined: Tue Sep 03, 2013 12:30 pm

Re: Blocking odd bots from Search.aspx and Category.aspx

Post by fiddycent » Thu Apr 07, 2016 10:14 am

I am just setting up a new site and running into this similar issue. It is not on the same scale as the others on this thread (2000 anonymous users, wishlists, and pageviews per day) but still very annoying. A few questions I have:

1. Is the only way to block these to filter by IP address ranges or use blocking at the IIS level? Or would configuring the robot.txt file help? I noticed the IP address varies from time to time so I'm not sure how effective the IP blocking would be.
2. If I do need to configure the robot.txt file, I am not sure where to start. Anyone can provide me with a sample suited for an Ablecommerce site?
3. If I block the Google crawler by IP address, does this prevent my pages from being indexed by Google? I definitely want my pages to show up in search results on Google and other major search engines. I just don't want it to be at the expense of the website performance. Any suggestions are appreciated.

User avatar
AbleMods
Master Yoda
Master Yoda
Posts: 5170
Joined: Wed Sep 26, 2007 5:47 am
Location: Fort Myers, Florida USA

Re: Blocking odd bots from Search.aspx and Category.aspx

Post by AbleMods » Thu Apr 07, 2016 10:41 am

Yes IP blocking is the only effective way to stop rogue bots. Takes a good quality firewall that handles GeoIP blocking to do it easily, otherwise it's a major hassle every time.

A rogue bot won't respect the robots.txt file, so you're just chasing ghosts if you go down that route. Robots.txt is not enforced by the server. It's simply a standard adopted by the (legitimate) crawler industry.

Definitely don't want to block Google. If your site performance suffers because Google is crawling your site, you need a better hosting provider.

2,000 anonymous user records a day is a drop in the bucket. Worry about it when you're getting 200,000 a day.
Joe Payne
AbleCommerce Custom Programming and Modules http://www.AbleMods.com/
AbleCommerce Hosting http://www.AbleModsHosting.com/
Precise Fishing and Hunting Time Tables http://www.Solunar.com

User avatar
efficiondave
Commander (CMDR)
Commander (CMDR)
Posts: 151
Joined: Tue Dec 02, 2008 10:20 am
Location: St. Louis Missouri
Contact:

Re: Blocking odd bots from Search.aspx and Category.aspx

Post by efficiondave » Thu Sep 01, 2016 4:56 am

Yesterday I noticed my server had overly high utilization (sustained 90%+ CPU) and in running the queries I specified above I saw BingBot and Yandex were running lots of odd queries against Search.aspx. I looked in robots.txt and saw that search.aspx was blocked but not Search.aspx (capital S). I added it and now things are much better.

Post Reply