This is a geeky post but I feel the need to give back to the community and help others who might stumble on a similar issue. Regular readers can feel free to skip this. Tomorrow’s post will return to more normal topics.
After moving my website to Azure and switching to WordPress, I noticed that my site was running pretty slowly. It kept getting worse to the point where a lot of users were seeing timeouts and errors. Bumping up the website to run on a bigger machine helped temporarily but the dual core CPU was still pegged at about 85%. For a site that gets a few hundred hits per day, this seemed ridiculous.
I stumbled onto a great Azure feature called DaaS – Diagnostics as a Service for Azure Web Sites. That tool helped me identify that about a few clients were hitting a link that used to serve up data to the CascadeSkier applications. All of the clients should have been updated by the end of last year, and none of them should ever have been requesting that file 1000 times per minute like these were. I don’t know if a few of my apps had gone wacky or if this was something more malicious. Either way, I had to solve it from my end.
Simply hitting a file that doesn’t exist shouldn’t take that much CPU effort to respond to except that WordPress was configured at the root of the site. WordPress did a bunch of checks to determine that the URL was indeed invalid and then served up a fancy 404 page. To mitigate this, I checked in a very simple file that returned a blank page. This took the average CPU usage from 85% down to less than 5%! I was able to drop back down to the smaller single core machine and save money.
When I asked the Azure team about blocking specific IPs, they said that isn’t supported for the Azure Web Site product, but they do support blocking based on the number of concurrent requests and the number of requests in a period of time. I’ve set this up to help protect against potential related issues in the future.
The net result is that my little website is now consuming an appropriate amount of resources and the average time to serve a page has dropped dramatically. If you’ve been visiting the site over the last week, you can probably see the difference.