The Easy Way To Gather Web Statistics
Part 1 of the series Resurrecting a Dead Site
Motivation
When I started Control-Escape as a resource for the Linux hobbyist, it was partially from self-interest and partially from a desire to help others like myself. Eventually, though, I got comfortable using Linux on a daily basis, and many companies grew up to provide support around Linux systems, and I moved on to other things.
I had been hauling Control-Escape around from server to server, experimenting with various web technologies and basically treating it as a personal toy, without updating it or improving it, for a few years. I started to ask myself, is this site even useful to anyone anymore? And if it is, shouldn't I put some work into it to make it usable and timely again? The answer to my second question was obviously "yes", but how to answer the first?
Statistics and Analytics
What I needed was what web junkies call a "stats package", or in modern enterprise parlance, web analytics. Some software that would tell me whether anyone was visiting my site, and if they were, who were they and what did they want? A quick search on Google turned up some hideously expensive "enterprise solutions" (I cannot believe anyone pays that much money for web statistics, it's absurd), and a few open-source packages. You can guess which way I went.
I evaluated several packages, and if this is a problem you need to solve, I recommend you perform your own evaluations, since everyone's needs are different. Some of the programs I looked at include Webalizer, Analog, and AWStats. Ultimately, all of these programs just seemed like too much work for me. Dealing with rotating and archiving log files, making a cron job to process the logs at the correct time, making the reports available while keeping my web site secure, all this just to find out whether I even have a web site or just a collection of files that nobody looks at.
Instead I decided to go in another direction. Google had recently purchased an analytics company, Urchin, and started providing an analytics service for its Adwords customers. When I heard that Google had opened the service to all comers for free, branding it Google Analytics, I had to check it out.
Google Analytics works by inserting a small javascript tag into your site. When the client downloads the javascript and executes it, the script reports back to Google's servers all kinds of interesting information about the client. Of course, you have to get the javascript tag into every page, and if you maintain all your HTML files manually that might be tricky. Fortunately, I was using HTML::Mason to compose my pages at request time using templates (one of the many server-side technologies I have tried), so it was easy to make one change to the template that effected all pages.
Google Analytics is a pure javascript solution. This means on the one hand that you can collect more and richer information on the client, but on the other hand clients with javascript disabled will not be recorded at all, nor will impressions of images and other non-HTML files. The down side is that if you expect to have a lot of legitimate visitors who have javascript disabled, or if you need to track things other than page views, your statistics will be skewed. The up side is that it's really easy. Google hosts the reporting interface on their servers, and you don't have to deal with log files or cron jobs at all.
I decided to take the easy way out. I installed the Google Analytics script, and within hours I was seeing results. Not only did I discover that I was getting 300-400 page views each day, but I also found that 85% of my traffic comes from Google, because I have the number one search result for a couple of key phrases. I had my answer: the site did indeed have some value. The next step was to do something about it.
But that's another article.

