I seek to slightly redefine the purpose of this blog, from showing the meaning and significance of web traffic listings to also discussing my journey towards defining what I think is a valid way of measuring advertiser's value, and the way I have implemented it.

Sunday, October 15, 2006

Hunting for the Best web analysis Program

In this blog, I will be telling you about one of our client’s search for the best fitting Web Analysis Program for its website, which incidentally gets its primary income from adverts placed on the website because of its high traffic.


Background Information

Our client is a big newspaper company based in West Africa. They’ve had a website running for sometime (primary revenue is through advertisements), and recently decided that they wanted a complete overhaul of their website and change of their current web hosting and support service provider (due to unsatisfactory support services). After a lot of rigorous examinations, we (the company I work with) were finally chosen to take over from the existing hosting and support service provider, and also produce a fresh new look for their website.

The Launch

We didn’t have any problem with producing a fresh and better look and user interface for their website. We also didn’t have any issue with providing a wonderful hosting solution for their need (we use the services of hostmysite for our clients). When the site was launched, it was all smiles in all of the quarters. These are the marked differences in the new site:

Before

Server - Apache server

Content Management System - zope content management system

Web statistics analyzer - Webalizer

Now

Server - IIS (Dedicated Server)

Content Management System – Custom created content management solution

Default Web statistics analyzer - SmarterStats

The Problems

  1. Issues began when the figures generated by SmarterStats were way above what our client was accustomed to seeing with Webalizer, with some figures more than 50 times what is reported by Webalizer on the former site. (The design of the site warranted series of repeat requests from the server, which expanded the log files – generated 67GB of log in 3 months)
  2. Another issue was that our client’s closest competition employs the services of our client’s former web host and thus it web analysis tool - Webalizer, in reporting activities on their website, with their report being within the range of our client’s former traffic report as produced by Webalizer.
  3. And finally, there are some seemingly unexplainable figures produced by SmarterStats, like why there should be a record of about 147,000 hits, for an IP address that was recorded as having 0 visits (still baffles me).

The Search

Due to the above perceived and actual problems, our client was very reluctant to take the traffic report generated by SmarterStats into the market, thus the main source of income from their website was endangered. Then began the search for the best solution, first option: make the figures more like what they had before and remove problem 3 (above) from the reports generated. Option two: Change SmarterStats

Option one was the logical first choice, we promptly got into a reprogramming of the website, and got into contact with our web host for an explanation of the SmarterStats errors (or seeming errors), but no satisfactory answers were forthcoming. We tried to reach the owners of SmarterStats, but met a brick wall (no response). Thus we were left with option two, Change SmarterStats.

The question now was which is better, or even the best in the market? Urchin (now owned by google) was recommended and setup.

For those of you who have worked with Urchin, you know that you can run Urchin with or without its complementary javascript-based UTM (Urchin Traffic Manager). The advantage of working with the UTM is that you are given a more detailed report on visitors’ activities, than you can get without the UTM, and the 'disadvantage' is that the UTM does not track as many pages as the web server log records, and does not count .js pages as pageviews.

Thus when the Urchin software was first installed on our client’s website (without UTM), the figures generated were very similar to that produced by SmarterStats, and the suspicion of the validity of the figures persisted. However, after later configuring the UTM into the site, many things suddenly changed. Firstly, all the records the Urchin had generated prior to the installation of the UTM were changed, some slightly (hits), and others very profoundly (page views=0, sessions = 0). Secondly, some new figures generated by the report became ridiculously low (from about 200,000 page views to 45,000 page views a week), but the hits remained the same, thus producing such gigantic page view to hit and session to hit ratios. (I think Urchin takes its page views and session figures based on the input from the UTM, while generating its hits based on the input from the web server log files.

And still our client can not publish the figures generated by their web traffic analyzer.

After series of hotly debated issues during long management meetings, it was finally decided that Webalizer should be installed on the web server, but it didn’t work, Webalizer could not just manage the volume of the log files, and eventually, even it too was discarded.

Present Solution

The present solution unanimously agreed upon is to run Urchin (without UTM) and SmarterStats simultaneously. Thus we can objectively compare the reports generated by the two and thus be better informed about the true picture of the traffic on our client’s website, and also get a marketable convincing-on-first-glance report that can be advertised to potential customers.

Comments

It’s always better to run two or more web traffic analyzers simultaneously. And if your income is very much dependent on the traffic figures your website produces (as in the case of this our client), I recommend running at least one web log file analyzer, and two client based (JavaScript) web traffic analyzers (try Google analytics). You can not possibly have perfect software, but you can generate a near perfect result by pooling the strengths of different strong but imperfect soft wares, thus I would say make the best use of available resources!



Please, forward your comments to samizybiz@yahoo.co.uk

Friday, October 06, 2006

Indicators reported by most of web log analyzers

• Number of visits and number of unique visitors
• Visits duration and last visits
• Authenticated users, and last authenticated visits
• Days of week and rush hours
• Domains/countries of host's visitors
• Hosts list
• Most viewed, entry and exit pages
• Files type
• OS used
• Browsers used
• Robots
• Search engines, keyphrases and keywords used to find the analyzed web site
• HTTP errors


Let us evaluate some of the indicators reported by most web log analyzers


Hits

Hit - A hit is simply any request to the web server for any type of file. This can be an HTML page, an image (jpeg, gif, png, etc.), a sound clip, a cgi script, and many other file types. An HTML page can account for several hits: the page itself, each image on the page, and any embedded sound or video clips. Therefore, the number of hits a website receives is not a valid popularity gauge, but rather is an indication of server use and loading.
http://www.google.com/support/analytics/bin/answer.py?answer=27303

Hit - A request for any object or file that is on a web site. This could be an html page, a file or a graphic on a page. A request for a page can generate a lot of hits depending on how many sub-elements of files the page consists of. This is an indicator of web site traffic but not an indicator of how pages were looked at. Also see Page and User.
http://www.surfstats.com/glossary.asp

Hits: This term refers to the number of files that are downloaded from a Web server. Keeping track of hits is a way of measuring traffic to a Web site. The number of hits a site receives is usually much greater than the number of actual visitors. That's because a Web page can contain more than one file.
http://www.networksolutions.com/glossary/glossary-h.jsp

Hits - A hit represents a request to your web site for a file such as an image, a web page, or a CGI script. One web page may contain several related resources, and as a result, a visitor viewing one web page may trigger several hits. Hits generated as a result of an error (either a 400 or 500 level error) are not counted as actual hits to your site, and are kept separate from successful hits.
http://www.smartertools.com/Help/SmarterStats/v3/Topics/Misc/Glossary.aspx

Hit - A request for a file from the web server. Available only in log analysis. The number of hits received by a website is frequently cited to assert its popularity, but this number is extremely misleading and dramatically over-estimates popularity. A single web-page typically consists of multiple (often dozens) of discreet files, each of which is counted as a hit as the page is downloaded, so the number of hits is really an arbitrary number more reflective of the complexity of individual pages on the website than the website's actual popularity. The total number of visitors or page views provides a more realisitic and accurate assesment of popularity.
http://en.wikipedia.org/wiki/Web_analytics

Page Views

Page View - A request for a file whose type is defined as a page in log analysis. An occurrence of the script being run in page tagging. In log analysis, a single page view may generate multiple hits as all the resources required to view the page (images, .js and .css files) are also requested from the web server.
http://en.wikipedia.org/wiki/Web_analytics

Pageview - A page is defined as any file or content delivered by a web server that would generally be considered a web document. This includes HTML pages (.html, .htm, .shtml), script-generated pages (.cgi, .asp, .cfm, etc.), and plain-text pages. It also includes sound files (.wav, .aiff, etc.), video files (.mov, etc.), and other non-document files. Only image files (.jpeg, .gif, .png), javascript (.js) and style sheets (.css) are excluded from this definition. Each time a file defined as a page is served, a pageview is registered by Google Analytics.
http://www.google.com/support/analytics/bin/answer.py?answer=27303

Page (HTML) View or Request - The request for a file defined as a page file. The page is basically what you see after the transfer and can consist of many other files. Page requests do not include hits to images, component pages of a frame or other non-html files. The number of page Hits = the number of page views.
http://www.surfstats.com/glossary.asp

Page Views: A page view is also called a Page Impression. It refers to a hit to HTML pages only (access to non-HTML documents, such as images, are not counted).
https://secure.oregonstate.edu/webstatistics/glossary.php

Page Views - A page view is a successful request for a file on your web site that is considered to be a page. These usually mean files with extensions such as .txt, .asp, .aspx, .php, etc. Views generated as a result of an error (either a 400 or 500 level error) are not counted as actual views for your site, and are kept separate from successful views.
http://www.smartertools.com/Help/SmarterStats/v3/Topics/Misc/Glossary.aspx

Visits

Visit / Session - A series of requests from the same uniquely identified client with a set timeout. A visit is expected to contain multiple hits (in log analysis) and page views.
http://en.wikipedia.org/wiki/Web_analytics

Session - A Session is a defined quantity of visitor interaction with a website.
By default in Analytics, a session is defined as the period of time during which visitors are interacting with your site and there has been inactivity for less than 30 minutes. After 30 minutes of inactivity, any further page views will be treated as a new session. Users that leave your site and return within 30 minutes will be counted as part of the original session.
The 30 minute default timeout can be changed with an addition to the tracking code.
http://www.google.com/support/analytics/bin/answer.py?answer=27303

User (client) Session - A series of consecutive requests from a user to an Internet site. A user session is terminated when a user does not make another request for more than 30 minutes. If for example a visitor with IP Address 1.2.3.4, visits the site, logs out and another visitor logs in an hour later with the same IP Address, there would be two user sessions but one unique visitor.
http://www.surfstats.com/glossary.asp

Visitor Sessions:A Visitor Session is a session of activity (all hits) for one user of a Web site. A unique user is determined by the IP address or cookie. By default, a visitor session is terminated when a user is inactive for more than 30 minutes. Synonym: Visit.
https://secure.oregonstate.edu/webstatistics/glossary.php

Visit Length - The number of seconds that a visit lasts. On reports dealing with visit length, the average visit length is calculated and shown for all visits. Visits length assumes that the visitor stays several seconds after their last hit.
http://www.smartertools.com/Help/SmarterStats/v3/Topics/Misc/Glossary.aspx


Unique Visitors

Visitor / Unique Visitor - The uniquely identified client generating requests on the web server (log analysis) or viewing pages (page tagging). A visitor can make multiple visits.
http://en.wikipedia.org/wiki/Web_analytics

Unique Visitors - Unique Visitors represents the number of unduplicated (counted only once) visitors to your website over the course of a specified time period. A Unique Visitor is determined using cookies.
http://www.google.com/support/analytics/bin/answer.py?answer=27303

Unique visitors - Unique visitors are counted using the visitor's IP address or cookie information to identify the visitor. For instance if one visitor visits a site on the 3rd and 4th of August it is seen as two visits from one unique visitor.
http://www.surfstats.com/glossary.asp

Users: This identifies the IP address and/or domain name and relative activity level on the site.
https://secure.oregonstate.edu/webstatistics/glossary.php

Unique Visitors - A unique visitor represents any number of visits from the same computer. If a person returns to the site again, a visit is counted, but a unique visit is not.
http://www.smartertools.com/Help/SmarterStats/v3/Topics/Misc/Glossary.aspx

Authenticated Users

Username - A Username is name used to gain access to a computer system. Usernames, and usually passwords, are required in multi-user systems. In most such systems, users can choose their own usernames and passwords.
http://www.google.com/support/analytics/bin/answer.py?answer=27303

Authentication - The verification of a user by matching a username and password in a multi-user or network environment. A user's name and password are compared against an authorized list, and, if the system detects a match, access is granted to the extent specified in the permission list for that user.
http://www.surfstats.com/glossary.asp

Authenticated Users: This identifies the true name and relative activity level of the users logging onto a server that requires user name and password. You may find more authenticated users than users (in the following table) as several persons may be using the same IP address. Since many ISPs (such as AOL) dynamically assign IP addresses, and since multiple users may come from a single IP address, authentication is the only way to truly identify top visitors.
https://secure.oregonstate.edu/webstatistics/glossary.php

Authenticated Visitor - An authenticated visitor is a web site user who successfully logs into a website using authentication. Scripted authentication (like ASP.NET Forms Authentication or database mechanisms) do not count as authentication. Typically, authentication must be administrated on the web server.
http://www.smartertools.com/Help/SmarterStats/v3/Topics/Misc/Glossary.aspx


Domains/Countries

Domain - A domain is a specific virtual area within the Internet, defined by the "top level" of the address or URL (Uniform Resource Locator). The top level is the end of the address; example: "whitehouse.gov". In this example, the top-level part of the domain is ".gov", indicating a US government entity. The "whitehouse" part is the second-level domain, indicating where within the ".gov" domain the information in question is to be found. Other common top-level domains include ".com", ".net", ".uk", etc.
http://www.google.com/support/analytics/bin/answer.py?answer=27303

Domain Name Suffix - The last digits of a domain name can be used to identify the country or type of organization. Possible suffixes for the organization type includes: .com = Commercial .edu = Educational .int = International .gov = Government .mil = Military .net = Network .org = Organization .xx = where the xx is a two digit country code, e.g. .uk for United Kingdom
http://www.surfstats.com/glossary.asp

Country-Specific Domain Names (ccTLDs): Country code domain extensions represent a specific country. ccTLDs allow you to create an in-language Web site and display different site content to visitors from various cultures around the world. You can also register ccTLDs to prevent unauthorized use of trademarks, brands and licensed names around the world.
Domain Name Extensions: Network Solutions offers a variety of domain name extensions. Protecting brand identity has become very important, so often customers will register multiple extensions and variations of their domain names. Here are the most frequently registered extensions and their common usage, although it must be noted that any extension can be used for any purpose:

Extension

Common Usage

.com

Commercial, but is commonly used for everything

.net

Internet administrative site, but is commonly used

.org

Organization

.info

Information

.biz

Business

.us

United States

.name

Personal Web sites

.ws

Western Samoa, but is often used for Web Sites

.bz

Belize

.vg

British Virgin Islands

.cc

Cocos (Keeling) Islands

.ms

Montserrat

.gs

South Georgia & the South Sandwich Islands

.tc

Turks and Caicos Islands

.tv

Tuvalu, but often used for television

.uk

United Kingdom

.de

Germany

.eu

European Union

.be

Belgium

.cn

China

.tw

Taiwan

.at

Austria

.nz

New Zealand

.mx

Mexico

http://www.networksolutions.com/glossary/glossary-d.jsp

Most Active Countries: This identifies the top locations of the visitors to the site by country. The country of the user is determined by the suffix of its domain name. Use this information carefully because this information is based on where the domain name of the visitor is registered, and may not always be an accurate identifier of the actual geographic location of this visitor. For example, while a vast majority of .com domain names are from the United States, there is a small minority of domain names that exist outside of the United States.
https://secure.oregonstate.edu/webstatistics/glossary.php


Moreover, the distinction between unique visitors and new visitors, is worthy of serious consideration:

Visitor / Unique Visitor - The uniquely identified client generating requests on the web server (log analysis) or viewing pages (page tagging). A visitor can make multiple visits.
Repeat Visitor - A visitor that has made at least one previous visit.
New Visitor - A visitor that has not made any previous visits.
http://en.wikipedia.org/wiki/Web_analytics

New Visitors - A new visitor represents a visit by a computer that has not yet been to the web site in the time period of the report.
Return Visitors - A return visit is counted when a computer that has already been to the site before returns for another visit.
Unique Visitors - A unique visitor represents any number of visits from the same computer. If a person returns to the site again, a visit is counted, but a unique visit is not.
http://www.smartertools.com/Help/SmarterStats/v3/Topics/Misc/Glossary.aspx

First Time Sessions - The number of times unique visitors came to your website during a specified time period, not having visited before that period. These visitors are identified by cookies.
Returning Sessions - Returning Sessions represents the number of times unique visitors returned to your website during a specified time period.
Visitor Session - A Visitor Session is a defined period of interaction between a Visitor (both unique and untrackable visitor types) and a website. The definition of a Session varies depending on the type of visitor tracking employed.
http://www.google.com/support/analytics/bin/answer.py?answer=27303#session



I therefore conclude that though there are slight variations in the definitions of these terms by different people, tracking Unique visitors will generally give you an idea of the real number of people that view you site. And visits/sessions will give you an idea of the number of times your website is viewed over a specific period of time.



Two interesting web log analyzers
Webalizer
From Wikipedia, the free encyclopedia
The Webalizer is a GPL application that generates web pages of analysis, from access and usage logs, i.e., a web log analysis software. It is one of the most commonly used web server administration tools today (2005). It was initiated by Bradford L. Barrett in 1997. Statistics commonly reported by Webalizer include: hits; visits; referers; the visitors' countries; and the amount of data downloaded. These statistics can be viewed graphically and presented by different time frames, such as per day, hour, or month.

Urchin (software)
From Wikipedia, the free encyclopedia
Urchin is a web statistics analysis program developed by Urchin Software Corporation. Urchin is used to analyze web server log file content and display the traffic information on that website based upon the log data. Urchin has become one of the more popular solutions for website traffic analysis. While Urchin Software has recently been purchased by Google, the Urchin 5 and prior analysis programs are still widely used and available today.

your comments are very much welcome
Background Terms

These are terms you need to learn about before you can fully understand, the statistics produced by a website.

These terms are:
Website, Web Server, Web Host, Domain name, Web Server log file, Web Page, Web Browser

WEB BOWSER
This is the software used to request and view web pages on the internet (e.g. Internet Explorer)

WEBSITE
A website is a collection of web pages, typically common to a particular domain name on the Internet, and that is stored on a web server.

WEB PAGE
A web page is a document, typically written in HTML/XHTML/PHP/ASP.NET/CFML, to mention a few, which is displayed in a user's web browser. It is made up of both text and/or images, and it can also contain special programs like swf files, javascripts, and vbscripts (that enables a page to be more dynamic), and more.

DOMAIN NAME
A name that is entered into a web browser and then looked up on the internet, to find out which web page to display in the web browser (e.g. webtraffix.blobspot.com)

WEB SERVER
A computer that is responsible for accepting requests from web browsers, and serving them Web pages.

WEB SERVER LOG FILE
This is a file (or several files) automatically created and maintained by a web server of pages requested by web browsers. Information about the request, including client IP address, request date/time, page requested, HTTP code, bytes served, user agent, and referer are typically added.



WEB LOG ANALYSIS SOFTWARE
Because the web server log file is not easy to understand, several soft wares has been developed to read and interpret information contained in the web server log file, in such a way that it is easy for most people without technical skills to view and understand. (E.g. Smarter Stats, Webalizer, Urchin)


Please note that unless otherwise indicated, the definitions here are primarily editings from www.wikipedia.com. Also note that this list and the definitions here are far from exhaustive.

If you have any personal comments, please forward them to samizybiz@yahoo.co.uk