American Road Cycling

[ Home | Rides | Chatter Box | Comics | About | Fees | Join | Members | FAQContact | Dedication ]   

WEB LOGS STUDY SUMMARY
01/31/06

- SlingShot

The object being observed is always changed
by the instrument of observation.
- Unknown (paraphrased by SlingShot)

The quote at the beginning of this page could not be more relevant to this study of the American Road Cycling usage logs. I read something similar more than 30 years ago, but I really gained a greater appreciation for it during this review of web site traffic. I believe my paraphrase does the thought justice, but that may just be my eyes making it look better to me.

The quote's relevance to the study of the logs comes from the fact that when I first started the ATTENDANCE RECORDS, I had only the barest evidence that anybody was even reading the American Road Cycling web site at all. Then after I started keeping records, it appeared that the act of keeping and publishing them had itself become an attraction, so I cannot be sure how much of my observation was based on typical browsing patterns, or how much the traffic was skewed by publishing those patterns themselves.

Previously, a few people had mentioned to me that American Road Cycling had become all the rage, but I assumed this was just the perception of a few people who all knew each other, and who were merely reinforcing this belief by talking amongst themselves. I figured there were only about a ½ dozen somewhat regular readers, or a baker's dozen tops.

Most of the feedback was about how everybody really loved ROAD RASH COMICS. It was a sentiment I totally agreed with and considered Road Rash to be the only truly excellent part of American Road Cycling, making it all worth the effort, a serendipitous boon which made publishing the site easy.

I originally began the site as a practical study to: 1) Show Paul Latrine how quickly a fully functional web site can be put together using "light structures" (my own term), and 2) add support to my general thesis that all things Internet are mostly a waste of time, when somebody is actually involved in doing something significant in the real world.

Almost as an aside, there was also that little matter of one of my newsletter articles being CENSORED BY THE TALIBAN, then afterward, during my protest, my personally being harassed by parts of the OCBC leadership. This censorship and harassment became the focal point of early American Road Cycling activity. Happily, all harassment has stopped, and the web site has supplanted the Taliban's censorship with freedom of nonsense.

But back to the matter, that little bit of feedback I started receiving almost immediately on publishing American Road Cycling, even such a small amount, was vastly more than I had received in the previous decade of my constructing and publishing three dozen other web sites. Those sites began in 1993, the year the World Wide Web began.

Those other sites were done gratis just like American Road Cycling. The exceptions were Endico (which foots the bill here) and Equipoise which wasn't meant to be gratis, it just turned out that way.

The earliest of these sites were established on my belief about "what could be" (or rather, what I knew surely would be), while later ones were put together as caveats to help other people understand what the Internet is totally incapable of providing, despite widespread hype to the contrary.

In light of my previous online experiences, the amount of feedback that was coming in about American Road Cycling was astonishing.

The final trigger that made me take a closer look was after I mentioned Terry Bowden on the home page, and he reported back very soon after that someone had told him he was now famous on the Internet. I had not known Terry before the mentioning, and did not know (even by name) the person he reported as saying something to him about it. It was clearly time to take a closer look at how many people were actually viewing the site. I still figured it was about 6 to 13 people tops.

Hit Counter: Before reading on, go back to the home page, pull the page down till the Hit Counter is in the middle of the screen, then hit your browser's Refresh button several times. You will notice the Hit Counter is going backwards, starting from some astronomical number that I put in myself. The hit counter has been functioning this way for most of the life of American Road Cycling, but Grant Salter is the only person who has ever mentioned noticing it.

The enormous number of "hits" aside, after speaking to Terry, and after Lynn Meyer reported this guy Dan (Palletman) McNeilly was also reading American Road Cycling a lot, I merely hoped to begin a study to confirm that about a half dozen people were showing up somewhat regularly, so I started taking a close look at the logs files.

It is not an easy task to decide what the data contained in a raw log file means. Plus, from the little I knew about it, I could tell that all the supposed "Web Traffic Reports" that were commercially available were reporting nonsense to the people who relied on them. These pitiful reports are a good thing for people selling web space, because they allow them to grossly overstate the amount of traffic a site is receiving. People are so accustomed to doing advertising that cannot be effectively tracked, they don't question these web usage reports very much.

So I developed my own system; and, in that process, found that it is even harder than I expected to get the truth out. If not for the number of actual human beings I knew were visiting the site, I would still be in the dark about it.

Here's how I figured it out.

First, a question. Do you want to know what kind of stuff your government is trying to get Google to cough up? TAKE A LOOK AT A RAW DATA FILE.

The file linked above are the American Road Cycling log entries for only the date 01/27/06. That day was chosen because it was the lowest traffic recorded for the month. but enough complexity remains to make the point. Security sensitive numbers have been redacted, so be aware that the uncensored version looks even more complex. Scroll down the log entries while thinking about the character Cypher (Joe Pantoliano) as he watched the raw data from The Matrix and says to Neo, "Why oh why didn't I take the BLUE pill?"

Why indeed.

Fortunately I knew some of the actual people who were attached to a some of the IP#'s in the logs. IP stands for Internet Protocol and the numbers are in this format: xxx.xxx.xxx.xxx

Every person on the web has at least one IP#, it is the only way a connection can be made. Somewhere there is a data table that links each and every IP# to the person who is using it. That's how Bad Boy become Caught Boys.

There are cloaking technologies, but every thing made can be broken, except not necessarily by me, so I was extremely excited to get the numbers of "known" browsers, who were returning on a regular basis. Really, there is no way to explain to somebody, who doesn't already know, what a wonderful opportunity this was. I've been waiting for more than a decade.

Still, just knowing the numbers isn't enough. There is a great deal of complexity that can be removed by importing the raw data into an Access database, and establishing filters to get rid of noise. Such as the following:

Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)

This means the "person" connecting is actually a Google web search bot, that is to say this is a web robot that is automatically scanning the internet gathering information and content from pages. Most of the traffic on a web server comes from such bots, not from humans. So once a "well behaved" bot (meaning it doesn't do bad stuff like download whole sites over and over, fill the logs with spam, steal images and other things you don't want stolen, etc), one can filter those entries from the log file, and that makes the log much more readable.

Establishing good filters is an iterative process, and it takes constant tweaking. Again, it is almost impossible to do without having good references at the outset regarding, "who's a bot, and who's not." It is really a great help to be able to look at a number and say, "Oh, that's just Lynn, it's ok. She's allowed to hit Mary Ellen's Birthday Countdown as many times as she wants. No problem."

Once you know a few peoples' IP#'s, you can start filtering using other criteria such as, "drop the log entries for all the images which get loaded every time somebody hits a page." You can get rid of things like:

/images/weblogo.jpg
/images/new.gif
/images/t_00145b.jpg
/_vti_bin/fpcount.exe/ Page=index.htm|Image=4|Digits=10
/images/patriot.jpg
/images/t_support.jpg
/images/t_ID.jpg
/images/poster.gif

...which just happens to be the images that are loaded (plus the hit counter decrement) every time somebody accesses the American Road Cycling home page. You take this stuff out of the report, and it gets a lot easier to read. but finding which things are important to see, and which things should be left out of the report, takes a lot of time, trial, and error.

Here is my current full list of filters:

ACCESS QUERY LOG FILTERS

Once again, security sensitive numbers have been redacted. You'll also note that several shortcuts have been used to filter whole types of entries. I really shouldn't publish this list, because it can give someone who has mal-intent a shortcut to finding how best to hide from SlingShot's eagle eye.

I also use numerous filters with automated sorting, deleting and renaming macros to get the raw data into an ever more readable format even before applying the filters above. But that's beyond the scope of this discussion.

Here's a picture of the final screen I use for logs review:

LOGS TRACKER

The top list is the filtered log events. The bottom list holds the copy of selected data for tracking. The bottom list is where I keep the track of observed IP#'s and their browsing behaviors exhibited. If it is an Unknown Viewer, 3 return visits of human like behavior gets a UV number assigned to them. UV11 is circled in red on the screen shot.

Tucked between the two lists is the note clipboard I use to keep track of what point in the logs I have reviewed, such as ARC, 209.210.33.75, 01/30/06, 23:58:22. That way I know where to start the next review.

Besides knowing which things can be filtered out, it is really important to figure out which things are good to keep track of, such as: Referrers. This is a record of whatever page it was that linked the viewer to your site. Below is the Referrer entry for somebody who arrived at American Road Cycling after having performed a Yahoo search for "Bela Caroli":

http://search.yahoo.com/search?p=bela+caroli&fr=FP-tab-web-t&toggle=1&cop=&ei=UTF-8

You can plug the above into a your browser's URL field to see just what led that person to the site. Search engine results are to some degree variable, so things might have changed by the time you take a look at it. Otherwise, the information derived this way is used to make necessary changes so the next person doesn't show up using bandwidth needlessly while they are getting a bad feeling about your name, as it is being associated with something that wasted their time.

Other things you can learn by keeping certain items in the report are such as: Dan (Palletman) McNeilly uses a Firefox browser. That sort of stuff is a comfort, because if something I do is incompatible with that browser, I'm sure Palletman is going to yell, "foul," and I'll be put on notice.

On the other hand, some people use this sort of information to merely attract as much traffic as is absolutely possible to a site, rational or not, just so they can report, "Wow, thousands and thousands and thousands of hits. That's worth a lot. Pay me more!'

A couple times a day, I log onto my server, grab a copy of the log file, run it through the filtering process, check to see who showed up. Take a look at how effectively American Road Cycling seems to be working for them, and check them onto the ATTENDANCE RECORDS.

Reviewing everybody's browsing this month have given me a solid ability to look at logs and see right away who's a human, and who's a bot. The biggest improvement is in my ability to identify a person who has arrived through an AOL connection which assigns several IP#'s at the same time, so their number keeps changing as they click from page to page. Being able to do all this really helps me when I review the Endico logs, and those logs are important, because Mary's painting sales buys me computers, Ottrotts, and trips to Florida.

In the final analysis, it turns out my first impressions were almost totally correct. There are about a half dozen regular readers of American Road Cycling, so about a dozen considering those who show up once in awhile, and at the very most two dozen considering rare or one time arrivals. Once in awhile there's a newbie who happens onto the site, but just like we already knew, the status of "American road cycling" is pretty sad. But at least I'm sure about those numbers now, not just guessing.

An unscrupulous Internet provider could easily reinterpret the raw data, and truthfully state, there were 23,488 unique viewings of the American Road Cycling web site this month. The fact of the matter would be that this "truth" represents a readership of less than a dozen regulars, a half dozen more of irregulars, and maybe a dozen tourists happening through from time to time. The rest is bots and nonsense. I can report this, because I'm not making any money on your belief that a lot is happening with American Road Cycling.

But this is an astounding number of people, considering what I've seen with 36 other sites that I've done over the last 13 years. One of them, Equipoise, is even one of the earliest equine sites on the web, and remains one of the best run and most unique horse sites, while holding the title as the world's first "catch engine"—which is the opposite of a search engine, and maybe more useful.

A somewhat lesser degree of lying occurs when web hosting services provide generic reports that pretend to distinguish people from bots, while overstating the degree of reliability for "unique viewings."

During the course of this month's study, I wound up looking at numerous web log reports throughout the Internet and was astonished at the general level of misinformation they could be providing to uninformed site owners. Just the knowing of who the bots are is only the first step in understanding logs output.

Disregarding for the moment that bots often change IP#'s, along with the other information they carry onto a site which may be used to filter the output of generic automated reports, standardized web reports are still unlikely to ever be able to deal with the vast array of digital processes that routinely access a web server without human intervention, the hardest task may be in effectively deciding which elements on a site are best left off the final report, and then using the information gathered during the reporting process to cycle back into the design and development process in order to continually improve the entire site as a whole.

The only way to make a site more understandable in terms of what the actual human traffic flow is, and what it really means, is to have a human constantly review and fine tune the reporting process, then match it back into changes made on the site, and iterate that process again, and again..

If anybody reads to this point and wants to know more about it, let me know, and I'll continue.

WELL, SOMEBODY DID COMMENT...


this page last updated:
10/27/2016 08:38:31 PM

A Def Unc T Publication