Does your web hosting company block access to access.log, the text file containing raw website log files? If so, you’re like me, and it sucks. There’s a plethora of gorgeous and extremely insightful website traffic analyzers, but all of them require access to raw HTTP access logs. Today I propose a semi-efficient way to generate such logs utilizing PHP to determine page load data (time, user IP, requested page, referring page, user client, etc) and SQL to save such data for easy retrieval later. Note that this method is a HUGE improvement of my previous project which used PHP scripts to store HTTP access logs as flat files. Although it worked in theory, in all practicality the process of opening, writing to, and closing a text file (which grew a few MB a week) was too cumbersome for my server to comfortable handle. The method described on this page utilizes SQL, a database engine well-suited to meet these exact demands. When we’re done, you’ll be able to use a web interface to view your access log (pictured, converting long, complicated search queries to web search and image search strings automatically), or have the option to export it directly to an access.log text file in a standard Apache-style format.
First, make sure your database is structured appropriately. This page is written for those with a working knowledge of PHP and SQL, but if you’re new to the field I encourage you to learn! W3Schools.com is an awesome resource to rapidly learn new languages. Also, when starting-out with SQL (like me), phpMyAdmin is a awesome. The code, as it’s currently written (below) is designed to store data in the “nibjb” database under the “logs” table. Briefly, it uses PHP to determine user data (time, ip, requested page, etc.) and injects this information into the SQL database. In fact, it’s doing it to you right now! Don’t believe me? View the source of this web page and scroll to the bottom. BAM! There you are.
All right, that was easy. Every time we load logme.php, it adds the data to the SQL database. To add data every time you go to a particular web page, you could use a PHP include() statement in each webpage, or you could take advantage of the PHP’s auto_append_file feature! Simply insert the following line into your php.ini file if you have access to yours:
auto_append_file = "/path/to/html/logme.php"
How do we access this data once it’s been loaded into the database? There are many different ways, but I’ve chosen to get a little creative with a sleek, yet minimalistic web-based fronted. It basically just shows the last [x] number of entries in the access log. You can adjust the number of entries displayed by slapping on some arguments to the URL, transforming viewLast.php into viewLast.php?limit=123 or something (see the screenshot above). I won’t discuss the details of this script. It’s self-explanatory.
// viewLast.php
<html>
<head>
<style type="text/css">
td {
font-family: verdana, arial;
font-size:10px;
}
</style>
</head>
<body>
<?php
$limit = (int)$_GET['limit'];
if ($limit===0) {$limit=25;}
$username="YOUR_USERNAME";
$password="YOUR_PASSWORD";
$database="nibjb";
mysql_connect('mysql157.secureserver.net',$username,$password);
mysql_select_db($database) or die( "Unable to select database");
$query="
SELECT * FROM logs WHERE
request NOT LIKE \"%testlog.php%\"
AND request NOT LIKE \"%/logs/%\"
AND request NOT LIKE \"%/wp-admin/%\"
ORDER BY ID DESC LIMIT 0,$limit
";
//$query="SELECT * FROM logs WHERE referrer LIKE \"%&q=%\" or referrer LIKE \"%&prev=%\" ";
$result=mysql_query($query);
$num=mysql_numrows($result);
mysql_close();
?>
<b><?php echo($query); ?></b>
<table border="1">
<tr>
<td>id</td>
<td>time</td>
<td>visitor</td>
<td>request</td>
<td>referrer</td>
</tr>
<?php
$i=1;
while ($i<$num) {
$id=mysql_result($result,$i,"id");
$time=mysql_result($result,$i,"time");
$visitor=mysql_result($result,$i,"visitor");
$method=mysql_result($result,$i,"method");
$request=mysql_result($result,$i,"request");
$protocol=mysql_result($result,$i,"protocol");
$referrer=mysql_result($result,$i,"referrer");
$referrer2=str_replace("&", "& ", $referrer);
$agent=mysql_result($result,$i,"agent");
$searchWords="";
$searchEngine="";
if (strpos($referrer, "q=")>0 and strpos($referrer, "google")>0) {$searchEngine="Google Web Search: ";}
if (strpos($referrer, "prev=/images")>0 and strpos($referrer, "google")>0) {$searchEngine="Google Image Search: ";}
// SEARCH EXTRACTION //
$j=0;
$rTemp=str_replace("prev=/images%3Fq%3D", "q=", $referrer);
$rTemp=str_replace("?q=","&q=", $rTemp);
$rTemp=str_replace("%2B"," ", $rTemp);
$rTemp=str_replace("%26"," ", $rTemp);
$rTemp=str_replace("%3D"," ", $rTemp);
$rTemp=str_replace("+"," ", $rTemp);
$wvars=split("&",$rTemp);
while ($j<count($wvars)){
if (substr($wvars[$j],0,2) === "q=") {
$searchWords = $searchWords . $wvars[$j] . " ";
}
$j++;
}
$searchWords=substr($searchWords,strpos($searchWords, "q=")+2);
if (strlen($searchWords)<3) {$searchWords=$referrer;}
////////////////////////
echo "
<tr>
<td>$id</td>
<td>$time</td>
<td>$visitor</td>
<td><a href='$request'>$request</a></td>
<td>$searchEngine <a href='$referrer'>$searchWords</a></td>
</td>
";
$i++;
}
?>
</table>
</body>
</html>
And you’re done! This example is a simplified, bare bones example. You can take this a long way if you’d like. My goal is lite & flexible. A quick query from Python and Matplotlib (for example) yields gorgeous visual representations of otherwise-convoluted data!
If you have any questions, or end-up developing something awesome with this code, shoot me an email! It’s not luxurious, but this code works for me, and I share it with the best of intentions.
I’m briefly suspending entries on this website. I currently have no projects I’m working on, and I’m going to try to keep it that way for a few weeks. I really need to re-gear my brain and get ready for dental school next month. I’m struggling with a plethora of random emotions, and I think the best thing for me is to take it easy for a little bit and try to let go of the things I feel are important to me (projects, electrical, mechanical, computational, painting, or otherwise). I’m going to try my best to organize data from my past life (about a decade worth) in an attempt to preserve it. I’ve been thrown back into my early teen years by uncovering ~10 GB of music I used to listen to. Nostalgia? Yeah, I’m feeling it. I had totally forgotten about random, obscure Japanese bands such as Rip Slyme. For example, Hot Chocolate [a must hear / must see youtube video]. In fact, [youtubes some more], check out this randomness [embeds below]. I love non-mainstream awkwardness. What’s that I hear? 8-bit tones?
I realized that the C code from yesterday wasn’t showing-up properly because of textile, a rapid, inline, tag-based formatting system. The app converted blog code from ["text":http://www.SWHarden.com/ *like* _this_] to [textlikethis. ] While it’s fun and convenient to use, it’s not always practical. The problem I was having was that in C code, variable names (such as _delay_) were becoming irrevocably italicized, and nothing I did could prevent textile from ignoring code while styling text. The kicker is that I couldn’t disable it easily, because I’ve been writing in this style for over four years! I decided that the time was now to put my mad Python skills to the test and write code to handle the conversion from textile-format to raw HTML. I accomplished this feat in a number of steps. Yeah, I could have done hours of research to find a “faster way”, but it simply wouldn’t have been as creative. In a nutshell, I backed-up the SQL database using PHPMyAdmin to a single “x.sql” file. I then wrote a pythons script to parse this [massive] file and output “o.sql”, the same data but with all of the textile tags I commonly used replaced by their HTML equivalent. It’s not 100% perfect, but it’s 99.999% perfect. I’ll accept that. The output? You’re viewing it! Here’s the code I used to do it:
## This Python (1.0) script removes *SOME* textile formatting from Wordpress
## backups in plain text SQL format (dumped from PHP MyAdmin). Specifically,
## it corrects bold and itallic fonts and corrects links. It should be easy
## to expand if you need to do something else with it.
## Enjoy! --Scott Harden (www.SWHarden.com)
infile = 'x.sql' # < < THIS IS THE INPUT FILE NAME!
replacements= ["\r"," "],["\n"," \n "],["*:","* :"],["_:","_ :"],
["\n","<br>\n"],[">*","> *"],["*< ","* <"],
[">_","> _"],["_< ","_ <"],
[" *"," <b>"],["* "," "],[" _"," <i>"],["_ ","</i> "]
#These are the easy replacements
def fixLinks(line):
## replace ["links":URL] with [<a href="URL">links</a>]. ##
words = line.split(" ")
for i in range(len(words)):
word = words[i]
if '":' in word:
upto=1
while (word.count('"')<2):
word = words[i-upto]+" "+word
upto+=1
word_orig = word
extra=""
word = word.split('":')
word[0]=word[0][1:]
for char in ".),'":
if word[1][-1]==char: extra=char
if len(extra)>0: word[1]=word[1][:-1]
word_new='<a href="%s">%s</a>'%(word[1],word[0])+extra
line=line.replace(word_orig,word_new)
return line
def stripTextile(orig):
## Handle the replacements and link fixing for each line. ##
if not orig.count("', '") == 13: return orig #non-normal post
line=orig
temp = line.split
line = line.split("', '",5)[2]
if len(line)<10:return orig #non-normal post
origline = line
line = " "+line
for replacement in replacements:
line = line.replace(replacement[0],replacement[1])
line=fixLinks(line)
line = orig.replace(origline,line)
return line
f=open(infile)
raw=f.readlines()
f.close
posts=0
for raw_i in range(len(raw)):
if raw[raw_i][:11]=="INSERT INTO":
if "wp_posts" in raw[raw_i]: #if it's a post, handle it!
posts+=1
print "on post",posts
raw[raw_i]=stripTextile(raw[raw_i])
print "WRITING..."
out = ""
for line in raw:
out+=line
f=open('o.sql','w')
f.write(out)
f.close()
I certainly held my breath while the thing ran. As I previously mentioned, this thing modified SQL tables. Therefore, when I uploaded the “corrected” versions, I kept breaking the site until I got all the bugs worked out. Here’s an image from earlier today when my site was totally dead (0 blog posts)
A few months ago I wrote about a way I use PHP to generate apache-style access.log files since my web host blocks access to them. Since then I’ve forgotten it was even running! I now have some pretty cool-looking graphs generated by Python and Matplotlib. For details (and the messy script) check the original posting.
This image represents the number of requests (php pages) made per hour since I implemented the script. It might be a good idea to perform some linear data smoothing techniques (which I love writing about), but for now I’ll leave it as it is so it most accurately reflects the actual data.
After several years of persistent writing on this website I was forced (by my undergraduate university’s difficult course loads) to stop adding to this blog – something I consider to be one of the most significant projects I’ve ever worked on, with brain-to-text recordings of my thoughts spanning almost a decade of time. After a few years of suspended writing, Google went from loving me (sending me thousands of pageviews daily) to forgetting about me (nothing. silence. nada.). Now that my thesis requirements have been completed, I’m trying to re-energize my writing in an attempt to document the projects I work on which, without this website, would likely be forever forgotten even by me. It appears that the burst of new writing has regained Google’s attention. Google for terms such as “data smoothing in python” and it favors my site. Google is slowly, but surely, re-indexing my pages and assigning them values of relevance which are approaching (but still a tiny fraction of) what they were before my hiatus. Here’s a chart from google’s analytics demonstrating an estimation of IP visits per day (visitors) and their locations. Do I have fans in South Africa? I didn’t know they had computers in South Africa! (I’m sorry if you are that person in South Africa, and were offended by that statement)
While skimming an earlier post of mine I decided to try representing the size of my blog (measured as the number of words) as a curve integrated with respect to the posting date. I had to perform my moving-triangle smoothing method with a 20 day window to get it to come out nicely (to correct for skipped days, double posts, etc) and I’m pleased with the result. Why did I do all of this? Because I can. Now, back to work. [sigh]
This begs the question: What was the late 2003 spike for? Well, I can’t say for sure, but I’d speculate this is the height of my geekdom. Just look at my blogs from December, 2003 – what do I talk about? Random life stuff (which, for me, mostly boiled down to network trouble and hardware projects). Curiously, the trace of my integrated blog size is a good (yet indirect) measure of my geekness. The last few years I’ve been relatively normal, but it appears I’m becoming more geeky again.
*I spent the day* in the lab with some random time on my hands between adding reagents to an ongoing immunohistochemical reaction I was performing. At one point I decided to further investigate the field of bioinformatics (is it worth seeking a PhD in this field if I don’t get into dental school again?). UCF offers a PhD in bioinformatics but it’s a new and small department (I think there are only 4 faculty). The degree itself is a degree in computer science (the logic side of computers, more programming than designing hardware). A degree in bioinformatics combines molecular biology (DNA, proteins, etc), computer science (programming), and statistics (developing code to analyze biological data). I feel a need to express what it is, because it’s not something that is commonly understood. Do you know what people who study bioinformatics do?
*I came across a paper* today “Structural Alignment of Pseudoknotted RNA”:http://cseweb.ucsd.edu/users/shzhang/app/RECOMB2005_pseudoknot.pdf (by Han B, Dost B, Bafna V, and Zhang S.) which is a good example of the practice of bioinformatics. Think about what goes on in a cell… the sequence of a gene (a short region of DNA) is copied (letter-by-letter) onto an RNA molecule. The RNA molecule is later read by an enzyme (called a ribosome) and converted into a protein based on its sequence. (This process is the central dogma of molecular biology) Traditionally, it was believed that RNA molecules’ only function was to copy gene sequences from DNA to ribosomes, but recently (the last several years) it was discovered that some small RNA molecules are never read and turned into proteins, but rather serve their own unique functions! For example, some RNA molecules (siRNAs) can actually turn genes on and off, and have been assosiated with cancer development and other immune diseases. Given the human genome (the ~3 billion letter long sequence all of our DNA), how can we determine what regions form these functional RNA molecules which don’t get converted into proteins? The paper I mentioned earlier addresses this. An algorithm was developed and used to test regions of DNA and predict its probability of forming small RNA molecules. Spikes in this trace (figure 7 of the paper) represent areas of the DNA which are likely to form these RNA molecules. (Is this useful? What if you were to compare these results between normal person and someone with cancer?)
*After reading the article* I thought to myself “Hmmm… logically manipulating large amounts of linear data… why does this seem familiar?” Then I realized how similar my current programming projects are with this one. (see “my latest DIY ECG data”:http://www.swharden.com/blog/images/ecg_goodie.png posted a couple days ago) Consider the trace (pictured, figure 7 in “Structural Alignment of Pseudoknotted RNA”:http://cseweb.ucsd.edu/users/shzhang/app/RECOMB2005_pseudoknot.pdf) of score (the likelihood that a region of DNA forms an RNA molecule), where peaks represent likely locations of RNA formation. Just generate the trace, determine the positions of the peaks, and you’re golden. How similar is this to the work I’ve been doing with my homemade ECG machine, where I perform signal analysis to eliminate electrical noise and then analyze the resulting trace to isolate and identify peaks corresponding to heartbeats?
*After reading* I shivered from mental-overload. There are so many exciting Python projects in the field of bioinformatics that are just waiting for me to begin work on! I know I’m like a child sometimes, but hey it’s my personality. I get excited. It’s just that I get excited about tacky things these days. Anyway, I got the itch to write a string-analysis program. What does it do? It reads the content of my website (exported in the form of a SQL backup query generated by PHPmyAdmin, pictured), splits it up by date, and allows for its analysis. Ultimately I want to track the usage of certain words (i.e.: the inverse relationship between the words “girls” and “python”), but for now I wrote a script which plots the number of words I wrote. Observe the output.
*Pretty cool huh?* Check out all those spikes between 2004 and 2005! (previous figure) Not only are they numerous (meaning many posts), but they’re also high (meaning many words per post). As you can see by the top trace, the most significant contribution to my site occurred during this time. So, let’s zoom in on it! (next figure)
*And of course, the code to produce this…* (obviously you have to have a wordpress backup SQL file in the same folder – if you want mine let me know and I’ll email it to ya’)
import datetime, pylab, numpy
# Let's convert SQL-backups of my WordPress blog into charts! yay!
class blogChrono():
baseUrl="http://www.SWHarden.com/blog"
posts=[]
dates=[]
def __init__(self,fname):
self.fname=fname
self.load()
def load(self):
print "loading [%s]..."%self.fname,
f=open(self.fname)
raw=f.readlines()
f.close()
for line in raw:
if "INSERT INTO" in line
and';' in line[-2:-1]
and " 'post'," in line[-20:-1]:
post={}
line=line.split("VALUES(",1)[1][:-3]
line=line.replace(', NULL',', None')
line=line.replace(", '',",", None,")
line=line.replace("''","")
c= line.split(',',4)[4][::-1]
c= c.split(" ,",21)
text=c[-1]
text=text[::-1]
text=text[2:-1]
text=text.replace('"""','###')
line=line.replace(text,'blogtext')
line=line.replace(', ,',', None,')
line=eval("["+line+"]")
if len(line[4])>len('blogtext'):
x=str(line[4].split(', '))[2:-2]
raw=str(line)
raw=raw.replace(line[4],x)
line=eval(raw)
post["id"]=int(line[0])
post["date"]=datetime.datetime.strptime(line[2],
"%Y-%m-%d %H:%M:%S")
post["text"]=eval('"""'+text+' """')
post["title"]=line[5]
post["url"]=line[21]
post["comm"]=int(line[25])
post["words"]=post["text"].count(" ")
self.dates.append(post["date"])
self.posts.append(post)
self.dates.sort()
d=self.dates[:]
i,newposts=0,[]
while len(self.posts)>0:
die=min(self.dates)
for post in self.posts:
if post["date"]==die:
self.dates.remove(die)
newposts.append(post)
self.posts.remove(post)
self.posts,self.dates=newposts,d
print "read %d posts!n"%len(self.posts)
#d=blogChrono('sml.sql')
d=blogChrono('test.sql')
fig=pylab.figure(figsize=(7,5))
dates,lengths,words,ltot,wtot=[],[],[],[0],[0]
for post in d.posts:
dates.append(post["date"])
lengths.append(len(post["text"]))
ltot.append(ltot[-1]+lengths[-1])
words.append(post["words"])
wtot.append(wtot[-1]+words[-1])
ltot,wtot=ltot[1:],wtot[1:]
pylab.subplot(211)
#pylab.plot(dates,numpy.array(ltot)/(10.0**6),label="letters")
pylab.plot(dates,numpy.array(wtot)/(10.0**3),label="words")
pylab.ylabel("Thousand")
pylab.title("Total Blogged Words")
pylab.grid(alpha=.2)
#pylab.legend()
fig.autofmt_xdate()
pylab.subplot(212,sharex=pylab.subplot(211))
pylab.bar(dates,numpy.array(words)/(10.0**3))
pylab.title("Words Per Entry")
pylab.ylabel("Thousand")
pylab.xlabel("Date")
pylab.grid(alpha=.2)
#pylab.axis([min(d.dates),max(d.dates),None,20])
fig.autofmt_xdate()
pylab.subplots_adjust(left=.1,bottom=.13,right=.98,top=.92,hspace=.25)
width=675
pylab.savefig('out.png',dpi=675/7)
pylab.show()
print "DONE"
*I wrote a Python script to analyze the word frequency* of the blogs in my website (extracted from an SQL query WordPress backup file) for frequency. “This is what I came up with”:http://swharden.com/little/worddump.html I then took my giant list over to “Wordie”:http://www.wordle.net/create and had them create a super-cool little word jumble. Neat, huh? Here’s “a picture”:http://www.SWHarden.com/blog/images/wordie2.png that’s cool but not worth posting.
*This is the script to make the worddump:*
import datetime, pylab, numpy
f=open('dump.txt')
body=f.read()
f.close()
body=body.lower()
body=body.split(" ")
tot=float(len(body))
words={}
for word in body:
for i in word:
if 65< =ord(i)<=90 or 97<=ord(i)<=122: pass
else: word=None
if word:
if not word in words:words[word]=0
words[word]=words[word]+1
data=[]
for word in words: data.append([words[word],word])
data.sort()
data.reverse()
out= "Out of %d words...n"%tot
xs=[]
for i in range(1000):
d=data[i]
out += '"%s" ranks #%d used %d times (%.05f%%)n'%
(d[1],i+1,d[0],d[0]/tot)
f=open("dump.html",'w')
f.write(out)
f.close()
print "DONE"
So I was reviewing my website statistics generated by a Python script I wrote when I noticed a peculiarity so bizarre that it made me questin the very purpose of my life. Okay maybe it wasn’t that bizarre, but it was interesting. The python script (which is automatically run every hour) downloads my latest access.log and saves it to its own folder. It then analyzes the data, creates some charts and graphs, and dumps out a bare-bones results file displaying some of the information I found useful. Of note is the number of times each page is hit.
This is where things get funny. Outperforming my home page by nearly double was indexOld.php (now indexOld22.php) – a simple webpage I tossed of for about a year before I put my big blog back online! Why were people still going to this page? Further investigation (from the referring sites section of my stats page) revealed a lot of hits from Google image-searches. I started looking at the actual requests and realized that many of these hits were people searching for the term Dwarf Gouramis “a type of freshwater aquarium fish) which was mentioned on that old webpage. The ironic part about it is what happens when you google image search for dwarf gouramis there is a picture of an extremely rare zebra pleco which is actually a link to my website! However the link APPEARS to be to wallpaperfishtalk.com because on my page I just linked to their image.
My conclusion: People are Google image-searching for ‘dwarf gouramis’, and an amazing picture of a zebra pleco is coming up which links to my site (due to the fact that months ago I talked about dwarf gouramis but posted a photo of a zebra pleco) and people (in their awe at this amazing fish) are clicking on it. So what did I do? I pulled a bait-and-switch! You bet I did. Now when you go to indexOld2.php it just forwards you to my current website – mua ha ha ha ha
PS: I’m appending to this entry at 2:17pm to note that I made a wonderful breakthrough in the lab today. Due to intellectual property protection blah blah and the fact that I don’t want anyone else to beat me to my research goal I will not describe what this is, I’ll just say that it took months of preparation and today – presto! It worked beautifully =oD
My web server blocks access to my apache-generated visitor logs (commonly stored in “access.log”). Therefore, many great site usage stats generators (such as awstats – see this example) cannot be used to analyze web traffic to my site. (How many people go what pages? Where do they come from? What search phrases do they type into Google to find my website?) My web host does allow PHP, and access to php.ini, so I figured that I could generate my own access.log using PHP code. I succeeded, but had a hard time doing this because it’s not clearly documented elsewhere – so I’ll make it clear.
Sample line from access.log generated by my PHP script:
132.170.10.227 – - [22/Jan/2009:11:58:49 +0800] “GET /blog/2005-06-29-eva-05-attack-scotts-sanity/ HTTP/1.1″ 200 – “http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&hs=8Lk&q=swharden+eva-05&btnG=Search” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5″
All I had to do was insert the following line at the end of my php.ini file:
And I placed logme.php in my root folder with the following code:
$logwriter_logformat = "combined"; // log format,combined or common
$logwriter_logdir = "/home/content/n/i/b/nibjb/html/logs/"; // physical path where your log file located
$logwriter_logfilename = "access.log"; // your log file's filename
$logwriter_timezone = "+0800"; // your server's time zone. +0800 means GMT+8
function logwriter_writelog($logstring){
global $logwriter_logdir,$logwriter_logfilename;
$fullpathfilename = $logwriter_logdir.$logwriter_logfilename;
if (!is_file($fullpathfilename)) {
print "Log file doesn't exist or file is corrupt.";
return;
}
if (!is_writeable($fullpathfilename)) {
print "Log file is not writable,please change its permission.";
return;
}
if($fp = @fopen($fullpathfilename, "a")) {
flock($fp, 2);
fputs($fp, $logstring);
fclose($fp);
}
}
function logwriter_handlevar($varname,$defaultvalue) {
$tempvar = getenv($varname);
if(!empty($tempvar)) {
return $tempvar;
} else {
return $defaultvalue;
}
}
if (!empty($REMOTE_HOST)) {
$logwriter_remote_vistor = $REMOTE_HOST;
}else{
$logwriter_remote_vistor = logwriter_handlevar("REMOTE_ADDR","-");
}
$logwriter_remote_ident = logwriter_handlevar("REMOTE_IDENT","-");
$logwriter_remote_user = logwriter_handlevar("REMOTE_USER","-");
$logwriter_date = date("d/M/Y:H:i:s");
$logwriter_server_port = logwriter_handlevar("SERVER_PORT","80");
if($logwriter_server_port!="80") {
$logwriter_server_port =
}else{
$logwriter_server_port = "";
}
$logwriter_request_method = logwriter_handlevar("REQUEST_METHOD","GET");
$logwriter_request_uri = logwriter_handlevar("REQUEST_URI","");
$logwriter_server_protocol = logwriter_handlevar("SERVER_PROTOCOL","HTTP/1.1");
if ($logwriter_logformat=="common") {
$logwriter_logstring = "$logwriter_remote_vistor $logwriter_remote_ident $logwriter_remote_user [$logwriter_date $logwriter_timezone] "$logwriter_request_method $logwriter_request_uri $logwriter_server_protocol" 200 -
";
}else{
$logwriter_http_referer = logwriter_handlevar("HTTP_REFERER","-");
$logwriter_http_user_agent = logwriter_handlevar("HTTP_USER_AGENT","");
$logwriter_logstring = "$logwriter_remote_vistor $logwriter_remote_ident $logwriter_remote_user [$logwriter_date $logwriter_timezone] "$logwriter_request_method $logwriter_request_uri $logwriter_server_protocol" 200 - "$logwriter_http_referer" "$logwriter_http_user_agent"
";
}
logwriter_writelog($logwriter_logstring);
Note that the PHP code must be surrounded with < ? php ?> as demonstrated here
The result? As you can tell, my logme.php dumps data to www.swharden.com/logs/access.log – if you browse a few pages on my website, or even use Google to search for me (ie: google for ’swharden’ and ‘minidisc’) you can see yourself in the logfile – pretty cool huh? Once I have a good volume of log data I’ll demonstrate how to turn it into useful information.
I realize and accept the fact that I’ve been talking about the same thing the last several posts. I’ll mention it one more time, then let it go. I don’t know why I’m enamored by my past life – I guess I’m not realizing that I was “cool” in a way I never realized before. The irony of course is that this realization comes years after any coolness left. Now instead of fun times, computer jokes, and savvy programming projects, I’m stuck behind a laboratory bench performing monotonous research, studying for classes and exams I have absolutely zero interest in, and trying to stretch my imagination as far as I can to somehow pull my school and work to a level where I “need” to write software to accomplish it. It’s this constant tugging both toward, and away from, who I was several years ago. As I mentioned in the previous entry, I’m rebuilding both of my main PCs at my house (software-wise, at least). I rely on the no-ip DNS assistance client to give a domain name to my dynamic home IP address. “swharden.sytes.net always points to my home IP address, which right now happens to be 97.104.81.110) Although I don’t use my home system for serious web server purposes, I do connect to my home network from all over using SSH to access the linux terminal. I also run a small web server and torrent server to help share things now and then. Anyway, this is why I posted…
Whenever I download the No-IP DNS client utility, I stopand check out my little contribution to the project. There, on the support page, under client configuration, you can find The Newbie’s Guide to the No-IP™ Linux Client which is a guide I volunteered to contribute to the company several years ago. It won some award for the best entry and I received 2 years of a free domain name of my choice. Since I had nothing to lose (heck, it was free) I registered ScottIsHot.com (the former home of this very blog). When was that? [searches posts] Well, my Wonderful Days blog entry from OCT 2003 mentions the blog as 4 months old, so I’d guess ScottIsHot.com started around July of 03. I think it took half a year from my tutorial submission to my prize, so let’s assume I wrote it in late 2002. That’s 6 years ago? I was about 17 I guess. It was probably a short time after I wrote the entry I described in the previous post. Hey, going back to something I mentioned earlier…
Why isn’t the SSH server installed and activated by default in new Ubuntu installations? Maybe it’s some kind of security thing, who knows. The point I’m trying to get at is that, if I hadn’t been told about ssh years ago when I first began my venture into open source operating systems while running a FreeBSD webserver, would I have known about it now? Is SSH common knowledge? I use it multiple times daily – it’s critical to my needs! Since essentially everything in linux can be accomplished by the console, the ability to connect to my home linux PC’s console remotely from any other PC is incredibly valuable! I guess this is a message to anyone just starting out with linux. Learn to use SSH. Oh, and screen. Very nice =o)
As a closing note I thought I’d post a screenshot found on the Gentoo Linux website demonstrating what the desktop of a gentoo developer looks like. I noticed the wallpaper and, although it wasn’t not too surprising, I still got a chuckle from it. I still love the feel of a desktop with totally transparent borderless terminal windows, and the speed and responsiveness you get from a cut-the-crap window manager like FluxBox. Hey, a random thought popped in my head just now. I wonder if it reveals my psyche more than I can describe? My thought (unedited for logic or embarrassment and as pure as I can reconstruct it) was this (and I’ll use the fancy quotes):
“I wonder what it would feel like knowing that one day I would be lucky enough to be an active member of the development team for such an important project as FluxBox – wait, I have teeth cleanings to look forward to instead…”
Yeah, I know there are some logical arguments. Dentistry isn’t necessarily all or nothing. Just because I would be working as a dentist (not necessarily only cleaning teeth though) doesn’t mean I’d have to forgo my desire to be a part of something important. I guess it just means that any significant contributions are unlikely. After investing time in my family and my career, I doubt I’d have enough free time to be actively involved in any kind of meaningful open source project. Ever. [keels over; dies promptly]
I’m quite proud of the general diction used through my blog. Although my brain can only recall exact phrases and ideas from recent entries, I think about my writing style (casual, yet indistinctly formal – a recipe for “intelligent” text?) and smile. I know I will always be imperfect, and surely there are numerous grammatical, spelling, and logic errors in my posts. Nonetheless, I’m proud of my work, and I’m very thankful that I have a quasi-organized, chronological, semi-continuous (and continuously backed-up) account of my thoughts going back to 2001, over 7 years ago! Although many of my current philosophies and views about life, love, and open source software remain the same, I’m evermore surprised at the stark contrast between my current ideology and the concepts expressed in some of my older writings (use Google to find my entry on “the corporation”, obviously written when I was irritated about someone – it’s about as anti-capitalistic as one could imagine!). I must digress; the only reason I wrote this entry tonight was to quote myself from October, 2002 (over 6 years ago – I just turned 17, and was a Jr. in high school). I was describing [in my mind what was a] catastrophic event: the accidental deletion of my file storage computer / web server’s entire hard drive. I’ll let the words speak for themselves.
“I am now in the process of trying to rebuild what I have lost. The many hours I have put into my websites has now deminished to nothing but some heat to be cast into the atmosphere by my heat sink.”
–Scott Harden (me) at age 17
How creative was I? Okay, so I was no Homer. Maybe I’m just being sentimental, but I still think that’s a beautiful over-literal description of how my life was deleted – “cast into the atmosphere by my heat sink” – it’s so cool! For those of you less computer savvy, a heat sink is what cools the main microchip in your computer.
If you’re in the mood for some 18′th century textile patterns you’ve stumbled upon the right place! Surprisingly, it’s incredibly difficult to find functional (seamless, tiling, free) damask-style patterns on the internet. If you don’t believe me, just Google / image search for it! It took me over an hour to find a functional pattern that tiled properly. Actually, to correct myself there, the image I downloaded didn’t even tile correctly!!! I had to manually modify it to make it seamless. So, free for all website makers, webmasters, wallpaper collectors, and Louis XVI enthusiasts: I give you a plethora of different colors of damask-style tiling backgrounds for whatever you want to do with it!
Okay, right off the bat there are a few shocking questions you may have. First, you’re probably wondering how I could possibly be writing again so soon, a mere matter of hours after posting my previous entry – a surprising action noting that it’s been almost a month since my presequent (the opposite of subsequent?) entry. To understand why, read the subsequent paragraph. Second, you may have noticed the awkward title. This was actually a quote from my organic chemistry professor. Without question, it was the most significant thing the man ever said in class. I believe it is also the only thing he said that I actually retained from that class – supported by the fact that I cannot even vaguely remember the context in which it was uttered (perhaps when he used a molecule attraction / people dating analogy to describe what happens when two atoms in a bond interact with another molecule which is more attractive to one of the atoms?). Organic chemistry emulsified my brain. Ha! That’s a good quote right there. For the record, an emulsion is what happens when an organic (oil-like) solution is mixed with an aqueous (water-like) solution (they don’t really mix) and then violently shaken to form millions of little tiny oil beads in water. Emulsions are very hard to break up, if not impossible. Milk is an emulsion, and I think you have to heat it up to get it to separate. Reminds me of my dorm room refrigerator actually…
The reason I decided to write again is because I’ve actually been feeling guilty about my last entry. Yeah, I know. I’m not one to flippantly use words that I later regret on this weblog. However, while sitting here in the confocal microscope room scanning slides and blogging in the time gaps caused by the 10 minute exposure time, I began pondering my attitude during the last entry. I described myself as “that annoying coworker”, and even voiced my frustration about dental school, ultimately noting that I understand no one reads this weblog anymore. Then I stopped to think about that last part… Why do i care who reads this? Yeah, I used to have this super-active website several years ago, where I’d brag about having five thousand new IP visits a month, but in retrospect it seems so fickle. I thought that this website made me so much more mature than the high school football players and cheerleaders who did little more than gloat in their own popularity, but there I was obsessing about how many people read my blog, how many comments I got on each post, and how many people were willing to donate their lives to beta test new versions of AIM hacking software. Wow, it’s been a while since I mentioned that. Venomcrack? AimPoo? AIM_H4X0R? Takes me back…
So, I began to feel guilty. The purpose of these writings must be the same now as it was in the beginning. The concept must come around full circle for it to work. I write for myself alone. If my writings benefit others in the way of advice, new contacts, or even simple amusement that’s great – but it can’t be why I write, and it can’t be my motivation. My desire to write is merely to ventilate trite emotions that are either too subtle or too complex to be worth explaining to somebody out loud. Additionally, my writings serve as a lens of perspective. Remember in the last paragraph how I talked about how I wrote about something that, years later, I read and ask myself “what was I thinking?” I think we could all use a reminder to take a step back in our opinions and ideas for a little while, and really place themselves in the big picture. Practically every entry over a couple years old has the same effect – I read it and squirm. Was I really that way? Did I really think that? How could I say those things? It’s a constant reminder that who I am now is probably similar, and other people look at me now the same way that I will look at myself in the future, which is probably similar to the way I think now about how I wrote in the past. Anyway, to add to the guilt I felt about writing how no one reads this anymore, when I logged in just now to write this I saw that someone actually left a comment! (thanks Kyle ^_^) It blew me away. Google analytics claims that I get about 4 visitors a day. I’m sure they’re disappointed googlers who are looking for something completely different than a personal weblog too. Anyway, cool stuff!
Hey, Kyle… since you’re probably the only one reading this (lol) I’ll write you a public, personal email. You should make a website of your own! I kind of disowned everybody I knew from my past, and the friends I did have with blogs stopped blogging. It’d be cool – we could build off of each other’s posts, ya’ know? Just a thought. I think WordPress allows you to make your own blog form their website so you don’t need to buy anything. And you get to be like Scott! I use WordPress =o)
Late nights in the laboratory make me sad. Yes, I have to work on my thesis so I can graduate, and yes I’m getting paid for my work, but at the same time I know I have a responsibility to my family (which pretty much just consists of my wife right now). I feel guilty when I’m gone late at night. I try to make maximum use of my time by working the ENTIRE time she goes to work (if she’s ever at work, I use it as a guilt-free opportunity to get my work done int he lab). On top of that, I put in random hours on weekdays, nights, and weekends. I try to do the best I can by coming home in the afternoon when she gets off work so we can spend a few hours together and eat lunch or dinner together, but whenever I leave again I feel guilty. Similar to the feeling I described earlier about fighting the notion that I’m “that annoying coworker”, I’m currently fighting the notion that I’m “one of those obsessive workaholics”. Just like I know I tend to be annoying sometimes, realize that I am obsessive sometimes. However, I’ve been working in the same laboratory on the same project for over a year. Am I obsessing about graduating? What level of hard work is excluding your family? How hard am I expected to work toward my degree, and how is this supposed to balance with my family? My wife claims she doesn’t really care, but I do. She goes out at night some days and doesn’t get back until early in the morning and she doesn’t seem to care, so why should I? I guess it’s guilt. Perhaps I don’t feel guilty initially, but then later I feel guilty for not feeling guilty. It’s the replacement of care for logic – it always produces uneasiness.
I’m in the laboratory right now with a few minutes to kill. To be specific, I have 5×4.5 minutes to kill. I’m in the middle of performing an immunohistochemical reaction on formalin-fixed 4-day-old explanted mouse hearts. I’ve let them incubate with primary antibody for ~48 hours, now it’s time for me to apply the biotinylated secondary antibody (which attaches to the primary antibody and provides a binding site for an avadin mixture I’ll be adding next (to create an “avadin biotin complex”, or ABC)). The problem is that if I apply the biotinylated secondary antibody immediately, it might attach to unattached primary antibody (antibody that’s simply floating in solution). Therefore, I have to thoroughly wash the tissue with a mixture of phosphate buffered saline (at physiological pH) and Triton-X100 (a super-strong detergent). Between my six five-minute-washes, I have time to write.
When browsing through my old blog posts, I realized that my current collection is not complete. I used to write blogs with raw HTML (half a decade ago), then I progressed to custom content management systems, then around age 18 I began using wordpress. No, it wasn’t wordpress – it was something similar though. Oh yeah, it was MovableType (how could I have forgotten? It used a pain-in-the-butt flat file database!). A friend of mine (”majestik”?) was sold on wordpress a few years after that, and convinced to switch (it used a SQL database). Since then I’ve been using wordpress faithfully. SQL databases are easy to manage, modify, and backup. However, I recently began to realize that a lot of my old entries are not here. I don’t know where they are… but they’re not in this database. Hopefully (if I ever find time) I can go through my old backup CDs (and iomega ZIP disks – for the super-old writings) and pull out my old words and post them here. In the mean time, I’ll have to settle for the incompleteness of the current system.
That got me to thinking… How could I visualize the homogenicity of my entries (by date)? I’m sure there’s some kind of wordpress plugin, PHP script, or other tech-savvy method for doing this, but I relied on trust ‘ol python to get me through. I copied/pasted the archive list (click “archives” on the side of the page to see it) which contained the title and date of every entry on record. I then converted the text into an array, isolated the dates, and had python export a text file of only the dates of the entries. I put that into excel, and used the frequency function to generate a histogram of the number of entries per half-year period (this is a fast and useful technique I use almost daily for data in the lab).
Not to shabby, huh? Note the transition between ScottIsHot.com and SWHarden.com (I lost my original domain name (which I never intended to use anyway) due to a billing error and was forced to get a different one as a result). It appears that my reliable blog record begins at 17 and ends at 20 – that’s only about 3 years. I remember writing my first few regular blog-style posts. I don’t think I knew what a blog was back then, but I felt like writing just because. I was sitting at my desk in my room with a newly-built FreeBSD webserver. I had a simple blue-background website with some kind of reddish logo at the top that I drew in Gimp (I was new to open source software and excited to use it). I remember thinking about what to write, and decided to write about how I felt about starting college soon. I also vividly remember the day I decided to stop writing. I was sitting at my desk in a dorm room in Tennessee (I’d already completed 2 and a half years of college) and thought to myself “I’m completely overwhelmed, I feel like I’m going to flunk all of my classes, and I have absolutely no time to do this anymore”. See that tiny little excuse for a bar at age 22.5? That’s this week’s post. I wonder if that bar will ever grow to match the rest… [ponders] Okay, back to labwork!
~5 hours pass ~
I just finished giving a presentation on the conformational plasticity of the human prion protein throughout oligomer and fibril development. It’s weird; graudate school is supposed to be harder than undergraduate, but why do all of these classes seem to be the same (or less) difficulty? CORE class is different: It’s easy, but the material is abstract and requires total memorization (which is incredibly difficult since there are only two tests per semester). The class I’m referring to is a seminar course. All I do is get a paper…