2:53:21 am on 3/12/10
Menu
» Home
» About Scott
» Biomed
» Old Stuff
» Archive
» Contact

Categories
» C/C++
» Circuitry
» Dentistry
» DIY ECG
» General
» Linux
» Microcontrollers
» Molecular Biology
» My Website
» PHP
» Prime Numbers
» Python
» Radio
» UCF Lab
» Everything
Writings
» MD Labels
» Streamrip
» AIM Thoughts
» WindowsXP?
» Partitioning
» CD/DVD Repair
» Monitor Info
» CRT Deflection
» Venomcrack
» Flash Thing
» Heart/Brain
» Diabetes
» Triops

Friends
» Fred
» Kyle W
» Nick
» Louis
» Tom
» Kyle H




Archives
» March 2010
» February 2010
» January 2010
» December 2009
» September 2009
» August 2009
» July 2009
» June 2009
» May 2009
» April 2009
» March 2009
» February 2009
» January 2009
» December 2008
» November 2008
» October 2008
» September 2008
» September 2007
» December 2006
» August 2006
» January 2006
» August 2005
» July 2005
» June 2005
» May 2005
» April 2005
» March 2005
» February 2005
» January 2005
» December 2004
» November 2004
» October 2004
» September 2004
» August 2004
» July 2004
» June 2004
» May 2004
» April 2004
» March 2004
» February 2004
» January 2004
» December 2003
» November 2003
» October 2003
» September 2003
» August 2003
» July 2003
» June 2003
» May 2003
» April 2003
» March 2003
» February 2003
» January 2003
» December 2002
» November 2002
» October 2002
» September 2002
» June 2001

You are currently browsing the archives for the My Website category.

Archive for the 'My Website' Category

« Previous Entries


Generate Apache-Style HTTP Access Logs via SQL and PHP
Posted by
Scott August 4th, 2009 | 5,253 words | No Comments »

Does your web hosting company block access to access.log, the text file containing raw website log files? If so, you’re like me, and it sucks. There’s a plethora of gorgeous and extremely insightful website traffic analyzers, but all of them require access to raw HTTP access logs. Today I propose a semi-efficient way to generate such logs utilizing PHP to determine page load data (time, user IP, requested page, referring page, user client, etc) and SQL to save such data for easy retrieval later. Note that this method is a HUGE improvement of my previous project which used PHP scripts to store HTTP access logs as flat files. Although it worked in theory, in all practicality the process of opening, writing to, and closing a text file (which grew a few MB a week) was too cumbersome for my server to comfortable handle. The method described on this page utilizes SQL, a database engine well-suited to meet these exact demands. When we’re done, you’ll be able to use a web interface to view your access log (pictured, converting long, complicated search queries to web search and image search strings automatically), or have the option to export it directly to an access.log text file in a standard Apache-style format.
sql_php_http_log_viewer

First, make sure your database is structured appropriately. This page is written for those with a working knowledge of PHP and SQL, but if you’re new to the field I encourage you to learn! W3Schools.com is an awesome resource to rapidly learn new languages. Also, when starting-out with SQL (like me), phpMyAdmin is a awesome. The code, as it’s currently written (below) is designed to store data in the “nibjb” database under the “logs” table. Briefly, it uses PHP to determine user data (time, ip, requested page, etc.) and injects this information into the SQL database. In fact, it’s doing it to you right now! Don’t believe me? View the source of this web page and scroll to the bottom. BAM! There you are.

// logme.php
<?php

if ( !isset($wp_did_header) ) {
	$wp_did_header = true;
	require_once( '/home/content/n/i/b/nibjb/html/blog/wp-load.php' );
	//wp();
	//require_once( '/home/content/n/i/b/nibjb/html/blog/wp-includes/template-loader.php' );
}

function logwriter_handlevar($varname,$defaultvalue){
    $tempvar = getenv($varname);
    if(!empty($tempvar)) {
        return $tempvar;
    } else {
        return $defaultvalue;
    }
} 

if (!empty($REMOTE_HOST)) {
$logwriter_remote_vistor = $REMOTE_HOST;
}else{
$logwriter_remote_vistor = logwriter_handlevar("REMOTE_ADDR","-");
} 

$logwriter_remote_ident = logwriter_handlevar("REMOTE_IDENT","-");
$logwriter_remote_user = logwriter_handlevar("REMOTE_USER","-");
$logwriter_date = date("d/M/Y:H:i:s");
$logwriter_request_method = logwriter_handlevar("REQUEST_METHOD","GET");
$logwriter_request_uri = logwriter_handlevar("REQUEST_URI","");
$logwriter_server_protocol = logwriter_handlevar("SERVER_PROTOCOL","HTTP/1.1");
$logwriter_http_referer = logwriter_handlevar("HTTP_REFERER","-");
$logwriter_http_user_agent = logwriter_handlevar("HTTP_USER_AGENT","");
$logwriter_logstring = "$logwriter_remote_vistor $logwriter_remote_ident $logwriter_remote_user [$logwriter_date $logwriter_timezone] \"$logwriter_request_method $logwriter_request_uri $logwriter_server_protocol\" 200 - \"$logwriter_http_referer\" \"$logwriter_http_user_agent\"\n";
?>

<?php
$username="YOUR_USERNAME";
$password="YOUR_PASSWORD";
$database="nibjb";
mysql_connect('mysql157.secureserver.net',$username,$password);
//mysql_connect(localhost,$username,$password);

$query = "INSERT INTO logs VALUES ('','$logwriter_date','$logwriter_remote_vistor','$logwriter_request_method','$logwriter_request_uri','$logwriter_server_protocol','$logwriter_http_referer','$logwriter_http_user_agent')";
mysql_query($query);
mysql_close();
?>

<!--
LOG DETAILS:
time: <?php echo($logwriter_date); ?>
vistor: <?php echo($logwriter_remote_vistor); ?>
method: <?php echo($logwriter_request_method); ?>
request: <?php echo($logwriter_request_uri); ?>
protocol: <?php echo($logwriter_server_protocol); ?>
referrer: <?php echo($logwriter_http_referer); ?>
agent: <?php echo($logwriter_http_user_agent); ?>
HTML LOG LINE:
<?php echo($logwriter_logstring); ?>
 -->

All right, that was easy. Every time we load logme.php, it adds the data to the SQL database. To add data every time you go to a particular web page, you could use a PHP include() statement in each webpage, or you could take advantage of the PHP’s auto_append_file feature! Simply insert the following line into your php.ini file if you have access to yours:

auto_append_file = "/path/to/html/logme.php"

How do we access this data once it’s been loaded into the database? There are many different ways, but I’ve chosen to get a little creative with a sleek, yet minimalistic web-based fronted. It basically just shows the last [x] number of entries in the access log. You can adjust the number of entries displayed by slapping on some arguments to the URL, transforming viewLast.php into viewLast.php?limit=123 or something (see the screenshot above). I won’t discuss the details of this script. It’s self-explanatory.

// viewLast.php
<html>
<head>
<style type="text/css">
td {
font-family: verdana, arial;
font-size:10px;
}
</style>
</head>
<body>
<?php

$limit = (int)$_GET['limit'];
if ($limit===0) {$limit=25;}

$username="YOUR_USERNAME";
$password="YOUR_PASSWORD";
$database="nibjb";
mysql_connect('mysql157.secureserver.net',$username,$password);
mysql_select_db($database) or die( "Unable to select database");
$query="
SELECT * FROM logs WHERE
request NOT LIKE \"%testlog.php%\"
AND request NOT LIKE  \"%/logs/%\"
AND request NOT LIKE \"%/wp-admin/%\"
ORDER BY ID DESC LIMIT 0,$limit
";
//$query="SELECT * FROM logs WHERE referrer LIKE \"%&q=%\" or referrer LIKE \"%&prev=%\" ";
$result=mysql_query($query);
$num=mysql_numrows($result);
mysql_close();
?>

<b><?php echo($query); ?></b>
<table border="1">
<tr>
<td>id</td>
<td>time</td>
<td>visitor</td>
<td>request</td>
<td>referrer</td>
</tr>

<?php
$i=1;
while ($i<$num) {
$id=mysql_result($result,$i,"id");
$time=mysql_result($result,$i,"time");
$visitor=mysql_result($result,$i,"visitor");
$method=mysql_result($result,$i,"method");
$request=mysql_result($result,$i,"request");
$protocol=mysql_result($result,$i,"protocol");
$referrer=mysql_result($result,$i,"referrer");
$referrer2=str_replace("&", "& ", $referrer);
$agent=mysql_result($result,$i,"agent");
$searchWords="";
$searchEngine="";
if (strpos($referrer, "q=")>0 and strpos($referrer, "google")>0) {$searchEngine="Google Web Search: ";}
if (strpos($referrer, "prev=/images")>0 and strpos($referrer, "google")>0) {$searchEngine="Google Image Search: ";}

// SEARCH EXTRACTION //
$j=0;
$rTemp=str_replace("prev=/images%3Fq%3D", "q=", $referrer);
$rTemp=str_replace("?q=","&q=", $rTemp);
$rTemp=str_replace("%2B"," ", $rTemp);
$rTemp=str_replace("%26"," ", $rTemp);
$rTemp=str_replace("%3D"," ", $rTemp);
$rTemp=str_replace("+"," ", $rTemp);
$wvars=split("&",$rTemp);
while ($j<count($wvars)){
	if (substr($wvars[$j],0,2) === "q=") {
		$searchWords = $searchWords . $wvars[$j] . " ";
		}
	$j++;
}

$searchWords=substr($searchWords,strpos($searchWords, "q=")+2);
if (strlen($searchWords)<3) {$searchWords=$referrer;}
////////////////////////

echo "
<tr>
<td>$id</td>
<td>$time</td>
<td>$visitor</td>
<td><a href='$request'>$request</a></td>
<td>$searchEngine <a href='$referrer'>$searchWords</a></td>
</td>
";
$i++;
}
?>
</table>
</body>
</html>

And you’re done! This example is a simplified, bare bones example. You can take this a long way if you’d like. My goal is lite & flexible. A quick query from Python and Matplotlib (for example) yields gorgeous visual representations of otherwise-convoluted data!

If you have any questions, or end-up developing something awesome with this code, shoot me an email! It’s not luxurious, but this code works for me, and I share it with the best of intentions.



Transition
Posted by
Scott July 23rd, 2009 | 5,253 words | No Comments »

I’m briefly suspending entries on this website. I currently have no projects I’m working on, and I’m going to try to keep it that way for a few weeks. I really need to re-gear my brain and get ready for dental school next month. I’m struggling with a plethora of random emotions, and I think the best thing for me is to take it easy for a little bit and try to let go of the things I feel are important to me (projects, electrical, mechanical, computational, painting, or otherwise). I’m going to try my best to organize data from my past life (about a decade worth) in an attempt to preserve it. I’ve been thrown back into my early teen years by uncovering ~10 GB of music I used to listen to. Nostalgia? Yeah, I’m feeling it. I had totally forgotten about random, obscure Japanese bands such as Rip Slyme. For example, Hot Chocolate [a must hear / must see youtube video]. In fact, [youtubes some more], check out this randomness [embeds below]. I love non-mainstream awkwardness. What’s that I hear? 8-bit tones?



Removing Textile From Wordpress
Posted by
Scott May 15th, 2009 | 5,253 words | 1 Comment »

I realized that the C code from yesterday wasn’t showing-up properly because of textile, a rapid, inline, tag-based formatting system. The app converted blog code from ["text":http://www.SWHarden.com/ *like* _this_] to [text like this. ] While it’s fun and convenient to use, it’s not always practical. The problem I was having was that in C code, variable names (such as _delay_) were becoming irrevocably italicized, and nothing I did could prevent textile from ignoring code while styling text. The kicker is that I couldn’t disable it easily, because I’ve been writing in this style for over four years! I decided that the time was now to put my mad Python skills to the test and write code to handle the conversion from textile-format to raw HTML.
I accomplished this feat in a number of steps. Yeah, I could have done hours of research to find a “faster way”, but it simply wouldn’t have been as creative. In a nutshell, I backed-up the SQL database using PHPMyAdmin to a single “x.sql” file. I then wrote a pythons script to parse this [massive] file and output “o.sql”, the same data but with all of the textile tags I commonly used replaced by their HTML equivalent. It’s not 100% perfect, but it’s 99.999% perfect. I’ll accept that. The output? You’re viewing it! Here’s the code I used to do it:

## This Python (1.0) script removes *SOME* textile formatting from Wordpress
## backups in plain text SQL format (dumped from PHP MyAdmin). Specifically,
## it corrects bold and itallic fonts and corrects links. It should be easy
## to expand if you need to do something else with it.
## Enjoy! --Scott Harden (www.SWHarden.com)

infile = 'x.sql' # < < THIS IS THE INPUT FILE NAME!

replacements=   ["\r"," "],["\n"," \n "],["*:","* :"],["_:","_ :"],
                ["\n","<br>\n"],[">*","> *"],["*< ","* <"],
                [">_","> _"],["_< ","_ <"],
                [" *"," <b>"],["* "," "],[" _"," <i>"],["_ ","</i> "]
                #These are the easy replacements

def fixLinks(line):
    ## replace ["links":URL] with [<a href="URL">links</a>]. ##
    words = line.split(" ")
    for i in range(len(words)):
        word = words[i]
        if '":' in word:
            upto=1
            while (word.count('"')&lt;2):
                word = words[i-upto]+" "+word
                upto+=1
            word_orig = word
            extra=""
            word = word.split('":')
            word[0]=word[0][1:]
            for char in ".),'":
                if word[1][-1]==char: extra=char
            if len(extra)>0: word[1]=word[1][:-1]
            word_new='<a href="%s">%s</a>'%(word[1],word[0])+extra
            line=line.replace(word_orig,word_new)
    return line

def stripTextile(orig):
    ## Handle the replacements and link fixing for each line. ##
    if not orig.count("', '") == 13: return orig #non-normal post
    line=orig
    temp = line.split
    line = line.split("', '",5)[2]
    if len(line)&lt;10:return orig #non-normal post
    origline = line
    line = " "+line
    for replacement in replacements:
        line = line.replace(replacement[0],replacement[1])
    line=fixLinks(line)
    line = orig.replace(origline,line)
    return line

f=open(infile)
raw=f.readlines()
f.close
posts=0
for raw_i in range(len(raw)):
    if raw[raw_i][:11]=="INSERT INTO":
        if "wp_posts" in raw[raw_i]: #if it's a post, handle it!
            posts+=1
            print "on post",posts
            raw[raw_i]=stripTextile(raw[raw_i])

print "WRITING..."
out = ""
for line in raw:
    out+=line
f=open('o.sql','w')
f.write(out)
f.close()

I certainly held my breath while the thing ran. As I previously mentioned, this thing modified SQL tables. Therefore, when I uploaded the “corrected” versions, I kept breaking the site until I got all the bugs worked out. Here’s an image from earlier today when my site was totally dead (0 blog posts)

hostingwork



PHP-Generated Access.log is a Success
Posted by
Scott May 14th, 2009 | 5,253 words | No Comments »


THIS CODE HAS BEEN UPDATED!
THIS CODE HAS BEEN UPDATED!
THIS CODE HAS BEEN UPDATED!

>>> CHECK OUT THE NEW CODE < <<
[Generate Apache-Style HTTP Access Logs via SQL and PHP]

OBSOLETE CODE IS BELOW…

A few months ago I wrote about a way I use PHP to generate apache-style access.log files since my web host blocks access to them. Since then I’ve forgotten it was even running! I now have some pretty cool-looking graphs generated by Python and Matplotlib. For details (and the messy script) check the original posting.

This image represents the number of requests (php pages) made per hour since I implemented the script. It might be a good idea to perform some linear data smoothing techniques (which I love writing about), but for now I’ll leave it as it is so it most accurately reflects the actual data.



Is Google Re-Indexing My Life?
Posted by
Scott April 2nd, 2009 | 5,253 words | No Comments »

After several years of persistent writing on this website I was forced (by my undergraduate university’s difficult course loads) to stop adding to this blog – something I consider to be one of the most significant projects I’ve ever worked on, with brain-to-text recordings of my thoughts spanning almost a decade of time. After a few years of suspended writing, Google went from loving me (sending me thousands of pageviews daily) to forgetting about me (nothing. silence. nada.). Now that my thesis requirements have been completed, I’m trying to re-energize my writing in an attempt to document the projects I work on which, without this website, would likely be forever forgotten even by me. It appears that the burst of new writing has regained Google’s attention. Google for terms such as “data smoothing in python” and it favors my site. Google is slowly, but surely, re-indexing my pages and assigning them values of relevance which are approaching (but still a tiny fraction of) what they were before my hiatus. Here’s a chart from google’s analytics demonstrating an estimation of IP visits per day (visitors) and their locations. Do I have fans in South Africa? I didn’t know they had computers in South Africa! (I’m sorry if you are that person in South Africa, and were offended by that statement)



Integrated Writing
Posted by
Scott February 17th, 2009 | 5,253 words | No Comments »

While skimming an earlier post of mine I decided to try representing the size of my blog (measured as the number of words) as a curve integrated with respect to the posting date. I had to perform my moving-triangle smoothing method with a 20 day window to get it to come out nicely (to correct for skipped days, double posts, etc) and I’m pleased with the result. Why did I do all of this? Because I can. Now, back to work. [sigh]

This begs the question: What was the late 2003 spike for? Well, I can’t say for sure, but I’d speculate this is the height of my geekdom. Just look at my blogs from December, 2003 – what do I talk about? Random life stuff (which, for me, mostly boiled down to network trouble and hardware projects). Curiously, the trace of my integrated blog size is a good (yet indirect) measure of my geekness. The last few years I’ve been relatively normal, but it appears I’m becoming more geeky again.



Analyzing my Writings with Python
Posted by
Scott January 29th, 2009 | 5,253 words | 3 Comments »

*I spent the day* in the lab with some random time on my hands between adding reagents to an ongoing immunohistochemical reaction I was performing. At one point I decided to further investigate the field of bioinformatics (is it worth seeking a PhD in this field if I don’t get into dental school again?). UCF offers a PhD in bioinformatics but it’s a new and small department (I think there are only 4 faculty). The degree itself is a degree in computer science (the logic side of computers, more programming than designing hardware). A degree in bioinformatics combines molecular biology (DNA, proteins, etc), computer science (programming), and statistics (developing code to analyze biological data). I feel a need to express what it is, because it’s not something that is commonly understood. Do you know what people who study bioinformatics do?

*I came across a paper* today “Structural Alignment of Pseudoknotted RNA”:http://cseweb.ucsd.edu/users/shzhang/app/RECOMB2005_pseudoknot.pdf (by Han B, Dost B, Bafna V, and Zhang S.) which is a good example of the practice of bioinformatics. Think about what goes on in a cell… the sequence of a gene (a short region of DNA) is copied (letter-by-letter) onto an RNA molecule. The RNA molecule is later read by an enzyme (called a ribosome) and converted into a protein based on its sequence. (This process is the central dogma of molecular biology) Traditionally, it was believed that RNA molecules’ only function was to copy gene sequences from DNA to ribosomes, but recently (the last several years) it was discovered that some small RNA molecules are never read and turned into proteins, but rather serve their own unique functions! For example, some RNA molecules (siRNAs) can actually turn genes on and off, and have been assosiated with cancer development and other immune diseases. Given the human genome (the ~3 billion letter long sequence all of our DNA), how can we determine what regions form these functional RNA molecules which don’t get converted into proteins? The paper I mentioned earlier addresses this. An algorithm was developed and used to test regions of DNA and predict its probability of forming small RNA molecules. Spikes in this trace (figure 7 of the paper) represent areas of the DNA which are likely to form these RNA molecules. (Is this useful? What if you were to compare these results between normal person and someone with cancer?)

*After reading the article* I thought to myself “Hmmm… logically manipulating large amounts of linear data… why does this seem familiar?” Then I realized how similar my current programming projects are with this one. (see “my latest DIY ECG data”:http://www.swharden.com/blog/images/ecg_goodie.png posted a couple days ago) Consider the trace (pictured, figure 7 in “Structural Alignment of Pseudoknotted RNA”:http://cseweb.ucsd.edu/users/shzhang/app/RECOMB2005_pseudoknot.pdf) of score (the likelihood that a region of DNA forms an RNA molecule), where peaks represent likely locations of RNA formation. Just generate the trace, determine the positions of the peaks, and you’re golden. How similar is this to the work I’ve been doing with my homemade ECG machine, where I perform signal analysis to eliminate electrical noise and then analyze the resulting trace to isolate and identify peaks corresponding to heartbeats?

*After reading* I shivered from mental-overload. There are so many exciting Python projects in the field of bioinformatics that are just waiting for me to begin work on! I know I’m like a child sometimes, but hey it’s my personality. I get excited. It’s just that I get excited about tacky things these days. Anyway, I got the itch to write a string-analysis program. What does it do? It reads the content of my website (exported in the form of a SQL backup query generated by PHPmyAdmin, pictured), splits it up by date, and allows for its analysis. Ultimately I want to track the usage of certain words (i.e.: the inverse relationship between the words “girls” and “python”), but for now I wrote a script which plots the number of words I wrote. Observe the output.

*Pretty cool huh?* Check out all those spikes between 2004 and 2005! (previous figure) Not only are they numerous (meaning many posts), but they’re also high (meaning many words per post). As you can see by the top trace, the most significant contribution to my site occurred during this time. So, let’s zoom in on it! (next figure)

*And of course, the code to produce this…* (obviously you have to have a wordpress backup SQL file in the same folder – if you want mine let me know and I’ll email it to ya’)

import datetime, pylab, numpy
# Let's convert SQL-backups of my WordPress blog into charts! yay!
class blogChrono():
    baseUrl="http://www.SWHarden.com/blog"
    posts=[]
    dates=[]
    def __init__(self,fname):
        self.fname=fname
        self.load()
    def load(self):
        print "loading [%s]..."%self.fname,
        f=open(self.fname)
        raw=f.readlines()
        f.close()
        for line in raw:
            if "INSERT INTO" in line
            and';' in line[-2:-1]
            and " 'post'," in line[-20:-1]:
                post={}
                line=line.split("VALUES(",1)[1][:-3]
                line=line.replace(', NULL',', None')
                line=line.replace(", '',",", None,")
                line=line.replace("''","")
                c= line.split(',',4)[4][::-1]
                c= c.split(" ,",21)
                text=c[-1]
                text=text[::-1]
                text=text[2:-1]
                text=text.replace('"""','###')
                line=line.replace(text,'blogtext')
                line=line.replace(', ,',', None,')
                line=eval("["+line+"]")
                if len(line[4])>len('blogtext'):
                    x=str(line[4].split(', '))[2:-2]
                    raw=str(line)
                    raw=raw.replace(line[4],x)
                    line=eval(raw)
                post["id"]=int(line[0])
                post["date"]=datetime.datetime.strptime(line[2],
                                                        "%Y-%m-%d %H:%M:%S")
                post["text"]=eval('"""'+text+' """')
                post["title"]=line[5]
                post["url"]=line[21]
                post["comm"]=int(line[25])
                post["words"]=post["text"].count(" ")
                self.dates.append(post["date"])
                self.posts.append(post)
        self.dates.sort()
        d=self.dates[:]
        i,newposts=0,[]
        while len(self.posts)>0:
            die=min(self.dates)
            for post in self.posts:
                if post["date"]==die:
                    self.dates.remove(die)
                    newposts.append(post)
                    self.posts.remove(post)
        self.posts,self.dates=newposts,d
        print "read %d posts!n"%len(self.posts)

#d=blogChrono('sml.sql')
d=blogChrono('test.sql')

fig=pylab.figure(figsize=(7,5))
dates,lengths,words,ltot,wtot=[],[],[],[0],[0]
for post in d.posts:
    dates.append(post["date"])
    lengths.append(len(post["text"]))
    ltot.append(ltot[-1]+lengths[-1])
    words.append(post["words"])
    wtot.append(wtot[-1]+words[-1])
ltot,wtot=ltot[1:],wtot[1:]

pylab.subplot(211)
#pylab.plot(dates,numpy.array(ltot)/(10.0**6),label="letters")
pylab.plot(dates,numpy.array(wtot)/(10.0**3),label="words")
pylab.ylabel("Thousand")
pylab.title("Total Blogged Words")
pylab.grid(alpha=.2)
#pylab.legend()
fig.autofmt_xdate()
pylab.subplot(212,sharex=pylab.subplot(211))
pylab.bar(dates,numpy.array(words)/(10.0**3))
pylab.title("Words Per Entry")
pylab.ylabel("Thousand")
pylab.xlabel("Date")
pylab.grid(alpha=.2)
#pylab.axis([min(d.dates),max(d.dates),None,20])
fig.autofmt_xdate()
pylab.subplots_adjust(left=.1,bottom=.13,right=.98,top=.92,hspace=.25)
width=675
pylab.savefig('out.png',dpi=675/7)
pylab.show()

print "DONE"

*I wrote a Python script to analyze the word frequency* of the blogs in my website (extracted from an SQL query WordPress backup file) for frequency. “This is what I came up with”:http://swharden.com/little/worddump.html I then took my giant list over to “Wordie”:http://www.wordle.net/create and had them create a super-cool little word jumble. Neat, huh? Here’s “a picture”:http://www.SWHarden.com/blog/images/wordie2.png that’s cool but not worth posting.

*This is the script to make the worddump:*

import datetime, pylab, numpy
f=open('dump.txt')
body=f.read()
f.close()
body=body.lower()
body=body.split(" ")
tot=float(len(body))
words={}
for word in body:
    for i in word:
        if 65< =ord(i)<=90 or 97<=ord(i)<=122: pass
        else: word=None
    if word:
        if not word in words:words[word]=0
        words[word]=words[word]+1
data=[]
for word in words: data.append([words[word],word])
data.sort()
data.reverse()
out= "Out of %d words...n"%tot
xs=[]
for i in range(1000):
    d=data[i]
    out += '"%s" ranks #%d used %d times (%.05f%%)n'%
                (d[1],i+1,d[0],d[0]/tot)
f=open("dump.html",'w')
f.write(out)
f.close()
print "DONE"


Celebrity Dwarf Gouramis
Posted by
Scott January 29th, 2009 | 5,253 words | No Comments »

So I was reviewing my website statistics generated by a Python script I wrote when I noticed a peculiarity so bizarre that it made me questin the very purpose of my life. Okay maybe it wasn’t that bizarre, but it was interesting. The python script (which is automatically run every hour) downloads my latest access.log and saves it to its own folder. It then analyzes the data, creates some charts and graphs, and dumps out a bare-bones results file displaying some of the information I found useful. Of note is the number of times each page is hit.

This is where things get funny. Outperforming my home page by nearly double was indexOld.php (now indexOld22.php) – a simple webpage I tossed of for about a year before I put my big blog back online! Why were people still going to this page? Further investigation (from the referring sites section of my stats page) revealed a lot of hits from Google image-searches. I started looking at the actual requests and realized that many of these hits were people searching for the term Dwarf Gouramis “a type of freshwater aquarium fish) which was mentioned on that old webpage. The ironic part about it is what happens when you google image search for dwarf gouramis there is a picture of an extremely rare zebra pleco which is actually a link to my website! However the link APPEARS to be to wallpaperfishtalk.com because on my page I just linked to their image.

My conclusion: People are Google image-searching for ‘dwarf gouramis’, and an amazing picture of a zebra pleco is coming up which links to my site (due to the fact that months ago I talked about dwarf gouramis but posted a photo of a zebra pleco) and people (in their awe at this amazing fish) are clicking on it. So what did I do? I pulled a bait-and-switch! You bet I did. Now when you go to indexOld2.php it just forwards you to my current website – mua ha ha ha ha

PS: I’m appending to this entry at 2:17pm to note that I made a wonderful breakthrough in the lab today. Due to intellectual property protection blah blah and the fact that I don’t want anyone else to beat me to my research goal I will not describe what this is, I’ll just say that it took months of preparation and today – presto! It worked beautifully =oD



Using PHP to Create Apache-Style Access.log
Posted by
Scott January 22nd, 2009 | 5,253 words | 2 Comments »


THIS CODE HAS BEEN UPDATED!
THIS CODE HAS BEEN UPDATED!
THIS CODE HAS BEEN UPDATED!

>>> CHECK OUT THE NEW CODE < <<
[Generate Apache-Style HTTP Access Logs via SQL and PHP]

OBSOLETE CODE IS BELOW…

My web server blocks access to my apache-generated visitor logs (commonly stored in “access.log”). Therefore, many great site usage stats generators (such as awstats – see this example) cannot be used to analyze web traffic to my site. (How many people go what pages? Where do they come from? What search phrases do they type into Google to find my website?) My web host does allow PHP, and access to php.ini, so I figured that I could generate my own access.log using PHP code. I succeeded, but had a hard time doing this because it’s not clearly documented elsewhere – so I’ll make it clear.

Sample line from access.log generated by my PHP script:
132.170.10.227 – - [22/Jan/2009:11:58:49 +0800] “GET /blog/2005-06-29-eva-05-attack-scotts-sanity/ HTTP/1.1″ 200 – “http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&hs=8Lk&q=swharden+eva-05&btnG=Search” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5″

All I had to do was insert the following line at the end of my php.ini file:

 auto_append_file = "/home/content/n/i/b/nibjb/html/logme.php"  
 

And I placed logme.php in my root folder with the following code:

 $logwriter_logformat = "combined"; // log format,combined or common  
 $logwriter_logdir = "/home/content/n/i/b/nibjb/html/logs/"; // physical path where your log file located  
 $logwriter_logfilename = "access.log"; // your log file's filename  
 $logwriter_timezone = "+0800"; // your server's time zone. +0800 means GMT+8  

 function logwriter_writelog($logstring){  

 global $logwriter_logdir,$logwriter_logfilename;  
 $fullpathfilename = $logwriter_logdir.$logwriter_logfilename;  

 if (!is_file($fullpathfilename)) {  
 print "Log file doesn't exist or file is corrupt.";  
 return;  
 }  

 if (!is_writeable($fullpathfilename)) {  
 print "Log file is not writable,please change its permission.";  
 return;  
 }  

 if($fp = @fopen($fullpathfilename, "a")) {  
 flock($fp, 2);  
 fputs($fp, $logstring);  
 fclose($fp);  
 }  
 }  

 function logwriter_handlevar($varname,$defaultvalue) {  
 $tempvar = getenv($varname);  
 if(!empty($tempvar)) {  
 return $tempvar;  
 } else {  
 return $defaultvalue;  
 }  
 }  

 if (!empty($REMOTE_HOST)) {  
 $logwriter_remote_vistor = $REMOTE_HOST;  
 }else{  
 $logwriter_remote_vistor = logwriter_handlevar("REMOTE_ADDR","-");  
 }  

 $logwriter_remote_ident = logwriter_handlevar("REMOTE_IDENT","-");  
 $logwriter_remote_user = logwriter_handlevar("REMOTE_USER","-");  
 $logwriter_date = date("d/M/Y:H:i:s");  

 $logwriter_server_port = logwriter_handlevar("SERVER_PORT","80");  
 if($logwriter_server_port!="80") {  
 $logwriter_server_port =   
 }else{  
 $logwriter_server_port = "";  
 }  

 $logwriter_request_method = logwriter_handlevar("REQUEST_METHOD","GET");  
 $logwriter_request_uri = logwriter_handlevar("REQUEST_URI","");  
 $logwriter_server_protocol = logwriter_handlevar("SERVER_PROTOCOL","HTTP/1.1");  

 if ($logwriter_logformat=="common") {  
 $logwriter_logstring = "$logwriter_remote_vistor $logwriter_remote_ident $logwriter_remote_user [$logwriter_date $logwriter_timezone] "$logwriter_request_method $logwriter_request_uri $logwriter_server_protocol" 200 - 
 ";  
 }else{  

 $logwriter_http_referer = logwriter_handlevar("HTTP_REFERER","-");  
 $logwriter_http_user_agent = logwriter_handlevar("HTTP_USER_AGENT","");  

 $logwriter_logstring = "$logwriter_remote_vistor $logwriter_remote_ident $logwriter_remote_user [$logwriter_date $logwriter_timezone] "$logwriter_request_method $logwriter_request_uri $logwriter_server_protocol" 200 - "$logwriter_http_referer" "$logwriter_http_user_agent" 
 ";  

 }  

 logwriter_writelog($logwriter_logstring);  
 

Note that the PHP code must be surrounded with < ? php ?> as demonstrated here

The result? As you can tell, my logme.php dumps data to www.swharden.com/logs/access.log – if you browse a few pages on my website, or even use Google to search for me (ie: google for ’swharden’ and ‘minidisc’) you can see yourself in the logfile – pretty cool huh? Once I have a good volume of log data I’ll demonstrate how to turn it into useful information.



Enamored by a Past Life
Posted by
Scott November 29th, 2008 | 5,253 words | 4 Comments »

I realize and accept the fact that I’ve been talking about the same thing the last several posts. I’ll mention it one more time, then let it go. I don’t know why I’m enamored by my past life – I guess I’m not realizing that I was “cool” in a way I never realized before. The irony of course is that this realization comes years after any coolness left. Now instead of fun times, computer jokes, and savvy programming projects, I’m stuck behind a laboratory bench performing monotonous research, studying for classes and exams I have absolutely zero interest in, and trying to stretch my imagination as far as I can to somehow pull my school and work to a level where I “need” to write software to accomplish it. It’s this constant tugging both toward, and away from, who I was several years ago. As I mentioned in the previous entry, I’m rebuilding both of my main PCs at my house (software-wise, at least). I rely on the no-ip DNS assistance client to give a domain name to my dynamic home IP address. “swharden.sytes.net always points to my home IP address, which right now happens to be 97.104.81.110) Although I don’t use my home system for serious web server purposes, I do connect to my home network from all over using SSH to access the linux terminal. I also run a small web server and torrent server to help share things now and then. Anyway, this is why I posted…

Whenever I download the No-IP DNS client utility, I stop and check out my little contribution to the project. There, on the support page, under client configuration, you can find The Newbie’s Guide to the No-IP™ Linux Client which is a guide I volunteered to contribute to the company several years ago. It won some award for the best entry and I received 2 years of a free domain name of my choice. Since I had nothing to lose (heck, it was free) I registered ScottIsHot.com (the former home of this very blog). When was that? [searches posts] Well, my Wonderful Days blog entry from OCT 2003 mentions the blog as 4 months old, so I’d guess ScottIsHot.com started around July of 03. I think it took half a year from my tutorial submission to my prize, so let’s assume I wrote it in late 2002. That’s 6 years ago? I was about 17 I guess. It was probably a short time after I wrote the entry I described in the previous post. Hey, going back to something I mentioned earlier…

Why isn’t the SSH server installed and activated by default in new Ubuntu installations? Maybe it’s some kind of security thing, who knows. The point I’m trying to get at is that, if I hadn’t been told about ssh years ago when I first began my venture into open source operating systems while running a FreeBSD webserver, would I have known about it now? Is SSH common knowledge? I use it multiple times daily – it’s critical to my needs! Since essentially everything in linux can be accomplished by the console, the ability to connect to my home linux PC’s console remotely from any other PC is incredibly valuable! I guess this is a message to anyone just starting out with linux. Learn to use SSH. Oh, and screen. Very nice =o)

As a closing note I thought I’d post a screenshot found on the Gentoo Linux website demonstrating what the desktop of a gentoo developer looks like. I noticed the wallpaper and, although it wasn’t not too surprising, I still got a chuckle from it. I still love the feel of a desktop with totally transparent borderless terminal windows, and the speed and responsiveness you get from a cut-the-crap window manager like FluxBox. Hey, a random thought popped in my head just now. I wonder if it reveals my psyche more than I can describe? My thought (unedited for logic or embarrassment and as pure as I can reconstruct it) was this (and I’ll use the fancy quotes):

“I wonder what it would feel like knowing that one day I would be lucky enough to be an active member of the development team for such an important project as FluxBox – wait, I have teeth cleanings to look forward to instead…”

Yeah, I know there are some logical arguments. Dentistry isn’t necessarily all or nothing. Just because I would be working as a dentist (not necessarily only cleaning teeth though) doesn’t mean I’d have to forgo my desire to be a part of something important. I guess it just means that any significant contributions are unlikely. After investing time in my family and my career, I doubt I’d have enough free time to be actively involved in any kind of meaningful open source project. Ever. [keels over; dies promptly]

« Previous Entries
copyright © 2006 swharden@gmail.com