Archive for the 'My Website' Category
Integrated Writing
Posted by Scott February 17th, 2009 | 5,253 words | No Comments »
Scott was 23.40 years old when he wrote this!
While skimming an earlier post of mine I decided to try representing the size of my blog (measured as the number of words) as a curve integrated with respect to the posting date. I had to perform my moving-triangle smoothing method with a 20 day window to get it to come out nicely (to correct for skipped days, double posts, etc) and I’m pleased with the result. Why did I do all of this? Because I can. Now, back to work. [sigh]
This begs the question: What was the late 2003 spike for? Well, I can’t say for sure, but I’d speculate this is the height of my geekdom. Just look at my blogs from December, 2003 – what do I talk about? Random life stuff (which, for me, mostly boiled down to network trouble and hardware projects). Curiously, the trace of my integrated blog size is a good (yet indirect) measure of my geekness. The last few years I’ve been relatively normal, but it appears I’m becoming more geeky again.
Analyzing my Writings with Python
Posted by Scott January 29th, 2009 | 5,253 words | 3 Comments »
Scott was 23.35 years old when he wrote this!
*I spent the day* in the lab with some random time on my hands between adding reagents to an ongoing immunohistochemical reaction I was performing. At one point I decided to further investigate the field of bioinformatics (is it worth seeking a PhD in this field if I don’t get into dental school again?). UCF offers a PhD in bioinformatics but it’s a new and small department (I think there are only 4 faculty). The degree itself is a degree in computer science (the logic side of computers, more programming than designing hardware). A degree in bioinformatics combines molecular biology (DNA, proteins, etc), computer science (programming), and statistics (developing code to analyze biological data). I feel a need to express what it is, because it’s not something that is commonly understood. Do you know what people who study bioinformatics do?
*I came across a paper* today “Structural Alignment of Pseudoknotted RNA”:http://cseweb.ucsd.edu/users/shzhang/app/RECOMB2005_pseudoknot.pdf (by Han B, Dost B, Bafna V, and Zhang S.) which is a good example of the practice of bioinformatics. Think about what goes on in a cell… the sequence of a gene (a short region of DNA) is copied (letter-by-letter) onto an RNA molecule. The RNA molecule is later read by an enzyme (called a ribosome) and converted into a protein based on its sequence. (This process is the central dogma of molecular biology) Traditionally, it was believed that RNA molecules’ only function was to copy gene sequences from DNA to ribosomes, but recently (the last several years) it was discovered that some small RNA molecules are never read and turned into proteins, but rather serve their own unique functions! For example, some RNA molecules (siRNAs) can actually turn genes on and off, and have been assosiated with cancer development and other immune diseases. Given the human genome (the ~3 billion letter long sequence all of our DNA), how can we determine what regions form these functional RNA molecules which don’t get converted into proteins? The paper I mentioned earlier addresses this. An algorithm was developed and used to test regions of DNA and predict its probability of forming small RNA molecules. Spikes in this trace (figure 7 of the paper) represent areas of the DNA which are likely to form these RNA molecules. (Is this useful? What if you were to compare these results between normal person and someone with cancer?)
*After reading the article* I thought to myself “Hmmm… logically manipulating large amounts of linear data… why does this seem familiar?” Then I realized how similar my current programming projects are with this one. (see “my latest DIY ECG data”:http://www.swharden.com/blog/images/ecg_goodie.png posted a couple days ago)
Consider the trace (pictured, figure 7 in “Structural Alignment of Pseudoknotted RNA”:http://cseweb.ucsd.edu/users/shzhang/app/RECOMB2005_pseudoknot.pdf) of score (the likelihood that a region of DNA forms an RNA molecule), where peaks represent likely locations of RNA formation. Just generate the trace, determine the positions of the peaks, and you’re golden. How similar is this to the work I’ve been doing with my homemade ECG machine, where I perform signal analysis to eliminate electrical noise and then analyze the resulting trace to isolate and identify peaks corresponding to heartbeats?
*After reading* I shivered from mental-overload.
There are so many exciting Python projects in the field of bioinformatics that are just waiting for me to begin work on! I know I’m like a child sometimes, but hey it’s my personality. I get excited. It’s just that I get excited about tacky things these days. Anyway, I got the itch to write a string-analysis program. What does it do? It reads the content of my website (exported in the form of a SQL backup query generated by PHPmyAdmin, pictured), splits it up by date, and allows for its analysis. Ultimately I want to track the usage of certain words (i.e.: the inverse relationship between the words “girls” and “python”), but for now I wrote a script which plots the number of words I wrote. Observe the output.

*Pretty cool huh?* Check out all those spikes between 2004 and 2005! (previous figure) Not only are they numerous (meaning many posts), but they’re also high (meaning many words per post). As you can see by the top trace, the most significant contribution to my site occurred during this time. So, let’s zoom in on it! (next figure)

*And of course, the code to produce this…* (obviously you have to have a wordpress backup SQL file in the same folder – if you want mine let me know and I’ll email it to ya’)
import datetime, pylab, numpy
# Let's convert SQL-backups of my WordPress blog into charts! yay!
class blogChrono():
baseUrl="http://www.SWHarden.com/blog"
posts=[]
dates=[]
def __init__(self,fname):
self.fname=fname
self.load()
def load(self):
print "loading [%s]..."%self.fname,
f=open(self.fname)
raw=f.readlines()
f.close()
for line in raw:
if "INSERT INTO" in line
and';' in line[-2:-1]
and " 'post'," in line[-20:-1]:
post={}
line=line.split("VALUES(",1)[1][:-3]
line=line.replace(', NULL',', None')
line=line.replace(", '',",", None,")
line=line.replace("''","")
c= line.split(',',4)[4][::-1]
c= c.split(" ,",21)
text=c[-1]
text=text[::-1]
text=text[2:-1]
text=text.replace('"""','###')
line=line.replace(text,'blogtext')
line=line.replace(', ,',', None,')
line=eval("["+line+"]")
if len(line[4])>len('blogtext'):
x=str(line[4].split(', '))[2:-2]
raw=str(line)
raw=raw.replace(line[4],x)
line=eval(raw)
post["id"]=int(line[0])
post["date"]=datetime.datetime.strptime(line[2],
"%Y-%m-%d %H:%M:%S")
post["text"]=eval('"""'+text+' """')
post["title"]=line[5]
post["url"]=line[21]
post["comm"]=int(line[25])
post["words"]=post["text"].count(" ")
self.dates.append(post["date"])
self.posts.append(post)
self.dates.sort()
d=self.dates[:]
i,newposts=0,[]
while len(self.posts)>0:
die=min(self.dates)
for post in self.posts:
if post["date"]==die:
self.dates.remove(die)
newposts.append(post)
self.posts.remove(post)
self.posts,self.dates=newposts,d
print "read %d posts!n"%len(self.posts)
#d=blogChrono('sml.sql')
d=blogChrono('test.sql')
fig=pylab.figure(figsize=(7,5))
dates,lengths,words,ltot,wtot=[],[],[],[0],[0]
for post in d.posts:
dates.append(post["date"])
lengths.append(len(post["text"]))
ltot.append(ltot[-1]+lengths[-1])
words.append(post["words"])
wtot.append(wtot[-1]+words[-1])
ltot,wtot=ltot[1:],wtot[1:]
pylab.subplot(211)
#pylab.plot(dates,numpy.array(ltot)/(10.0**6),label="letters")
pylab.plot(dates,numpy.array(wtot)/(10.0**3),label="words")
pylab.ylabel("Thousand")
pylab.title("Total Blogged Words")
pylab.grid(alpha=.2)
#pylab.legend()
fig.autofmt_xdate()
pylab.subplot(212,sharex=pylab.subplot(211))
pylab.bar(dates,numpy.array(words)/(10.0**3))
pylab.title("Words Per Entry")
pylab.ylabel("Thousand")
pylab.xlabel("Date")
pylab.grid(alpha=.2)
#pylab.axis([min(d.dates),max(d.dates),None,20])
fig.autofmt_xdate()
pylab.subplots_adjust(left=.1,bottom=.13,right=.98,top=.92,hspace=.25)
width=675
pylab.savefig('out.png',dpi=675/7)
pylab.show()
print "DONE"
*I wrote a Python script to analyze the word frequency* of the blogs in my website (extracted from an SQL query WordPress backup file) for frequency. “This is what I came up with”:http://swharden.com/little/worddump.html I then took my giant list over to “Wordie”:http://www.wordle.net/create and had them create a super-cool little word jumble. Neat, huh? Here’s “a picture”:http://www.SWHarden.com/blog/images/wordie2.png that’s cool but not worth posting.

*This is the script to make the worddump:*
import datetime, pylab, numpy
f=open('dump.txt')
body=f.read()
f.close()
body=body.lower()
body=body.split(" ")
tot=float(len(body))
words={}
for word in body:
for i in word:
if 65< =ord(i)<=90 or 97<=ord(i)<=122: pass
else: word=None
if word:
if not word in words:words[word]=0
words[word]=words[word]+1
data=[]
for word in words: data.append([words[word],word])
data.sort()
data.reverse()
out= "Out of %d words...n"%tot
xs=[]
for i in range(1000):
d=data[i]
out += '"%s" ranks #%d used %d times (%.05f%%)n'%
(d[1],i+1,d[0],d[0]/tot)
f=open("dump.html",'w')
f.write(out)
f.close()
print "DONE"
Celebrity Dwarf Gouramis
Posted by Scott January 29th, 2009 | 5,253 words | No Comments »
Scott was 26.65 years old when he wrote this!
So I was reviewing my website statistics generated by a Python script I wrote when I noticed a peculiarity so bizarre that it made me questin the very purpose of my life. Okay maybe it wasn’t that bizarre, but it was interesting. The python script (which is automatically run every hour) downloads my
latest access.log and saves it to its own folder. It then analyzes the data, creates some charts and graphs, and dumps out a bare-bones results file displaying some of the information I found useful. Of note is the number of times each page is hit.
This is where things get funny. Outperforming my home page by nearly double was indexOld.php (now indexOld22.php) – a simple webpage I tossed of for about a year before I put my big blog back online! Why were people still going to this page? Further investigation (from the referring sites section of my stats page) revealed a lot of hits from Google image-searches. I started looking at the actual requests and realized that many of these hits were people searching for the term Dwarf Gouramis “a type of freshwater aquarium fish) which was mentioned on that old webpage. The ironic part about it is what happens when you google image search for dwarf gouramis there is a picture of an extremely rare zebra pleco which is actually a link to my website! However the link APPEARS to be to wallpaperfishtalk.com because on my page I just linked to their image.
My conclusion: People are Google image-searching for ‘dwarf gouramis’, and an amazing picture of a zebra pleco is coming up which links to my site (due to the fact that months ago I talked about dwarf gouramis but posted a photo of a zebra pleco) and people (in their awe at this amazing fish) are clicking on it. So what did I do? I pulled a bait-and-switch! You bet I did. Now when you go to indexOld2.php it just forwards you to my current website – mua ha ha ha ha
PS: I’m appending to this entry at 2:17pm to note that I made a wonderful breakthrough in the lab today. Due to intellectual property protection blah blah and the fact that I don’t want anyone else to beat me to my research goal I will not describe what this is, I’ll just say that it took months of preparation and today – presto! It worked beautifully =oD

Using PHP to Create Apache-Style Access.log
Posted by Scott January 22nd, 2009 | 5,253 words | 2 Comments »
Scott was 23.33 years old when he wrote this!
My web server blocks access to my apache-generated visitor logs (commonly stored in “access.log”). Therefore, many great site usage stats generators (such as awstats – see this example) cannot be used to analyze web traffic to my site. (How many people go what pages? Where do they come from? What search phrases do they type into Google to find my website?) My web host does allow PHP, and access to php.ini, so I figured that I could generate my own access.log using PHP code. I succeeded, but had a hard time doing this because it’s not clearly documented elsewhere – so I’ll make it clear.
Sample line from access.log generated by my PHP script:
132.170.10.227 – - [22/Jan/2009:11:58:49 +0800] “GET /blog/2005-06-29-eva-05-attack-scotts-sanity/ HTTP/1.1″ 200 – “http://www.google.com/search?hl=en&client=firefox-a&rls=org.mozilla%3Aen-US%3Aofficial&hs=8Lk&q=swharden+eva-05&btnG=Search” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5″
All I had to do was insert the following line at the end of my php.ini file:
auto_append_file = "/home/content/n/i/b/nibjb/html/logme.php"
And I placed logme.php in my root folder with the following code:
$logwriter_logformat = "combined"; // log format,combined or common
$logwriter_logdir = "/home/content/n/i/b/nibjb/html/logs/"; // physical path where your log file located
$logwriter_logfilename = "access.log"; // your log file's filename
$logwriter_timezone = "+0800"; // your server's time zone. +0800 means GMT+8
function logwriter_writelog($logstring){
global $logwriter_logdir,$logwriter_logfilename;
$fullpathfilename = $logwriter_logdir.$logwriter_logfilename;
if (!is_file($fullpathfilename)) {
print "Log file doesn't exist or file is corrupt.";
return;
}
if (!is_writeable($fullpathfilename)) {
print "Log file is not writable,please change its permission.";
return;
}
if($fp = @fopen($fullpathfilename, "a")) {
flock($fp, 2);
fputs($fp, $logstring);
fclose($fp);
}
}
function logwriter_handlevar($varname,$defaultvalue) {
$tempvar = getenv($varname);
if(!empty($tempvar)) {
return $tempvar;
} else {
return $defaultvalue;
}
}
if (!empty($REMOTE_HOST)) {
$logwriter_remote_vistor = $REMOTE_HOST;
}else{
$logwriter_remote_vistor = logwriter_handlevar("REMOTE_ADDR","-");
}
$logwriter_remote_ident = logwriter_handlevar("REMOTE_IDENT","-");
$logwriter_remote_user = logwriter_handlevar("REMOTE_USER","-");
$logwriter_date = date("d/M/Y:H:i:s");
$logwriter_server_port = logwriter_handlevar("SERVER_PORT","80");
if($logwriter_server_port!="80") {
$logwriter_server_port =
}else{
$logwriter_server_port = "";
}
$logwriter_request_method = logwriter_handlevar("REQUEST_METHOD","GET");
$logwriter_request_uri = logwriter_handlevar("REQUEST_URI","");
$logwriter_server_protocol = logwriter_handlevar("SERVER_PROTOCOL","HTTP/1.1");
if ($logwriter_logformat=="common") {
$logwriter_logstring = "$logwriter_remote_vistor $logwriter_remote_ident $logwriter_remote_user [$logwriter_date $logwriter_timezone] "$logwriter_request_method $logwriter_request_uri $logwriter_server_protocol" 200 -
";
}else{
$logwriter_http_referer = logwriter_handlevar("HTTP_REFERER","-");
$logwriter_http_user_agent = logwriter_handlevar("HTTP_USER_AGENT","");
$logwriter_logstring = "$logwriter_remote_vistor $logwriter_remote_ident $logwriter_remote_user [$logwriter_date $logwriter_timezone] "$logwriter_request_method $logwriter_request_uri $logwriter_server_protocol" 200 - "$logwriter_http_referer" "$logwriter_http_user_agent"
";
}
logwriter_writelog($logwriter_logstring);
Note that the PHP code must be surrounded with < ? php ?> as demonstrated here
The result? As you can tell, my logme.php dumps data to www.swharden.com/logs/access.log – if you browse a few pages on my website, or even use Google to search for me (ie: google for ’swharden’ and ‘minidisc’) you can see yourself in the logfile – pretty cool huh? Once I have a good volume of log data I’ll demonstrate how to turn it into useful information.
Enamored by a Past Life
Posted by Scott November 29th, 2008 | 5,253 words | 4 Comments »
Scott was 23.18 years old when he wrote this!
I realize and accept the fact that I’ve been talking about the same thing the last several posts. I’ll mention it one more time, then let it go. I don’t know why I’m enamored by my past life – I guess I’m not realizing that I was “cool” in a way I never realized before.
The irony of course is that this realization comes years after any coolness left. Now instead of fun times, computer jokes, and savvy programming projects, I’m stuck behind a laboratory bench performing monotonous research, studying for classes and exams I have absolutely zero interest in, and trying to stretch my imagination as far as I can to somehow pull my school and work to a level where I “need” to write software to accomplish it. It’s this constant tugging both toward, and away from, who I was several years ago. As I mentioned in the previous entry, I’m rebuilding both of my main PCs at my house (software-wise, at least). I rely on the no-ip DNS assistance client to give a domain name to my dynamic home IP address. “swharden.sytes.net always points to my home IP address, which right now happens to be 97.104.81.110) Although I don’t use my home system for serious web server purposes, I do connect to my home network from all over using SSH to access the linux terminal. I also run a small web server and torrent server to help share things now and then. Anyway, this is why I posted…
Whenever I download the No-IP DNS client utility, I stop
and check out my little contribution to the project. There, on the support page, under client configuration, you can find The Newbie’s Guide to the No-IP™ Linux Client which is a guide I volunteered to contribute to the company several years ago. It won some award for the best entry and I received 2 years of a free domain name of my choice. Since I had nothing to lose (heck, it was free) I registered ScottIsHot.com (the former home of this very blog). When was that? [searches posts] Well, my Wonderful Days blog entry from OCT 2003 mentions the blog as 4 months old, so I’d guess ScottIsHot.com started around July of 03. I think it took half a year from my tutorial submission to my prize, so let’s assume I wrote it in late 2002. That’s 6 years ago? I was about 17 I guess. It was probably a short time after I wrote the entry I described in the previous post. Hey, going back to something I mentioned earlier…
Why isn’t the SSH server installed and activated by default in new Ubuntu installations? Maybe it’s some kind of security thing, who knows. The point I’m trying to get at is that, if I hadn’t been told about ssh years ago when I first began my venture into open source operating systems while running a FreeBSD webserver, would I have known about it now? Is SSH common knowledge?
I use it multiple times daily – it’s critical to my needs! Since essentially everything in linux can be accomplished by the console, the ability to connect to my home linux PC’s console remotely from any other PC is incredibly valuable! I guess this is a message to anyone just starting out with linux. Learn to use SSH. Oh, and screen. Very nice =o)
As a closing note I thought I’d post a screenshot found on the Gentoo Linux website demonstrating what the desktop of a gentoo developer looks like. I noticed the wallpaper and, although it wasn’t not too surprising, I still got a chuckle from it. I still love the feel of a desktop with totally transparent borderless terminal windows, and the speed and responsiveness you get from a cut-the-crap window manager like FluxBox. Hey, a random thought popped in my head just now. I wonder if it reveals my psyche more than I can describe? My thought (unedited for logic or embarrassment and as pure as I can reconstruct it) was this (and I’ll use the fancy quotes):
| “I wonder what it would feel like knowing that one day I would be lucky enough to be an active member of the development team for such an important project as FluxBox – wait, I have teeth cleanings to look forward to instead…” |
Yeah, I know there are some logical arguments. Dentistry isn’t necessarily all or nothing. Just because I would be working as a dentist (not necessarily only cleaning teeth though) doesn’t mean I’d have to forgo my desire to be a part of something important. I guess it just means that any significant contributions are unlikely. After investing time in my family and my career, I doubt I’d have enough free time to be actively involved in any kind of meaningful open source project. Ever. [keels over; dies promptly]