10:44:21 pm on 9/6/10
Menu
» Home
» About Scott
» VD Labs
» QRSS VD
» Old Stuff
» Archive
» Contact

Categories
» C/C++
» Circuitry
» DIY ECG
» General
» Linux
» Microcontrollers
» Molecular Biology
» My Website
» PHP
» Prime Numbers
» Python
» Radio
» UCF Lab
» Everything
Writings
» MD Labels
» Streamrip
» AIM Thoughts
» WindowsXP?
» Partitioning
» CD/DVD Repair
» Monitor Info
» CRT Deflection
» Venomcrack
» Flash Thing
» Heart/Brain
» Diabetes
» Triops
» Biomed

Friends
» Fred
» Kyle W
» Nick
» Louis
» Tom
» Kyle H




Archives
» September 2010
» August 2010
» July 2010
» June 2010
» May 2010
» April 2010
» March 2010
» February 2010
» January 2010
» December 2009
» September 2009
» August 2009
» July 2009
» June 2009
» May 2009
» April 2009
» March 2009
» February 2009
» January 2009
» December 2008
» November 2008
» October 2008
» September 2008
» September 2007
» December 2006
» August 2006
» January 2006
» August 2005
» July 2005
» June 2005
» May 2005
» April 2005
» March 2005
» February 2005
» January 2005
» December 2004
» November 2004
» October 2004
» September 2004
» August 2004
» July 2004
» June 2004
» May 2004
» April 2004
» March 2004
» February 2004
» January 2004
» December 2003
» November 2003
» October 2003
» September 2003
» August 2003
» July 2003
» June 2003
» May 2003
» April 2003
» March 2003
» February 2003
» January 2003
» December 2002
» November 2002
» October 2002
» September 2002
» June 2001
« Free Damask Seamless Tiling Backgrounds
Run Ubuntu Live CD From a USB Drive »


Linear Data Smoothing in Python
900 words | Posted by Scott on November 17th, 2008
Scott was 23.15 years old when he wrote this!
Filed under: General, Python

Here’s a scrumptious morsel of juicy python code for even the most stoic of scientists to get excited about. Granted, it’s a very simple concept and has surely been done countless times before, but there aren’t any good resources for this code on the internet. Since I had to write my own code to perform a variety of different linear 1-dimensional array data smoothing in python, I decided it would be nice to share it. At the bottom of this post you can see a PNG image which is the file output by the code listen even further below. If you copy/paste the code into an empty text file and run it in Python, it will generate the exact same PNG file (assuming you have pylab and numpy libraries configured).

 ### This is the Gaussian data smoothing function I wrote ###  
 def smoothListGaussian(list,degree=5):  
     window=degree*2-1  
     weight=numpy.array([1.0]*window)  
     weightGauss=[]  
     for i in range(window):  
         i=i-degree+1  
         frac=i/float(window)  
         gauss=1/(numpy.exp((4*(frac))**2))  
         weightGauss.append(gauss)  
     weight=numpy.array(weightGauss)*weight  
     smoothed=[0.0]*(len(list)-window)  
     for i in range(len(smoothed)):  
         smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)  
     return smoothed  
 

Basically, you feed it a list (it doesn’t matter how long it is) and it will return a smoother version of the data. The Gaussian smoothing function I wrote is leagues better than a moving window average method, for reasons that are obvious when viewing the chart below. Surprisingly, the moving triangle method appears to be very similar to the Gaussian function at low degrees of spread. However, for huge numbers of data points, the Gaussian function should perform better.

 ### This is the code to produce the image displayed above ###  
 import pylab,numpy  

 def smoothList(list,strippedXs=False,degree=10):  
     if strippedXs==True: return Xs[0:-(len(list)-(len(list)-degree+1))]  
     smoothed=[0]*(len(list)-degree+1)  
     for i in range(len(smoothed)):  
         smoothed[i]=sum(list[i:i+degree])/float(degree)  
     return smoothed  

 def smoothListTriangle(list,strippedXs=False,degree=5):  
     weight=[]  
     window=degree*2-1  
     smoothed=[0.0]*(len(list)-window)  
     for x in range(1,2*degree):weight.append(degree-abs(degree-x))  
     w=numpy.array(weight)  
     for i in range(len(smoothed)):  
         smoothed[i]=sum(numpy.array(list[i:i+window])*w)/float(sum(w))  
     return smoothed  

 def smoothListGaussian(list,strippedXs=False,degree=5):  
     window=degree*2-1  
     weight=numpy.array([1.0]*window)  
     weightGauss=[]  
     for i in range(window):  
         i=i-degree+1  
         frac=i/float(window)  
         gauss=1/(numpy.exp((4*(frac))**2))  
         weightGauss.append(gauss)  
     weight=numpy.array(weightGauss)*weight  
     smoothed=[0.0]*(len(list)-window)  
     for i in range(len(smoothed)):  
         smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)  
     return smoothed  

 ### DUMMY DATA ###  
 data = [0]*30 #30 "0"s in a row  
 data[15]=1    #the middle one is "1"  

 ### PLOT DIFFERENT SMOOTHING FUNCTIONS ###  

 pylab.figure(figsize=(550/80,700/80))  
 pylab.suptitle('1D Data Smoothing', fontsize=16)  

 pylab.subplot(4,1,1)  
 p1=pylab.plot(data,".k")  
 p1=pylab.plot(data,"-k")  
 a=pylab.axis()  
 pylab.axis([a[0],a[1],-.1,1.1])  
 pylab.text(2,.8,"raw data",fontsize=14)  

 pylab.subplot(4,1,2)  
 p1=pylab.plot(smoothList(data),".k")  
 p1=pylab.plot(smoothList(data),"-k")  
 a=pylab.axis()  
 pylab.axis([a[0],a[1],-.1,.4])  
 pylab.text(2,.3,"moving window average",fontsize=14)  

 pylab.subplot(4,1,3)  
 p1=pylab.plot(smoothListTriangle(data),".k")  
 p1=pylab.plot(smoothListTriangle(data),"-k")  
 pylab.axis([a[0],a[1],-.1,.4])  
 pylab.text(2,.3,"moving triangle",fontsize=14)  

 pylab.subplot(4,1,4)  
 p1=pylab.plot(smoothListGaussian(data),".k")  
 p1=pylab.plot(smoothListGaussian(data),"-k")  
 pylab.axis([a[0],a[1],-.1,.4])  
 pylab.text(2,.3,"moving gaussian",fontsize=14)  

 #pylab.show()  
 pylab.savefig("smooth.png",dpi=80)  
 

Hey, I had a great idea, why don’t I test it on some of my own data? Due to the fact that I don’t want the details of my thesis work getting out onto the internet too early, I can’t reveal exactly what this data is from. It will suffice to say that it’s fractional density of neurite coverage in thick muscle tissue. Anyhow, this data is wild and in desperate need of some smoothing. Below is a visual representation of the differences in the methods of smoothing. Yayness! I like the gaussian function the best.

I should note that the degree of window coverage for the moving window average, moving triangle, and gaussian functions are 10, 5, and 5 respectively. Also note that (due to the handling of the “degree” variable between the different functions) the actual number of data points assessed in these three functions are 10, 9, and 9 respectively. The degree for the last two functions represents “spread” from each point, whereas the first one represents the total number of points to be averaged for the moving average. Enjoy.





This entry was posted on Monday, November 17th, 2008 at 2:50 pmand is filed under General, Python. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.



8 Responses to “Linear Data Smoothing in Python”

FreeToGo wrote the following at 02:54:59 AM on May 25th, 2009

Judging from your 1-D smoothing result, I guess u to offset your the same amount as your “degree”. Otherwise, the peak value after smoothing became 10 rather than 15.

Scott wrote the following at 03:28:36 PM on May 28th, 2009

To correct for offset, buffer your data by degree/2. If the degree is 10, put 5 blank (or copied) data points in the beginning of the data set.

Saketh wrote the following at 04:32:10 PM on June 11th, 2009

Thanks for this! It’s very helpful.

Vittorio wrote the following at 08:40:59 AM on April 29th, 2010

Thanks a lot for the snippet !!!
Works fine but the resulting list is shorter of (degree*2) elements.
I think that padding would be useful…. but thanks again ;-)

Helena wrote the following at 02:00:09 AM on May 4th, 2010

Hi,

Thanks for the code. I’m stupid when it comes to programming. I don’t know how I can apply your code (for instance the smoothListGaussian) on a file containing two columns. I guess I don’t want to read in every column separately and then smooth every column…

For instance, how did you do it for your “own” data above?

Best regards,

Helena

Vvn wrote the following at 02:30:33 AM on May 12th, 2010

thanks for the article. It saved me time and i wasnt even aware of triangle and guassian smoothing…

Viven Rajendra wrote the following at 01:16:29 PM on May 15th, 2010

This should fix the padding/buffer issue….

def smooth_list(list,degree=10):
list = [list[0]]*((degree-1)/2) + list + [list[-1]]*(degree/2)
…………….
……………
def smooth_list_triangle(list,degree=10):
list = [list[0]]*(degree-1) + list + [list[-1]]*degree
…………….
…………….
def smooth_list_gaussian(list,degree=10):
list = [list[0]]*(degree-1) + list + [list[-1]]*degree
……………..
…………….

Anonymous wrote the following at 02:00:27 AM on August 23rd, 2010

You may find this interesting:

http://www.scipy.org/Cookbook/SignalSmooth

Leave a Reply




copyright © 2006 swharden@gmail.com