Linear Data Smoothing in Python

Here’s a scrumptious morsel of juicy python code for even the most stoic of scientists to get excited about. Granted, it’s a very simple concept and has surely been done countless times before, but there aren’t any good resources for this code on the internet. Since I had to write my own code to perform a variety of different linear 1-dimensional array data smoothing in python, I decided it would be nice to share it. At the bottom of this post you can see a PNG image which is the file output by the code listen even further below. If you copy/paste the code into an empty text file and run it in Python, it will generate the exact same PNG file (assuming you have pylab and numpy libraries configured).

  

 ### This is the Gaussian data smoothing function I wrote ###  

 def smoothListGaussian(list,degree=5):  

     window=degree*2-1  

     weight=numpy.array([1.0]*window)  

     weightGauss=[]  

     for i in range(window):  

         i=i-degree+1  

         frac=i/float(window)  

         gauss=1/(numpy.exp((4*(frac))**2))  

         weightGauss.append(gauss)  

     weight=numpy.array(weightGauss)*weight  

     smoothed=[0.0]*(len(list)-window)  

     for i in range(len(smoothed)):  

         smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)  

     return smoothed  

 

Basically, you feed it a list (it doesn’t matter how long it is) and it will return a smoother version of the data. The Gaussian smoothing function I wrote is leagues better than a moving window average method, for reasons that are obvious when viewing the chart below. Surprisingly, the moving triangle method appears to be very similar to the Gaussian function at low degrees of spread. However, for huge numbers of data points, the Gaussian function should perform better.

  

 ### This is the code to produce the image displayed above ###  

 import pylab,numpy  

     

 def smoothList(list,strippedXs=False,degree=10):  

     if strippedXs==True: return Xs[0:-(len(list)-(len(list)-degree+1))]  

     smoothed=[0]*(len(list)-degree+1)  

     for i in range(len(smoothed)):  

         smoothed[i]=sum(list[i:i+degree])/float(degree)  

     return smoothed  

   

 def smoothListTriangle(list,strippedXs=False,degree=5):  

     weight=[]  

     window=degree*2-1  

     smoothed=[0.0]*(len(list)-window)  

     for x in range(1,2*degree):weight.append(degree-abs(degree-x))  

     w=numpy.array(weight)  

     for i in range(len(smoothed)):  

         smoothed[i]=sum(numpy.array(list[i:i+window])*w)/float(sum(w))  

     return smoothed  

   

 def smoothListGaussian(list,strippedXs=False,degree=5):  

     window=degree*2-1  

     weight=numpy.array([1.0]*window)  

     weightGauss=[]  

     for i in range(window):  

         i=i-degree+1  

         frac=i/float(window)  

         gauss=1/(numpy.exp((4*(frac))**2))  

         weightGauss.append(gauss)  

     weight=numpy.array(weightGauss)*weight  

     smoothed=[0.0]*(len(list)-window)  

     for i in range(len(smoothed)):  

         smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)  

     return smoothed  

   

 ### DUMMY DATA ###  

 data = [0]*30 #30 "0"s in a row  

 data[15]=1    #the middle one is "1"  

   

 ### PLOT DIFFERENT SMOOTHING FUNCTIONS ###  

   

 pylab.figure(figsize=(550/80,700/80))  

 pylab.suptitle('1D Data Smoothing', fontsize=16)  

   

 pylab.subplot(4,1,1)  

 p1=pylab.plot(data,".k")  

 p1=pylab.plot(data,"-k")  

 a=pylab.axis()  

 pylab.axis([a[0],a[1],-.1,1.1])  

 pylab.text(2,.8,"raw data",fontsize=14)  

   

 pylab.subplot(4,1,2)  

 p1=pylab.plot(smoothList(data),".k")  

 p1=pylab.plot(smoothList(data),"-k")  

 a=pylab.axis()  

 pylab.axis([a[0],a[1],-.1,.4])  

 pylab.text(2,.3,"moving window average",fontsize=14)  

   

 pylab.subplot(4,1,3)  

 p1=pylab.plot(smoothListTriangle(data),".k")  

 p1=pylab.plot(smoothListTriangle(data),"-k")  

 pylab.axis([a[0],a[1],-.1,.4])  

 pylab.text(2,.3,"moving triangle",fontsize=14)  

   

 pylab.subplot(4,1,4)  

 p1=pylab.plot(smoothListGaussian(data),".k")  

 p1=pylab.plot(smoothListGaussian(data),"-k")  

 pylab.axis([a[0],a[1],-.1,.4])  

 pylab.text(2,.3,"moving gaussian",fontsize=14)  

   

 #pylab.show()  

 pylab.savefig("smooth.png",dpi=80)  

 

Hey, I had a great idea, why don’t I test it on some of my own data? Due to the fact that I don’t want the details of my thesis work getting out onto the internet too early, I can’t reveal exactly what this data is from. It will suffice to say that it’s fractional density of neurite coverage in thick muscle tissue. Anyhow, this data is wild and in desperate need of some smoothing. Below is a visual representation of the differences in the methods of smoothing. Yayness! I like the gaussian function the best.

I should note that the degree of window coverage for the moving window average, moving triangle, and gaussian functions are 10, 5, and 5 respectively. Also note that (due to the handling of the “degree” variable between the different functions) the actual number of data points assessed in these three functions are 10, 9, and 9 respectively. The degree for the last two functions represents “spread” from each point, whereas the first one represents the total number of points to be averaged for the moving average. Enjoy.


     

Free Damask Seamless Tiling Backgrounds

If you’re in the mood for some 18’th century textile patterns you’ve stumbled upon the right place! Surprisingly, it’s incredibly difficult to find functional (seamless, tiling, free) damask-style patterns on the internet. If you don’t believe me, just Google / image search for it! It took me over an hour to find a functional pattern that tiled properly. Actually, to correct myself there, the image I downloaded didn’t even tile correctly!!! I had to manually modify it to make it seamless. So, free for all website makers, webmasters, wallpaper collectors, and Louis XVI enthusiasts: I give you a plethora of different colors of damask-style tiling backgrounds for whatever you want to do with it!