Linear Data Smoothing in Python
900 words | Posted by Scott on November 17th, 2008
Scott was 23.15 years old when he wrote this!
Filed under: General, Python
Here’s a scrumptious morsel of juicy python code for even the most stoic of scientists to get excited about. Granted, it’s a very simple concept and has surely been done countless times before, but there aren’t any good resources for this code on the internet. Since I had to write my own code to perform a variety of different linear 1-dimensional array data smoothing in python, I decided it would be nice to share it. At the bottom of this post you can see a PNG image which is the file output by the code listen even further below. If you copy/paste the code into an empty text file and run it in Python, it will generate the exact same PNG file (assuming you have pylab and numpy libraries configured).
### This is the Gaussian data smoothing function I wrote ###
def smoothListGaussian(list,degree=5):
window=degree*2-1
weight=numpy.array([1.0]*window)
weightGauss=[]
for i in range(window):
i=i-degree+1
frac=i/float(window)
gauss=1/(numpy.exp((4*(frac))**2))
weightGauss.append(gauss)
weight=numpy.array(weightGauss)*weight
smoothed=[0.0]*(len(list)-window)
for i in range(len(smoothed)):
smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)
return smoothed
Basically, you feed it a list (it doesn’t matter how long it is) and it will return a smoother version of the data. The Gaussian smoothing function I wrote is leagues better than a moving window average method, for reasons that are obvious when viewing the chart below. Surprisingly, the moving triangle method appears to be very similar to the Gaussian function at low degrees of spread. However, for huge numbers of data points, the Gaussian function should perform better.
### This is the code to produce the image displayed above ###
import pylab,numpy
def smoothList(list,strippedXs=False,degree=10):
if strippedXs==True: return Xs[0:-(len(list)-(len(list)-degree+1))]
smoothed=[0]*(len(list)-degree+1)
for i in range(len(smoothed)):
smoothed[i]=sum(list[i:i+degree])/float(degree)
return smoothed
def smoothListTriangle(list,strippedXs=False,degree=5):
weight=[]
window=degree*2-1
smoothed=[0.0]*(len(list)-window)
for x in range(1,2*degree):weight.append(degree-abs(degree-x))
w=numpy.array(weight)
for i in range(len(smoothed)):
smoothed[i]=sum(numpy.array(list[i:i+window])*w)/float(sum(w))
return smoothed
def smoothListGaussian(list,strippedXs=False,degree=5):
window=degree*2-1
weight=numpy.array([1.0]*window)
weightGauss=[]
for i in range(window):
i=i-degree+1
frac=i/float(window)
gauss=1/(numpy.exp((4*(frac))**2))
weightGauss.append(gauss)
weight=numpy.array(weightGauss)*weight
smoothed=[0.0]*(len(list)-window)
for i in range(len(smoothed)):
smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)
return smoothed
### DUMMY DATA ###
data = [0]*30 #30 "0"s in a row
data[15]=1 #the middle one is "1"
### PLOT DIFFERENT SMOOTHING FUNCTIONS ###
pylab.figure(figsize=(550/80,700/80))
pylab.suptitle('1D Data Smoothing', fontsize=16)
pylab.subplot(4,1,1)
p1=pylab.plot(data,".k")
p1=pylab.plot(data,"-k")
a=pylab.axis()
pylab.axis([a[0],a[1],-.1,1.1])
pylab.text(2,.8,"raw data",fontsize=14)
pylab.subplot(4,1,2)
p1=pylab.plot(smoothList(data),".k")
p1=pylab.plot(smoothList(data),"-k")
a=pylab.axis()
pylab.axis([a[0],a[1],-.1,.4])
pylab.text(2,.3,"moving window average",fontsize=14)
pylab.subplot(4,1,3)
p1=pylab.plot(smoothListTriangle(data),".k")
p1=pylab.plot(smoothListTriangle(data),"-k")
pylab.axis([a[0],a[1],-.1,.4])
pylab.text(2,.3,"moving triangle",fontsize=14)
pylab.subplot(4,1,4)
p1=pylab.plot(smoothListGaussian(data),".k")
p1=pylab.plot(smoothListGaussian(data),"-k")
pylab.axis([a[0],a[1],-.1,.4])
pylab.text(2,.3,"moving gaussian",fontsize=14)
#pylab.show()
pylab.savefig("smooth.png",dpi=80)
Hey, I had a great idea, why don’t I test it on some of my own data? Due to the fact that I don’t want the details of my thesis work getting out onto the internet too early, I can’t reveal exactly what this data is from. It will suffice to say that it’s fractional density of neurite coverage in thick muscle tissue. Anyhow, this data is wild and in desperate need of some smoothing. Below is a visual representation of the differences in the methods of smoothing. Yayness! I like the gaussian function the best.
I should note that the degree of window coverage for the moving window average, moving triangle, and gaussian functions are 10, 5, and 5 respectively. Also note that (due to the handling of the “degree” variable between the different functions) the actual number of data points assessed in these three functions are 10, 9, and 9 respectively. The degree for the last two functions represents “spread” from each point, whereas the first one represents the total number of points to be averaged for the moving average. Enjoy.
This entry was posted on Monday, November 17th, 2008 at 2:50 pmand is filed under General, Python. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.
8 Responses to “Linear Data Smoothing in Python”
| FreeToGo wrote the following at 02:54:59 AM on May 25th, 2009 |
|
Judging from your 1-D smoothing result, I guess u to offset your the same amount as your “degree”. Otherwise, the peak value after smoothing became 10 rather than 15. |
| Scott wrote the following at 03:28:36 PM on May 28th, 2009 |
|
To correct for offset, buffer your data by degree/2. If the degree is 10, put 5 blank (or copied) data points in the beginning of the data set. |
| Saketh wrote the following at 04:32:10 PM on June 11th, 2009 |
|
Thanks for this! It’s very helpful. |
| Vittorio wrote the following at 08:40:59 AM on April 29th, 2010 |
|
Thanks a lot for the snippet !!! |
| Helena wrote the following at 02:00:09 AM on May 4th, 2010 |
|
Hi, Thanks for the code. I’m stupid when it comes to programming. I don’t know how I can apply your code (for instance the smoothListGaussian) on a file containing two columns. I guess I don’t want to read in every column separately and then smooth every column… For instance, how did you do it for your “own” data above? Best regards, Helena |
| Vvn wrote the following at 02:30:33 AM on May 12th, 2010 |
|
thanks for the article. It saved me time and i wasnt even aware of triangle and guassian smoothing… |
| Viven Rajendra wrote the following at 01:16:29 PM on May 15th, 2010 |
|
This should fix the padding/buffer issue…. def smooth_list(list,degree=10): |
| Anonymous wrote the following at 02:00:27 AM on August 23rd, 2010 |
|
You may find this interesting: |