### Linear Data Smoothing in Python

Here’s a scrumptious morsel of juicy python code for even the most stoic of scientists to get excited about. Granted, it’s a very simple concept and has surely been done countless times before, but there aren’t any good resources for this code on the internet. Since I had to write my own code to perform a variety of different linear 1-dimensional array data smoothing in python, I decided it would be nice to share it. At the bottom of this post you can see a PNG image which is the file output by the code listen even further below. If you copy/paste the code into an empty text file and run it in Python, it will generate the exact same PNG file (assuming you have pylab and numpy libraries configured).

```

### This is the Gaussian data smoothing function I wrote ###

def smoothListGaussian(list,degree=5):

window=degree*2-1

weight=numpy.array([1.0]*window)

weightGauss=[]

for i in range(window):

i=i-degree+1

frac=i/float(window)

gauss=1/(numpy.exp((4*(frac))**2))

weightGauss.append(gauss)

weight=numpy.array(weightGauss)*weight

smoothed=[0.0]*(len(list)-window)

for i in range(len(smoothed)):

smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)

return smoothed

```

Basically, you feed it a list (it doesn’t matter how long it is) and it will return a smoother version of the data. The Gaussian smoothing function I wrote is leagues better than a moving window average method, for reasons that are obvious when viewing the chart below. Surprisingly, the moving triangle method appears to be very similar to the Gaussian function at low degrees of spread. However, for huge numbers of data points, the Gaussian function should perform better. ```

### This is the code to produce the image displayed above ###

import pylab,numpy

def smoothList(list,strippedXs=False,degree=10):

if strippedXs==True: return Xs[0:-(len(list)-(len(list)-degree+1))]

smoothed=*(len(list)-degree+1)

for i in range(len(smoothed)):

smoothed[i]=sum(list[i:i+degree])/float(degree)

return smoothed

def smoothListTriangle(list,strippedXs=False,degree=5):

weight=[]

window=degree*2-1

smoothed=[0.0]*(len(list)-window)

for x in range(1,2*degree):weight.append(degree-abs(degree-x))

w=numpy.array(weight)

for i in range(len(smoothed)):

smoothed[i]=sum(numpy.array(list[i:i+window])*w)/float(sum(w))

return smoothed

def smoothListGaussian(list,strippedXs=False,degree=5):

window=degree*2-1

weight=numpy.array([1.0]*window)

weightGauss=[]

for i in range(window):

i=i-degree+1

frac=i/float(window)

gauss=1/(numpy.exp((4*(frac))**2))

weightGauss.append(gauss)

weight=numpy.array(weightGauss)*weight

smoothed=[0.0]*(len(list)-window)

for i in range(len(smoothed)):

smoothed[i]=sum(numpy.array(list[i:i+window])*weight)/sum(weight)

return smoothed

### DUMMY DATA ###

data = *30 #30 "0"s in a row

data=1    #the middle one is "1"

### PLOT DIFFERENT SMOOTHING FUNCTIONS ###

pylab.figure(figsize=(550/80,700/80))

pylab.suptitle('1D Data Smoothing', fontsize=16)

pylab.subplot(4,1,1)

p1=pylab.plot(data,".k")

p1=pylab.plot(data,"-k")

a=pylab.axis()

pylab.axis([a,a,-.1,1.1])

pylab.text(2,.8,"raw data",fontsize=14)

pylab.subplot(4,1,2)

p1=pylab.plot(smoothList(data),".k")

p1=pylab.plot(smoothList(data),"-k")

a=pylab.axis()

pylab.axis([a,a,-.1,.4])

pylab.text(2,.3,"moving window average",fontsize=14)

pylab.subplot(4,1,3)

p1=pylab.plot(smoothListTriangle(data),".k")

p1=pylab.plot(smoothListTriangle(data),"-k")

pylab.axis([a,a,-.1,.4])

pylab.text(2,.3,"moving triangle",fontsize=14)

pylab.subplot(4,1,4)

p1=pylab.plot(smoothListGaussian(data),".k")

p1=pylab.plot(smoothListGaussian(data),"-k")

pylab.axis([a,a,-.1,.4])

pylab.text(2,.3,"moving gaussian",fontsize=14)

#pylab.show()

pylab.savefig("smooth.png",dpi=80)

```

Hey, I had a great idea, why don’t I test it on some of my own data? Due to the fact that I don’t want the details of my thesis work getting out onto the internet too early, I can’t reveal exactly what this data is from. It will suffice to say that it’s fractional density of neurite coverage in thick muscle tissue. Anyhow, this data is wild and in desperate need of some smoothing. Below is a visual representation of the differences in the methods of smoothing. Yayness! I like the gaussian function the best. I should note that the degree of window coverage for the moving window average, moving triangle, and gaussian functions are 10, 5, and 5 respectively. Also note that (due to the handling of the “degree” variable between the different functions) the actual number of data points assessed in these three functions are 10, 9, and 9 respectively. The degree for the last two functions represents “spread” from each point, whereas the first one represents the total number of points to be averaged for the moving average. Enjoy.