Show HN: Automated smooth Nth order derivatives of noisy data
143 points
1 day ago
| 12 comments
| github.com
| HN
This little project came about because I kept running into the same problem: cleanly differentiating sensor data before doing analysis. There are a ton of ways to solve this problem, I've always personally been a fan of using kalman filters for the job as its easy to get the double whammy of resampling/upsampling to a fixed consistent rate and also smoothing/outlier rejection. I wrote a little numpy only bayesian filtering/smoothing library recently (https://github.com/hugohadfield/bayesfilter/) so this felt like a fun and very useful first thing to try it out on! If people find kalmangrad useful I would be more than happy to add a few more features etc. and I would be very grateful if people sent in any bugs they spot.. Thanks!
magicalhippo
18 hours ago
[-]
Looks really cool.

I stumbled over this[1] page recently, which has a method that's apparently is better than the "traditional" Savitzky-Golay filters.

The idea seems to be to start with the desired frequency response, with lower frequencies close to the ideal differentiator, and higher frequencies tending smoothly to zero. This is then used to derive the filter coefficients through a set of equations.

The author generalizes it to irregularly sampled data near the end, so would be interesting to compare the approaches.

Just thought it'd throw it out there.

[1]: http://www.holoborodko.com/pavel/numerical-methods/numerical...

reply
jcgrillo
1 day ago
[-]
This is great! I've taken sort of a passive interest in this topic over the years, some papers which come to mind are [1] and [2] but I don't think I've seen a real life example of using the Kalman filter before.

[1] https://www.sciencedirect.com/science/article/abs/pii/002192...

[2] https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=924...

reply
discretion22
15 hours ago
[-]
I worked on air quality monitoring about 30 years ago and the equipment used a Kalman filter to remove measurement noise and give stability; it had been added to the software a number of years earlier by a mathematics student who was working as a summer job with the company. No-one in the team had heard of it before but it solved lots of the stability issues and was core to the system ever after.
reply
fisian
1 day ago
[-]
Great work!

I would've needed this recently for some data analysis, to estimate the mass of an object based on position measurments. I tried calculating the 2nd derivative with a Savitzky-Golay filter, but still had some problems and ended up using a different approach (also using a Kalman filter, but with a physics-based model of my setup).

My main problem was that I had repeated values in my measurements (sensor had a lower, non-integer divisible sampling rate than the acquisition pipeline). This especially made clear that np.gradient wasn't suitable, because it resulted in erratic switches between zero and the calculated derivative. Applying, np.gradient twice made the data look like random noise.

I will try using this library, when I next get the chance.

reply
hugohadfield
22 hours ago
[-]
Thanks so much! Yeah this was also a key reason I like this approach. Quite often we end up with repeated values due to quantisation of signal or timing differences or whatever and we get exactly that problem you describe, either massive gradients or 0 gradient and nothing in betweeen. With the KF approach you can just strip out the repeated values and run the filter with them missing and its fine. In the quantisation case you can approximate the quantisation observation noise by using resolution*1/sqrt(12) and it also all just works nicely. If you have any sample data of some fun problems and don't mind sharing then let me know and we could add some demos to the library!
reply
radarsat1
22 hours ago
[-]
Did you try prefiltering to remove the repeated values?
reply
pm
1 day ago
[-]
Congratulations! Pardon my ignorance, as my understanding of mathematics at this level is beyond rusty, but what are the applications of this kind of functionality?
reply
thatcherc
1 day ago
[-]
I actually have one for this! Last week I had something really specific - a GeoTIFF image where each pixel represents the speed in "x" direction of the ice sheet surface in Antarctica and I wanted to get the derivative of that velocity field so I could look at the strain rate of the ice.

A common way to do that is to use a Savitzky-Golay filter [0], which does a similar thing - it can smooth out data and also provide smooth derivatives of the input data. It looks like this post's technique can also do that, so maybe it'd be useful for my ice strain-rate field project.

[0] - https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter

reply
defrost
1 day ago
[-]
I've been a heavy user of Savitzky-Golay filters (linear time series, rectangular grid images, cubic space domains | first, second and third derivitives | balanced and unbalanced (returning central region smoothed values and values at edges)) since the 1980s.

The usual implementation is as a convolution filter based on the premise that the underlying data is regularly sampled.

The pain in the arse occassional reality is missing data and|or present but glitched|spiked data .. both of which require a "sensible infill" to continue with a convolution.

This is a nice implementation and a potentially useful bit of kit- the elephant in the room (from my PoV) is "how come the application domain is irregularly sampled data"?

Generally (engineering, geophysics, etc) great lengths are taken to clock data samples like a metronome (in time and|or space (as required most)).

I'm assuming that your gridded GeoTIFF data field is regularly sampled in both the X and Y axis?

reply
thatcherc
19 hours ago
[-]
Yup, my data is nicely gridded so I can use the convolution approach pretty easily. Agreed though - missing data at the edges or in the interior is annoying. For a while I was thinking I should recompute the SG coefficients every time I hit a missing data point so that they just "jump over" the missing values, giving me a derivative at the missing point based on the values that come before and after it, but for now I'm just throwing away any convolutions that hit a missing value.
reply
defrost
8 hours ago
[-]
> For a while I was thinking I should recompute the SG coefficients every time

We had, in our geophysics application, a "pre computed" coefficient cache - the primary filters (central symmetric smoothing at various lengths) were common choices and almost always there to grab - missing values were either cheaply "faked" for Quick'NDirty displays or infilled by prediction filters that were S-G's computed to use existing points within the range to replace the missing value, that was either a look up from indexed filter cache or a fresh filter generation to use and stash in cache.

It's a complication (in the mechanical watch sense) to add, but with code to generate coefficients already existing it's really just looking at the generation times versus the hassle of indexing and storing them as created and the frequency of reuse of "uncommon" patterns.

reply
hugohadfield
21 hours ago
[-]
Yeah regularly sampled is the goal almost always, and great when its available! The main times I deal with non-uniformly sampled data is with jitter and missing data etc
reply
pm
1 day ago
[-]
Thanks for that, it looks like my research today is cut out for me.
reply
hugohadfield
1 day ago
[-]
No problem! Let's dream up a little use case:

Imagine you have a speed sensor eg. on your car and you would like to calculate the jerk (2nd derivative of speed) of your motion (useful in a range of driving comfort metrics etc.). The speed sensor on your car is probably not all that accurate, it will give some slightly randomly wrong output and it may not give that output at exactly 10 times per second, you will have some jitter in the rate you receive data. If you naiively attempt to calculate jerk by doing central differences on the signal twice (using np.gradient twice) you will amplify the noise in the signal and end up with something that looks totally wrong which you will then have to post process and maybe resample to get it at the rate that you want. If instead of np.gradient you use kalmangrad.grad you will get a nice smooth jerk signal (and a fixed up speed signal too). There are many ways to do this kind of thing, but I personally like this one as its fast, can be run online, and if you want you can get uncertainties in your derivatives too :)

reply
pm
1 day ago
[-]
I'd been researching Kalman filters to smooth out some sampling values (working on mobile: anything from accelerometer values to voice activation detection), but hadn't got around to revising the mathematics, so I appreciate the explanation. Out of curiosity, what other ways might this be achieved? I haven't seen much else beyond Kalman filters.
reply
hugohadfield
22 hours ago
[-]
You could almost certainly construct a convolutional kernal that computes smoothed derivatives of your function by the derivative of a gaussian smoothing kernal (that kind of technique is mostly used for images if I remember correctly ), in fact I recon this might work nicely https://docs.scipy.org/doc/scipy/reference/generated/scipy.n... although you would need to enforce equally spaces inputs with no misssing data. Alternatively you might also set up an optimisation problem in which you are optimising the values of your N'th derivative on some set of points and then integrating and minimising their distance to your input data, also would work well probably but would be annoying to do regularisation on your lowest derivative and the whole thing might be quite slow. You could also do B-splines or other local low order polynomial methods... the list goes on and on!
reply
nihzm
1 day ago
[-]
Kalman filters are usually the way to go because for some cases it is mathematically proven that they are optimal, in the sense that they minimize the noise. About alternatives, not sure if people actually do this but I think Savitzky-Golay filters could be used for the same purpose.
reply
auxym
13 hours ago
[-]
My first thought as a mechanical engineer is whether this could be useful for PID controllers. Getting a usable derivative value for the "D" term is often a challenge because relatively small noise can create large variations in the derivative, and many filtering methods (eg simple first-order lowpass) introduce a delay/phase shift, which reduces controller performance.
reply
caseyy
1 day ago
[-]
This is very important in controllers using feedback loops. The output of a controller is measured, a function is applied to it, and the result is fed back into the controller. The output becomes self-balancing.

The applications in this case involve self-driving cars, rocketry, homeostatic medical devices like insulin pumps, cruise control, HVAC controllers, life support systems, satellites, and other scenarios.

This is mainly due to a type of controller called the PID controller which involves a feedback loop and is self-balancing. The purpose of a PID controller is to induce a target value of a measurement in a system by adjusting the system’s inputs, at least some of which are outputs of the said controller. Particularly, the derivative term of a PID controller involves a first order derivative. The smoother its values are over time, the better such a controller performs. A problem where derivative values are not smooth or second degree derivative is not continuous, is called a “derivative kick”.

The people building these controllers have long sought after algorithms that produce at least a good approximation of a measurement from a noisy sensor. A good approximation of derivatives is the next level, a bit harder, and overall good approximations of the derivative are a relatively recent development.

There is a lot of business here. For example, Abbott Laboratories and Dexcom are building continuous blood glucose monitors that use a small transdermal sensor to sense someone’s blood glucose. This is tremendously important for management of people’s diabetes. And yet algorithms like what GP presents are some of the biggest hurdles. The sensors are small and ridiculously susceptible to noise. Yet it is safety-critical that the data they produce is reliable and up to date (can’t use historical smoothing) because devices like insulin pumps can consume it at real time. I won’t go into this in further detail, but incorrect data can and has killed patients. So a good algorithm for cleaning up this noisy sensor data is both a serious matter and challenging.

The same can be said about self-driving cars - real-time data from noisy sensors must be fed into various systems, some using PID controllers. These systems are often safety-critical and can kill people in a garbage in-garbage out scenario.

There are about a million applications to this algorithm. It is likely an improvement on at least some previous implementations in the aforementioned fields. Of course, these algorithms also often don’t handle certain edge cases well. It’s an ongoing area of research.

In short — take any important and technically advanced sensor-controller system. There’s a good chance it benefits from advancements like what GP posted.

P.S. it’s more solved with uniformly sampled data (i.e. every N seconds) than non-uniformly sampled data (i.e. as available). So once again, what GP posted is really useful.

I think they could get a job at pretty big medical and automotive industry companies with this, it is “the sauce”. If they weren’t already working for a research group of a self-driving car company, that is ;)

reply
082349872349872
20 hours ago
[-]
Thinking about people as PID controllers: left to our own devices we're normally very good at the D term, but lousy at the I term, with the P term somewhere in the middle.

Give people clay/parchment/paper, however, and it becomes much easier to reliably evaluate an I term.

Example: https://xkcd.com/1205/ ; maybe each single time you do the task it seems like sanding out the glitches would be more trouble than it's worth, but a little record keeping allows one to see when a slight itch becomes frequent enough to be worth addressing. (conversely, it may be tempting to automate everything, but a little record keeping allows one to see if it'd obviously be rabbit holing)

reply
caseyy
16 hours ago
[-]
You might like the second episode of All Watched Over by Machines of Loving Grace. It talks about how techno-utopians tried to model society and nature as feedback loop controllers.

One might say a part of the reason they have failed is because nature and people don't much care for the I term. These systems have feedback loops for sudden events, but increase the temperature slowly enough and the proverbial frog boils.

There are very many undercurrents in our world we do not see. So much that even when we think we understand and can predict their effects, we almost never take into account the entire system.

reply
082349872349872
9 hours ago
[-]
Thanks! Parts of that ep reminded me greatly of The Tyranny of Structurelessness (1971-1973): https://www.jofreeman.com/joreen/tyranny.htm

(the notion that we tend to overly ascribe stability and reproducibility to a system reminds me of Vilfredo Pareto having convinced himself that 80-20 was the invariable power law)

reply
caseyy
7 hours ago
[-]
That was a very interesting read, thanks. Structurelessness seems to be very fertile ground for structure and organization.
reply
082349872349872
36 minutes ago
[-]
Rotating people through the root ("central" if you insist on camouflaging the hierarchy*) positions seems an excellent idea. The biggest problem with rotations in general is the existence of domains requiring large amounts of specific knowledge, but I don't think anyone is arguing that administrative positions are among them.

— We're taking turns to act as a sort of executive officer for the week

— Yes...

— But all the decisions of that officer have to be ratified at a special bi-weekly meeting

— Yes I see!

— By a simple majority, in the case of purely internal affairs

Be quiet!

— But by a two-thirds majority, in the case of more major

Be quiet! I order you to be quiet!

(I wonder how the Pythons themselves handled group decisions?)

* but see expander networks for a possible alternative network structure

reply
uoaei
1 day ago
[-]
Basically, approximating calculus operations on noisy, discrete-in-time data streams.
reply
pm
1 day ago
[-]
This is what I was thinking, but stated much clearer than I'd have managed.
reply
brody_slade_ai
1 day ago
[-]
Kalman Filter helped me understand the mathematical concepts that make it a powerful tool for estimating values from noisy data

I made a simulation that forecasted a greenhouse's temperature and humidity to help me understand the idea. I began by going over the basics of Gaussians and normal distribution once more. After that, I used NumPy and SciPy to develop the Kalman Filter in Python. To represent the system, I defined noise matrices (Q and R), a state transition matrix (F), and a control input matrix (B).

reply
youoy
21 hours ago
[-]
Nice work! Just one quick question (maybe it's clear but I have not looked at it in depth). It says it computes the derivative for non-uniformly sampled time series data and the example image shows this. Is this also well behaved if the sampled measurements have noise (it is not the case of the example)? Or should one use a different approach for that? Thanks!
reply
hugohadfield
21 hours ago
[-]
Thanks! So the example image is actually with both non-uniformly sampled measurements and noise :) works great for both/either
reply
hugohadfield
21 hours ago
[-]
Noise is added here: ``` # Generate noisy sinusoidal data with random time points np.random.seed(0) t = sorted(np.random.uniform(0.0, 10.0, 100)) noise_std = 0.01 y = np.sin(t) + noise_std * np.random.randn(len(t)) true_first_derivative = np.cos(t) true_second_derivative = -np.sin(t) ``` changing noise_std will change the magnitude of the noise added, hope that helps!
reply
youoy
21 hours ago
[-]
Ah true! Now I see it on the image too, I was looking at it on the phone and it did not look like it. I will definitely give it a try! Thank you very much
reply
hugohadfield
21 hours ago
[-]
no problem!
reply
tarlinian
17 hours ago
[-]
How did you choose the process noise covariance in your `grad` function? It doesn't seem like a single process noise covariance structure should be globally applicable across all possible functions.
reply
theaussiestew
1 day ago
[-]
I'm looking to calculate jerk from accelerometer data, I'm assuming this would be the perfect use case?
reply
hugohadfield
1 day ago
[-]
this is a perfect use case, let me know how it goes!
reply
Animats
1 day ago
[-]
That's useful. Can it generate a simple filter for later real-time use, based on the statistics of the noise? That would be useful for self-tuning controllers.
reply
hugohadfield
22 hours ago
[-]
Glad you like it! This library will not generate a set of convolutional filter coefficients for you if that is what you are after, I'm sure it would be possible to do some fairly nasty maths to get out some kind of equivalent convolutional kernal for a given tuning, or you could wrap an optimiser round it and try to walk your coefficients to something equivalent. I would say though that the juice would almost certainly not be worth the squeeze. The kalman filter is easily lightweight enough to run in real time itself (it was developed for this task), I've deployed several in real time embedded scenarios on a range of platforms (inc. microcontrollers) and it also has the added advantage of doing handling jitter in input timing etc.
reply
marmaduke
1 day ago
[-]
This is really nice approach. We are doing some nonlinear system id, and faced with this kinda problem (not irregular spacing but low sample rate and noisy). Definitely will check it out.

What’s your opinion on ensemble KF? We’d like to use that for parameter estimations. I saw unscented in your bayesfilter, but not ensemble, so I’m curious. Thanks!

reply
hugohadfield
22 hours ago
[-]
Sounds like a fun project! I've not spent much time on ensemble KF but my mate Sam (https://github.com/samDuffield/) did a lot of work in his PhD on them for high dimensional datasets. Is your dataset specifically high dimensional and so not something you'd use an unscented filter for?
reply
seanhunter
21 hours ago
[-]
This looks cool. When you say "there are tons of ways to solve this problem", presumably the canonical way is some sort of Fourier Analysis?
reply
hugohadfield
21 hours ago
[-]
I guess I'm not totally sure what the canonical way would be, probably convolution with the N'th derivative of a guassian smoothing kernal where the smoothing response is chosen by frequency analysis, or something along those lines. You could also just smooth the signal then differentiate it numerically (probably equivalent but less efficient). I would personally go for this bayesian filtering approach or some kind of local polynomial approximation like splines or the Savitzky-Golay filter people are talking about this comment section because it would probably be easier to deal with missing data etc.
reply
3abiton
1 day ago
[-]
This got me thinking, is this used in supply chain problems?
reply
hugohadfield
21 hours ago
[-]
hmm, I don't think I'm familiar with the kind of problems you might be thinking about. Care to share an example?
reply
3abiton
15 hours ago
[-]
Specifically for demand forecasting for example. I have seen lots of issues with missing data, and this seems to elegantly tackle it.
reply