Sunday, February 1, 2015

Change Up - Moving Averages in 'C' 

Recently my inexpensive, archaic and inaccurate weather station started tossing out spurious values at a rate that grew from mildly annoying to "really pizzing me off!"


After putting a little bit of research in on the types of errors, it appeared that some sort of "flyer" control algorithm would help. The easiest of these is the Moving Average (also called Running Average, Rolling Average, Sliding Average, Sliding Window Average).


And I needed it to be coded in 'C'.  After 15 minutes of Googling, I decided to roll my own.


And here's the result.



Moving Average - Header File - Function Prototypes.


extern  MovingAverage_t     *MovingAverage_CreateMovingAverage( int maxElements );
extern  int   MovingAverage_AddValue( MovingAverage_t *maPtr, double value );
extern  double MovingAverage_GetAverage( MovingAverage_t *maPtr );
extern  int   MovingAverage_GetElementCount( MovingAverage_t *maPtr );
extern  int   MovingAverage_Reset( MovingAverage_t *maPtr );
extern  int   MovingAverage_DestroyMovingAverage( MovingAverage_t *maPtr );
extern  int   MovingAverage_Resize( MovingAverage_t *maPtr, int newMaxElements );
extern  int   MovingAverage_GetValues( MovingAverage_t *maPtr, double returnValues[] );
extern  void  MovingAverage_DebugDump( FILE *fp, MovingAverage_t *maPtr );


The first call to CreateMovingAverage takes a number of elements to hang onto. Pass in '5' and the moving average keeps 5 values around.  It returns a pointer to a structure of type "MovingAverage_t".  You hang onto that structure and use it in the subsequent calls.  

Here's the structure definition:

typedef struct MovingAverageStruct {
 BufferElement_t     *head;
 int    maxElements;     // total size of the ring buffer (eg. 10))


 int    elementCount;    // how many elements (values) are in the buffer. Will range from [0 to maxElements ]


 int    index;   // next spot to use for a new value in the buffer [ 0.. maxE ]]


 double     runningSum;    // we keep a running total
 double     currentAverage;  
} MovingAverage_t;



The values to be averaged are stored as a Linked List. Here's the definition for BufferElement_t:

typedef struct BufferElement {
    double                value;  // the value
    struct BufferElement  *next;  // needed for singly- or doubly-linked lists 

   
    unsigned    long     sequence;   // debugging only...
} BufferElement_t;


There are a number of ways to store the values, typically a Ring Buffer is used. And I could have implemented the Ring Buffer using an array, but I wanted the ability to easily grow the buffer size at run time, if necessary.  So some sort of dynamic structure is called for, and a Linked List seemed like a reasonable first start.


Troy Hanson's UTLIST.H

A small diversion to recognize a fellow by the name of Troy Hanson (see his wordpress blog).  

I've used Troy's code before and it works wonderfully.  His code is here.  

The most amazing thing about Troy's work is that he's done it all using C Preprocessor Macros! 


That means there's no runtime dependency on a library, shared or static. Just include his header file, follow his coding conventions and you've got a fully functional Linked List as easy as 1, 2, 3!

Troy is, by far and away, the best preprocessor guru I've run across.  Thank you Troy, for your great work.

Back to the code, the only thing Troy's macros need for a singly linked list is a variable named "next" that points to the next list item.  Don't change the variable name -- it must be called "next".

Here's the complete header file:

/*
 * File:   movingaverage.h
 * Author: pconroy
 *
 * Created on January 19, 2015, 8:46 AM
 */

#ifndef MOVINGAVERAGE_H
#define    MOVINGAVERAGE_H

#ifdef    __cplusplus
extern "C" {
#endif

//
// Going to use Troy Hanson's Excellent LinkedList Macros
#include <stdio.h>
#include <UtHash/utlist.h>

#ifndef     FALSE
# define    FALSE       0
# define    TRUE        (!FALSE)
#endif

#define     MASUCCESS   0
#define     MAFAILURE   1


/*
 * You can use any structure with these macros, as long as the structure contains a next pointer.
 * If you want to make a doubly-linked list, the element also needs to have a prev pointer.
 */

typedef struct BufferElement {
    double                  value;
    struct BufferElement    *next;       /* needed for singly- or doubly-linked lists */
   
    unsigned    long        sequence;   // debugging
} BufferElement_t;



typedef struct MovingAverageStruct {
    BufferElement_t     *head;
    int                 maxElements;     // total size of the ring buffer (eg. 10))
    int                 elementCount;    // how many elements (values) are in the buffer. Will range from [0 to maxElements ]
    int                 index;           // next spot to use for a new value in the buffer [ 0.. maxE ]]
    double              runningSum;      // we keep a running total
    double              currentAverage;
} MovingAverage_t;


extern  MovingAverage_t     *MovingAverage_CreateMovingAverage( int maxElements );
extern  int             MovingAverage_AddValue( MovingAverage_t *maPtr, double value );
extern  double          MovingAverage_GetAverage( MovingAverage_t *maPtr );
extern  int             MovingAverage_GetElementCount( MovingAverage_t *maPtr );
extern  int             MovingAverage_Reset( MovingAverage_t *maPtr );
extern  int             MovingAverage_DestroyMovingAverage( MovingAverage_t *maPtr );
extern  int             MovingAverage_Resize( MovingAverage_t *maPtr, int newMaxElements );
extern  int             MovingAverage_GetValues( MovingAverage_t *maPtr, double returnValues[] );
extern  void            MovingAverage_DebugDump( FILE *fp, MovingAverage_t *maPtr );


#ifdef    __cplusplus
}
#endif

#endif    /* MOVINGAVERAGE_H */


Let's quickly revisit the functions exposed by the code. We already covered off on CreateMovingAverage().  

AddValue() is called when you have a new value (of type double) to add to the running average. If the buffer is not full, (element count < max elements) then the value is appended to the linked list. If the buffer is full, then the oldest element is removed and the value appended.  The buffer always stays, at most, max elements full.

GetAverage() returns the current moving average in the buffer. There's an assert() in the code to make sure the element count isn't zero.  Don't call it until there's at least one value in the list.

GetElementCount() returns the number of values in the list. It'll range from 0 to maxElements passed into the create call.

Reset() deletes the linked list of values and resets all counters to zero. It does NOT resize the list - maxElements stays the same as when the Create() call was made.

DestroyMovingAverage() deletes the list, deletes any memory allocated and sets the maxElements value to zero.  Pair it with the Create() call and call it when you're done using the moving average.

Resize() allows you to grow the ring buffer. You can make the buffer larger, store more values. Currently you can't shrink it. (Until I convince myself of the need and a way to keep 'n' values when shrinking, it isn't implemented.)

GetValues() lets you pass in a pointer to an array of doubles. The values are pulled out of the list and passed back.  For debugging.

DebugDump() dumps the contents of the structure to a file for logging and debugging.


That's the header file. Let's move onto the source code:


/*
 * File:   movingaverage.c
 * Author: pconroy
 *
 * Created on January 15, 2015, 2:50 PM
 */

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

#include "movingaverage.h"


static  unsigned    long    globalSequenceCounter = 0UL;

//------------------------------------------------------------------------------
//
//  Call this function first to obtain a pointer to a Moving Average structure
//  The buffer will be ready to take new values after this call
MovingAverage_t    *MovingAverage_CreateMovingAverage (int maxElements)
{
    MovingAverage_t    *maPtr = malloc( sizeof( MovingAverage_t ));
    if (maPtr != (MovingAverage_t *) 0) {
        maPtr->head = (BufferElement_t *) 0;
        maPtr->elementCount = 0;
        maPtr->index = 0;
        maPtr->runningSum = 0.0;
        maPtr->currentAverage = 0.0;
       
        maPtr->maxElements = maxElements;               // size of the ring buffer
    }
   
    return maPtr;
}

//------------------------------------------------------------------------------
//
//  Add a new value (of type double) to the list. This will force a recalculation
//  of the average
int     MovingAverage_AddValue (MovingAverage_t *maPtr, double value)
{
    if (maPtr != (MovingAverage_t *) 0) {
        BufferElement_t     *newElement = malloc( sizeof( BufferElement_t ));
        if (newElement != (BufferElement_t *) 0 ) {
            newElement->value = value;
            newElement->sequence = globalSequenceCounter;
            globalSequenceCounter += 1UL;

            //
            // If there's still room in the buffer, if we haven't wrapped around yet
            if (maPtr->elementCount < maPtr->maxElements) {
                LL_APPEND( maPtr->head, newElement );
                maPtr->elementCount += 1;
                maPtr->index += 1;
 

            } else {
                //
                // Buffer is full, time to wrap around and over write the value at "index"
                //   Where will this new element go? Here:
                int newElementSpot = (maPtr->index % maPtr->maxElements);
               
                // First - advance to the element and get value that's already there
                BufferElement_t     *oldElement = (maPtr->head);
                for (int i = 0 ; i < newElementSpot; i ++)
                    oldElement = oldElement->next;
               
                //
                // Remove the value from the running total
                maPtr->runningSum -= oldElement->value;
               
                //
                // Swap out the old element with the new
                LL_REPLACE_ELEM( maPtr->head, oldElement, newElement );
               
                //
                //  Update the index, but it needs to wrap around
                maPtr->index = ((maPtr->index + 1) % maPtr->maxElements);
               
                //
                // Free up the memory used by the old, discarded element
                free( oldElement );
            }
           
            maPtr->runningSum += value;
            assert( maPtr->elementCount != 0 );
            maPtr->currentAverage = ( maPtr->runningSum / maPtr->elementCount );
                       
        } else {
            return MAFAILURE;
        }
    }
   
    return MASUCCESS;
}

//------------------------------------------------------------------------------
//
//  Return the average of all of the values in the buffer
double  MovingAverage_GetAverage (MovingAverage_t *maPtr)
{
    assert( maPtr->elementCount != 0 );
    return maPtr->currentAverage;
}

//------------------------------------------------------------------------------
//
//  Return how many values are currently in the buffer
int  MovingAverage_GetElementCount (MovingAverage_t *maPtr)
{
    return maPtr->elementCount;
}

//------------------------------------------------------------------------------
//
//  Clear the contents of the list of values, free the memory, reset all sums and counts
//  to zero.  Do NOT free the head of the list, do NOT resize the list
int  MovingAverage_Reset (MovingAverage_t *maPtr)
{
    BufferElement_t *ptr1;
    BufferElement_t *ptr2;
   
    LL_FOREACH_SAFE( maPtr->head, ptr1, ptr2 ) {
        free( ptr1 );
    }
   
    maPtr->elementCount = 0;
    maPtr->currentAverage = 0.0;
    maPtr->index = 0;
    maPtr->runningSum = 0.0;
   
    return MASUCCESS;
}

//------------------------------------------------------------------------------
//
//  Call Reset() to clear and free memory, then free the head ptr and set the list size to
//  zero.  The list should not be reused after this
int     MovingAverage_DestroyMovingAverage (MovingAverage_t *maPtr)
{
    MovingAverage_Reset( maPtr );

    free( maPtr->head );
    maPtr->maxElements = 0;
   
    return MASUCCESS;
}

// -----------------------------------------------------------------------------
int     MovingAverage_Resize (MovingAverage_t *maPtr, int newMaxElements)
{
    //
    // For now - we're only going to allow larger, not smaller
    assert( newMaxElements > maPtr->maxElements );
   
   
    //
    //  If we're increasing the size of the list - that's easy to do!
    if (newMaxElements > maPtr->maxElements) {
        maPtr->maxElements = newMaxElements;
        //
        // We're Not going to adjust the current index
       
    } else {
        //
        // Shrinking is harder! -- Save the "newMaxElement" values, and drop the rest
        //
    }
   
    return MASUCCESS;
}

//------------------------------------------------------------------------------
int     MovingAverage_GetValues (MovingAverage_t *maPtr, double returnValues[])
{
    BufferElement_t *ptr;
    int             index = 0;
   
    LL_FOREACH( maPtr->head, ptr ) {
        returnValues[ index ] = ptr->value;
        index += 1;
    }
   
    return maPtr->elementCount;
}

//------------------------------------------------------------------------------
void    MovingAverage_DebugDump (FILE *fp, MovingAverage_t *maPtr)
{
    fprintf( fp, "Current Structure Values\n" );
    fprintf( fp, "  Element Count: %d\n", maPtr->elementCount );
    fprintf( fp, "  Max Elements : %d\n", maPtr->maxElements );
    fprintf( fp, "  Index        : %d\n", maPtr->index );
    fprintf( fp, "  Average      : %f\n", maPtr->currentAverage );
    fprintf( fp, "  Running Sum  : %f\n", maPtr->runningSum );

    BufferElement_t *ptr;
   
    LL_FOREACH( maPtr->head, ptr ) {
        fprintf( fp, "      Seq: %lu  Value: %f\n", ptr->sequence, ptr->value );   
    }
    fprintf( fp, "Global Sequence Counter: %lu\n", globalSequenceCounter );
}



#if 0
int main(int argc, char** argv)
{
    MovingAverage_t    *aMovingAverage = MovingAverage_CreateMovingAverage( 3 );
    if (aMovingAverage != (MovingAverage_t *) 0) {
        for (int i = 0; i < 10; i += 1) {
            (void) MovingAverage_AddValue( aMovingAverage, (double) i );
        }
    }
    MovingAverage_DebugDump( aMovingAverage );
   
   
    //
    // Make it bigger
    printf( "Make the buffer bigger.\n" );
    MovingAverage_Resize( aMovingAverage, 20 );
    for (int i = 100; i < 110; i += 1) {
        (void) MovingAverage_AddValue( aMovingAverage, (double) i );
    }
   
    printf( "After making the buffer larger.\n" );
    MovingAverage_DebugDump( aMovingAverage );
 
   
    MovingAverage_DestroyMovingAverage( aMovingAverage );
    return (EXIT_SUCCESS);
}
#endif

That's it.  In my weather station code, I created a function called "checkAndAddValue()" for each value emitted by the weather station:

static
int     checkAndAddValue (weatherStats_t  *wsPtr, int statType, double value, double percentage)


It's job is to check the value against the moving average.  I pass in a percentage variance threshold.  Values that are outside the threshold are discarded.

For example, the interior temperature check/add function looks like:

 if (!checkAndAddValue_ITemp( wsPtr, datum->indoor.temp, 15.0 )) {
        Logger_LogWarning( "Reading's indoor temperature appears to be in error. Will be discarded. Value [%f]\n", datum->indoor.temp );
        return FALSE;
    }



 The interior temperature reading coming in, has to be within 15% of the moving average or it's discarded.

The percentage is increased for outside temperature values since those can move a bit faster:

    if (!checkAndAddValue_OTemp( wsPtr, datum->outdoor.temp, 30.0 )) {
        Logger_LogWarning( "Reading's outdoor temperature appears to be in error. Will be discarded. Value [%f]\n", datum->outdoor.temp );
        return FALSE;
    }



Maybe the next post will be about the simple logging framework I use.