Numerous visitors to this forum question “why it is so hard for Sense to see" an on or off pattern for a device in the Power Meter that is clearly visible to them. Or why does Sense have trouble figuring out when a smartplug device is in and “Off” state, vs. an “Idle” state vs. an “On/Active”. The gist of issue is that detecting patterns in the midst of noise and other patterns is a hard problem and the human brain and vision system are a very sophisticated piece of biologic hardware built for flexible recognition. Your eyes and brain use a lot of learning history and incredible processing to come up with a detection.
Ever since the smartplug integration was released, I’ve been trying to dial in on just the simple problem of what ‘Off’, ‘Idle’, and ‘On/Active’ really mean, by looking at my smartplug data. How hard could it be to classify the power usage into a few simple states ? Let’s look a a few examples, starting with the simplest first. Along the way I’ll highlight a few key points that make even this simple classification problem harder than it seems.
Here’s my hot water recirculation pump via an HS110, which has a sampling resolution of about one second (vs. Sense’s 1 microsecond sample time). Most of the time it’s running at around 45W. I turn it off via a timer from midnight to 6AM, but the timer continues to use about 1W. Plus, occasionally there are also dropouts where the chart goes to zero when it is really still running. I don’t exactly know how Sense treats those dropouts internally, as zeros or as not available (NA). I do know that if I have a dropout that lasts over an entire clock hour, Sense export will leave that hour out of the exported .csv, at least in the current web app. From a one second resolution chart, there seems to be a very clear on/off or idle/on behavior (is 1W idle or on ?)
Here’s a short view of some of the data from Sense hourly energy export data. I use the term energy because Sense outputs the energy consumed during that hour. The NA’s mean that Sense export did not provide data for that hour. Plus it looks like Sense zeroed out some of the data on either side of the NA’s where dropout also occurred, since we see hourly values between 45Wh and 1 Wh. So a few energy datapoints will lie between off/idle and on, even though the operation of the pump is digital either Idle (1Wh) or On (~45Wh).
Just from this you can see two tricky things about analyzing the data:
1) Selecting the best sampling time resolution is critical to “seeing” the best results. Data values are crisp and clean here between hours because my timer on/off cycle is a multiple of the sampling resolution, 1 hour. But if my timer was on the half hour, we would see more frequent in between values. And as we’ll see later, time samples should to be somewhat smaller than the runtimes of the different power modes to get the best results, but microseconds is probably too small.
2), Whatever analysis we do, it has to be robust enough to deal with missing NA data. Some types of time series analysis might also require “complete” data, data for every hour. In that case, we would need to pad the missing data, hopefully with good representative values.
To get a different, more useful view on the power usage behavior, I created an energy histogram, with default bin sizes for the different power levels to see if there are discernible clusters of power usage that I could call Off, On or Idle. I could see two immediate issues with the histogram. 1) The gaps in the histogram around the biggest spike indicate a bin size problem. And 2) The histogram needs some distribution-oriented smoothing to make the clustering easier to work with.
Fortunately, histograms have a companion density analysis function/plot, that smooths out the distributions based on different selected algorithms. Here I have chosen a density plot using the default gaussian model (blue line). I then annotated the plot with 2 largest local maximums (green triangle - peaks) plus their 2 adjacent local minimums (red triangle - dips), to hopefully come up with a numeric way to find the energy/power thresholds for ‘Off’ vs. ‘Idle’, vs. ‘On/Active’ for each device.
We still have an issue with the histogram gaps. It doesn’t look like much of a problem for this data set, since the clusters appear obvious and the density curve looks like it has done an OK job. But I encounter a bigger problem with the data sets for some of my other devices, like the outlet strip in my master bedroom closet that includes an access point, a backup NAS, AppleTV, Tivo and Zigbee bridge. Here, the binning size and associated histogram gaps raised havoc with the density smoothing by overfitting the poorly-binned data leading to false local minimums and maximums.
The fix, is to adjust the size of the bins to a multiple of the minimum energy resolution of the Sense export data, 0.001kWh, or 1Wh. Now Sense may actually save additional accuracy back in the cloud, but the point is that looking at the data with the right power/energy resolution “lens” is critical to seeing a clear results.
Here’s the recirc pump with resized bins. Note the lack of gaps and the crisper density curve.
Verdict - For the recirc pump, there is a clear 20-30W breakpoint that separates “Idle” from “On”, with a very digital behavior. I choose to say “idle” because there is measurable power going to the timer, and in my mind, “Off” really should correlate with 0W, and/or the smart plug actually being turned off.
Revisiting the master bedroom closet power strip data, even better news. The histogram is no longer gap-toothed and the density curve is close to continuous, with reasonable local minimums and maximums. Looks like the binning adjustment worked.
3) Just like with time resolution, data analysis also requires tuning based on the resolution of the other data for best results. And a corollary - mixing data of different resolutions can be more treacherous for data analysis than one might think.
According to my numerical analysis approach (above), for the master bedroom closet power strip, there’s one energy mode at around 37Wh, with a much smaller one possible at 12Wh. But is that 12Wh a real mode, or is it an artifact ? To determine that let’s take a peek at the Power Meter view of the same data at 1 sec resolution. Perusing the waveforms, I don’t see any 12Wh datapoints at all.
When I go back to my hourly data to find the values around 12Wh, I see this:
Looking at the Power Meter for 2018-11-21 06:00:00, I find a partial-hour data dropout. Nosing around the in-between values a little more, I find that all of them are a result of hours that include partial data dropout.
So, no, the second local maximum is not a true power mode.
Verdict - this power strip only has one power mode, “On”.
It looks like I will need to strip out local maximums that don’t rise to some threshold level of density. I’ll figure out that threshold, and deal with other challenges in my next installment.