Hi folks,
It’s now 2025 and time to look back at 2024 (and earlier) in my Sense data. I export data every year and try a few different fun kinds of analysis on it. This year I decided to try a few new visualizations that focus on “missing” data. I have now accumulated 6 years of data on my second Sense (the first one died in 2019) and have Exported all 6 years ( = 52,000 hours) as .csv files at 1 Hour resolution using the Sense Export function in the web app.
Here’s what the hourly data looks like for a selected set of Devices that have been with me all 6 years. I have carefully selected
- The DateTime for each hour of measurement.
- 3 fundamental measurements (Total Usage, Solar Production, Always On) that should all be close to 100% complete through the entire time period
- 1 smart plug device (Yamaha) that should be also 100% complete, but subject to more potential data losses
- 1 Sense AI detection (Floor Heat/Dryer), which due to the way Sense converts to Export, only shows when the detection is on for that hour. When the device is off for an entire hour, the time period is simply dropped.
One other note - this view shows the datatypes, where NA means missing, and POSIX is a combined date and time value.
This second view shows more specifics on how much of the data is “missing”. I put “missing” in quotes for two reasons:
-
The reference for 100% complete is essentially the DateTime column - that’s the ID entry / column that is common to all the Device measurements / observations. But if there were power outages or internet outages that last for more than a given hour, Sense will drop that DateTime entry. This will become apparent later when we compare against an even more reliable timestamp from my utility, PG&E.
-
As I mentioned earlier, Sense AI detections show as missing for hours where the device isn’t detected as on. Sense expects us to impute a 0kWh usage for those missing entries.
You’ll notice that Solar Production, Always On, the Yamaha receiver on a smart plug all display varying degrees of “missingness” for various reasons. Solar shows as 0% missing but we can see that is due to rounding down to zero. Always On tends to go out when Sense data is missing for more than 30 minutes in a 48 hour period so it is a good detector for partial Sense outages. And the Yamaha smart plug typically reveals issues with my network.
If we tighten the comparison, Solar reveals that it has roughly 0.2% of all observations or about 300 hours missing.
Now I’m going to join my Sense data with the hourly net usage data from my utility. The utility has much more complete data, without any missing hours, so when I join, the new reference becomes a more complete DateTime. BTW - SenseNet is a new computed value created by adding Total Usage and Solar Production (negative) that I use to compare against my hourly meter number. One can see immediately that my Sense measurements/observations don’t start at the beginning of 2019, but on April 6, 2019 when the new Sense monitor was set up. You’ll also see a couple of bands of missing data between 30,000 and 40,000 observations in.
The missing data chart helps us quantify how much of the fundamental measurements are missing .
Here’s the final check of my Sense data over the years. Here I compare the Sense net usage against my utility, PG&E’s, net meter hourly readings. A perfect result would be a straight line at 45 degrees. The plot shows a huge number of hours that are pretty much dead on, but with 5 key deviations:
-
The points at the bottom are PG&E readings that don’t have matching Sense readings. Many are from 2019, at the beginning of the year, before my new Sense was in operation. But some are also from other years, most likely from power or network outages.
-
A small line in brown (2020) at negative 45 degrees. That resulted from a Sense spontaneous borking, where Sense readings got reversed. I was able to fix relatively quickly, but these hours left their mark.
-
A vertical blue (2023) line at the origin (0,0). That’s a place where the Sense data changed while my meter showed 0. That deserves investigation.
-
A bunch of orange points (2019) above the 45 degree line. Those are hours where the Sense data was greater than the utility data. Those deserve investigation.
-
A bunch of points across all the years below the 45 degree line. Those are likely places where Sense experienced data collection dropouts for part of an hour, rendering the Sense usage value less than my utility.