Everything You Wanted To Know About Data Export (But Were Afraid to Ask)*

Given that we’re into the New Year, I decided to do one last analysis of my 2022 energy usage and compare against my utility bill for accuracy. Along the way I was reminded of all the weird and hidden idiosyncrasies in the Sense Data Export feature.

Starting Out: How to Pull Sense and Utility Data

  • Examine both files for missing hours and other irregularities, then fix or log them - more on the details when I discuss the irregularities, below.
  • Select just the “Total Usage”, “Solar Production” and “Always On” data from the Sense download, then pivot the data to a wide format where those 3 entries are columns.
  • Subtract “Solar Production” from " Total Usage" to get “Net Usage”, so I can compare against my utility’s net meter reading.
  • Sum the the 15 min utility data into hourly readings so that both data sets are in hours.
  • Join the data from the two sources by date and time so everything is matched up, then compare.

What I Discovered in the Exported Data

  • The Sense exported data isn’t necessarily in date/time order in the Exported CSV file that comes out - don’t rely on it being in order. And don’t be confused if the first entries you see in the file are not the first or the last day or hour in your interval. Sense chunks the data when exporting, resulting in potential confusion. Just remember to sort by date / time before thinking the export is wrong.

To get a quick summary of most of this info from out of your 2022 data Export, you don’t need to be a data whiz. All you need to do is to read the CSV into Excel and do a Pivot Table.

2 Likes

14 Hours in 2022 Lost to Sense
Here are 14 hours of 15 minute intervals reported by my utility which Sense missed.

  • The first chunk on Jan 6, labeled “* This data was estimated” truly looks like a power outage. Note how small the usage is - 20Wh to 130Wh.

  • The second chunk, on July 29th, is more likely an internet outage or network problem on my end that lasted beyond Sense’s 6 hour storage.

   DATE       START         END       USAGE NOTES                     Hour    
 1 2022-01-06 02:00        02:14       0.02 * This data was estimated 02:00:00
 2 2022-01-06 02:15        02:29       0.04 * This data was estimated 02:00:00
 3 2022-01-06 02:30        02:44       0.02 * This data was estimated 02:00:00
 4 2022-01-06 02:45        02:59       0.04 * This data was estimated 02:00:00
 5 2022-01-06 03:00        03:14       0.02 * This data was estimated 03:00:00
 6 2022-01-06 03:15        03:29       0.04 * This data was estimated 03:00:00
 7 2022-01-06 03:30        03:44       0.18 * This data was estimated 03:00:00
 8 2022-01-06 03:45        03:59       0.14 * This data was estimated 03:00:00
 9 2022-01-06 04:00        04:14       0.12 * This data was estimated 04:00:00
10 2022-01-06 04:15        04:29       0.13 * This data was estimated 04:00:00
11 2022-01-06 04:30        04:44       0.12 * This data was estimated 04:00:00
12 2022-01-06 04:45        04:59       0.13 * This data was estimated 04:00:00
13 2022-01-06 15:00        15:14       0.01 * This data was estimated 15:00:00
14 2022-01-06 15:15        15:29       0.02 * This data was estimated 15:00:00
15 2022-01-06 15:30        15:44       0.02 * This data was estimated 15:00:00
16 2022-01-06 15:45        15:59       0.01 * This data was estimated 15:00:00
17 2022-01-06 16:00        16:14       0.02 * This data was estimated 16:00:00
18 2022-01-06 16:15        16:29       0.02 * This data was estimated 16:00:00
19 2022-01-06 16:30        16:44       0.02 * This data was estimated 16:00:00
20 2022-01-06 16:45        16:59       0.04 * This data was estimated 16:00:00
21 2022-07-29 06:00        06:14       0.35 NA                        06:00:00
22 2022-07-29 06:15        06:29       0.24 NA                        06:00:00
23 2022-07-29 06:30        06:44       0.22 NA                        06:00:00
24 2022-07-29 06:45        06:59       0.22 NA                        06:00:00
25 2022-07-29 07:00        07:14       0.16 NA                        07:00:00
26 2022-07-29 07:15        07:29       0.81 NA                        07:00:00
27 2022-07-29 07:30        07:44       0.12 NA                        07:00:00
28 2022-07-29 07:45        07:59       0.15 NA                        07:00:00
29 2022-07-29 08:00        08:14       0.26 NA                        08:00:00
30 2022-07-29 08:15        08:29       0.39 NA                        08:00:00
31 2022-07-29 08:30        08:44       0.33 NA                        08:00:00
32 2022-07-29 08:45        08:59       0.11 NA                        08:00:00
33 2022-07-29 09:00        09:14       0.15 NA                        09:00:00
34 2022-07-29 09:15        09:29       0.18 NA                        09:00:00
35 2022-07-29 09:30        09:44       0.23 NA                        09:00:00
36 2022-07-29 09:45        09:59       0.54 NA                        09:00:00
37 2022-07-29 10:00        10:14      -0.07 NA                        10:00:00
38 2022-07-29 10:15        10:29      -0.24 NA                        10:00:00
39 2022-07-29 10:30        10:44      -0.21 NA                        10:00:00
40 2022-07-29 10:45        10:59      -0.06 NA                        10:00:00
41 2022-07-29 11:00        11:14       1.07 NA                        11:00:00
42 2022-07-29 11:15        11:29       1.06 NA                        11:00:00
43 2022-07-29 11:30        11:44       0.21 NA                        11:00:00
44 2022-07-29 11:45        11:59       0    NA                        11:00:00
45 2022-07-29 12:00        12:14      -0.4  NA                        12:00:00
46 2022-07-29 12:15        12:29       0.03 NA                        12:00:00
47 2022-07-29 12:30        12:44       1.42 NA                        12:00:00
48 2022-07-29 12:45        12:59       0.01 NA                        12:00:00
49 2022-07-29 13:00        13:14       0.18 NA                        13:00:00
50 2022-07-29 13:15        13:29       0.29 NA                        13:00:00
51 2022-07-29 13:30        13:44       1.06 NA                        13:00:00
52 2022-07-29 13:45        13:59      -0.39 NA                        13:00:00
53 2022-07-29 14:00        14:14      -0.31 NA                        14:00:00
54 2022-07-29 14:15        14:29       0.33 NA                        14:00:00
55 2022-07-29 14:30        14:44       0.28 NA                        14:00:00
56 2022-07-29 14:45        14:59       0.57 NA                        14:00:00

2 Hours in 2022(?) Lost to My Utility

Here are the two hours I find in my Sense Exported data, but not in what comes from my utility.

  • The first is an obvious, but small issue on Sense’s part - The hour associated with Jan 1, 2023 starting at 12:00AM touches 2022, but doesn’t live in 2022. So this one can safely be excluded.
  • The second hour, 1AM on Nov 6th, is really a reflection of me fixing the “fall back” problem in the Sense data. The 1 AM is really the 1 AM standard time that follows 1 AM daylight savings time, the extra yearly hour in the day, when we “fall back”. I fixed in the Sense data because Sense outputs duplicate 1AM data. So this one is useful to have ins the Sense data, but can’t be compared against my utility data.
 DateTime                          Solar     Always On       Total    Net
1 2023-01-01 00:00:00              0.001       0.513         2.41     2.41 
2 2022-11-06 01:00:00              0.001       0.433         0.834    0.835

Dealing With Duplicates

When I first looked at the Sense Export, I was baffled to see a huge number of duplicate entries - Two Device entries for the same hour often with different values. Then I realized that device Names can refer to multiple Device IDs, which are the real detections, that have been merged. I then did a count of duplicates by Name and Device ID and got the following list:

   Name                        DevID        count
                                  
 1 AC 3                        4466b6a8       834
 2 AC 3                        90172486       834
 3 Always On                   always_on        2
 4 AppleTV Family Room         da94c832         2
 5 AppleTV Master              caa89d7f         2
 6 Cable Amps                  0b550d1e         2
 7 Canon Printer               E4KDKPsK         1
 8 Canon Printer               ks5w26g4         1
 9 Coffee Maker                35d584f6         8
10 Coffee Maker                b05db37b         8
11 Family Room Switch          a12d3fb8         2
12 Fridge                      f022103b         2
13 Fridge 2                    ebd937ef         2
14 Furnace Down                dtSNTXPF         2
15 Furnace Up                  OsBEZsNy         2
16 HP LaserJet                 3b211685         2
17 Hue Down                    09190c6b         2
18 Keene Vent Bridge           46d8391d         2
19 KP115 Roamer                ssi-78bdfde3     2
20 Kyle’s PC                   da6fd827         2
21 Microwave                   d0956b5a       129
22 Microwave                   e0a3bf5d       129
23 Modem                       6f6e7d08         2
24 Office Always On            1022400f         2
25 Playroom Cluster            ba04ecc2         2
26 Recirculation Pump          9f9090d6         2
27 Router                      4bb72296         2
28 Solar Production            solar            2
29 SolarCity Bridge            80cd62b4         2
30 Switch                      e4372a48         2
31 TiVo                        db039872         2
32 Tivo Mini                   04894581         2
33 Total Usage                 mains            2
...

Surprisingly there appeared to be 2 kinds of duplicates.

  • Same Name but different DevIDs - that’s the merged device case, like AC 3, Coffee Maker and Microwave.
  • Same Name, single DevID - How did those happen ? How, became an apparent when I looked at the date and time of those duplicates, all at 1AM on Nov 6th. All of those are related to the extra “fall back” daylight savings transition hours.

Once I moved all those Nov 6, 1AM duplicates to the extra hour, I had a much shorter list, that made sense due to merges.

  Name          DevID    count
1 AC 3          4466b6a8   834
2 AC 3          90172486   834
3 Canon Printer E4KDKPsK     1
4 Canon Printer ks5w26g4     1
5 Coffee Maker  35d584f6     8
6 Coffee Maker  b05db37b     8
7 Microwave     d0956b5a   129
8 Microwave     e0a3bf5d   129

There’s still one minor mystery - the Canon Printer is not merged and is a smart plug. I suspect this duplicate happened when I was setting up the smart plug and I might have reset the device and connected a second time.

I should also point out that I have a number of smart plugs that are merged using “What’s connected to this ?”, but none of those show as duplicates in the Export. That’s because Sense Export only outputs the data from the smart plug, since it is the smart plug reflects the true data for the whole device. Not the same as merge of native detections that are additive.

1 Like

You Might Find Some Hidden Surprises in Your Data Export !
I found this hidden device with Device ID ‘battery’ scattered throughout my Sense hours from July 9th until Dec 28th. No real data, but perhaps related to this ??

 
   DateTime                      DevID  Name    kWh
  1 2022-07-09 02:00:00 battery     NA        0
  2 2022-07-09 03:00:00 battery     NA        0
  3 2022-07-09 20:00:00 battery     NA        0
  4 2022-07-11 10:00:00 battery     NA        0
  5 2022-07-11 12:00:00 battery     NA        0
  6 2022-07-11 13:00:00 battery     NA        0
  7 2022-07-11 14:00:00 battery     NA        0
  8 2022-07-11 16:00:00 battery     NA        0
  9 2022-07-11 17:00:00 battery     NA        0
 10 2022-07-11 18:00:00 battery     NA        0
 11 2022-07-11 19:00:00 battery     NA        0
 12 2022-07-12 10:00:00 battery     NA        0
 13 2022-07-12 11:00:00 battery     NA        0
 14 2022-07-12 12:00:00 battery     NA        0
 15 2022-07-12 13:00:00 battery     NA        0
....
656 2022-11-22 15:00:00 battery     NA        0
657 2022-11-22 16:00:00 battery     NA        0
658 2022-11-23 19:00:00 battery     NA        0
659 2022-11-30 12:00:00 battery     NA        0
660 2022-12-05 17:00:00 battery     NA        0
661 2022-12-06 11:00:00 battery     NA        0
662 2022-12-11 20:00:00 battery     NA        0
663 2022-12-13 00:00:00 battery     NA        0
664 2022-12-14 18:00:00 battery     NA        0
665 2022-12-16 13:00:00 battery     NA        0
666 2022-12-16 14:00:00 battery     NA        0
667 2022-12-20 10:00:00 battery     NA        0
668 2022-12-23 03:00:00 battery     NA        0
669 2022-12-27 12:00:00 battery     NA        0
670 2022-12-28 19:00:00 battery     NA        0

You might also find deleted devices lurking in your Export file as well, but non this year for me.

1 Like

Doing This Analysis in Excel

Here’s the fast way to glean most of what I highlighted above for your data using basic Excel. It only takes a few simple steps:

  • Read the 1-hour data from Jan 01, 2022 to Jan 01, 2023.csv into Excel. Select the region that included the header and columns with the data, all the way to the bottom of your file.

  • Do into the top level Data menu and hit “Summarize with Pivot Table”. Fill in the Pivot Table Fields like this. The “# of Data Hours” user to be “Count of Device IDs”, but I changed the name in the next step.

  • You’ll get a new Pivot Table sheet that looks like this. I have done a couple modifications to my Pivot Table using the customization icon to the right of the Row Labels. I have removed a few devices for clarity, and sorted by the decreasing “Count of Device IDs”, then renamed to “# of Data Hours”

I have also added the type of detection used for each device, to add a little context. It also illuminates a few things.

  • Total Usage and Solar production are the most reliable (always produced data) - Sense produced data for these for 8747 out of 8760 hours in a non-Leap-Year. So 13 missing hours.
  • My most reliable smart plugs produced data for 8693 out of 8760 hours, missing 67 hours in the year. What’s more, is that it is the same 67 missing hours for all the most reliable smart plugs - the 13 hours where it looks like my Sense monitor wasn’t collecting data, plus the same 54 hours where there was likely some kind of network SNAFU in my house that borked even the most reliable smart plugs.
  • Always On was only produced for 7589 out of 8760 hours - 13 missing due to the Sense monitor and 1158 hours missing due to other reasons. Typically, Sense does not produce a value for Always On in export when the value of the calculation is either zero or negative, usually due to something being wrong with the top-line Solar Production and Total Usage data.
  • Devices with multiple Device Ids for a single Name are also clearly visible - Merged devices like AC 3, plus also measurements taken by my Wiser smart plug/switches.
  • You can also see the devices that don’t run much, but use lots of energy in the Native Detections at the bottom - devices like our EVs and oven.

Closer to the End Goal

One of my end goals was to get a better view of my big ticket device over the year. With a little more knowledge of my Exported Data, I’m ready to do that by charting my weekly usage using a program I wrote a couple years ago and have since improved.

Unfortunately, it looks like my Exported data is somewhat less amenable to being plotted than it was a couple years ago. Two things stand out and distort/obscure the results.

  • Kitchen Overheads - There’s a huge Oct power spike for that device that overshadows all the other data. This spike is explainable and shows up in my Sense app data as well - It’s from some buggy firmware in Oct that failed to filter false half million watt surges “measured” by my Wiser smart switch.

  • Other is negative every week - At least according to the Export data. This is where I have to do some digging. Three important things to realize:

    • Sense doesn’t output values for Other via Export. A user has to calculate them as Total Usage - all detected usage (Always On + native detections + smart plugs).
    • Some of the detections in the Export data might be extraneous - I found a second redundant AC 3 device that nearly matched my main one in the Export file. Extra false loads like this can create negative Other.
    • I’m not sure how the Sense app treats negative Other. Places where Other might have gone negative are visible in the Hourly, Weekly and Monthly trend bars - the bar drops to zero. That might indicate that the Sense app zeros out Other any time it goes negative with the app or is used for their data aggregation

I’m going to do three fixes to the data based on these discoveries and see how the results change:

  • Filtering out the extra ‘AC 3’ entry
  • Clamping the ‘Kitchen Overheads’ to 105Wh when they are on.
  • And the same trick Sense seems to do - clamping Other on the bottom to 0.

And voila ! A chart that looks reasonable.

Next step - let’s see how well the data matches the Sense app.

Comparing Export Numbers vs In-App Totals

Time to get some understanding about how my device totals compare against the in-App totals This is going to be a manual process because I have to hand-enter trend data from the app (the Home Assistant integration only pulls Trend data for top level usage and solar production), so I’m just going to spot check the top 20 devices against my 2022 totals from the Exported data.

A few things of note:

  • The only significant difference is Other - in-app yearly total is 2823kWh vs
    • 4273.0 with Export data and calculation, when I zero out hours where Other is negative
    • -899.6 with Export data and calculation, when I leave the negative hourly Other data in
  • But we have to remember that Other is computed value (Total Usage - all detections for that hour - Always On). If any one of those values was different or not included, then Other is going to be off.
  • Several Exported devices had duplicates and in-App did as well:
    • Kitchen Overheads - the device with the huge power spike problem in Oct. I included both in my Export calculation and in-App Sense shows 2 as well. When I add the two in-App numbers together I get the same 5179.1 the Exported data shows.
    • AC 3 - The air conditioner device that has a nearly duplicated Export device, but for a shorter time period. I only included the one over the longer time period in my Exported data. But that matches up with the larger of the two in-App Ac 3’s.
  • There’s also a Floor Heater Guest device that shows up in-App but NOT in the Export, probably because I lumped a bunch of smaller devices into “Other Identified”. If expand to all devices I see something really weird. In-App, I have a “Floor Heater Guest”, spelled exactly that way at 899.2. In Export, I have a “Floor Heat Guest Room” that only comes in at 32.0. Not sure why I have this mismatch between in-App and Export device names.
  • I’ll also point out that if I include the second AC 3 (as Sense does), plus the Floor Heater Guest into the Other subtraction, my Export number comes much closer to Senses, down from 4273 to 3184. So I’m guessing I should do a full direct device-by-device comparison.

The Trouble with Time Series Data !
My Export vs. In-App number comparisons pointed out that I need to do an even more basic device-by-device comparison between the Exported Devices and the In-App Devices. To do this, I carefully scraped the full Device List from the current Sense Device List, from the Sense Web App. Overall statistics:

  • Sense In-App - 78 devices (by Device ID) with 73 unique Names (5 name duplicates)
  • Sense Export - 77 devices (by Device ID) with 71 unique (6 name duplicates)

So not so bad, right - maybe 1 difference ? Turns out it’s a lot more complicated.

  • My first table, below, is a name comparison with the names/w device IDs that are In-App but not in the Export on the vertical and the names/w device IDs in the Export but not In-App on the horizontal.

  • 3 of the names are somewhat different but have matching Device IDs (green, orange, blue). In two the cases the names are only minimally off, one with an extra space and the other messed up around a special character ('). The last is a place where the device looks like it was renamed in-App, but somehow Export is still seeing the default detection name, Water Heater 3.

  • There are also 4 names, 1 in-App and 3 in the Export, where the mismatches are all understandable (the ones in green), given the sources I used. Solar, mains and battery are not in the device list. Other is not in Exported data, but is in the device list.

  • That leaves 5 names In-App and 2 names (duplicates) in Export that don’t have counterparts.

The bottom is a scaled down comparison I did using Device IDs. I removed the ones I resolved. Ultimately a few more unmatched Device IDs showed up, though 6 of those are probably due to duplicated devices (3 in each list), which gets us back to the same 2 and 5 mismatches (in yellow) that existed at the name level.

I should also point out that many of these mismatched devices, though not all, come from Sense integrations and tend to live on forever, even if the origin devices disappear.

Ugly huh ? But I don’t see this as sloppiness on Sense’s part. This looks very similar to problems that I have seen when working on projects for my wife’s company, having to do with analytics for keywords in Google Ads. All the time-series analytics work nicely when everything is stable without changes to Campaigns, Ad Groups and Keywords aren’t changing. But things get much more challenging from an analytics perspective when the overriding structure changes. In Google-land, it’s Campaign, Ad Group and Keyword changes as well as changes in the associated hierarchy. In Sense, it’s new devices, renamings, device deletion and rediscoveries. Both of these result in objects being around that might not have existed at the beginning or end of the analysis period, and in multiple instances with the same object name that may or may not correspond to similar named objects.