Need a little help understanding the data export from the web

I’m no stranger to pivottables. I manage an IT helpdesk so part of my job is to take metric reports and analyze the data and pivottables are really great for quickly filtering and visualizing data.

I’m not even wanting to make anything very complex. I just want to track my savings since getting solar. So a spreadsheet that says I would have paid X this month, but instead I paid Y, so the savings is Z. I’ll probably just make each month a new sheet and create a summary dashboard sheet. Since I don’t need very granular data for this, I just exported Sep 1 - Oct 1 with the day interval.

But the way some of the data is organized in the export has me a little confused. I created a simple pivot table that shows name in the rows, and sum of kWh and average wattage is the columns and immediately noticed a few things.

  1. Total usage for each day is recorded as a device. This made my total usage for the month a lot higher than it actually was. But, once noticed this is no big deal to filter out. But this is where I notice some oddities.
  • The device “total usage” says 1155 kWh. When looking at the web trends section for that month it says 1123 kWh. I went back to the table of data, and the export included data from 10/1, so I just removed that data. When I’m looking at a particular month in trends and then export that data, I expect it to only have that month. When doing the export, it does say 9/1 - 10/1 as the data range, but you can’t change this so I had assumed it would stop at 12:00am 10/1.
  • Data from 10/1 has been removed from the table. Now in the pivottable, total usage shows 1123 matching the web. Now in my pivottable, I filter out that device. But the pivottable is showing the grand total as 468 kWh. Why does the total off all my devices not add up to 1123? EDIT: I figured this one out. “Other” is not included as a device
  1. The solar Production device data is just broken. The Avg Wattage and kWh columns shows as negative and has an apostrophe in front, so Excel thinks it’s a String and not a number.

    This breaks the pivottable. It shows 0 as the sum of kWh and a device by 0 error for the average wattage.

  2. This is unrelated to the data I actually care about, but why is Avg Wattage calculated as kWh / 24 and not kWh / device runtime? My water heater shows as having an avg wattage of 143W. This makes the column pretty useless imo.

I might run into more issues as I play with the data, but this is what I’ve noticed so far.

1 Like

Nice work… A few thoughts - When I use pivot tables with Sense exported data, I tend to do a few things differently:

  • Export for the the whole year, sometimes in days and sometimes in hours depending on my needs. Don’t rely on the export to break up over time - do that in the pivot table.
  • Spread the devices over the columns so I can see device totals for each column
  • Or filter for a single device / Total Consumption in the Pivot Table filter.
  • Use the adjustable period capability in the Pivot Table to either sum power usage for days months or years, or over a customized billing period.

You can also use computed row values with hourly data and a lookup table table to add things like time of use (ToU) costing and calculations.

You are right, that Average Power based on the entire period vs. runtime isn’t that useful. It’s essentially the same as the energy data, though sometimes it will have more significant digits.
And yes, the ’ before negative values is a bug and needs to be fix. Let folks know at support@sense.com.

Here is what I’ve found useful:

  1. I export all my sense data by the hour interval into a giant CSV file.
  2. Have Excel read in the CSV file, then save as .xlsx (so that auto saves capture the following edits within Excel).
  3. Put a data filter on the table contents.
  4. Add several new columns after the “DateTime” column, with the following headers and formulas (pasted down the entire huge table):
    1. Column header: “Year”
      1. Formula: =TEXT(A2, “yyyy”)
      2. Comment: cell will contain {…, 2018, 2019}
    2. Colum header: “Month”
      1. Formula: =TEXT(A2, “mmmm”)<\li>
      2. Comment: cell will contain {January, February, ... , December}
    3. Column header: “Day of Month”
      1. Formula: =TEXT(A2, “dd”)<\li>
      2. Comment: cell will contain {01, 02, … , 31}<\li>
    4. Column header: “Hour”
      1. Formula: =TEXT(A2, “hh”)
      2. Comment: cell will contain {01, 02, … , 23}
    5. Column header: “Day of Week”
      1. Formula: =TEXT(A2, “dddd”)
      2. Comment: cell will contain {Sunday, Monday, … , Saturday}
    6. Column header: “Billing Cycle” (similar to “Month”), or “True Up Interval” (similar to “Year”)
      1. Formula: (figure this out yourself if you care; I haven’t)
  5. Since Sense exports the “Solar Production” data with an apostrophe before the negative (production) number for both “Avg Wattage” and “kWh”, fix them by adding 2 new columns with the following headers and formulas (pasted down the entire huge table):
    1. Column header: Avg Wattage (fixed)
      1. Formula: =VALUE(IF(LEFT(G2,1)="'",RIGHT(G2,LEN(G2)-1),G2))
      2. Comment: strips away a leading apostrophe and makes the resulting cell contain a value rather than text.
    2. Column header: kWh (fixed)
      1. Formula: =VALUE(IF(LEFT(H2,1)="'",RIGHT(H2,LEN(H2)-1),H2))
      2. Comment: strips away a leading apostrophe and makes the resulting cell contain a value rather than text.
  6. Select the massive table, then insert a pivot table on a new worksheet.
  7. For the pivot table “Filters”, add “Name”. Certainly {Total Usage, Solar Production} are 2 good filter values are good ones to use. (Sense device identification problems make the others not useful in my opinion.) You can also select both “Total Usage” and “Solar Production” at the same time in the filter, if you want the pivot table to display energy “exported to grid” (negative numbers) or “imported from grid” (positive numbers).
  8. For the pivot table “Values”, add “kWh”. Depending on what kind of report you want, you’ll need to change the Value Field Settings to {Average, Sum}…maybe even “Max”.
  9. Now you can slice and dice the data in many different ways. I keep the “Grand Total” rows and columns, as they provide additional insight without having to even more pivot table variants:
    1. {Average,sum} energy {consumed,produced,grid_imported,grid_exported} by row=”Day of Week” for each column=”Month”
      1. Interesting trends by day of week can learned, as seasons change (monthly) throughout the year.
    2. {Average,sum} energy {consumed,produced,grid_imported,grid_exported} by row=”Day of Month” for each column=”Month”
      1. The output of this pivot table I can export to another Excel spreadsheet where I’ve modeled different tariff options to see which is best, and where I can play around with the historical consumption data like “what if I were to run my pool pump during “off-peak” hours rather than ‘peak’ hours?” or “what if I get another electric car and charge at night?”
    3. {Average,sum} energy {consumed,produced,grid_imported,grid_exported} by row=”Hour” for each column=”Month”
      1. Interesting trends by hour can be learned, as seasons change (monthly) throughout the year.
    4. {Average,sum} energy {consumed,produced,grid_imported,grid_exported} by row=”Hour” for each column=”Day of Week”
      1. Interesting trends by hour can be learned, for different days of the week.
    5. (lots more ways the pivot table can be sliced and diced>)
  10. I wish Sense developers would actually add this kind of report/metric generation natively into the app/web “trends” data visualizer. Heck, just the “imported from grid” and “exported to grid” would be highly desired…like the Enphase Enlighten visualization.
4 Likes

Totally cool… You know that you can even post a sample .xlsx file of what you have done if you want to. The forum upload allows .xlsx. I tend to post my most useful R scripts over in the Data Analysis are of the forum…
https://community.sense.com/c/data-analysis

1 Like

Just an FYI, I’ve logged a bug about the solar data issue.

1 Like

(Allow me to upload files, and I’ll attach my .xlsx template. I get a “Sorry, new users can not upload attachments” error when I try.)

You should be able to now. Let me know if it’s still not working.

Here is the Excel template I use that implements steps 1-9 I list above.
[1-hour data from Jan 01, 2019 to Jan 01, 2020.xlsx|attachment]1-hour data from Jan 01, 2019 to Jan 01, 2020.xlsx (2.3 MB)

2 Likes

Nice spreadsheet, thanks for sharing.

How would one use this with their own data? I know enough Excel to be dangerous and that’s about it.

How would one use this with their own data?
@ken2, my initial contribution to this thread 4 days ago included instructions on how to do this Excel pivot table analysis with any exported data set. The later included .xlsx was more of an example and template, in case the instructions were not clear. That said, the .xlsx could be used directly, with the following instructions.

  1. Export Sense data by the hour interval into a CSV file.
  2. Have Excel read in the CSV file.
  3. In another Excel session, open the .xlsx template that I provided earlier, and go to the "SenseExportByHour" tab
  4. Copy your column of "DateTime" and paste into cell A3.
  5. Copy your columns "Device ID" through "kWh" and past into cell G3.
  6. Make sure the equations in columns B-F and O-P extend for all the rows of your data.
  7. Click on the "PivotTable" tab, the click on the table itself, then click on "PivotTable Analyze" in the menu bar, then click on "Change Data Source->Change Data Source...", then make sure the "Table/Range" field includes you entire table length...you'll need to update the final row information.
  8. Now it's time for you to slice and dice the pivot table how you see fit, for whatever analysis makes sense to you.
If I had more motivation, I'd update the .xlsx to automate steps 2-7 here, but I'm not an Excel super user either, and I know this would take me several hours to figure out.

Thanks James for the reply. I wasn’t seeing the formulas in the .xlsx template. I’ll rebuild it using your prior write-up.

Thanks again

@ken2, my apologies, the XLSX I had uploaded earlier indeed didn’t have formulas. (When I had originally attempted to upload it, it got rejected because it was over 4MB, so I quickly removed most of the per-device data, and in that process must have wiped away the formulas.) I’ve since re-added the formulas, and edited my earlier post to have the corrected XLSX template.

@james_reilley Thanks, I’ll give the latest version a look.

Update - Works great, thanks.

We’re trying to figure out this '- bug. Interestingly, I’m seeing the ' in Excel but not Google Sheets. The negative is shown in either case.

@kevin1, since I know you’re big on Python and R, what are you seeing when you manipulate directly there? Still seeing the '?

Didn’t try, but I’m guessing Google parses out the apostrophes.

One simple method, before Excel import, would be to remove the apostrophes in a text editor.

Here’s a sample from a web export:

I’m traveling today. Might be able to test out tomorrow. My experience with .csv’s is that different spreadsheets and R parse things differently. For instance the R package finds a real problem with Ecobee .csv output files (missing comma) that Excel just parses OK. The R package tends to throw warnings and errors more strictly.

@RyanAtSense,

As @ixu noted in example, export still has issues for Excel. Negative solar readings (positive energy generation) are still preceded by unmatched signs quote. I think export either needs to leave off the sign quote or put a second one after the negative entry.

2019-11-16 08:00:00,solar,Solar Production,Solar,,,,'-226.802,'-0.227 

Excel treats the unmatched single quote as a character, converting the Excel entries into text cells instead of numeric.

R readr parser does the same except it converts the entire column into characters, or it complains if
you try to cast the column into a numeric type.

One somewhat PITA method is to import the exported .csv into Excel vs simply just opening it. That way you can set the cell delimiters … you end up with some extra columns but those are easy to deal with.

03%20PM

1 Like