Converting Web App Data Back to Real Data

kevin1 · September 6, 2019, 2:18am

I’m on a mission now to “scrape” and recover data displayed in the Sense web app, back into relatively higher resolution data than we can get out of export. The good news is that I have been able to successfully recover data using just the web vector graphics data. But it’s still challenging:

Data is displayed by Sense as a HTML vector path construct - pretty easy to extract
The current path data for a day on the screen actually consists of about 3 days worth of data - has to be clipped to a single day window
The data is all in terms of the on-screen coordinates - data has to be rescaled to a 24hr period and to the vertical scale maximum for that day

At this point in time, I have 4288 data points per day all properly scaled, with a few more challenges to unpack.

All the data is in a path format so I still need to reduce to a single point per time format - simple reduction
But before that, I have to make sense of the data which has 2 (or really 4 until I do the path to point reduction) power values for each time point. The way Sense displays the data, it does a right-ward sweep of data, then comes back with a left-going sweep via the path construct where the right going and left going sweeps, put down different power values for the same point in time.

For well behaved days, the numbers aren’t all so different and the left looks like a shadow of the right trace.

But other days, where there has been a data dropout or 3, I get less correspondence between the right and the left:

Any creative thoughts on the best way to resolve the two lines ??

Matching Sense web app view below.

ixu · September 6, 2019, 8:22pm

This is great and something that I was starting to explore but knew you were in deeper …

Thought: Abandon the left sweep?

looking (i.e. not at the underlying path data) at full resolution you always see spurious verticals that don’t correlate to displayed wattage when dragging left/right.

Have you tried smartplug data at your 6s sample? [Edit: I mean looking at it in app vs web … realizing you can’t get device data on web]

kevin1 · September 6, 2019, 9:09pm

Good suggestion. I’m actually going to try separating the right scan from the left, then aggregating both into hourly energy/power. Then I’ll compare against the Sense hourly exports and see which one comes closest. First I have a little code cleanup to do and then I need to move back to my highest res monitor because the amount of data delivers in the HTML paths increases with browser window resolution and width, as far as I can tell. More later !

BTW - with my 5120 x 2880 monitor, I can extract a data point every 35 sec or so. Still not every 1/2 sec or every 2 secs, but much better than once per hour. Let’s hope the Power Meter graphs are accurate.

kevin1 · September 7, 2019, 3:49pm

Here’s the test result - a moderate success. Tested on 5 days worth of data and the results look linear, though there is a bit of variation depending on whether I use the right sweep or the left sweep, and whether I use the vertical or horizontal moves of the path as my data points. It’s also clear that I need to add a fudge factor into my power/energy calibration to get matching results.

This is a comparison between Sense supplied energy data vs. the computed aggregated energy data from my graphics grabs of each day.

Next step is to do for a much larger number of days, then:

Figure out the best fit combination
Compute the “fudge factor”

kevin1 · September 11, 2019, 6:24am

Finally managed to work on this a little more and discovered another issue. Sometimes the data scraped from the page doesn’t really match the display window size and doesn’t scale appropriately. Not sure why just yet and it is tough to track down since things are fairly dynamic when I’m actually scraping the data.

This is what my first “big data” test looked like… Crazy data scattered throughout. Many power values way too large. All the data that shows reasonable correlation is down on the bottom in the line that looks like it is zeroed out. Typical data science challenge - finding the signal in the noise.

The good news is that it is easy to detect and remove days where the data exceeds the window size. Once I remove those days I still have 76 days of data to work with. Plus I get a much more linear plot:

No matter how I slice the path data, all four combinations have R2 fits of around 0.9955 vs. the Sense data. Using all the data, my computed energy (from data scraping) = 0.8345 * Sense Energy + 18.9893W. So my fudge factor = 1/0.8345 or basically 1.2.

So hurray - I now have a way of extracting 35 sec resolution data if I really, really want it.

kevin1 · January 4, 2021, 5:59pm

Hi all,

@h.265 did a great job porting my R work here, to Python. It seems to work, and is available here:

github.com

TCNJ-ECE-IMPL/tools/blob/master/collect_data_by_days_and_dump_public.py

# Adapted from the original R version, which was written by Kevin Kranen

# Semi automatic scraping - user:
#   selects dates
#   sets Sense user name
#   sets Sense password
#   sets debug_flag (indicates whether to save debug information)

import re
import datetime as dt
import time
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

############################################# User settings ###################################
start_date      = dt.date(2020, 12, 14)
end_date        = dt.date(2020, 12, 29)
sense_username  = "insert_your_email_address@domain.com"

This file has been truncated. show original

h.265 · January 4, 2021, 6:23pm

Kevin, thanks!

Larry

h.265 · January 4, 2021, 8:25pm

I guess it’s worth asking – why can’t Sense give us a user-level ability to save a CSV file, with 1 samples per minute level of detail (or finer)? We can scrape it, so why can’t they provide it?

It’s a bit fraught to scrape it, since they return maybe 3 days of data for a 1 day request, and the y-axis seems to be distorted. Takes some playing around to get the constant offset and scale correct.

Thanks,

Larry

kevin1 · January 4, 2021, 8:39pm

Add your “like” to the top of this Wishlist Request.

Right now, I believe Sense exports data at hourly and daily level since they are already aggregating data at those intervals for the Trends. Adding smaller intervals would certainly add computation and storage costs for Sense on AWS.

h.265 · January 4, 2021, 9:16pm

Thanks - but, then, the minute-by-minute data must be stored somewhere?

kevin1 · January 4, 2021, 9:47pm

I think there is a complex storage hierarchy for data display and for learning and training. The Sense mothership keeps 1/2 second data for display for the Power Meter, 2 sec data for smartplugs, but then downsamples older display data at some threshold point in time to 1 minute data. There is probably an entirely different derived set of data kept around for training. But probably the only homogenous time resolution data Sense has around is the aggregated hourly and daily stuff.

Topic		Replies	Views
What's new in Web App v4: Data Export App Release Notes	63	13342	July 2, 2024
Scraping the web app Power Meter for data analysis Data Analysis	3	1837	January 23, 2019
Sense lost track of time/date- web only! Technical Questions	18	696	January 11, 2024
Everything You Wanted To Know About Data Export (But Were Afraid to Ask)* Data Analysis	8	1371	January 26, 2023
A combination of two displays ... Is it possible at all? Technical Questions	3	469	November 9, 2021

Converting Web App Data Back to Real Data

Related topics