I’m on a mission now to “scrape” and recover data displayed in the Sense web app, back into relatively higher resolution data than we can get out of export. The good news is that I have been able to successfully recover data using just the web vector graphics data. But it’s still challenging:
Data is displayed by Sense as a HTML vector path construct - pretty easy to extract
The current path data for a day on the screen actually consists of about 3 days worth of data - has to be clipped to a single day window
The data is all in terms of the on-screen coordinates - data has to be rescaled to a 24hr period and to the vertical scale maximum for that day
At this point in time, I have 4288 data points per day all properly scaled, with a few more challenges to unpack.
All the data is in a path format so I still need to reduce to a single point per time format - simple reduction
But before that, I have to make sense of the data which has 2 (or really 4 until I do the path to point reduction) power values for each time point. The way Sense displays the data, it does a right-ward sweep of data, then comes back with a left-going sweep via the path construct where the right going and left going sweeps, put down different power values for the same point in time.
For well behaved days, the numbers aren’t all so different and the left looks like a shadow of the right trace.
This is great and something that I was starting to explore but knew you were in deeper …
Thought: Abandon the left sweep?
looking (i.e. not at the underlying path data) at full resolution you always see spurious verticals that don’t correlate to displayed wattage when dragging left/right.
Have you tried smartplug data at your 6s sample? [Edit: I mean looking at it in app vs web … realizing you can’t get device data on web]
Good suggestion. I’m actually going to try separating the right scan from the left, then aggregating both into hourly energy/power. Then I’ll compare against the Sense hourly exports and see which one comes closest. First I have a little code cleanup to do and then I need to move back to my highest res monitor because the amount of data delivers in the HTML paths increases with browser window resolution and width, as far as I can tell. More later !
BTW - with my 5120 x 2880 monitor, I can extract a data point every 35 sec or so. Still not every 1/2 sec or every 2 secs, but much better than once per hour. Let’s hope the Power Meter graphs are accurate.
Here’s the test result - a moderate success. Tested on 5 days worth of data and the results look linear, though there is a bit of variation depending on whether I use the right sweep or the left sweep, and whether I use the vertical or horizontal moves of the path as my data points. It’s also clear that I need to add a fudge factor into my power/energy calibration to get matching results.
This is a comparison between Sense supplied energy data vs. the computed aggregated energy data from my graphics grabs of each day.
Finally managed to work on this a little more and discovered another issue. Sometimes the data scraped from the page doesn’t really match the display window size and doesn’t scale appropriately. Not sure why just yet and it is tough to track down since things are fairly dynamic when I’m actually scraping the data.
This is what my first “big data” test looked like… Crazy data scattered throughout. Many power values way too large. All the data that shows reasonable correlation is down on the bottom in the line that looks like it is zeroed out. Typical data science challenge - finding the signal in the noise.
The good news is that it is easy to detect and remove days where the data exceeds the window size. Once I remove those days I still have 76 days of data to work with. Plus I get a much more linear plot:
No matter how I slice the path data, all four combinations have R2 fits of around 0.9955 vs. the Sense data. Using all the data, my computed energy (from data scraping) = 0.8345 * Sense Energy + 18.9893W. So my fudge factor = 1/0.8345 or basically 1.2.
So hurray - I now have a way of extracting 35 sec resolution data if I really, really want it.
I guess it’s worth asking – why can’t Sense give us a user-level ability to save a CSV file, with 1 samples per minute level of detail (or finer)? We can scrape it, so why can’t they provide it?
It’s a bit fraught to scrape it, since they return maybe 3 days of data for a 1 day request, and the y-axis seems to be distorted. Takes some playing around to get the constant offset and scale correct.
Add your “like” to the top of this Wishlist Request.
Right now, I believe Sense exports data at hourly and daily level since they are already aggregating data at those intervals for the Trends. Adding smaller intervals would certainly add computation and storage costs for Sense on AWS.
I think there is a complex storage hierarchy for data display and for learning and training. The Sense mothership keeps 1/2 second data for display for the Power Meter, 2 sec data for smartplugs, but then downsamples older display data at some threshold point in time to 1 minute data. There is probably an entirely different derived set of data kept around for training. But probably the only homogenous time resolution data Sense has around is the aggregated hourly and daily stuff.