Scraping the web app Power Meter for data analysis


#1

OK, I finally broke down and wrote some simple code to scrape the Power Meter waveforms from the web app. Not incredibly helpful if you want to pull second-by-second power data because all one can really pull out is graphical path data. But I really wanted to write some code that counts the number of zero data events (dropouts) occurring over a period of time. That seems possible via web scraping.

I have included two R programs here, the first (CollectData) that uses the RSelenium package to login, traverse and pull data from the web app on a week-by-week basis. The second, DropDetect, analyzes the weekly “waveforms” to detect events where the Total Usage power drops to zeros, and to also to detect events where the Total Usage goes below zero. BTW - I don’t know if it is the new RSelenium 1.75, or better coding and luck on my part, but my mean time between web crashes has vastly improved over my experience about 1 year ago.

Not too pretty here. The Sense web app offers up a huge set of path coordinates for displaying both the Total Usage and Solar waveforms. With a little bit of parsing and reformatting, they can be converted to a .csv format, and saved away along with additional meta data about the data (Range and zero location information). The meta info is needed since all the path geometry data generated by Sense is web-window-size specific. To get the greatest range of data and the best resolution, I ran the web scraping with a full monitor Chrome window. Even with that, I still might be missing some extremely short zero/dropout events that weren’t visible in the 1 week window, but I’m not sure I care about those. I do care about the dropouts visible at the 1 week level though.

I can also plot the waveforms, though Sense, for some reason, presents the Total Usage and Solar y-data inverted in the web app, then flips via a transform, to display. I decided to do all my work on the “raw” display data, which is simple to reformat for use by ggplot.

Here are 3 weekly plots replayed. Red dots are zero-events, blue dots are negative-events, though they are both inverted in my plots. Just ignore the axis values because those are all just relative values used for displaying the data in the web app.

My “bad” end of Dec. - 140 dropouts in the week.

A better new year - only 18 dropouts the next week. I stopped leaving my iPad on realtime display. But also, a big self inflicted network outage on Dec 9th that wiped out many hours of data.

Here’s what negative Total Usage looked like back ins Sep 18. Glad that’s over with !

OK - You have read this far… Here’s the code. LMK if you have any questions… It’s still pretty raw.

CollectData.R (3.2 KB)

DropDetect.R (1.4 KB)


Missing Data Continuously
#2

Nice.

I’d like to scrape the phone app Power Meter to get 1s device data, but it seems like a lot of work in unexplored territory. Maybe someone else will do that first.


#3

More data !

I reworked my program this aft to liberate the daily (vs. weekly) Power Meter waveforms plus managed to get it to scrape all 535 days of data in a single go. Trickier than it looks, since the Sense web app doesn’t like snapping to exact day boundaries. I had to pre-compute the URLs for the windows I wanted to look at, then ask for them individually, rather than using the day-to-day traversal in the web app. Runs more slowly that my weekly scraper since Sense does caching when using their menu traversal arrows (plus many more days). I have to build in delays to allow Sense to update the display before scraping…

Thanks to this daily approach, I now have a fairly accurate census of my data dropouts over time. I say “fairly accurate” because I did notice some data overspraying in the time dimension between the the data available via scraping vs. what is visible on the web app. Scraping gives a little extra data outside the viewing window, so my results may have a small number of double counted dropouts on the borders between days.

Here’s a column chart of dropouts over time, with color indicating the other monitor issue I experienced, events where total usage actually went negative due to a reversed CT.

The spike around Sept 13, 2018 was mainly a self imposed “negative total usage” event. My electrician reversed a CT when we added extension cables, and I couldn’t get him back out for several days…

Here’s a view from my “dropout detector” for one of those days - the blue dots are negative crossovers, the red dots are places where total usage “touches” zero. Many of the red dots are occluded by the blue dots because they are same event.

Here’s the same thing in the Sense web app:

What about the bad dropout days, without any negative going Total Usage? Consider Dec. 28, 2018, below:

And the corresponding Sense web app picture:

Daily scraping works, at least for my limited purposes !

Oops - here are the new day-by-day collection and detection scripts. Remember to customize using your login and password before you try.

CollectDataDays.R (3.7 KB)

DropDetectDays.R (1.9 KB)

Enjoy !


#4

One final chapter on this enterprise. I invested a bunch of time in “finding” data dropouts, because I kept seeing the impact of dropouts in Sense exported data. “Always On” data is exceptionally sensitive to dropouts, though I don’t know enough about how it is calculated to understand exactly why. But anomalies in “Always On” gave me a way to pretty much automatically locate 334 hours over the past 535 days, that included one or more data dropout events. In the chart below, I overlay the number of hours with detected dropout events (black line) on top of my column chart of dropout events per day. They correlate hugely, though that should be no surprise.

What surprised me was that “Always On” anomalies found about 95% of those hours. Yes, I had to do a little filtering because there were some false positives. And yes, I did have to search in one more region (Always On < 0.135kWh) to find the remaining 5% of the hours with dropouts.

As you can see from the chart, dropout events are discretely different than dropout hours. You can have more than one dropout event per hour, but a big single dropout event can also subsume several hours. It also looks like I missed a few hours despite my intense search, but I can easily go back and locate them now, if I want to.

Best of all, I can put this experiment to rest and hope that Sense finds the data leaks that cause dropouts in the history timeline and export.

More on finding data dropouts using “Always On” data here:


Does anybody have any cool graphs or analytics using Sense data - Please share!