A long-term solar simulation with hourly data from Sense - accurately size your system - improvements and input needed

Hi, [update: code and additional comments in reply.]
I’m interested in creating a solar power simulator in R so that I can adjust my daily energy usage (behavior) based on hourly data. I plan to use simulated hourly solar production data from the NREL website and couple that with actual hourly usage data from my sense.

Could someone please provide some guidance on scraping hourly data from the website or any other method that would allow me to automatically enter the previous hour’s electricity use into a table. I found a sample R script in the forum that scrapes daily usage over multiple days, but I’m looking to scrape hourly usage for a single day. I would execute the script once an hour. Thanks for any guidance!


Three suggestions:

  1. Use data export via the web app to pull hourly data, for each day, month or year you want. No scraping needed. Just go to the time period you want in the Trends/Usage window and hit the export icon on the upper right.

  2. There’s already a significant R-based solar simulation package called SolarR. I have done a little exploring with it. I have gotten it to work nicely up to the point where I have to have it compute Global (G0), diffuse (D0), and direct (B0) irradiance components.
    solaR: Solar Radiation and Photovoltaic Systems with R

  3. Some interesting experiments with solar simulation and prediction on this thread:
    Varying goals for solar

Hi Kevin Thanks for your response. I ended up using a macro to get the functionality I wanted, but a purely R based solution would be preferable. Here is the full explanation of what I’m trying to accomplish along with code.

As someone that lives in rural Georgia without net metering, I’m looking for the solar solution that meets some of my needs without sending massive amounts of energy back to my utility. I have an electric vehicle and heat pump based heat\AC, but charge the 60kw capacity car mainly at night. I work at home and have a lot of flexibility in how I use electricity. Most solar installers want to set me up with a massive system that will fully cover my car. I want a simulation that will help me set realistic expectations for how much solar power I can actual produce and use without selling it to my utility for a few cents per kW.

Create a simulation that:

  1. accurately estimates annual production with hourly data points
  2. tracks powerwall capacity, taking into account that a single PW cannot supply more than 5kw
  3. tracks excess power required beyond solar production
  4. tracks unused solar power sold back to the utility
  5. uses actual hourly power use from sense.com
  6. provides hourly updates on system status to inform efficient energy use decisions


  1. Hourly production data was download from https://pvwatts.nrel.gov/ based on location, array design, average weather with normal variability, and other system parameters. This tool only takes a couple minutes to use.
  2. I created a Windows Macro to download hourly usage data from sense.com every hour using Pulover’s Macro creator, https://www.macrocreator.com/. The macro must log out of the website each time, otherwise the newest data did not update on my computer.
  3. I imported all CSV data into Program R and created variables to track hourly Powerwall capacity, production, overproduction, and underproduction
  4. I used the code and the associated App, Pushover, https://pushover.net/ to send hourly notifications that include the daily outlook for production (stand-in for weather forecast), and major variables.


  1. It would be great to have a live estimate of solar production based on all the values that NREL considers along with live weather. Once I have solar I plan to use weather forecast to make decisions about when to use high-load appliances, considering my typical car charging schedule.
  2. A fully R-based scraping or direct download url for hourly data from sense.com would make the simulation more reliable.

The R script is attached. I made quite a few changes as I was adding comments, but I believe everything still works. Please let me know if you find in issues, have questions, or any comment to aid me in goal. Thanks!!!

sense_process_edits.R (3.8 KB)

1 Like

Sounds like a very interesting and useful project. A few thoughts:

  1. if you really want to scrape/download each hour in native R, you can use RSelenium instead of a separate macro. Take a look at my CollectData.R script here for an example.
    Scraping the web app Power Meter for data analysis - #3 by kevin1

  2. it’s easy to figure our best case solar production from the NREL app, or from RSolarR, but real world production is another matter.

  3. What is your objective function you are trying to minimize, as a function of power sold to the grid and power bought from grid ?

Thanks again for the response Kevin.

  1. I spent a lot of time looking at the code from you and the only other example I could find. I struggled to figure out how to go from a resolution of Day to Hours. Any guidance there?
  2. I actually had not considered that these simulators were best case scenarios. I’ve received quotes from 2 companies that provide a similar annual production value.
  3. Assuming I want to have a Powerwall and solar to cover the majority of my household electricity (some of my car electricity), what is the most cost effective solution?

Following about 5 months of sense usage and a simulated 7.7kw system here are the metrics:
3,752kW produced
2,932 kW solar used
4,286 kW bough
822 kW sold
Solar usage = 78.1%

I believe I can get solar usage close to 100% and perhaps warrant a larger system. Without changes in when I use electricity though, I think a bigger system would just cost more money. Obviously if I had true net metering it would be a much easier calculation.

1 Like


  1. Not quite sure where you are encountering the difficulty. I was trying to do a quick and dirty Rselenium script that would do an export, but I’m have an issue getting remDr$findElement to find an element for the export icon that is clickable in the DOM. I may just have to raid the main panel for the hourly data. Again, my main model of operation is to export a year’s worth of hourly data and process entire history in R there, instead of doing daily incremental downloads.

  2. The simulators can calculate solar radiation/flux arriving at the top at the atmosphere fairly exactly, but once the light hits the atmosphere the calculations in the middle are statistical based on historic weather patterns to get the Global (G0), diffuse (D0), and direct (B0) irradiance components. After that, the calculations are direct again, based on the exact geometry of your panels. But the statistical part in the middle means that predictions are really only useful in aggregate (daily, monthly), not so useful for hourly. I have also seen that simulators are generally over optimistic. If I plug my system’s numbers into the NREL calculator, I get this.

But my Sense (and Solarcity) data is well below these estimates (Aug 827kWh vs 681.7kWh)

  1. I like your approach…


You got me rolling. Originally I was going to try to rewrite your code differently so instead of of using hourly or daily updates, the simulation could just run across all existing data for a year, since both Sense and NREL PVWatt can output for a year. Instead, I ended up doing a detailed comparison between my real solar results from Sense vs. the NREL model of my system, because some of the results looked problematic.

Take a look at the R code included. One thing you will notice is that I’m using the export file that comes straight out of Sense, “1-hour data from Jan 01, 2019 to Jan 01, 2020.csv”. I’m also using the data that comes straight out of the NREL app, “pvwatts_hourly.csv”. The readr package in R is pretty good way to read CSV files in directly so no editing is needed. The other thing to note in my code, is that once I create a DateTime for the NREL data for a year, I can merge directly with the Sense data, giving me a data frame, ‘sense’ that has all the data consolidated by DateTime.

But once I merge the hourly data and graph, it’s clear that something is wrong. Instead of straight line or a scatter around an straight line, I get a ‘butterfly wing’. The wing indicates that there is some kind of temporal misalignment between the predicted and actual data, and adding color for the hour really highlights the mismatch. I’m going to have to look at both time bases very carefully, to resolve.

Moving to aggregated daily data should at least help remove some of the temporal issues. And it does give me something closer to what I would expect, a random distribution around a roughly straight line. I’m guessing that a monthly aggregation would converge even more toward a straight line. But realize that the NREL results are only statistically accurate.

Here’s the completely reworked script. I don’t think it will run for you since your don’t have Solar Production data in your Sense output, but a few small changes would enable you to merge your NREL and Sense usage data into a single dataframe. You could just filter on “Total Consumption” and get rid of the cast section of the script.
sense_process_edits2.R (2.3 KB)

1 Like

Hey Kevin,

This is amazing! There were quite a few improvements I need to make and this will definitely help me with a lot of them. Hopefully I’ll have some time to employ them today. Thanks

A few things that you may need to consider.

  1. micro-climates. Is you location prone to fog. I remember living in San Diego and Chula Vista, California as a kid. In Summer Chula vista would get a lot of marine fog. in the summer. Sometimes it would last the whole day. In Eastern San Diego, only 10 miles away, it would be sunny and cloudless all day.
    I currently get more fog in NYC because I am close to the Hudson river. Only a mile inland gets a lot less fog.
    Even snow on a roof could significantly affect solar production during the winter months. Dust from construction and soot from forest fires can also affect production. This is less likely in areas that get enough rain to clean them regularly.

  2. If net metering is not available, how does your load/usage compare to solar production? Can you move loads around to suite your solar production? Dish and cloth washers maybe simpler to move compared to other devices.

  3. If roof mounted, the angle and direction of the roof can have a significant impact on production that is not easy to change.



Good luck, feel free to PM me via the forum if you have any questions about what I did. One of your questions might help me figure out why my numbers depart so weirdly from the NREL hourly numbers. BTW, I get the same “butterfly wing” like pattern when I compare hourly results against the SolarR simulator so I think whatever is causing the difference is systemic. Similar butterfly wing pattern in this thread.

And as I suggested earlier, if I aggregate by month I get something that is very linear between my monthly Sense data vs. the NREL results. You have convinced me to break out my other 6 years of solar history for comparison against the NREL PVWatt yearly template.

Slightly improved and commented script below.

sense_process_edits3.R (2.7 KB)


I did one more experiment before trying to compare my multi-year data against the NREL PVWatt simulation output. I played around to see if PVWatt generates a different statistical distribution each time it is run. The answer is no. I ran with two different invocations, but same parameters, plus I also changed the parameters slightly and I still get the same “year” from a weather/atmospherics perspective. See the screenshot below for the solar production data generated for the first day of all the NREL years. All three times I ran PVWatts, it had the same cloudy conditions that reduced power production during the 11AM interval. The “weather year” NREL uses is statistically representative of a typical year, and will aggregate to something near the expected values, but the hour by hour data shouldn’t be relied on.


One more interesting view into ‘statistical accuracy’. Finally compared my hourly solar data for the past 6 years or so from my inverter vs. the NREL hourly simulation data. Same results as I saw against my Sense solar data (no surprise), but I was able to look across 6 years. Guessing that this might be of interest to @ixu as well.

Hourly Comparison - NREL vs Real
Still a butterfly wing. The few points up near the top are places where Solarcity missed an hourly reporting and doubled-up on the next one (another challenge with making sense of data). I think the “butterfly wing” occurs because my panel geometries (azimuth, incline) don’t square with the NREL or solarR conventions, but it doesn’t matter when integrating over the course of the day. Shortfall in the morning vs. the NREL predictions is made up for by a surfeit in the afternoon/evening.

Daily Comparison - NREL vs Real
Approaching linear

Monthly Comparison - NREL vs Real
Linear - Now you know why SolarCity was able to sell me a power purchase agreement with a money back guarantee on a yearly basis.

1 Like

OK, this little exercise helped me figure out a huge mystery. I understand my “butterfly wing” much better now. The shape had me thinking that there might be a mismatch between the parameters I fed the NREL simulator and my actuals. I had been inputting a pitch of 23 and azimuth of 123 based on some written documentation I had, but when I went back and consulted my plans, I really have 4 different arrays, of which only one, had those parameters (6 panels out of 21). the other 3 have the same pitch of 23 but an azimuth of 123 (15 panels out of 21). Guess I’ll have to do as two separate arrays in the NREL simulator, then sum them and see what happens.

Improvement #1
The “butterfly wing” becomes a narrower “dragonfly wing” if I use an azimuth of 213 for my entire solar system.

Improvement #2
Here’s what it looks like if I create a dual array result in NREL PVWatts using the two different capacities and azimuths. That seems to worsen things. The “eye” in the middle seems to get larger.

Improvement #3
If I offset my real solar production 1 hour earlier, still using the dual array NREL comparison data, the “eye” closes and I nearly get a line.

Experiment #4
If I offset my real solar production by 2 hours instead, the “dragonfly wing” inverts. Afternoon production looks too large vs. NREL, and morning is too big.

So now I’m finally going to circle back to my original Sense solar data vs. NREL. Here’s the chart once I move to the more correct NREL model that uses two arrays with different (correct) azimuths. Once again I get a similar “dragonfly wing” to what I saw against my historic inverter data.

And if I offset the solar production 1 hour earlier, I get essentially a linear relationship, with lots of well-distributed randomness, as I would expect from the weather modeling.

The bottom line @brettabailey, is that this is the the best you can expect for hour-by-hour modeling from PVWatts.

sense_process_edits4.R (3.3 KB)

1 Like

One more discovery as I try to sort out my solar production vs. NREL PVWatts. Realize that the production year that PVWatts outputs is all based on standard time. That’s not an issue WRT to the lost and added hours of production time since both DST adjustments (spring forward, fall back) are during the nighttime (production=0). But it does mean that when you do a production prediction or a comparison against previous production, one has to offset data for spring to fall months by 1 hour. Not surprisingly, that’s what I had to do above to coerce my production vs. NREL PVWatts into something nearer the expected straight line.

Looking at my hourly 2019 (Sense) solar production vs. PVWatts data with data colored by month, you notice two things:

  • My charted production data contains very little standard time data (Nov-March).
  • Where there is production data from standard time (lavender and pink), it is scattered all over the chart. Nov-Mar in my area is the cloudy rainy season. That’s the place where PVWatts is really taking wild statistical guesses at which hours and days are going to be cloudy while local weather does it’s own thing, unlike the far clearer Apr-Oct.

So there is really a good reason for the data to match up when I offset my production vs. NREL PVWatts by one hour. That’s something you also need to take into account when you use PVWatts as a solar production predictor.

If I look at a linear model for my offset production data vs. NREL PVWatts for just the period during DST (Apr-Nov 2, 2019), I get:

Sense Production = 0.002550468 + 0.855212926 x NREL PVWatts

with fits of - Multiple R-squared: 0.8623, Adjusted R-squared: 0.8623.

sense_process_edits5.R (3.7 KB)

1 Like

I finally went back and made a minor mod to the script of yours that I had been tweaking. I added the adjustment that moves the incoming NREL PVWatts predicted solar generation data during DST from standard time to DST. Just a couple lines, but it is custom for 2019 DST. There’s probably a way to use some set of R as.Posix and/or shftime() to do things more cleanly, but i didn’t want to have to dig through to find it.

With that DST adjustment, my 6 year data vs. the NREL simulation comes out much closer to linear, as should happen.

sense_process_edits6.R (4.1 KB)

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.