Solar Production Analysis over 6 years


I’m celebrating the start of the 7th year of solar production on my SolarCity solar system this year and decided to see if I could sort out some kind of predictive calculation that was a function of yearly periodicity, days in service (dirt and degradation), plus any other time factors (like day in year). Now, my solar install preceded Sense, so all this daily production data comes from web access to data coming from my inverter over the years.

First I looked via linear regression, at all the output production versus:

  • Cyclic = cos(2 * pi * (day in year (Yday)- 172)/365)

    • Day 172 is Jun 21st, longest day of the year.
  • Days in Operation (DIO) - to account for degradation over time

  • Yday - any other weirdness about the time in the year.

Here’s what the fit looked like:

Broad distribution of points all living inside an envelop of the max possible solar production, gated by max normal solar radiation per day. Reasonable fit given all the negative variability depending on weather - Adjusted R-squared: 0.7107. You can also see the degradation effect in the lowering fit lines - I lose about 1.6W of daily capacity every day of operation. A good reason to have a power purchase agreement with a guarantee. Here’s the linear model information:

lm(formula = kWh ~ Cyclic + Yday + DIO, data = SolarHist)

    Min      1Q  Median      3Q     Max 
-20.451  -1.147   1.007   2.432   6.459 

              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 19.6539052  0.2157409  91.100   <2e-16 ***
Cyclic       8.2409384  0.1176425  70.051   <2e-16 ***
Yday         0.0022278  0.0007835   2.843   0.0045 ** 
DIO         -0.0015961  0.0001297 -12.311   <2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.826 on 2189 degrees of freedom
Multiple R-squared:  0.7111,	Adjusted R-squared:  0.7107 
F-statistic:  1796 on 3 and 2189 DF,  p-value: < 2.2e-16

After this, I decided to look at just the maximum value on every given day in the year to help compensate for weather variability. I also pulled the days in operation (DIO) variable out of the equation since I’m looking at the maxes over a period of 6 years.

Much better fit with less weather variability. Adjusted R-squared: 0.9455. Plus the Yday factor becomes negative and increases, plus becomes more significant. I’ll need to look more closely at why that happens. I was actually getting a slightly better fit when I used a 162 day offset, rather than a 172 day offset inside my cyclic cosine component. Would love to see what other solar users are seeing, though this might not be the best forum.

Here’s more info on the model that fits the max:

lm(formula = Max ~ Cyclic + Yday, data = SolarHist)

    Min      1Q  Median      3Q     Max 
-7.7307 -0.8023 -0.0357  0.7073  3.9201 

              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22.4786629  0.0579658  387.79   <2e-16 ***
Cyclic       7.8434592  0.0411364  190.67   <2e-16 ***
Yday        -0.0036100  0.0002757  -13.09   <2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.347 on 2190 degrees of freedom
Multiple R-squared:  0.9455,	Adjusted R-squared:  0.9455 
F-statistic: 1.9e+04 on 2 and 2190 DF,  p-value: < 2.2e-16


Very nice. Interesting to see the degradation across the years, way more consistent than I would have guessed.


Could you express this as a percentage? I do believe this number, 1.6, will vary by the size (total wattage) of the system. Also, I would expect the percentage to be larger in the early years and smaller in later years, excepting for a sudden failure of a panel or panel component.


Given my max production is in the neighborhood of 30kWh per day, that’s just shy of 2% degradation per year. I’ll try yearly fittings to see how much the factor varies.

Edit: Ooops - easier said than done. When I look at regression on yearly-only basis, the DIO (days in operation) completely correlates with the Yday(day in year) feature. That means they get run together in the regression (actually the DIO factor gives singularities). What I really should do is figure out how to do a tighter fit without the Yday feature (that helped accuracy, I think, because we have fewer cloudy days in the second half of the year).