Varying goals for solar

Sadly, not a story where I could say Sense solved a problem for me, but with an enhancement to the goals functionality, it could have.

I have two solar inverters; in September of last year, one of them failed. Unfortunately, at least as far as the solar production is considered, I had a daughter four days later, and I didn’t really pay much attention to solar production for a while. This was also the first year that the solar array was up, so I couldn’t immediately compare my current generation to last year’s generation. However, once I hit January, looking back, I can definitely see the decreased generation, but again, I was otherwise occupied. Sense does have goals, and I could put in some sort of solar generation goal, but it’s for every month, in total, and based on where I am, my solar generation can vary from .5 MWh and 2.5 MWh per month throughout the year. So it would be nice if I could have goals on a per month basis, or even possibly better, a relative basis, where I could say “I want to generate at least 80% of the solar power as I generated one year ago”, or something similar.


SENSE! This seems like a prime target for an ML/intelligent auto-alert that would be relatively easy to implement.

Solar has predictable and expected generation based on the panel baseline efficiency, age and weather conditions … and one assumes that an unmodified array could be “locked in” early in the learning cycle IF (and perhaps only if) the learning incorporates local weather conditions? Have higher brains thought this through?

This seems too complex for reliable user-determined alerts and needs the backend brain to work things out. i.e. Its NOT an example of the following:

I think it would be great to see the long list of things that Sense can Save and does (in theory, at least) actually Save!

Daily solar goals are probably better defined by a simple periodic algorithm than by delving into ML unless the ML can predict cloudy weather. I did a fitting of daily solar production and it is fairly predictable, except for those darn clouds.

1 Like

Well, precisely.

I love your charts and agree up to a point but if you don’t consider cloud cover then an inverter failure or other panel anomaly would at least take longer to determine. No?

Time is energy wasted!

(Sure, you can take that both ways: Time taken to consider the clouds or energy wasted while not considering the clouds)

I don’t think ML can accurately predict cloud cover unless it is fed some features that have some level of a causal relationship, or at least correlation with clouds. So you would need to supply something like the hyper-local weather/cloud prediction or the output of a solar cell into the ML dataset to predictably adjust the goals to account for some degree of cloud variability. You could also try to use historic data, but the variability is some large that the predicted window would be need to be quite wide to achieve 95% confidence.

1 Like

Ironic indeed.


A separate known-working solar panel (area S1) on a Smart Plug is the cloud cover.

The working solar array (area S100) output is the cloud cover = 100 x S1.

A failing solar array (area S100) would generate < 100 x S1.

What one can argue about is the overall system redundancy: How reliable & precise is the calibrator? That said, any calibration (& expectation) is better than none.

A working solar array on a cloudless day essentially calibrates to its location!

Your charts make a self-calibration inference seem tantalizingly possible, especially if you are using ML across multiple locations and panels (regardless of cloud and location awareness). For example, if 2019 doesn’t fit nicely in your chart you’ll guess something’s awry. The problem of course is the large amount of data that you need to make those determinations “with 95% confidence”.

I have an Awair that calibrates its CO2 readings in a way that would seem applicable here.


You have me thinking about this some more - Two thoughts:

  • A simple solar cell with a shunt resistor, even one that doesn’t have the same orientation as one’s solar panel system, would be a super-reliable feature to give local cloud input, to set daily goals.
  • ML could be made to work if one looked at power output over the course of multiple days, where the length of time was sufficient to meet the required confidence interval.

Going to do a little experimenting with prediction using different windows when I get a chance. The interesting question to me is whether looking at a string of hourly data or a comparable string of daily data will give a quicker or more accurate predictor of a hard fail (near zero output from inverter).

BTW - Here’s a different view of my solar production over the past 6 years from a specific kind of time-series machine learning analysis, an STL (“Seasonal and Trend decomposition using Loess”) plot. It breaks down the observed time series in to 3 components, a seasonal component based on a yearly cycle, a trend line component that shows the long term change (panel dirt and degradation), and a random component, mainly due to clouds. It gives some great insights into possible triggers for alerting us to a broken inverter of panel (random component is negative and magnitude is greater than some value). Or trend line drops off more steeply ?


I’ve been playing around with this idea a bit more looking at various analyses of my past history. But the the real challenge is that I don’t have a dataset that includes inverter failures and other possible error modes. And without errors, real or simulated, I can’t do any ML training or testing. Any thoughts on what errors look like ? @kdm saw consistently reduced output, presumably with the same kind of random ups and down %s as a fully functional inverter.

1 Like

A couple of thoughts here:

  1. Point drop-offs in the trend line indicates a possible failure mode. This is particularly true if the trend continues after the drop-off.
  2. Could weather be separated out by comparing the dataset with other solar production datasets from geographically close neighbors?
  3. Is there publicly available solar radiation data by zip code? I mean, you could gather solar radiation data from your own personal weather station… I benchmark my output against output from neighbors who publish their data on

I think @kds had a complete inverter failure. Since he has two strings, he probably saw output drop by 50% from that day forward. (Or whatever percentage represents the output fraction from the inverter that failed). This problem becomes harder to detect when you have micro-inverters since you have 15-40 micro-inverters, each subject to failure. The nice thing is that you can have panel-level monitoring so you can detect these kinds of failures through the monitoring system. Note that the monitoring I’ve seen provided by the vendors doesn’t have alerting, so though the information is there it’s not used to its fullest.


It seems the most likely fail points in an inverter are the storage capacitors. Leading up to catastrophic failure there is a likely inverter degradation … particularly with non-micro inverters that are under more capacitor stress. There are some papers out there, e.g. Flicker-PVSC-Cap-Paper-Final.pdf that point to an output voltage ripple from the PWM that is proportional to the bus capacitance. i.e. a failing capacitor makes itself clear.

But of course nothing compares to some big data. Like panel degradation, which you seem to have pinpointed in your own dataset (@kevin1), you could assume that the inverter degradation would become apparent in a larger dataset and the anomalies (failures) would “pop out”.

Going over what we have (excuse the repetition and somewhat irrelevant sidetrack):

Solar panel installs normally come with a warranty that guarantee solar output over time.
When your solar output falls below that guarantee you should know (i.e. this is not only about “failure”)
The question is, how do you know when your output falls if you don’t have a baseline?
So the first question is: How do you establish a baseline?
Much as solar panel ratings are based on a generalized model, you can set the baseline to an arbitrary (but relevant) point which, one would think, should be based on the specific install location and panel rating, initial runtime and expectations from similarly installed systems & all their baselines.

The simplest idealized baselines would be, let’s say, in order of benefit (best-to-worst; fastest-to-slowest fault resolution)

You have 100 identical panels in an array with 100 micro-inverters.
[This system will always give you more potential confidence in failure analysis than a single-inverter system]

TRIVIAL CASE: If panels are individually monitored, all 100 panels should have similar output all of the time and meet a “peak” expectation else ALERT on those that are consistently lower (easy % confidence in this case).
Establishing catastrophic inverter failure (fast) vs “slow” panel degradation would be clear.
Inverter degradation would be harder to detect (vs. panel degradation) but either way the failing panel would eventually meet some ALERT threshold and be checked.
Physical panel degradation (dirt & dust & bird poop & snow & hail) is more likely to be evenly across multiple panels; micro-inverter degradation is either going to be slow across ALL panels or fast on a particular panel … these would seem to be easy metrics to work with.

HARDER CASE: An arbitrarily-chosen individually-monitored panels’ output should be 1/100th the output of the entire array else ALERT = Either the chosen panel’s output is failing (unlikely) or somewhere in the array there is a failure happening ==> cycle which individual panel is monitored until the fault/s are found.
(The normally-monitored individual panel should obviously be exposed to identical array conditions but would, itself, be “idealized” and “optimized" … it would get cleaned frequently, for example, and could be used to determine “is it worth cleaning the entire array?"; a micro-inverter upgrade could be applied to it to make determinations for the entire array.

SENSE SCENARIO: The array and inverter output (micro or non-micro) is monitored (Sense) but not any individual panel … After some period, max array output is established (peak cloudless sun). Subsequent max outputs fall below the expected panel degradation threshold = ALERT. There could well be inverter failure caught by the instantaneous speed of analysis but more likely (and with more confidence) would be the ongoing lower output that seems like it could only be established with confidence by looking at Sense-wide data. Meaning: in the Sense world, treat the ARRAY as a DEVICE – or, essentially, “40 similar Devices if you have 40 panels”. Do ML on the waveform.

In ML terms, the individual house array is a subset of Sense-wide solar systems being monitored. Data from the same Zip location on other arrays is highly relevant (as @dianecarolmark reiterates) . It’s easy to argue that precise coordinates for solar panel geolocation would be ML gold. Going beyond Zip in precision for Solar geolocation would be my strong suggestion to Sense and would yield a significantly more robust data set in terms of targeting failure alerts. In the solar world, ultra-precise geolocation would seem to be a necessity rather than an after-thought.
As much as solar panels (Sense-wide) could be used as CLOUD DETECTORS … with increased scale comes increased confidence in (array = panel/inverter) failure along with the speed of detection.

And going back to the trivial cases … in the meantime any “trivial case” data that can be injected into the system will be like SMART PLUG LEARNING for Sense Solar. @kevin1, your request for datasets showing inverter failure is a prime example of why establishing a baseline is so important … there is probably plenty of data already stored by Sense that would show those failures but is there any way to make those determinations without looking system-wide?

1 Like

@ixu, @dianecarolmark, @kdm,
Thanks for all the great suggestions. Your thoughts on potential modes of failure were helpful. Plus your ideas on analysis methods spurred me to look more closely for the patterns in my data, plus to look for other data sources for weather and solar production.

  • One of the biggest challenges in searching for degradation patterns is dealing with the somewhat random weather “noise” exposed in the “random” component of my decomposition above. For my geographic area, random fluctuations due to clouds, etc. are larger and more prevalent during the first 6 months of the year. I have been looking around for real-time local weather data on irradiance or insolation. The best I have found so far comes from NREL, but it is historic and not quite at the resolution (geographically or time wise) that I had hoped.

  • The best ML starting point is using any known analytical (vs. empirical) model for fitting first, in order to derive baseline parameters, and the picking the optimal features for machine learning analysis. Sense does this when they develop basic analytic models for motors, heaters, microwaves, resistive elements, etc. @ixu, I think that thinking of Solar Production as a device or set of devices is entirely appropriate.

  • The analytic models for solar irradiance and irradiation are actually quite complicated with primary, secondary and tertiary effects all contributing to a huge equation. There’s also a question of time scale and patterns - daily and yearly cycles can both be used for baselining, and off course the daily cycle has some dependencies on where one is in the year. I’m focusing on daily measurements in the yearly cycle first. Here’s the best reference I have found to daily and monthly irradiance, vs the yearly cycle.

Another thought I had about getting localized near-real-time solar radiation data is to access the wunderground weather station network. Some stations have the ability to report solar radiation, but there seem to be issues retrieving this data as you can see from this thread:

Another potentially interesting approximation is the UV index data that does seem to be readily available. I found this gentleman’s post about the relationship between UV index and solar irradiance as measured in watts per square meter:

I’m not sure how accurate reports of UV index are and whether they are reported in weather datastreams “as forecasted” vs “as measured.” I think this might be a red herring as I think UV index values are “as forecasted” and do not take into account cloud cover.

You may want to look at the weatherflow weather station. It is reasonably priced, includes a uv detector, and has a full api for getting the data. I’m believe you could monitor uv intensity at your home with it and create a model to account for local cloud cover.

Hope this helps


Energy & Light from all angles! This is good stuff.

Not to be a Luddite, but my inner Luddite still points to the solar array or a solar panel (or section thereof) that is identical to the array as being the best UV/cloud/snow/bird poop indicator … it will, after all, have an easily scalable response to the highly localized solar irradiance.

The real trick here is to not have to introduce another type of data stream (weather) since the data is already there within the Solar waveform.

Just as there have been recent reports of bugs on radar (oh the beauty!) I’m rooting for Weather Detection (and bugs for that matter) via Solar Arrays.

I thought the goal was to detect faults in the solar array like inverter failure, dirt/snow/bird droppings covering the panels, or measuring annual panel degradation? As such you want to have a separate estimate of local solar irradiance separate from the power produced by the panel, right?

Yes. Yes. Yes. No.

In the slightly less than ideal case you would “bless” a separately Sense-monitored identical system (same panel/s and location) and use that as a reference. Could be a neighbor’s system or one of the panels in your own array.

In the ideal case, limited to the CTs on a standard solar inverter output, you would infer that “blessed” part. The baseline is in the data … if the dataset is big enough (in space and time). You neighbors’ [non-identical systems] and the global (Sense Solar) dataset gets to work.

What I suggested previously, that the Solar data can be treated In the same way as the Mains. The goal on the Mains is Device disaggregation. The goal on the Solar is “information” disaggregation.

Information = bird poop; snow; locusts; inverter degradation and so on.

[Sorry if this starts to sound like it needs disaggregating itself!]

1 Like

Note that solar irradiance is the energy per unit area from the sun. It is independent of the system used to measure it. What I meant was that the output from your solar system from day to day is an estimate of relative solar irradiance at your location. One an compare this figure with my my neighbor’s relative solar irradiance from day to day. The two should be substantially identical and if they are not, this is a red flag.

I guess this didn’t come across in my original post.

BTW, from looking at even a single days solar production over the course of a day, a human can easily identify a sunny day from a cloudy day. I could imagine being able to specify a “sunny day” goal and have the system be able to adjust it for seasonality. On a daily basis, you would use AI classification to classify the solar production waveform as either “sunny” or “not sunny”. If the production waveform was identified as “sunny”, it would compare to the seasonally adjusted and perhaps age adjusted goal for production on a sunny day and alert if the goal wasn’t reached.

Yes all this is clear and we’re on the same page.

The challenge is with the resolution and scale of the data and so the resolution of the “issues”, as @kevin1’s chart demonstrates

Even without any irradiance or cloud data (weather) you can infer irradiance and (in theory) panel degradation. What’s harder to see (jumping to where I quote you above) in this chart is how you might infer, say, inverter degradation or that perhaps what seemed like a cloudy day was actually snow on the panels. The usefulness of course comes from rapid alerting to issues (“you’ve probably got snow on you panels”) so the challenge is compressing the timeframe over which you can make the inferences. To that extent even bigger datasets like weather prediction start to seem interesting but one step at a time.

I’m suggesting that the AI can classify each day’s daily output based on analysis of the half-second or even one-minute granularity waveform of production for the day. Here are three example waveforms.
A below

B below

C below

B is a mostly sunny day
A is a partly cloudy day
C is a mostly overcast day

If you have a mostly sunny day but output is lower than predicted for location and seasonality, then you’ve got either inverter or panel failure or snow on the panels. BTW, I think most people don’t do anything about snow on the panels, IIRC, but my sister is in Hawaii so I haven’t researched the issue at all. :slight_smile: