Simple Machine Learning EV Charge Detectors

Since I had already collected and tagged my combined Sense/utility data with EV charging events, I decided to try a simple experiment to see if I could build a “detector” to tell if one of my EVs (Model S and Model 3) was charging during a given hour. A few key points of context:

  1. Sense has been detecting the Model S fairly reliably throughout 2020, so far, but I don’t really have a fix on how reliably.
  2. Sense was detecting the Model 3 for a while in 2019, but kind of gave up in 2020. I haven’t really pushed for a fix, partially because it seemed every new Model 3 software release changed the charging profile.
  3. Both cars charge at 240V, one at up to 80A and one at up to 48A, so they have fairly distinctive patterns, except when we plug in and the battery is almost full. Occasionally the charging can get obscured by up to 15kW of heating/cooling/baseline usage.
  4. It’s pretty easy to detect hours where one or both cars are charging for the full hour because the Sense Total Usage number is just so big, though that doesn’t work for hours when the car is only charging for a partial hour. Plus there are a few hours that slip in because they have lots of HVAC and other activity.
  5. We usually charge each car at a fixed start time offset from the other one (1AM and 3:30AM), early in the morning when rates are cheapest. But not always, so we can’t rely on time to help us with “detection”.

What does EV charging look like ?
The chart below highlights points 4 and 5. There seems to be a frontier about where the yellow line is that mostly separates charging from non-charging hours, but it’s nowhere near 100% accurate for categorizing.

A close look look at Sense Model S EV Detection
How did Sense do identifying hours when my wife’s EV, the Model S, was charging ? The graph below shows EV Detection energy, the amount of energy Sense sees going to the Model S, vs. the Total Usage Sense is seeing. I have set all the hours where Sense did not see the Model S charging to -0.5kWh, just so we can separate out non-detected hours from near-zero, but real detections.

The red oval highlights hours where Sense saw the Model S as charging but it wasn’t (false positive). The green circle highlights that Sense is not mistakenly identifying ANY Model 3 charging hours as Model S charging (true negative). And the blue circle highlights hours where Sense did not detect the Model S charging even though it was (false negative). Not that two of the Model S charging hours were detected, but showed up as almost zero power, even though the Sense Usage was substantial.

If I create a quick table of occurrences, I can see the exact number of hours that fit into each category, where ‘Positive’ means Sense identified Model S charging for that hour and ‘Negative’ means it did not. The hours in blue are true-positives - Sense predicted the Model S was charging and it really was. The hours in red are true-negatives - Sense predicted the Model S was not charging and it wasn’t.

Screen Shot 2020-06-10 at 4.20.36 PM

Sense got identifications right for 113 hours (true-positives), but also detected 18 extra hours (false-positives) where the Model S wasn’t charging. It recorded 3730 hours where the Model S wasn’t charging and got that correct (true-negative), while showing 10 hours as negative, even though the Model S was charging. If Sense’s EV detection was a medical diagnostic, it would have a sensitivity of 113 / (113 + 10) or 92%. And it would have a specificity of 3730/(3730 + 18) of 99.5%


Improving “Simple Detection” Beyond Sense Total Usage
As I saw earlier, Total Usage per hour is somewhat helpful in discriminating hours when our EV’s are charging, but not always. There are hours, in red circles, where EV charging Total Usage is less non-charging hours, and there are hours in blue circles where non-charging hours exceed charging hours.

If I look at a few of these points, what I find is that they are cases where an EV was only charging for a small part of the hour, so the charging had very little impact on the Total Usage, even though the peak charging level was 2-10x of surrounding baseline for the hour. If had a way of pulling out the peak power for each hour, I would have a great “feature” for EV detection, but Sense doesn’t easily let me get at that. But I do have access, via my utility, to 15 min samples of usage, so I do have data for the the 15 minute max for each hour. If I do a plot of utility 15 min max usage vs. Sense Total Usage, I see the charging and non-charging hours begin to separate out, though the plot is a little bizarre because everything is above the 45 degree unity line. That’s not unexpected since the 15 min max usage extrapolated to an hour should always be above the hourly usage.

I can get a better human-discernible plot by plotting against the 15 min max for an hour divided by the hourly usage. The y-axis now represents how many time bigger the max 15 min is vs. the full hour.

Pretty cool ! It looks like I can separate the charging hours from the non-charging hours in this chart with a smooth curved line, but just barely. Given that, it’s likely I can build a reasonably high accuracy hourly detector using just those two features, though I should emphasize that this detector is highly dependent on my own home’s usage and very high draw EVs. It doesn’t look like I have enough features / resolution to separate the Model S vs the Model 3, though they tend to separate for hours where charging is the main power draw for the house (toward the bottom of the plot). Next step, build and test different machine learning prediction techniques !

BTW - this step of selecting data parameters that can be used to discriminate between the event we want to detect and the rest of the time is called “feature selection”

Training and Testing - Training
Now that we have selected some features from the Sense and utility data that seem to be able to separate EV charging from other usage, it’s time to try to train some different types of machine learning models to see which ones retain the ability to discriminate between charging and non-charging hours.

The first step is training with a little bit of testing indispersed throughout, to tell us which model is best for the next step, testing. The input to training looks like below - a 3,838 row long list of the two features, followed by the label for that pair. I don’t even need to include the date and time. This is probably not the best snippet to show because the two charging hours stand out like a sore thumb in terms of total usage. But as I have shown early, there are a bunch of charging hours that also “blend in” in terms of total usage.

I’m going to try to train 3 different model types using those just those two inputs, hourly Sense Total Usage and the ratio of the 15min Max vs.Total Usage. The three models I tried are:

  • a simple generalized linear model
  • a more complex linear model with Bayesian support
  • Random Forest, which is the jack-of-all-trades model - it will almost always work reasonably well, but won’t give you any clues as to why it works.

Here are the results of that training with a little bit of testing folded in, where a value of 1.0 is the model getting everything right 100% of the time.

There are some error bars because the training/testing sliced up the data into 10 different “folds” and some of the folds didn’t predict correctly fully 100% of the time.

Yet, all of the models worked well, though the Bayes GLM looks like it worked best at predicting the data in the training set. But predicting the data in the training set is kind of like drinking your own bathwater - you only predicted based on data you were training with. A good model has to predict the results for incoming data it has never seen before. Without testing on a separate set of data, there’s a chance that a model might overfit the training data - in other words, be so customized for predicting the training data outputs that it doesn’t predict well on other sets of data. I want to find out whether the Bayes GLM model generalizes well.

Next step - Test against 2019 data !

1 Like

Prepping the 2019 Data to Test My Simple Detector
Ideally, I would do two things to prep my 2019 Sense/utility data before using it for testing. I would:

  • Clean it - remove any hours I know to be bad because of things like Sense data gaps, etc. I’ll do some of that.
  • Label it - add information on when my EVs were actually charging. But I don’t have that data, in a ready form, and would have to do much of the labeling manually. Instead, I’m going to try to do what I can using only partially labeled data and automation to guide me to where I should do the manual labeling work.

The first thing I’m going to do is to look at my raw 2019 data in the same graph I used earlier, except I don’t have the labels that tell which EV was charging during that hour. Instead, at least temporarily, I’m going to use my 2019 Sense EV detections as pseudo-labels. Sense detections won’t be entirely correct as I we saw earlier, but even at 80% sensitivity and specificity for just the Model S, that would be a nice starting point. One other thing - in 2019 I had three different detections / models for Tesla charging, EV (Electric Vehicle - the current 2020 detection model), Tesla (which I think was a Model 3 Sense detector) and Model S. I deleted Tesla and Model S along the way, ostensibly to improve detections, by allowing a new better model to take their place. Here’s the same chart as earlier, but for 2019 and pseudo-labeled by the Sense detections, with my raw (uncleaned) 2019 data.

Pretty ugly, huh ? Looks like I need to do some more data analysis and cleaning before I dip into testing. Data on the y axis seems to go too high and low. It might also go too high on the x axis. But I do have one ace up my sleeve for cleaning - I can easily remove data where the Sense data diverges too much from my utility data. If I play with divergence threshold for removal, I reach a nice point where most of the ugly points disappear if I exclude data where Sense is more than +/- 350 Wh different from my utility. That only removes 65 data points out of 6620 but gets rid of all but 1 problematic datapoint in the graph. And I know from some of my previous accuracy studies that I have more data gaps in 2019 than 65 hours worth. Good tradeoff between getting rid of bad data and keeping useful data around.

This plot, after a little auto-cleaning looks very similar to the earlier 2020 plot, and many of the Sense detections look like they are where they should be. But some are also not where I would have expected, buried in the middle of the ‘Nones’.

Let’s first look at that remaining point where the y axis is less than 1 (max is less than average ?). Turns out that is actually a special case - 2019-11-03 01:00:00. In 2019, that’s the duplicated extra daylight savings time hour that comes from “falling back”. In Sense-land, it looks entirely normal, but it gets merged with the other 1am hour of PGE data.

Investigating the biggest 30kWh Total Usage Sense hours, they are all legit, so I don’t need to clean those datapoints.

Prediction and Testing Time
After removing the daylight savings time hour, it’s now time to use the BayesGLM model to predict which 2019 hours included EV charging. I won’t be able to exhaustively check since my 2019 doesn’t have either ‘Charging’ or ‘Car’ labels. My plan is to use the Sense charging datapoints plus “boundary datapoints” to do spot checking of my predictions. Here’s the same graph as earlier, but charting the BayesGLM predictions. You can see the clear “line” where the model predicts charging will begin.

The next chart is really an eye chart, but really highlights my simple detector’s predictions vs. Sense’s. The color scheme is the same as the earlier one in terms of Sense EV detections, and the shape indicates the prediction of my simple detector. Now I need to investigate points around the edges - in the red, green, and blue circles.

Excellent stuff … that deserves an extensive reply after some proper pondering.


FYI - I’m noodling over a time-series thresholding approach to detecting charging ramp using my 15min utility data, to label the 2019 data set, rather than going through the long manual labeling process (with me as the Mechanical Turk). Even if that labeling is not 100% correct, I could play the two prediction approaches against each other to isolate on the boundary data that matters, in a slightly different form of Adversarial Machine Learning (two predictors compete to get the right answer, instead of one generating and the other predicting).

But it’s not as easy as it looks. This week in May 2020 is a bit emblematic of the challenge. A bunch of Model S charges (18kW), some very short, one long, along with one medium duration Model 3 charge (11kW).

If had Sense’s granularity of data, I could probably pick a single usage threshold that would catch all the Model S charge. But a fixed threshold wouldn’t discriminate between Model 3 charing and noise in the house at other times.

Level Detection - Here’s the same thing with a 15min sample view, instead of the Sense 1/2 second resolution. The color of the dots actually highlights the charging state for that corresponding hour. What you will notice is that some the of short (timewise) Model S charging periods actually register less energy usage than some of the 15min non-charging periods the same day. So with a 15 minute sampling, one can’t rely on energy usage exclusively, even for the beefy Model S charging.

Edge Detection - And if I chart deltas (the difference between current usage and the previous 15min period), most of the charging ramps stand out, except for the Model 3 ramp, that comes just after the Model S finishes on May 2nd. It really looks like one long charging session.

So whatever I do with the time series approach, it is going to need to blend edge detection with level detection to give a complete picture.

1 Like

Trialing a Time Series Edge Detection Technique
So after a couple of wasted evenings on different edge detection approaches using the delta between the current 15min utility reading and the previous one, I determined three things:

  1. It’s impossible, at least in my house, to set separate detection thresholds that find all the positive and negative car charging edges, based on just the the 15 min delta values alone. There are just too many other heating and cooling events that can masquerade as car charges based on the limited time resolution data I have.

  2. There are no simple time series approaches to filtering out heating and cooling events - they are just not regular enough. I can almost filter out the cooling events using data from my Ecobees, but I haven’t cracked the access to the data hidden in my NuHeat flooring heaters.

  3. So my best approach has been to set the delta thresholds low enough to catch all the car charging event edges, as well as many extras, then use other parameters to filter out all the unlikely pairs of positive and negative ramps. The most promising technique seemed to be to look at every positive spike that passed the threshold, match it up with the nearest and next nearest negative spike, then look at the time distance (NextNeg) and the mean energy consumed between the positive and negative going time intervals (PeakMean).

Doing #3 for my 2019 Sense data merged with my 15min utility data gave me an interesting chart. This chart shows the two parameters I mentioned earlier for every positive spike / delta that was the 1 hour equivalent of 6kWh (remember I’m looking at 15 min increments, but all the power/energy numbers I have extrapolated to 1 hour equivalents to simplify my thinking). NextNeg tells how many 15min periods it is to the first 6kWh downward spike / delta. And PeakMean tells me what the average energy used in the whole house was during that period.

Next I’ll filter in two ways. My cars don’t charge longer that 7 hours of so NextNeg for any charging must be less than 28 15 min periods. And I can apply a simple formula for the minimum mean energy that must be expended to charge the Model 3 to exclude other points, leaving only the “good” charging candidates in blue-green below.

It turns out after a little sniffing at a few of the boundary points, the “charge” in magenta is a false (not EV charging) positive spike that piggybacks on a real charging cycle an hour or so later. It also looks like the curves break out into two separate pattern groups with different average energy uses that seem to align with the two different EVs. Gotta investigate that later.

If I roll these 190 “Good” charge cycles / points back into hours (remember that many charging cycles cross multiple hours), I get 386 hours of charging that I can use to annotate my earlier charging detection curve of training features. I get a result that looks somewhat reassuring, but not completely.

Most of the hours that my new algorithm identified as charging hours fit my old parametric curve nicely. But a small number are buried inside of the “non charging” zone and a bunch of “non-charging” hours per my new algorithm live in the charging region of my old parametric curve. A quick confusion table shows me how the two approaches have done with respect to each other:

Screen Shot 2020-06-18 at 11.45.08 PM

They both agree on 339 hours as hours when charging is taking place and agree on 5707 hours when both are certain charing is NOT taking place. But my GLM model thinks charging is taking place during 67 hours that the new time series approach does not. And the time series approach see 41 hours of charging that the GLM model does not agree with. Such is the nature of machine learning and prediction. But which one is right and are they always right when both concur ? Time to look at some of the points of disagreement.


A Peek Into the Time Series Approach
Before moving on, I should probably give a little bit more insight into my time-series approach that seems to work. The graph below gives some insight into how it operates. The red line is the 15min total energy usage from my utility extrapolated to a 1 hour value (I multiply the total usage for 15min x 4 so I can work with the 1 hour equivalent). I flag all deltas above a 6kWh equivalent change in 15min (vertical edges in blue). I then match up each positive edge with the nearest successive negative edge and look at the time distance between the two as well as mean energy used during the whole interval. If the distance is too far (more than 7 hours) or the mean is too low to be a car charging I reject the interval, leaving only “Good” charging intervals with the blue dots on top showing the mean.

The graph above shows two important things. First, the positive spike / delta on May 8th gets rejected because it’s nearest partner negative spike is 121 intervals way and the mean is too low. Second, this algorithm seems to work for back-to-back chargings of both EVs.

And here’s a slightly altered version of the chart I used earlier to help cull the “non-Good” charging intervals. All I did was reject any charge intervals that were greater than 7 hours and ones that didn’t have a mean energy usage that met the minimum charging rate for my Model 3, pre supposing no other energy usage in the house. The red line shows that minimum possible hourly charging energy for the Model 3 at 48A for a given number of charging periods assuming that the first and last 15 min periods have at least 2 1/2 minutes of charging. The blue line represent the the same for the Model S at 80A.

This chart will likely help me separate Model 3 and Model S chargings (as well as Boths) in the future. I also noted earlier that that the point in magenta is not good, even though it passed my tests, so perhaps a couple more border points slipped through. I’m going to examine the orange point as well.

I’m also going not need to go through the months of 2019 from a timeline perspective and see if I see anything weird. The goal is going to be to compare details in the Sense month against the details from my utility, as well as looking into places where my automated selections in blue look inconsistent.

Bottom line - for all you folks out there who claim that Sense could use some logic or a few heuristics to define or refine detections, you’ve got another thing coming… Not as easy as it looks.

Here’s a Test For All the Folks who Say "It’s Easy to Spot My Device"
I plotted my 35 weeks of 2019 with my time-series algorithm overlaid. 11 of those 35 weeks showed possible algorithm anomalies that I need to investigate. Posting all 11 of those here, together with 1 clean week, just to see how good people really are at accurately spotting big devices in action.

Which week looks error free, and what’s wrong with all the rest ?

1 Like

November 23-30 looks tricky.
Thinking … thinking …


Fine Tuning My Time-Series Charging Detector
I see three issues with my algorithm’s detections during these weeks:

  1. The week of 2019-07-27 shows two missed back-to-back Model S / Model 3 charging sessions. That happened because the delta threshold for the positive edges for those two are 5.26 and 5.56 kWh, lower than the 6kWh I arbitrarily chose.

  2. A bunch of the weeks above show short (less than 15 min) spikes that could be charging or could be other things, because the mean energy still fits the minimum energy that might be needed by the Model 3 charger for a few minutes. But I can’t set the energy bar too high, or I’ll miss some short charges. The Model 3 “rump charging session” below had a mean energy of about 4kWh, though the actual usage was closer to 7.2kWh, because my algorithm saw it spread between two 15min intervals.

  1. I also see a couple of back-to-back positive edges, and subsequent charging cycles, that my detector catches, but the second charging cycle is questionable. 2019-12-07 shows one of these.

I’m going to try playing with my threshold and energy parameters to make sure I catch all legitimate back to back charges in my data set, while improving the energy filtering (if possible). And still need to look closely at #3.

Moving the the thresholds to 5.2kWh catches the two back-to backs during the week of 2019-07-27, alleviating the most egregious error I saw in my algorithm.

But the lower thresholds lets through more “low energy” charging candidates, so I have also tightened up my energy criteria. Not going not go into exact details, but it focuses on tightening the time window for potential charging sessions.

Even with the tightened criteria, it seems like lots of questionable charging cycles slip through on the short (15min) side. For instance, the two small peaks with dots on the top on the right hand side, below, are clearly not charging events, even though they fit the threshold, window and mean energy criteria.

The problem is that those two are impossible to discern from the real short charging event for the Model 3 on Dec 16th I detailed in the screenshot earlier. The same short charge is circled in the 15min waveform below.

So I may just need to scrub the low energy events manually. If they are a charging session, they are essentially plugging the car in, ramping up, then having the charger ramp back down again.

If I do the same comparison between my GLM machine learning model and my time series model, things look a little different.

But a the new confusion table gives more useful information about what happened.

3 more orange (non charging) dots/hours in what the GLM model thinks is the ‘charging zone’, in exchange for 23 fewer green (charging) dots/hours in the GLM model’s ‘not charging zone’.

Once again, who is right ? Gonna take more looking at those misplaced dots :wink: Machine learning can be tedious…

1 Like

What’s the Real Difference Between the Two Models

One good way to see which model is coming up right and which one is coming up wrong, or why the differences exist, is by looking at both, superimposed on the 15min waveform. I have added a green bar at the bottom of my earlier plots to show time regions that my GLM model thinks are charging hours. Below is the “Rogues Gallery” of differences. But one huge potential cause for differences is the difference in time bases. My time-series analysis has the benefit of a 15min time resolution for detection, while the GLM model only looks at hours, though one feature includes data from the 15min sampling. This means that detections are guaranteed not to be exactly aligned in time, even after a 15 min detection is extended into the entire hour.

The magenta below highlights two short spikes that the GLM model sees as "charging’, but the time-series approach does not. The green circle highlights the opposite - a short charging cycle predicted by time-series, but not by GLM. We’ll see who is right on those later. There’s also a clue to time misalignment errors in the orange circle. It looks like the time-series model is accurately aligned with a real charge cycle, but the GLM approach only catches half the cycle.

If I consult the high-res Sense waveforms of the same sequence, I can see that both algorithms are wrong with their short-cycle predictions. Neither the magenta circles nor the green circle are real charge cycles. I suspect that’s where most of the differences and errors come from - short cycle detections.

Here are a couple of cases in magenta when spikey and relatively high energy waveforms trigger the GLM, but not the time-series algorithm.

Here’s another case of an incorrectly triggered GLM charge cycle. But we’re also seeing a case where the time-series fails to catch the second back-to-back charge cycle while he GLM nails it. The time series approach misses, because it relies on a negative spike/delta as a delimiter, but in this case the time period is just too short to register a negative spike.

And finally a few more cases where each algorithm probably predicts a false charge cycle, that the other doesn’t. Time-series incorrectly predicts the up down spikes in the green and GLM gets the two in magenta wrong (I think). And I’m going not need to take a closer look at the stuff in orange. It has the makings of a back-to-back charge that gets rolled into a single by both algorithms.

Given these results, I’m tempted to move in a new direction on detection and automated labeling of my data:

  1. Add a fix to my time-series algorithm for back to back charging without a resolved negative spike. Either insert a “helper negative spike” or take care of the special case of a positive spike followed by another positive and a negative followed by another negative, though there are many of those that are not charge cycles.

  2. Start ignoring single 15min period charge events for all intents and purposes. They don’t use much energy (relative to real charge periods), and are virtually indistinguishable from other types of spikes given my 15 min time resolution. Not ideal, but a rational trade off to increase accuracy for all prediction models. Why try to identify them if my “microscope” doesn’t have sufficient resolution ?


A Little More Tuning
Based on the last round of results, I tweaked the positive and negative spike detector to deal better with all the variants of overlapping charging, plus set a higher bar on the minimum energy criterion for charging. The combination results in improvements, especially in two of the 4 weeks I had identified earlier. Back-to-back charging is identified more accurately and some of the false short charging sessions have disappeared, perhaps the the expense of a couple of legitimate ones.

I’m now viewing my time-series detector as relatively accurate, though it is highly tuned for just my house and my combination of EVs / charging levels. We’ll see how robust it really is in a bit. But first I’m going to roll the time-series results back into my GLM model environment one more time.

It looks like more of the time-series charge hours have moved out of the GLM’s “no charging zone”, but there also appears to be a less clear boundary.

And the confusion matrix confirms what the graph shows. If we assume the time-series detection (True/False) is more accurate in prediction than the GLM model, then we are seeing the increased accuracy of the time-series model highlight 5 more hours where the GLM model incorrectly detects charging, when it isn’t happening. But at the same we have gotten down to only 7 hours where the time-series sees the car as charging, but the GLM sees it as not charging. I f we look at the time-series as completely correct, then the GLM model has a sensitivity of 331/ 331+7 or 98%. And it has specificity of 5741 / 5741 + 75 or almost 99%.

Now it’s time to roll 2019 and 2020 together ! First step - expand the time-series algorithm to 2020.

Comparing All Detections !
Now comes the fun part. I have merged 2019 and 2020 and pulled together detections from all my different sources:

  • Sense - 3 different detections from different models at different points in time (orange = EV, purple = Model S, magenta = short-lived Tesla detections). For these, the colored dots show the energy level detected for those hours.

  • My simple GLM-model machine learning detector - dots in green. Even though it is simple, it is highly custom for my house. It hasn’t been generalized for any other homes.

  • My time-series based, hand-tweaked detector. For these two years, it’s almost 100% accurate on detecting charging (I still see one visible error), but doesn’t posit charging level (yet). Plus it’s only that accurate if we ignore extremely short, low energy charging events, some that I can’t even visually resolve. Note that I can’t do confusion matrices for each of the Sense detections since I wasn’t very careful about tracking when the specific device models were added and deleted (EV/Electric Vehicle is my only current surviving device model). Once again, this model is highly custom and not generalizable to any other homes or EVs.

Here are some of the most interesting 2 week periods.

EV (orange) doing its job today. It really only triggers with the Model S, and predicts about half the energy actually used. The GLM model does a good job with these two weeks, finding all charging events and avoiding other spikes.

A few mistakes by both the EV detection and the GLM detector.

Here’s a case of both Model S (purple) and EV (orange) predicting against one another. Energy predictions are typically way low.

Another similar 2 week period.

Model S (purple) does a good job discriminating between Model S and Model 3 charging cycles. But it also comes up with a fair number of low energy false positives…

One of the few shots of the Tesla detector (magenta) in action. It aligns with Model 3 charging, and seems fairly specific to just Model 3 events, plus it nails the energy fairly closely.

Sharing this around with folks here, @kevin1. Very cool. Mohammad will be poking around this thread :slight_smile:

1 Like

For one more fun experiment, I’m going to try to program my time-series model to categorize which car is charging based on the energy footprint over the “charging window”. I’ve tweeked my energy curves just a bit more, mainly to split the points in the curve below in what looks to be natural dividing lines.

Now I can compare against the “golden charging” data for 2020 where I tediously visually tagged every hour that looked like it had a car charging event. Note that I only identified a few hours as ‘None’, ones that had a big spike but proved not to be a charging event. I can merge my “golden” hourly classification data with my 15 min time-series classification data, but that ends up in a very unsatisfying confusion matrix. Why ? Because a “golden” hour labeled ‘Model 3’ merges with 4 different 15 min periods, of which only 1 might have been an actual ‘Model 3’ charging event for my time-series detector, and 3 'None’s.

Here’s the actual confusion matrix for all the 15 minute intervals common to the two datasets - vertically are my “golden” hourly classifications spread across all 4 15 min periods, horizontally my time-series detector which operates on 15 min periods.

Screen Shot 2020-06-30 at 10.52.25 PM

If everything was perfect, we would just see values along the diagonal, where the 118, 282 and 71 are. But with the merge of two different time bases, it’s really OK to have a row where my time-series detector picked up 2 15 min periods of the ‘Model S’ but also 2 15 min periods of ‘None’, so the 2, 74 and 195 in the ‘None’ column don’t really concern me. The only two real issue might be the 31 differences of opinion on ‘Model 3’ vs. ‘Model S’, plus the one place where I saw ‘None’ but my time-series detector saw a ‘Model 3’.

If I look at a few of the waveforms, it becomes apparent that my time-series detector is probably better at classifying which car was charging, than I was when I did my “golden” labeling. Here are a couple of fun waveforms. My time-series detector results in the dots on the top of the waveform, my “golden” labeling in the line across the bottom. Notice that the detections line up fairly well though occasionally they are different colors (1 or more of those 31 differences)

Here’s a two week period where things line up really well.

Here’s one where I need to look at my time-series detector since it looks like it made a couple of mistakes with that first charging event.

A little more tuning of my edge detectors cleans up the time-series mistake on Jan 20, so now the detection and classification lines up perfectly with the “golden” data.

The confusion table also reflects the small improvement, moving the 31 ‘Model S’/‘Model 3’ conflations down to 21 15 min periods.

Screen Shot 2020-07-01 at 9.47.37 AM

1 Like