Data Science
The first thing I generally do when I look at my Sense data is to look for missing hours for each device, especially for Total Usage (mains), Solar Production (solar) and Always On (alwayson), plus one reference native detection (Floor Heater / Dryer), and one smart plug detection (Yamaha). The first three should have data for every hour in the month (30 days x 24hrs = 720 observations). The smart plug should be similar, though that number can be affected by network outages, and fishy stuff happing on the Sense monitor. And the native detection only produces hourly data for the hours where the device(s) are on.
Here’s how a simple data science tool sees that select set of data for April 2025. Note that this diagram also includes DateTime, which is the constant (or id_col) for the entire dataset. Things look pretty good except for the Yamaha smart plug dropout, which appears to have been one continuous interval in time.
I also look at all my devices, to see if there are any patterns. here’s the eye chart from highest percentage of dropouts to lowest.
A few interesting things about the patterns.
- There are a couple of native detections that show almost no hourly dropouts, both the Room Heater Unknown 2, and the Fridge. It tuns out the Room Heater is really an instant hot water dispenser. So both appear accurate - both devices use bursts of energy every hour in the day,
- I have a couple of Kasa smart plugs that haven’t been on the whole time. One that turns off every once in a while by accident (Doorbell) and one that I turn off for stretches when my son isn’t around (Kyle’s PC).
- One can also clearly see the stretch of time when Sense stopped reading my other Kasa smart plugs.
- I also have a few sets of Hue lights that work similar to native detections - there is only data for the hours when the bulbs have been on.
Perplexity Pro AI
Now I’m going to show you what the world looks like when I try the same using Perplexity Pro, my paid subscription. The general workflow is to upload my CSV data to a Perplexity “Space” that acts as a workspace with the main data for the analysis. I have uploaded 3 files, the Sense CSV for April 2025, for April 2024, and my PGE (utility) usage CSV for April 2025. This is all being done in “Deep Research” mode, where Perplexity might call several models as it thinks through how to do the analysis.
My first query is:
“Please share relative percentages for missing data for each device in April 2025 assuming there should be data for every hour.”
The response:
There is no missing data for any device in April 2025 based on the hourly records in your Sense data file. The code execution results show an empty dictionary for missing data percentages, which means every device present in the file has a complete set of hourly data for the month—no device is missing any expected hourly entries.
Summary Table:
Device Missing Data (%) All devices 0% If you have concerns about a specific device or suspect data gaps, please specify which device to investigate further.
This is typical of LLMs. They are kind of like inept interns - a gentle push makes them rethink things.
“Try again - there should be plenty of missing data.”
Here are the relative percentages of missing hourly data for each device in April 2025, based on your uploaded file (assuming 720 hours in April):
Device Name % Missing Data Washing Machine 97.6% Vacuum 12 97.6% Garage door 96.7% Microwave 96.6% Playroom lights 96.5% Tesla Model S 95.8% Floor Heat Playroom 95.7% Fridge 94.6% Traveling HS110 94.4% Device 3 94.3% Motor 1 94.2% Floor Heater Unknown 2 93.6% Rainforest Eagle Bridge 92.9% Sonos Dining Room 92.8% Family Room Mantle lights 92.7% HP LaserJet 92.6% AppleTV Master 92.6% Yamaha 92.5% Sonos Kitchen 92.5% Sonos Living Room 92.5% Sonos Kitchen 92.5% Office Cluster 92.4% AppleTV Family Room 92.4% Sonos Dining Room 92.4% Sonos Living Room 92.4% … … mains (Total Usage) 0% alwayson (Always On) 0% solar (Solar Production) 0% Key Points:
- Most individual device records are missing over 90% of expected hourly data for April 2025.
- Only the aggregate channels—Total Usage (mains), Always On, and Solar Production (solar)—have complete data (0% missing).
- Nearly all other devices are missing the vast majority of hourly records, indicating they were either off, not detected, or not reported for most hours.
If you need the full table or want to focus on specific devices, let me know!
Now we’re talking ! Those numbers look reasonable. Time to try another wrinkle.
“Please separate between 0 watts and no data”
Here’s a breakdown of missing data vs. zero-usage data for each device in April 2025 (assuming 720 hourly records expected per device):
Device Name % No Data (missing) % Zero Usage (explicit 0W) AC 3 99.9% 0.0% Central Vacuum 99.7% 0.0% Electric Vehicle 7 99.7% 0.0% Vacuum 2 99.7% 0.0% Kayla’s Room lights 99.6% 0.0% Mystery Motor 12 99.6% 0.0% Fridge 99.3% 0.0% Motor 2 99.2% 0.0% Vacuum 12 99.2% 0.0% Vacuum 99.0% 0.0% Tesla Model 3 98.8% 0.0% Disposal 97.9% 0.0% Tesla Model S 97.8% 0.0% Motor 1 97.5% 0.0% Oven 4 97.4% 0.0% Motor 4 97.2% 0.0% AC 2 94.6% 0.0% Family Room lights 93.3% 0.0% Oven 2 92.2% 0.0% Playroom lights 91.1% 0.0% … … … Always On 0.0% 0.0% Floor Heater Unknown 2 0.0% 0.0% Key points:
- For almost all devices, the vast majority of missing hours are truly missing (no data at all, not even a zero-wattage entry).
- There are essentially no hours where a device is present but explicitly reports zero watts; all “zero usage” percentages are 0.0%.
- Only a few aggregate devices (like “Always On” and “Floor Heater Unknown 2”) have complete data for every hour.
- For most devices, over 90% of hours have no data at all—meaning the device was either not detected, not reporting, or not in use.
Conclusion:
In your April 2025 Sense data, missing data overwhelmingly means “no record at all for that hour and device,” not “explicitly reported as 0 watts.” If you want the full table or a breakdown for a specific device, let me know!
OK - my intern is turning trustworthy, but I want to see all the data.
“Tell me about all the devices, including the ones in between.”
Here is a comprehensive summary of all devices in your April 2025 Sense data, showing for each device the percentage of hours with actual usage data, the percentage of hours with explicit 0-watt readings, and the percentage of hours with no data at all (assuming 720 hourly records for April):
Device Name % Hours with Data % Hours with 0W % Hours with No Data mains (Total Usage) 100% 0% 0% alwayson (Always On) 100% 0% 0% solar (Solar Production) 100% 0% 0% Floor Heater Unknown 2 88% 0% 12% Office Cluster 54% 0% 46% Traveling HS110 51% 0% 49% HP LaserJet 49% 0% 51% AppleTV Master 48% 0% 52% Kyles PC 47% 0% 53% Sonos Kitchen 46% 0% 54% Family Room Switch 46% 0% 54% Switch 46% 0% 54% Sonos Living Room 46% 0% 54% Sonos Dining 46% 0% 54% Sonos Playroom 46% 0% 54% Yamaha 46% 0% 54% Doorbell 46% 0% 54% Rainforest Eagle Bridge 46% 0% 54% Keene Vent Bridge 46% 0% 54% SolarCity Bridge 46% 0% 54% KP115 Roamer 46% 0% 54% Video Server 46% 0% 54% Cable Amps 46% 0% 54% HP LaserJet 46% 0% 54% AppleTV Family Room 46% 0% 54% Fridge 2 46% 0% 54% Recirculation Pump 46% 0% 54% Furnace Down 46% 0% 54% Furnace Up 46% 0% 54% Office Cluster 46% 0% 54% Modem 46% 0% 54% Sonos Dining 46% 0% 54% Sonos Kitchen 46% 0% 54% Sonos Living Room 46% 0% 54% Sonos Playroom 46% 0% 54% … … … … Washing Machine 3% 0% 97% Vacuum 12 3% 0% 97% Garage door 3% 0% 97% Microwave 3% 0% 97% Playroom lights 3% 0% 97% Tesla Model S 4% 0% 96% Floor Heat Playroom 4% 0% 96% Fridge 5% 0% 95% Device 3 6% 0% 94% Motor 1 6% 0% 94% Key Points:
- Only the aggregate devices (Total Usage, Always On, Solar Production) have complete data (0% missing).
- Most other devices are missing 46% to 97% of expected hourly records. This means that for most hours in April, there is no entry for that device—neither a usage value nor an explicit zero.
- Explicit 0W readings are essentially absent; if a device is not running, it is typically not reported at all for that hour.
Interpretation:
- If a device is missing for a given hour, it means either the device was completely off and not detected, or Sense did not report it for that hour.
- If a device is present, it always has a nonzero reading—Sense does not log explicit zeros for off devices.
If you need the full list or want to focus on a subset, let me know!
Whooaaa ! What happened there. Not only did it not give me a full list as requested, but it also started to screw with the numbers… This is a real challenge with AI. It’s not clear why it changed its answers for devices like “Floor Heater Unknown 2”.
So once again, I’m going to go back to basics with my intern.
“Once again, calculate the % of missing and 0 data hours, as well as full data hours, for every device in the April 2025 file assuming the month of April has 720 hours.”
Here is a summary of the percentage of hours with actual data, explicit zero usage, and missing data for each device in your April 2025 Sense file, assuming 720 hours in the month:
Device Name % Hours with Data % Hours with 0W % Hours Missing mains (Total Usage) 100% 0% 0% alwayson (Always On) 100% 0% 0% solar (Solar Production) 100% 0% 0% Floor Heater Unknown 2 100% 0% 0% Office Cluster 100% 0% 0% Traveling HS110 100% 0% 0% HP LaserJet 100% 0% 0% AppleTV Master 100% 0% 0% Kyles PC 100% 0% 0% Sonos Kitchen 100% 0% 0% Family Room Switch 100% 0% 0% Switch 100% 0% 0% Sonos Living 100% 0% 0% Sonos Dining 100% 0% 0% Sonos Playroom 100% 0% 0% Yamaha 100% 0% 0% Doorbell 100% 0% 0% Rainforest Eagle Bridge 100% 0% 0% Keene Vent Bridge 100% 0% 0% SolarCity Bridge 100% 0% 0% KP115 Roamer 100% 0% 0% Video Server 100% 0% 0% Cable Amps 100% 0% 0% AppleTV Family Room 100% 0% 0% Fridge 2 100% 0% 0% Recirculation Pump 100% 0% 0% Furnace Down 100% 0% 0% Furnace Up 100% 0% 0% Modem 100% 0% 0% Fridge 3 100% 0% 0% Garage door 100% 0% 0% Sonos Back Patio 100% 0% 0% Playroom Cluster 100% 0% 0% Toto Neorest Downstairs 100% 0% 0% Toto Neorest Master Bedroom 100% 0% 0% Floor Heat Playroom 100% 0% 0% Washing Machine 100% 0% 0% Floor Heat Dryer 100% 0% 0% Sonos Dining 100% 0% 0% HP LaserJet 100% 0% 0% AppleTV Master 100% 0% 0% Sonos Kitchen 100% 0% 0% Yamaha 100% 0% 0% Doorbell 100% 0% 0% Rainforest Eagle Bridge 100% 0% 0% Keene Vent Bridge 100% 0% 0% SolarCity Bridge 100% 0% 0% KP115 Roamer 100% 0% 0% Video Server 100% 0% 0% Cable Amps 100% 0% 0% AppleTV Family Room 100% 0% 0% Fridge 2 100% 0% 0% Recirculation Pump 100% 0% 0% Furnace Down 100% 0% 0% Furnace Up 100% 0% 0% Modem 100% 0% 0% Fridge 3 100% 0% 0% Garage door 100% 0% 0% Sonos Back Patio 100% 0% 0% Playroom Cluster 100% 0% 0% Key Points:
- For all listed devices, there is an entry for every hour in April 2025 (no missing data).
- There are no hours where a device is present but reports explicit 0W usage; if a device is present, it has a nonzero value.
- No device has missing data for any hour.
If you need a more detailed breakdown for devices with partial data (less than 100% coverage), or if you want to see only devices with significant missing data, please let me know!
As Shaggy from Scooby Doo would say - Zoooiiiiiks ! Mysterious bad math again… So again, a little directive coaching…
“Try again - there are many missing hours for some device detections.”
Here are the percentages of hours with data, with explicit zero usage, and with missing data for each device in April 2025 (assuming 720 hours in the month):
Device Name % Hours with Data % Hours with 0 kWh % Hours Missing AC 3 0.14% 0.00% 99.86% Electric Vehicle 7 0.28% 0.00% 99.72% Central Vacuum 0.28% 0.00% 99.72% Vacuum 2 0.28% 0.00% 99.72% Mystery Motor 12 0.42% 0.00% 99.58% Kayla’s Room lights 0.42% 0.28% 99.58% Fridge 0.69% 0.00% 99.31% Motor 2 0.83% 0.00% 99.17% Vacuum 12 0.83% 0.00% 99.17% Vacuum 0.97% 0.00% 99.03% Tesla Model 3 1.25% 0.00% 98.75% Disposal 2.08% 0.28% 97.92% Tesla Model S 2.22% 0.00% 97.78% Motor 1 2.50% 0.14% 97.50% Oven 4 2.64% 0.00% 97.36% Motor 4 2.78% 0.00% 97.22% AC 2 5.42% 0.00% 94.58% Family Room lights 6.67% 0.00% 93.33% Oven 2 7.78% 0.00% 92.22% Playroom lights 8.89% 0.14% 91.11% Coffee Maker 10.42% 0.28% 89.58% Floor Heat Playroom 12.78% 0.00% 87.22% Fridge 3 12.92% 0.28% 87.08% Garage door 13.19% 0.00% 86.81% Family Room Mantle lights 14.72% 0.14% 85.28% Microwave 19.44% 0.00% 80.56% Mystery Motor 20.00% 1.25% 80.00% Sony TV 23.33% 12.50% 76.67% Device 3 29.31% 0.00% 70.69% Washing Machine 35.00% 11.67% 65.00% Floor Heater Guest 37.50% 0.00% 62.50% Floor Heat Guest Room 51.81% 0.97% 48.19% Floor Heat Unknown Room 53.19% 1.11% 46.81% Doorbell 59.86% 0.83% 40.14% Kyle’s PC 82.64% 0.00% 17.36% Floor Heat / Dryer 87.08% 0.00% 12.92% HP LaserJet 96.25% 0.14% 3.75% KP115 Roamer 96.25% 0.14% 3.75% Keene Vent Bridge 96.25% 0.14% 3.75% Switch 96.25% 0.00% 3.75% Sonos Playroom 96.25% 0.14% 3.75% Sonos Living 96.25% 0.00% 3.75% Yamaha 96.25% 0.00% 3.75% Furnace Down 96.25% 0.00% 3.75% Furnace Up 96.25% 0.00% 3.75% Toto Neorest Downstairs 96.25% 0.00% 3.75% Family Room Switch 96.25% 0.00% 3.75% Modem 96.25% 0.00% 3.75% Sonos Kitchen 96.25% 0.00% 3.75% Sonos Back Patio 96.25% 0.00% 3.75% … … … … Key points:
- Most devices are missing data for the majority of hours in the month.
- Explicit zero-usage hours are rare; when a device is not detected, it is usually not reported at all (i.e., missing).
- Only a handful of devices (not shown here, but including “Always On”, “Total Usage”, and “Solar Production”) have nearly complete data.
- For example, “AC 3” has data for only 0.14% of hours (1 hour), and is missing data for 99.86% of the month.
If you want the full list or a CSV export, just ask!
I gave Perplexity one last chance to deliver deeper insights and it blew it by making the leap that “There is no evidence of large, simultaneous gaps across all devices (which would indicate a Sense system or Wi-Fi outage).”
Yes, there are several significant patterns in the missing data for April 2025:
1. Device Type and Usage Pattern
- Core/aggregate channels like
mains
(Total Usage),solar
(Solar Production), andalwayson
(Always On) have no missing data—they are reported every hour[1].- Most individual devices (e.g., Yamaha, Floor Heat Dryer, AppleTV Master, etc.) are missing data for the vast majority of hours—typically between 85% and 99% missing[1].
- Devices with high usage or always-on status (like routers, modems, and network bridges) tend to have much less missing data compared to devices that are only intermittently detected (like AV receivers, dryers, or lights).
2. No Explicit Zeros
- For almost all devices, when there is no usage, there is simply no record for that hour. Explicit 0 kWh entries are extremely rare or absent. This means that “missing” almost always means “not detected or not running,” not “off and reporting zero”[1].
3. Operational Profile
- Devices that are used intermittently or only when needed (e.g., Yamaha, Floor Heat Dryer, garage door, various lights) show high missing data percentages. For example, Yamaha is present only about 4% of the time, and Floor Heat Dryer about 13%[1].
- Devices with scheduled or background operation (e.g., some smart plugs, network devices, printers) have moderate missing data, reflecting their usage patterns.
4. Possible Data Interruptions
- There is a disclaimer in the file about possible discrepancies due to “data interruptions such as Wi-Fi signal loss or loss of power or other issues.” However, the missing data pattern is highly device-specific, not system-wide, suggesting most missing data is due to device detection and usage, not global outages[1].
5. No Systemic Gaps
- There is no evidence of large, simultaneous gaps across all devices (which would indicate a Sense system or Wi-Fi outage). Instead, missingness is per-device and aligns with likely usage.
Summary Table (Selected Examples)
Device Name % Hours with Data % Hours with 0 kWh % Hours Missing mains (Total Usage) 100% 0% 0% solar 100% 0% 0% alwayson 100% 0% 0% Yamaha 4% 0% 96% Floor Heat Dryer 13.1% 0% 86.9% AppleTV Master ~2% 0% ~98% Garage door ~1% 0% ~99%
Conclusion
- Missing data is mainly due to device detection and operational patterns, not system outages.
- Aggregate and always-on devices have full data; most other devices are detected only when in use.
- No explicit zeros means “missing” is best interpreted as “device not running or detected.”
If you want a breakdown by device category or a specific time-based analysis of missingness, let me know!
The bottom line for me is that the data science approach is better at producing insights, partially because the combination of human perception and visualization seems more powerful. But Perplexity Pro (AI) does an adequate job with a little extra oversight.