Could open source ML algorithms be in Sense’s future?

jamesroxford · March 14, 2019, 10:09pm

As the Sense community grows, and the fields of machine learning and data analytics grow, could there one day be an opportunity to allow community contributors to create/update the detection algorithms.

One of the most common requests I’ve seen is for manual labeling of devices. This is hard in an ML environment (based on forum responses), but what if there was a way for power user volunteers, who possess the needed skills, to help augment this effort? Similar models exist already for large software development products (eg, opensource.com)

kevin1 · March 14, 2019, 10:56pm

It’s a good question. Ultimately in machine learning, the eventual algorithms are determined by the real world dataset with embedded ground truth (feedback on the the right answer). I’m fairly certain that most of us haven’t signed up to have our data open-sourced. And even if we could try a model that we created (more on that later) on only our own data, to forestall data privacy issues, the ultimate model(s) would likely be fairly useless for homes outside our own.

Another option might be to enable user-built models, developed in whatever machine learning framework Sense is using to be inserted into the Sense environment trained/tested against their aggregated data set. That would require the users to have a deep understanding of the Sense development, dataset and validation environment, as well as someone to foot the bill for fairly expensive (computationally and monetarily) training runs.

Quite honestly, there are a number of open source machine learning frameworks and power disaggregation datasets available today for someone who wants to play around with the basic concepts. Look for information on REDD and BLUED

But I can’t see Sense open sourcing their customer dataset, nor a way to easily allow outsiders to build models in their environment for training and testing. Plus who would pay the outside developers computational bills ??

jamesroxford · March 15, 2019, 12:00am

Is there a place to sign up for that? I’m all in. Seriously, though, I understand that others don’t share my same opinions re security and privacy.

Last year I attended the USENIX Enigma Conference and went to a presentation titled Cryptographically Secure Data Analysis for Social Good (slides and video available for free on the website). The premise was a project in Boston to research pay disparity among women using real salary data from several local companies that backed the research to address this problem (ref 100talent.org). To protect the sensitive data, they used something called Secure Multi-Parity Computation (MPC). Maybe this or a similar cryptographic technique could be used to obfuscate/protect user data in a way that avoids privacy concerns and is also still useful for the ML algorithms?

kevin1 · March 15, 2019, 1:05am

There are plenty of technical ways to handle privacy concerns and offer data to the outside, but I’m not sure there’s a business case to go the distance on all the things that would need to be done to offer “open source”-like development to folks outside of Sense. Why do I say that ?

The only win for Sense is if someone develops improved/additional models within their environment/dataset and constraints (the monitor has performance and resource constraints).
It’s not a win for Sense to have someone use Sense data and come up with a set of models that don’t live neatly within the Sense universe.
The Sense model development environment is high in complexity, with cloud infrastructure for the acquisition, validation and storage and access of huge volumes of data. It also includes large scale capabilities for developing, training, cross-validating and comparing different models based on huge amounts of data.

There are models for “open-sourcing” data science challenges like ImageNet. In fact, if you look on Kaggle you can find an old Belkin competition for energy disaggregation (a mere 20GB of data - Sense likely accumulates Petabytes of data)

Plus MIT, CMU and others have provided their own datasets that many universities use for research-level “solutions”. But I don’t foresee an ImageNet university driver for energy disaggregation.

BTW - I have worked to open source several large commercial industry-wide projects to expand the developer base and rate of adoption and innovation. This is a more complex problem than just open-sourcing something.

samwooly1 · March 15, 2019, 1:17am

When it comes to Sense and data sharing, i don’t completely understand what they consider our data.
I’ll provide a link to their page about “opting in” to data sharing with other parties.
What they mention is basically our Total usage, solar production, personal account information and device metadata.
It does not say that device level usage won’t be shared.
What does Sense consider is our data vs what is their data? Data gathered or measured using their proprietary software, who does that belong to? Or at what point in the process does the data make the transition to our from theirs or theirs to ours?
I’m sure they have developers and engineers outside of their company that data has to be shared with.
Not that I have a problem with them sharing my data. As long as anything personally identifiable is left out, I really don’t care.

jamesroxford · March 15, 2019, 2:04am

Thanks for the insight! I definitely get that it’s complex, and I appreciate that you are very knowledgeable and experienced on the subject.

What if we try to imagine a future where this is reality (despite what we know is currently possible and sensical). In this future, not only are we doing it, but it is paying a win-win for the Sense company, customers, community, etc. Sense is on the cutting edge of this market, much like Netflix and Microsoft have become. Once you’ve imagined it, can you work backwards to conjecture what advancements had to take place to get us there? Is such a reality beyond imagination?

jamesroxford · March 15, 2019, 2:12am

I’m not a business person or an open source guru. As a consumer though, I am attracted to companies that embrace the os philosophy — maybe I’m an oddball.

Has there been business case for software companies to embrace os? I’m aware of os code projects, but I’m not sure of whether there are companies that have gone that route. Do you know of any?

kevin1 · March 15, 2019, 2:45am

The winning business cases for open source I have seen are:

A service and packaged goods business on top of open-source - Red Hat w Linux
A standard that is used to undercut the proprietary position of a leading supplier - Khronos group in graphics - meant to erode a few proprietary standards, so many companies can play in the graphics hardware market.
Infrastructure standards that enable sales of packaged software, especially where the no one company wins based on the value of the infrastructure, when the infrastructure is very expensive to develop and maintain. I participated in a couple of these - SystemC and OpenAccess. Expensive electronic design automation tools are based on both of these. They were offered as open source to get big companies with their own similar in-house infrastructure to move to more commercial technologies.
I’m sure there are many more winning open source business strategies, especially with the cloud, but all of them involve making money on something that surrounds the open source.

kevin1 · March 15, 2019, 5:25am

My 2c:
Sense is doing a good job setting up for the equivalent of open-source, which is really crowd-sourcing device data. Things like smartplugs and device information when devices are detected are the best way to improve machine learning as quickly as possible. You’ll hear the term “ground truth” - that’s the real data from your devices propagated back to Sense as feedback. Ground truth is really the most valuable commodity for improving algorithms.

kevin1 · March 15, 2019, 5:55pm

@samwooly1,

The way this is written, it sounds like Sense either had, or will have third party agreements for Sense sales and installation via solar installers. Another company that sold a device similar to Sense, Neurio, pretty much retreated into the solar add-on market where their product was only sold through solar installers. It’s a good way to add business through another channel (solar), especially for solar systems that have cryptic or problematic metering hardware (my solar system came with reasonable web access and metering, but some do not). In that situation, the solar company would be the seller and quite possibly the originator of the Sense account. Hence the sharing privacy statement.

samwooly1 · March 15, 2019, 6:06pm

I figured that page was for something along those lines.
The reason I used that particular page is there is apparently something already in the works for data sharing.
My question about data is what data is considered ours and theirs?
Sense has 2 other pages talking about privacy and I think they are all a little vague. The page I attached gets into the most detail.
I thought about the Neurio but believe Sense is a superior product.

RyanAtSense · March 15, 2019, 8:00pm

Bingo. This is utilized by third parties like solar installers.

@samwooly1 If you writeup your specific questions, I will ask the relevant people here. This is a pretty serious topic, so I want to make sure the answers I provide are 100% accurate.

samwooly1 · March 15, 2019, 8:16pm

@RyanAtSense
The only question I have is
What specific information or Data is ours?

In my mind, the only thing that is mine is personally identifiable things like name, address etc…
I’m using Sense product as far as the firmware, software and storage so that belongs to them. I may have purchased the monitor but it’s like a video game to me. On a game you’ve not purchased the game, only the right to use it.

I really don’t care and have the trust in Sense as a company that they will use any information or data responsibly.

RyanAtSense · March 19, 2019, 3:44pm

Pardon the delay on this.

We take PII very seriously (data like email, name, address, internal identifiers). PII absolutely does not get shared with any third parties without your explicit and clear permission (which is what the data sharing page above details). We do share non-PII data with other third parties, but this is for support logging and purposes (as outlined in the privacy policy) and is always anonymized.

Does that make sense? You can imagine also that many of us here at Sense use the product in our own homes and certainly wouldn’t want our home data to be conflated with PII and shared widely.

samwooly1 · March 19, 2019, 6:35pm

Thank you for the response @RyanAtSense

Sounds like privacy falls right in line with what I was thinking and all I’ve ever been concerned with is PII. All the data that has to do with energy usage being anonymize wouldn’t allow a connection to anything personal.
The policy on the website does have that covered as you pointed out.

bhstark · March 21, 2019, 4:57pm

I suspect one of the reasons that you wouldn’t open source the models is that, often, ML can be less rigorous than people want it to be. “Detect a Fridge” could be as simple as “look for something with a 15min duty cycle that is preceded by a power spike” . It could also involve trade secretes like “convert everything to frequency domain” . Often ML is a term chosen by marketers after their eyes glaze over from hearing SWEs explain ow things actually work

kevin1 · March 21, 2019, 6:32pm

I think the harder problem in “open sourcing” machine learning is that the “algorithm” is a function of a proprietary users dataset and the models, where each model is built to live within a one or more machine learning frameworks, training methodologies/meta parameters, plus constraints (time, computational complexity, memory usage)

BTW - Kaggle.com is one of the places that people have “open sourced”
data science and machine learning problems. But the key is that one has to be willing to put the dataset and software framework out on the web.

happyday.mjohnson · March 24, 2019, 12:18pm

Correct me if I am wrong, but Sense’s competitive advantage/business model is on energy device detection. This means, the more data from all of us Sense gathers, the better their detection. I assume the VCs that invested (as well as the employees) see this as their main differentiator. Given that, I can’t see Sense ever opening their data to “anyone.” The interesting aspect is what is Sense talking to energy efficiency system integrators? I’d be interested in these type of questions being explicitly answered versus just reading the license. Since the license is written not with the intent of informing future intent, but of ensuring future intent isn’t unobtainable. Knowing we would have to give consent - some form of opt in? Is good to hear. Is this roadmap in some form of Sense documentation?/3 - 5 year plan?

jamesroxford · March 24, 2019, 12:36pm

@happyday.mjohnson welcome to the community! Thanks for your comment.

I agree that this one advantage, but there can also be advantage to open source and the transparency/interactivity it creates. I think it’s just different sides of the same coin.

ixu · March 27, 2019, 1:46pm

Imagine the Sense AWS data being accessed by open source ML coders working (ironically?) for Amazon. One can imagine all sorts of tie-ins: you just bought a product on Amazon … did you plug it in yet?. While this could have benefits in terms of device identification, energy use is one (half) step away from GPS in terms of privacy issues so one can understand why Sense needs to keep things proprietary.

Topic		Replies	Views
The Missed Crowd Sourcing Opportunity Identifying Devices in Your Home	21	2719	July 2, 2019
We should be able to TRAIN this! Product Wishlist technicalsupport	119	11105	October 15, 2018
Additional input layer for Machine Learning Identifying Devices in Your Home device-detection	22	1991	July 30, 2019
Have any questions? Ask Sense! - May 2019 News & Announcements	79	4968	November 26, 2019
Get SENSE back to basics Share your Sense stories	48	2506	November 10, 2021

Could open source ML algorithms be in Sense’s future?

Related topics