Create ML-as-a-service.md by MikeMS-sys · Pull Request #399 · w3f/Grants-Program

MikeMS-sys · 2021-05-03T14:46:01Z

Grant Application Checklist

The application template has been copied, renamed (project_name.md) and updated.
A BTC or Ethereum (DAI) address for the payment of the milestones is provided inside the application.
I have read and acknowledged the Terms and Conditions.
The software delivered for this grant will be released under an open-source license specified in the application.
The total funding amount of the project is below USD $30k for initial grant applications and $100k for follow-up grants.
The initial PR contains only one commit (squash if needed before submitting your PR).
The grant will only be announced once the first milestone has been accepted.

semuelle · 2021-05-03T21:25:59Z

Hi @MikeMS-sys, thank you for your application. We will look into it as soon as possible.

If you would be so kind as to review the specification of your deliverables so that every deliverable is measurable and verifiable, that would help us enormously with evaluation later on. We will update [...], for example, is not a deliverable.

Here are some more questions & suggestions based on a quick glance at your document:

Will the ML algorithms be implemented as offchain workers?
Either way, what's the point of bringing blockchain into this? If I distrust your model, I can train one locally with your data. If I distrust your data, I am not interested in your model.
Some models require hundreds of gigabytes of data, what's your use case/motivation for asking someone to upload their data so that some blockchain node can run an algorithm on it that's already implemented in Rust?

MikeMS-sys · 2021-05-07T12:24:30Z

Hi @semuelle, thank you for your suggestions.

We’re looking to implement offchain and onchain calculations to make the decision optional for developers in case of implementing machine learning in substrate-based projects.

Offchain workers useful for implementing a separate ML service and for some offchain work with data and predictions (or other results).
Onchain workers useful to provide full blockchain powered service when data or any data pointers (ipfs hashes, etc.) become available only in blockchain.

Pallet suitable for projects where users need the power of a communal neural network while knowing their data is protected. Worked a lot with blockchain technologies our team found that both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis.

We want to realise this idea as the core part of our project in healthcare sphere - Trusted Health Council.. Users there are available securely share data with blockchain and get predictions. Neural network education process become better with any new user data. Nobody knows the data owners, because all the data anonymised.

Roadmap update

Milestone 1 - Proof of concept

Estimated Duration: 1.5 months
FTE: 3
Costs: $14 000

Substrate ML pallet

Generate predictions based on Random Forest algorithm
All data stores onchain

Web application

Interacting with blockchain
Form with fields to upload user data into Ml pallet
Handle event with prediction

All code will have proper unit-test coverage to ensure functionality and robustness.
Complex quality Assurance for all platform features.
Docker image with testing Substrate chain with integrated ML pallet, demonstrating its functionality.
Documentation of the code and a basic tutorial describing how the software can be used and tested.

Milestone 2 - Production ready

Estimated Duration: 1.5 months
FTE: 3
Costs: $14 000

Substrate ML pallet

Implement all ML algorithms from smartcore lib
Integrate OrbitDb and add allowance to store data in IPFS
Data encryption module
Manage access to users predictions results and provided data

Web application

Functionality to select current ML algorithm
Flag to encrypt user data
Access to IPFS data by hash

All code will have proper unit-test coverage to ensure functionality and robustness.
Complex quality Assurance for all platform features.
Docker image with a new version of testing Substrate chain, demonstrating its functionality.
Documentation of the code and a basic tutorial describing how the software can be used and tested.

alxs · 2021-05-10T14:24:26Z

Hi @MikeMS-sys. I'm only quickly jumping it to post a link to your last application to the General Grants Program, for reference: w3f/General-Grants-Program#413.

Besides, could you please update the application itself? It would also be helpful if you could structure the deliverables tabularly as in the template, and include deliverables 0a-c in each milestone.

Lastly I would also add that your deliverables and the application in general should still include far more details. You have barely updated them whereas they need a complete overhaul. You may treat this it as a contract; the level of detail must be enough to later verify that the software meets the specification. You can find some examples of what we're interested in for different grant categories here and have a look at this somewhat related application and its deliverables or any of the ones mentioned in the README for reference.

And could you specify what you mean by

Onchain workers [are] useful to provide full blockchain powered service when data or any data pointers (ipfs hashes, etc.) become available only in blockchain.

Since both the data and the algorithms required for ML would be far too resource-intensive to be run on-chain. What's your thinking behind this? Also data referenced via e.g. an IPFS hash would be accessed via an off-chain worker and clearly cannot be retrieved on chain. Could you clarify what you mean?

semuelle · 2021-05-11T12:05:22Z

Pallet suitable for projects where users need the power of a communal neural network while knowing their data is protected.

Can you expand on that? A communal neural network is a model that anyone has access to, or is there more? If I'm worried about my data being protected, wouldn't I just build my own model or download it and run it locally?

Neural network education process become better with any new user data.

But models are usually trained with data that is verified and often selected from a small population slice.

Data encryption module

What is encrypted, and where? In the browser before upload? If I used someone else's model, wouldn't I want to have access to the data it was trained with? How do you re-train a model with two separately encrypted datasets?

MikeMS-sys · 2021-05-12T10:46:11Z

Dear @alxs and @semuelle,
Тhank you once again for the comprehensive recommendations. We work on the application and update it for further steps.

MikeMS-sys · 2021-05-22T18:25:25Z

Returning to our conversation @alxs and @semuelle we have reflected on the earlier questions and have updated the application.

@semuelle Data encryption module unfortunately was wrongly included into the application.

semuelle · 2021-05-25T15:00:40Z

Hi @MikeMS-sys, thanks for the update. The repo containing the images (example) seems private though. I cannot access them.

semuelle · 2021-05-26T12:55:58Z

Thanks for the update. Do I understand correctly that I have to pass my training data to the node via transaction, which then stores it off-chain? Why? Why don't I store it on IPFS myself and then reference it via hash? That sounds like a massive bottleneck.

Neural network education process become better with any new user data.

Is there anything preventing people from polluting my model with wrong or fake data?

burdges · 2021-05-28T08:01:50Z

Please provide github URLs for all team members. LinkedIn URLs have no value in demonstrating team member abilities.

Afaik, there is never much if any value in doing machine learning on a blockchain. There is no need for a public source of truth since by definition machine learning models extract features from statistical samples.

Instead, if one really needs secrecy, either services provide a proprietary obfuscated model directly to users, or users provide their own masked data to services. All this falls into the adversarial machine learning field, which evolves quite quickly these days.

It's pretty trivial to obtain a less biased sample population than blockchain users, but if one day blockchains become really widely used then it's plausible one wants cryptography like group/ring VRFs when sampling, but even then if one used blockchain accounts for sampling one never touches the blockchain itself, only proves account existence in zero-knowledge.

MikeMS-sys · 2021-05-29T10:06:25Z

@semuelle Data receiving process for machine learning algorithms here is our way to prevent spam attacks or fake data from intruders. Our transaction based on a specific format and for data transfer users can only use this format.

Future plans - implement validation module.

I certainly agree with you @burdges but nevertheless I am sure that this idea quite has its place in life especially for supported private projects or unique solutions.

Team web site https://uddug.com

Andrew Skurlatov (technical lead)
Github https://github.com/andskur

Nikita Velko (senior frontend developer)
Github https://github.com/nikichv

Ivan Podsebnev (devops engineer)
Github https://github.com/naykip

Constantine Czerniak (data scientist)
Github https://github.com/Snaaby

semuelle · 2021-05-31T08:54:43Z

Data receiving process for machine learning algorithms here is our way to prevent spam attacks or fake data from intruders.

How this helps with spam I understand, but fake data? And who or what are intruders?

Noc2

Thanks for the application. I have a few questions: Are you aware of offchain::ipfs? How are you going to implement OrbitDb or the Data encryption module of your second milestone? Could you provide more details here?
Since your milestone 1 is mostly about Random Forest, could you also provide more information about this? For example: How do you ensure randomness? Is everything calculated on-chain for this (seems to be very computation heavy and it might become really difficult to benchmark this correctly)? Or do you put only a single specific random forest on-chain? From the application, it seems users have the option to update the algorithm. Isn't this like allowing people to upload their own smart contract? How do you want to integrate this?

MikeMS-sys · 2021-06-10T13:39:09Z

@semuelle
It was a kind of mistake in explanation. Data receiving process for machine learning algorithms here is our way to prevent spam attacks. Transactions has a certain value and some fee should mostly prevent from receiving of the fake data. We discussed implementation of some anti-frod algorithm and validation module, but it will costs more resources, so we decide to leave it for a while in the project future plans.

@Noc2
In the 1st milestone we ensure implementation of the basic random forest regression using Smartcore lib (https://smartcorelib.org/user_guide/supervised.html) that will have 100 different independent tree and add another algorithms in the second .

In the 2nd Milestone We are planning to integrate orbit-db via offchain::ipfs pallet to implement complex data storage solution in ipfs. Data encryption module here unfortunately was wrongly included into the application, we've discussed it with Semuelle, but forget to delete.

Yes, users have the option to update the algorithm and upload their own smart contract to the chain.

On-chain calculations interesting but really promise very heavy computation. We plan make a research on the expediency of this in principle to analyse concept of production-ready on-chain calculation maybe on some side-chains in the feature.

Noc2

Thanks for the response. I have a few follow-up questions:

So to be honest, I still don’t fully understand the benefit of putting the algorithm on-chain. If you only care about spam attacks, then there are a lot of other ways to deal with it. Putting the computation and data on-chain means instead of one computer computing and storing everything suddenly a lot of computers need to do it. Which seems highly inefficient.
Your example here is a little bit scary to be honest. Putting personal health data on-chain isn’t something that anyone wants (except maybe insurance companies ;-)) and there are a lot of legal problems to overcome. If you generally want to focus on health data, I recommend to focus first on the encryption/privacy part and latter focus on everything else.
The orbit-db via offchain::ipfs pallet implementation sounds interesting to me. Could you integrate more details about this into the application? This on it’s own might be interesting for a lot of projects and something we might want to fund.

MikeMS-sys · 2021-06-13T15:47:28Z

It seems to be much more inefficient than off-chain, of course. But it might be helpful for some private chains which could suppose it for testing proposals and can deploy ready ml-blockchain without any extra third-party dependencies. Also, we see potential for the future - mean Skynet =)

Initially started with General programm we found inconsistencies with the provisions in european GDPR in particular with "the right to be forgotten". Of the current solutions, we mainly faced with hypotesys based on smart contracts (on-chain), what promise heavy computation.

It's just a most simple prototype to test some basic hypothesis. In Trusted Health Counsil (THC) project one of the most focused side is data encryption/anonymity. Nobody exept data owner knows the owner.
Yes we can. Probably the better way is to create a new application? In this ML pallet we plan to use database in a simplest way - only CRUD’s with simple requirements. But in THC project one milestone (and built-on pallet) is about ipfs-based distributed database and we can move all related stuff to different pallet.

Noc2 · 2021-06-14T11:50:36Z

Thanks for the quick reply. How about we close this application and you initially apply for an orbit-db pallet or something similar? This might be easier to approve and generally something the grants committee is interested in. It might also help us to get a better understanding of your current work.

MikeMS-sys · 2021-06-16T07:52:12Z

@Noc2 We agree, please close this application.

Noc2 · 2021-06-16T07:56:40Z

Thanks for the update.

Update milestone-deliverables-guidelines.md

Create ML-as-a-service.md

619e984

MikeMS-sys closed this May 3, 2021

MikeMS-sys reopened this May 3, 2021

semuelle self-assigned this May 3, 2021

semuelle added the changes requested The team needs to clarify a few things first. label May 17, 2021

Update ML-as-a-service.md

1382e15

Update ML-as-a-service.md

bcca610

Update ML-as-a-service.md

8aa64cc

semuelle added ready for review The project is ready to be reviewed by the committee members. and removed changes requested The team needs to clarify a few things first. labels May 31, 2021

Noc2 suggested changes May 31, 2021

View reviewed changes

Update ML-as-a-service.md

99cc249

Noc2 suggested changes Jun 11, 2021

View reviewed changes

Noc2 closed this Jun 16, 2021

alxs pushed a commit that referenced this pull request Jul 20, 2021

Merge pull request #399 from w3f/semuelle-patch-1

2ce0fdc

Update milestone-deliverables-guidelines.md

ninabreznik mentioned this pull request Mar 19, 2022

Add datdot_milestone_2 w3f/Grant-Milestone-Delivery#399

Merged

5 tasks

Conversation

MikeMS-sys commented May 3, 2021

Grant Application Checklist

Uh oh!

semuelle commented May 3, 2021

Uh oh!

MikeMS-sys commented May 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Roadmap update

Milestone 1 - Proof of concept

Milestone 2 - Production ready

Uh oh!

alxs commented May 10, 2021

Uh oh!

semuelle commented May 11, 2021

Uh oh!

MikeMS-sys commented May 12, 2021

Uh oh!

MikeMS-sys commented May 22, 2021

Uh oh!

semuelle commented May 25, 2021

Uh oh!

semuelle commented May 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

burdges commented May 28, 2021 • edited by alxs Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MikeMS-sys commented May 29, 2021

Uh oh!

semuelle commented May 31, 2021

Uh oh!

Noc2 left a comment

Choose a reason for hiding this comment

Uh oh!

MikeMS-sys commented Jun 10, 2021

Uh oh!

Noc2 left a comment

Choose a reason for hiding this comment

Uh oh!

MikeMS-sys commented Jun 13, 2021

Uh oh!

Noc2 commented Jun 14, 2021

Uh oh!

MikeMS-sys commented Jun 16, 2021

Uh oh!

Noc2 commented Jun 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MikeMS-sys commented May 7, 2021 •

edited

Loading

semuelle commented May 26, 2021 •

edited

Loading

burdges commented May 28, 2021 •

edited by alxs

Loading