Create ML-as-a-service.md#399
Create ML-as-a-service.md#399MikeMS-sys wants to merge 5 commits intow3f:masterfrom uddugteam:master
Conversation
|
Hi @MikeMS-sys, thank you for your application. We will look into it as soon as possible. If you would be so kind as to review the specification of your deliverables so that every deliverable is measurable and verifiable, that would help us enormously with evaluation later on. Here are some more questions & suggestions based on a quick glance at your document:
|
|
Hi @semuelle, thank you for your suggestions. We’re looking to implement offchain and onchain calculations to make the decision optional for developers in case of implementing machine learning in substrate-based projects.
Pallet suitable for projects where users need the power of a communal neural network while knowing their data is protected. Worked a lot with blockchain technologies our team found that both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. We want to realise this idea as the core part of our project in healthcare sphere - Trusted Health Council.. Users there are available securely share data with blockchain and get predictions. Neural network education process become better with any new user data. Nobody knows the data owners, because all the data anonymised. Roadmap updateMilestone 1 - Proof of concept
Milestone 2 - Production ready
|
|
Hi @MikeMS-sys. I'm only quickly jumping it to post a link to your last application to the General Grants Program, for reference: w3f/General-Grants-Program#413. Besides, could you please update the application itself? It would also be helpful if you could structure the deliverables tabularly as in the template, and include deliverables Lastly I would also add that your deliverables and the application in general should still include far more details. You have barely updated them whereas they need a complete overhaul. You may treat this it as a contract; the level of detail must be enough to later verify that the software meets the specification. You can find some examples of what we're interested in for different grant categories here and have a look at this somewhat related application and its deliverables or any of the ones mentioned in the README for reference. And could you specify what you mean by
Since both the data and the algorithms required for ML would be far too resource-intensive to be run on-chain. What's your thinking behind this? Also data referenced via e.g. an IPFS hash would be accessed via an off-chain worker and clearly cannot be retrieved on chain. Could you clarify what you mean? |
Can you expand on that? A communal neural network is a model that anyone has access to, or is there more? If I'm worried about my data being protected, wouldn't I just build my own model or download it and run it locally?
But models are usually trained with data that is verified and often selected from a small population slice.
What is encrypted, and where? In the browser before upload? If I used someone else's model, wouldn't I want to have access to the data it was trained with? How do you re-train a model with two separately encrypted datasets? |
|
Hi @MikeMS-sys, thanks for the update. The repo containing the images (example) seems private though. I cannot access them. |
|
Thanks for the update. Do I understand correctly that I have to pass my training data to the node via transaction, which then stores it off-chain? Why? Why don't I store it on IPFS myself and then reference it via hash? That sounds like a massive bottleneck.
Is there anything preventing people from polluting my model with wrong or fake data? |
|
Please provide github URLs for all team members. LinkedIn URLs have no value in demonstrating team member abilities. Afaik, there is never much if any value in doing machine learning on a blockchain. There is no need for a public source of truth since by definition machine learning models extract features from statistical samples. Instead, if one really needs secrecy, either services provide a proprietary obfuscated model directly to users, or users provide their own masked data to services. All this falls into the adversarial machine learning field, which evolves quite quickly these days. It's pretty trivial to obtain a less biased sample population than blockchain users, but if one day blockchains become really widely used then it's plausible one wants cryptography like group/ring VRFs when sampling, but even then if one used blockchain accounts for sampling one never touches the blockchain itself, only proves account existence in zero-knowledge. |
|
@semuelle Data receiving process for machine learning algorithms here is our way to prevent spam attacks or fake data from intruders. Our transaction based on a specific format and for data transfer users can only use this format. Future plans - implement validation module. I certainly agree with you @burdges but nevertheless I am sure that this idea quite has its place in life especially for supported private projects or unique solutions. Team web site https://uddug.com Andrew Skurlatov (technical lead) Nikita Velko (senior frontend developer) Ivan Podsebnev (devops engineer) Constantine Czerniak (data scientist) |
How this helps with spam I understand, but fake data? And who or what are intruders? |
Noc2
left a comment
There was a problem hiding this comment.
Thanks for the application. I have a few questions: Are you aware of offchain::ipfs? How are you going to implement OrbitDb or the Data encryption module of your second milestone? Could you provide more details here?
Since your milestone 1 is mostly about Random Forest, could you also provide more information about this? For example: How do you ensure randomness? Is everything calculated on-chain for this (seems to be very computation heavy and it might become really difficult to benchmark this correctly)? Or do you put only a single specific random forest on-chain? From the application, it seems users have the option to update the algorithm. Isn't this like allowing people to upload their own smart contract? How do you want to integrate this?
|
@semuelle @Noc2 In the 2nd Milestone We are planning to integrate orbit-db via offchain::ipfs pallet to implement complex data storage solution in ipfs. Data encryption module here unfortunately was wrongly included into the application, we've discussed it with Semuelle, but forget to delete. Yes, users have the option to update the algorithm and upload their own smart contract to the chain. On-chain calculations interesting but really promise very heavy computation. We plan make a research on the expediency of this in principle to analyse concept of production-ready on-chain calculation maybe on some side-chains in the feature. |
Noc2
left a comment
There was a problem hiding this comment.
Thanks for the response. I have a few follow-up questions:
- So to be honest, I still don’t fully understand the benefit of putting the algorithm on-chain. If you only care about spam attacks, then there are a lot of other ways to deal with it. Putting the computation and data on-chain means instead of one computer computing and storing everything suddenly a lot of computers need to do it. Which seems highly inefficient.
- Your example here is a little bit scary to be honest. Putting personal health data on-chain isn’t something that anyone wants (except maybe insurance companies ;-)) and there are a lot of legal problems to overcome. If you generally want to focus on health data, I recommend to focus first on the encryption/privacy part and latter focus on everything else.
- The orbit-db via offchain::ipfs pallet implementation sounds interesting to me. Could you integrate more details about this into the application? This on it’s own might be interesting for a lot of projects and something we might want to fund.
Initially started with General programm we found inconsistencies with the provisions in european GDPR in particular with "the right to be forgotten". Of the current solutions, we mainly faced with hypotesys based on smart contracts (on-chain), what promise heavy computation.
|
|
Thanks for the quick reply. How about we close this application and you initially apply for an orbit-db pallet or something similar? This might be easier to approve and generally something the grants committee is interested in. It might also help us to get a better understanding of your current work. |
|
@Noc2 We agree, please close this application. |
|
Thanks for the update. |
Update milestone-deliverables-guidelines.md
Grant Application Checklist
project_name.md) and updated.