License for Copilot generated code #60304
-
Select Topic AreaQuestion BodyAs org admin I've received requests to buy into Copilot. We produce mostly open source that needs to be put under a certain license which is not an issue as long as we create new code or reuse projects with a clear license. I'm a pit puzzled what is an acceptable license for the code that Copilot produces as to my understanding I wouldn't see where it originally comes from. GitHub says the model was trained by available code but not if they were under a permissive or copyleft license. I assume I can easily enter some grey zone. Practically I couldn't use GPL, AGPL, LGPL, EPL, MPL and put it under Apache, MIT, BSD (or even a proprietary license), if required for the project. That would be a license violation and may happen here if they were used to train the model. Is this indeed a risk or can I rely on GitHub so I can chose my favorite license including Copilot generated code? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
|
GitHub Copilot is powered by OpenAI Codex. It has been trained on natural language text and source code from publicly available sources, including code in public repositories on GitHub. It uses various elements of the context of your project to provide suggestions, including file content both in the file you are editing, as well as neighboring or related files within a project. It may also include the URLs of repositories or file paths to identify relevant context. The comments and code along with context are then used to synthesize and suggest individual lines and whole functions. TrainingGitHub Copilot’s Suggestions are all generated through AI. GitHub Copilot generates new code in a probabilistic way, and the probability that they produce the same code as a snippet that occurred in training is low. The models do not contain a database of code, and they do not ‘look up’ snippets. Our latest internal research shows that about 1% of the time, a Suggestion may contain some code snippets longer than ~150 characters that matches the training set. Previous research showed that many of these cases happen when GitHub Copilot is unable to glean sufficient context from the code you are writing, or when there is a common, perhaps even universal, solution to the problem. Filter Public CodeTo help you filter suggestions matching public code, we built a filter to help detect and suppress GitHub Copilot suggestions which contain code that matches public code on GitHub. GitHub Copilot for Individual users have the choice to enable that filter during setup on their individual accounts. For GitHub Copilot for Business users, the Enterprise administrator controls how the filter is applied. They can control suggestions for all organizations or defer control to individual organization administrators. These organization administrators can turn the filter on or off during setup (assuming their Enterprise administrator has deferred control) for the users in their organization. With the filter enabled, GitHub Copilot checks code suggestions with its surrounding code for matches or near matches (ignoring whitespace) against public code on GitHub of about 150 characters. If there is a match, the suggestion will not be shown to the user. In addition, we have announced that we are building a feature that will provide a reference for suggestions that resemble public code on GitHub so that you can make a more informed decision about whether or not to use that code, as well as explore and learn how that code is used in other projects. Just like when you write any code that uses material you did not independently originate, you should take precautions to understand how it works and ensure its suitability. These include rigorous testing, IP scanning, and checking for security vulnerabilities. You should make sure your IDE or editor does not automatically compile or run generated code before you review it. Measures to keep in mindYou should take the same precautions as you would with any code you write that uses material you did not independently originate, and should take precautions to ensure its suitability. These include rigorous testing, IP scanning, and checking for security vulnerabilities. You should make sure your IDE or editor does not automatically compile or run generated code before you review it. Hope that helps! |
Beta Was this translation helpful? Give feedback.
-
|
Long answer, but I very much appreciate it, thank you! So in summary I have the option to either turn on the (already existing) filter which will avoid to copy code. I assume the price to pay is less output from Copilot. But would I be free to put the result (only looking at the Copilot generated code) under any license I wish or are there limits? Or I can use the (yet to be built) feature that should allow me to see the origin of the Copilot generated code, so I can ideally see its license as if I copy it over from another project. That's probably the way to go, and it would be a super cool feature if you could add a license tracker that automatically shows remaining license options during use and prevents your user from infringement. |
Beta Was this translation helpful? Give feedback.
-
|
🕒 Stale Discussion Alert 🕒 This Discussion has been labeled as stale by an automated system for having no activity in the last 60 days. Please consider one the following actions: 1️⃣ Close as Out of Date: If the topic is no longer relevant, close the Discussion as 2️⃣ Provide More Information: Share additional details or context — or let the community know if you've found a solution on your own. 3️⃣ Mark a Reply as Answer: If your question has been answered by a reply, mark the most helpful reply as the solution. Note: This stale notification will only apply to Discussions with the Thank you for helping bring this Discussion to a resolution! 💬 |
Beta Was this translation helpful? Give feedback.
GitHub Copilot is powered by OpenAI Codex. It has been trained on natural language text and source code from publicly available sources, including code in public repositories on GitHub. It uses various elements of the context of your project to provide suggestions, including file content both in the file you are editing, as well as neighboring or related files within a project. It may also include the URLs of repositories or file paths to identify relevant context. The comments and code along with context are then used to synthesize and suggest individual lines and whole functions.
Training
GitHub Copilot’s Suggestions are all generated through AI. GitHub Copilot generates new code in a p…