Duo Context Exclusion
## Background As we expand the context and content available to Duo, we need to provide customer controls for excluding sensitive files/content from Duo features and supporting models. Customers may have sensitive files that should not be processed or input to LLM's and embeddings models. **References** * [Additional background and architecture proposal](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ai_context_management/) * [Additional MR discussion](https://gitlab.com/gitlab-org/editor-extensions/gitlab-lsp/-/merge_requests/837) * [Related feature request](https://gitlab.com/gitlab-org/editor-extensions/gitlab-jetbrains-plugin/-/issues/839) ## Main goal Allow customers to enforce their security/privacy policy by controlling the content that is used within Duo. This supports messaging that ensures customers that excluded files/context are not processed by any Duo LLM or supporting model. This provides a very strong data privacy and data retention position: * Each customer can preclude content from ever being processed by Duo * For content that is processed by Duo, we maintain zero-day data retention ## MVC Proposal ### **Functional summary** * Files are available for AI context by default, unless otherwise specified. * At the project level, an administrator can: * Configure paths to exclude from AI context * This could include a specific file, a directory, a file extension, etc. * Configure paths to include for AI context * e.g. Exclude a folder, but include 2 specific files in that folder. * This could include a specific file, a directory, a file extension, etc. * All files are excluded when a project has [Duo turned off](https://docs.gitlab.com/ee/user/gitlab_duo/turn_on_off.html). * The exclusion policy should also be enforced for GitLab Duo with Amazon Q, with the same behavior as GitLab Duo. * The exclusion policy should be enforced at the customer level (i.e. instance or top-level namespace) rather than the user level. * As a potential example, content should be uniformly excluded even if a customer has a mix of Duo Pro/Enterprise users and Duo Core users. * \[Nice to have but not a strict MVC requirement\] We also exclude any paths specified in `gitignore` * \[Nice to have but not a strict MVC requirement\] Updating files when the exclusion configuration is updated: * If files are embedded/stored, and Duo is turned off for the project, then we should remove the files from the Duo data store. * If files are embedded/stored, and those files are added to the exclude configuration, then we should remove the files from the Duo data store. * The removal doesn't need to be instantaneous but we should aim for no more than 30 minutes to apply the change. ### **Excluded files behavior** * Content from excluded files is not sent to an LLM or embeddings model. * Duo Chat is not supported for excluded files. * Code Suggestions are not supported within excluded files. * Content in excluded files won't be used to inform code completion suggestions in other files. * This includes both open tabs context, and imports context. * Content from excluded files is not embedded and stored. * Generally, no Duo feature should use content from excluded files. * Edge case: Duo is enabled but all or most files are excluded - Duo will be ineffective. No specific requirement here but we could consider in-product messaging if this is common. **Full list of features that should enforce content exclusion policy** <table> <tr> <th>Feature category</th> <th>Feature</th> <th>Unit primitive</th> </tr> <tr> <td>Code Suggestions</td> <td> [Code generation](https://docs.gitlab.com/user/project/repository/code_suggestions/#code-completion-and-generation) </td> <td>generate_code</td> </tr> <tr> <td>Code Suggestions</td> <td> [Code completion](https://docs.gitlab.com/user/project/repository/code_suggestions/#code-completion-and-generation) </td> <td>complete_code</td> </tr> <tr> <td>Chat</td> <td> [/include file](https://docs.gitlab.com/user/gitlab_duo_chat/examples/#ask-about-specific-files-in-the-ide) </td> <td>include_file_context</td> </tr> <tr> <td>Chat</td> <td>/include merge request</td> <td>include_merge_request_context</td> </tr> <tr> <td>Chat</td> <td>/include directory</td> <td>include_directory_context</td> </tr> <tr> <td>Chat</td> <td>/include repository</td> <td>include_repository_context</td> </tr> <tr> <td>Chat</td> <td> [/fix](https://docs.gitlab.com/user/gitlab_duo_chat/examples/#fix-code-in-the-ide) </td> <td>fix_code</td> </tr> <tr> <td>Chat</td> <td> [/refactor](https://docs.gitlab.com/user/gitlab_duo_chat/examples/#refactor-code-in-the-ide) </td> <td>refactor_code</td> </tr> <tr> <td>Chat</td> <td> [/test](https://docs.gitlab.com/user/gitlab_duo_chat/examples/#write-tests-in-the-ide) </td> <td>write_tests</td> </tr> <tr> <td>Chat</td> <td> [/explain](https://docs.gitlab.com/user/gitlab_duo_chat/examples/#explain-selected-code) </td> <td> explain_code include_terminal_context </td> </tr> <tr> <td>Chat</td> <td> [Ask about file](https://docs.gitlab.com/user/gitlab_duo_chat/#in-the-gitlab-ui) </td> <td>n/a</td> </tr> <tr> <td>Chat</td> <td> [Ask about merge request](https://docs.gitlab.com/user/gitlab_duo_chat/examples/#ask-about-a-specific-merge-request) </td> <td>ask_merge_request</td> </tr> <tr> <td>Duo Workflow</td> <td> [Duo Workflow](https://docs.gitlab.com/user/duo_workflow/) </td> <td>duo_workflow_execute_workflow</td> </tr> <tr> <td>Duo Code Review</td> <td> [Duo Code Review](https://docs.gitlab.com/user/project/merge_requests/duo_in_merge_requests/#have-gitlab-duo-review-your-code) </td> <td>review_merge_request</td> </tr> <tr> <td>Sec. Vulnerability</td> <td> [Vulnerability resolution](https://docs.gitlab.com/user/application_security/vulnerabilities/#vulnerability-resolution) </td> <td>resolve_vulnerability</td> </tr> <tr> <td>Summarization</td> <td> [Generate merge commit message](https://docs.gitlab.com/user/project/merge_requests/duo_in_merge_requests/#generate-a-merge-commit-message) </td> <td>generate_commit_message</td> </tr> <tr> <td>Summarization</td> <td> [Generate merge request description](https://docs.gitlab.com/user/project/merge_requests/duo_in_merge_requests/#generate-a-description-by-summarizing-code-changes) </td> <td>summarize_new_merge_request</td> </tr> <tr> <td>Tools</td> <td>Embeddings for codebase semantic search</td> <td>generate_embeddings_codebase</td> </tr> <tr> <td>Tools</td> <td>Codebase semantic search</td> <td>codebase_search</td> </tr> </table> **Proposed UX treatments** * IDE should display the disabled Tanuki icon when the open and active file is excluded. * If the user submits a Chat prompt for an excluded file, Chat should respond: `Duo does not have access to this file due to an active content exclusion policy.` * This could include `/fix` `/refactor` `/explain` `/test` * This could include a prompt such as "summarize this file". * Excluded files are displayed but disabled within the `/include` selection menu, with an info icon to communicate the file status. * Info icon hover text: `Duo does not have access to this file due to an active content exclusion policy.` * These features return an exclusion message within their response when one or more relevant files were excluded. The message should be `Duo could not access these files due to an active content exclusion policy: filename1.ext filename2.extt ...` * Duo Code Review * Duo Workflow * Vulnerability resolution * Ask about a merge request * Generate a merge commit message * Generate merge request description **Edge case** * We can't reasonably stop a user from copy/pasting the entire contents of restricted file into chat * e.g. Open file, copy all code, paste into Chat along with question/task ### Tier availability and deployment options **Supported Duo add-ons** * Duo Core :x: * Duo Pro :white_check_mark: * Duo Enterprise :white_check_mark: **Supported deployment options** * .com :white_check_mark: * Dedicated :white_check_mark: * Self Managed :white_check_mark: * Self-hosted models :white_check_mark: ### **Telemetry** * We can measure the number of customers using a non-default AI context policy * We can measure the number of projects using a non-default AI context policy ### **Potential future iterations** * Automated validation of correct policy configuration * Manage policy at group level ## For discussion Proposing that we use a UI-based settings affordance rather than an ignore file stored in each repository. This is more consistent with our current direction for [custom rules management](https://gitlab.com/groups/gitlab-org/-/epics/17685), and I prefer that we are consistent in the interaction patterns when possible. We can discuss this if there are advantages to storing a context policy file in each repository, rather than a UI interaction. A helpful comparison of file-based vs UI-based pros and cons with respect to rules: https://gitlab.com/groups/gitlab-org/-/epics/17685#note_2493440891 ## Metrics The metrics are focused on adoption, and measuring a shift in projects moving from Duo-disabled to Duo-enabled with some files excluded. We believe that there will be fewer projects with Duo turned completely off and more projects where specific file extensions are disabled. As a prerequisite to roll out, we can baseline the number of projects where Duo is supported but disabled. **Adoption** * % of customers using AI context inclusion/exclusion * % of projects using AI context inclusion/exclusion **Behavior change** * Reduced % of Duo-disabled projects ## Appendix There is a prior [AI Context management proposal](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/ai_context_management/#suggested-iterative-implementation-plan) that may be a useful reference to inform the implementation.
epic