Private issues #2

Open
opened 2024-07-21 16:16:27 +02:00 by fnetX · 30 comments
Owner

Warning

Please read this section of the README before interacting in the issue tracker.
We might remove comments that don't fit here. This is unfortunately necessary, because respect for design work is not yet common in Free/Libre Software projects.

Content

I'm starting work for a feature that allows private reporting of issues in Forgejo (mainly to demonstrate the workflow I have in mind), starting with forgejo/forgejo#142

> **Warning** > Please read [this section of the README](https://codeberg.org/forgejo/design#what-you-can-do-here) before interacting in the issue tracker. > We might remove comments that don't fit here. This is unfortunately necessary, because respect for design work is not yet common in Free/Libre Software projects. ### Content I'm starting work for a feature that allows private reporting of issues in Forgejo (mainly to demonstrate the workflow I have in mind), starting with https://codeberg.org/forgejo/forgejo/issues/142
Author
Owner
work has started here: https://codeberg.org/forgejo/design/src/branch/private-issues
Author
Owner

@algernon and / or @gusted, can I have your input on potential solutions for a sane separation of possible issues?

I was wondering if labels and issue templates could be (ab-)used for the job.

Issue templates already allow categorizing issue types and assigning labels.
If we can take this one step further, what about building workflows like this:

  • the label "security/new" is only visible to repo owners and "security" team
  • the label "user-support/new" is only visible to "contributors"
  • there are issue templates "security issue" and "ask a question" which assign the labels
  • you can configure issue creation for certain issue templates via email, e.g. repo+security@example.comm, repo+support@example.com
  • people with access to the issues can "publish" them by modifying the labels, e.g. to "security/published" or "user-support/help".

It sounds like a rather flexible solution where many building blogs are already present.

@algernon and / or @gusted, can I have your input on potential solutions for a sane separation of possible issues? I was wondering if labels and issue templates could be (ab-)used for the job. Issue templates already allow categorizing issue types and assigning labels. If we can take this one step further, what about building workflows like this: - the label "security/new" is only visible to repo owners and "security" team - the label "user-support/new" is only visible to "contributors" - there are issue templates "security issue" and "ask a question" which assign the labels - you can configure issue creation for certain issue templates via email, e.g. repo+security@example.comm, repo+support@example.com - people with access to the issues can "publish" them by modifying the labels, e.g. to "security/published" or "user-support/help". It sounds like a rather flexible solution where many building blogs are already present.
Owner

can I have your input on potential solutions for a sane separation of possible issues?

New database table, I've had numerous occasions since https://github.com/go-gitea/gitea/pull/17711 (please take no inspiration from the actual code, it's horrible) to pounder on some mechanism that would make me feel confident to have such feature (as well as seeing code I would've missed changing in such PR and would've added ten new CVEs to Forgejo), but if anything re-using the issue table would be a huge mistake, because it was never made with in mind that some issues need maximal protection for leakage, also it would require mental notes during PR reviews to ensure if someone do SQL with the issue table to ensure they exclude private issues.

Doing so means that for every time you want to have private issues integrated in some way or form (I assume you would want to see this integrated into the issue tab?) you have to take special care to integrate private issues, thus you do it explicitly which makes reviewing that the code is safe and sound way easier.

It's a boring solution, which is the best solution.

> can I have your input on potential solutions for a sane separation of possible issues? New database table, I've had numerous occasions since https://github.com/go-gitea/gitea/pull/17711 (please take no inspiration from the actual code, it's horrible) to pounder on some mechanism that would make me feel confident to have such feature (as well as seeing code I would've missed changing in such PR and would've added ten new CVEs to Forgejo), but if anything re-using the `issue` table would be a huge mistake, because it was never made with in mind that some issues need maximal protection for leakage, also it would require mental notes during PR reviews to ensure if someone do SQL with the issue table to ensure they exclude private issues. Doing so means that for every time you want to have private issues integrated in some way or form (I assume you would want to see this integrated into the issue tab?) you have to take special care to integrate private issues, thus you do it explicitly which makes reviewing that the code is safe and sound way easier. It's a boring solution, which is the best solution.
Author
Owner

I see. Thank you for the input. This makes a lot of sense.

I see. Thank you for the input. This makes a lot of sense.
fnetX added reference private-issues 2024-07-23 02:49:07 +02:00

Hi, is there any process on confidential issues? I am currently looking for a workaround for that.

Currently, if a user posted an issue on a private repo before removed access by admin, he will not be able to see the issue posted by him.

If we can make issue poster to view their issue regardless of the permission settings of the repo, we can achieve this functionality. Just create a private shadow repo of a public repo, and use api to post an issue in the shadow repo on behalf of the user, they will only be able to see the issue and not anything else. Also they can find their created issues in the dashboard, which seems to be a solution without the need to modify database/indexer schema.

Hi, is there any process on confidential issues? I am currently looking for a workaround for that. Currently, if a user posted an issue on a private repo before removed access by admin, he will not be able to see the issue posted by him. If we can make issue poster to view their issue regardless of the permission settings of the repo, we can achieve this functionality. Just create a private *shadow* repo of a public repo, and use api to post an issue in the shadow repo on behalf of the user, they will only be able to see the issue and not anything else. Also they can find their created issues in the dashboard, which seems to be a solution without the need to modify database/indexer schema.
Author
Owner

That sounds like a rather complex workaround to me with questionable gain. For example, if someone is part of a team and creates an issue, but is then removed from the team, it would likely be unexpected for others that they still have access to some issues and ongoing (private) conversation because they created the issue.

However, I have an idea that might allow to provide a usable alternative without the complexity of a full-fledged private issue implementation:

  • allow moving issues between repositories, which is an often-requested feature anyway and seems to have a rather high usability
  • allow submission-only issues (allow a special permission to only create issues, but never view their content)

This combination would allow to cover at least some use cases like secure reporting of issues:

  • there is a private security repo
  • users can report issues there, but don't view the content
  • once the issue is resolved, it could be moved to the public repo easily for a transparent archive

It would not address the use case of having individual permissions per issue, though.

That sounds like a rather complex workaround to me with questionable gain. For example, if someone is part of a team and creates an issue, but is then removed from the team, it would likely be unexpected for others that they still have access to some issues and ongoing (private) conversation because they created the issue. However, I have an idea that might allow to provide a usable alternative without the complexity of a full-fledged private issue implementation: - allow moving issues between repositories, which is an often-requested feature anyway and seems to have a rather high usability - allow submission-only issues (allow a special permission to only create issues, but never view their content) This combination would allow to cover at least some use cases like secure reporting of issues: - there is a private security repo - users can report issues there, but don't view the content - once the issue is resolved, it could be moved to the public repo easily for a transparent archive It would not address the use case of having individual permissions per issue, though.

Just to chime in: excited that this is already being worked on! 😍 It's the current only thing stopping us from moving the remaining repos of our org over from GitLab…

For example, if someone is part of a team and creates an issue, but is then removed from the team

They'll still have access to them, yes (as to all issues they've opened). That's how GitLab deals with that as well.

If there are concerns about that (and I understand there might be cases concern even should be there), maybe a "disown" (aka "ghosting") might be a viable approach (of course only available to repo owners and those having be granted the privilege to do so)? This approach would also cause less complication if a "mixed" repo (containing both, confidential and public issues) is migrated from e.g. GitLab.

Moving issues between repos is of course also something welcome, even to other use-cases (e.g. issues filed in the wrong repo of an organization).

Just to chime in: excited that this is already being worked on! 😍 It's the current only thing stopping us from moving the remaining repos of our org over from GitLab… > For example, if someone is part of a team and creates an issue, but is then removed from the team They'll still have access to them, yes (as to all issues they've opened). That's how GitLab deals with that as well. If there are concerns about that (and I understand there might be cases concern even *should* be there), maybe a "disown" (aka "ghosting") might be a viable approach (of course only available to repo owners and those having be granted the privilege to do so)? This approach would also cause less complication if a "mixed" repo (containing both, confidential **and** public issues) is migrated from e.g. GitLab. Moving issues between repos is of course also something welcome, even to other use-cases (e.g. issues filed in the wrong repo of an organization).

@Gusted wrote in #2 (comment):

New database table, I've had numerous occasions since https://github.com/go-gitea/gitea/pull/17711 (please take no inspiration from the actual code, it's horrible) to pounder on some mechanism that would make me feel confident to have such feature (as well as seeing code I would've missed changing in such PR and would've added ten new CVEs to Forgejo), but if anything re-using the issue table would be a huge mistake, because it was never made with in mind that some issues need maximal protection for leakage, also it would require mental notes during PR reviews to ensure if someone do SQL with the issue table to ensure they exclude private issues.

Doing so means that for every time you want to have private issues integrated in some way or form (I assume you would want to see this integrated into the issue tab?) you have to take special care to integrate private issues, thus you do it explicitly which makes reviewing that the code is safe and sound way easier.

It's a boring solution, which is the best solution.

A little query about this one, (with the seperate DB table) do you envisage private issues being a separate class altogether, i.e. issues, PRs, and Private issues? with a separate UI?

Or they act like issues in every way in the UI, but in a separate DB table on the back-end? And if this case, the issue jumps between the tables if it is marked or unmarked as private?

@Gusted wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-2105623: > New database table, I've had numerous occasions since https://github.com/go-gitea/gitea/pull/17711 (please take no inspiration from the actual code, it's horrible) to pounder on some mechanism that would make me feel confident to have such feature (as well as seeing code I would've missed changing in such PR and would've added ten new CVEs to Forgejo), but if anything re-using the `issue` table would be a huge mistake, because it was never made with in mind that some issues need maximal protection for leakage, also it would require mental notes during PR reviews to ensure if someone do SQL with the issue table to ensure they exclude private issues. > > Doing so means that for every time you want to have private issues integrated in some way or form (I assume you would want to see this integrated into the issue tab?) you have to take special care to integrate private issues, thus you do it explicitly which makes reviewing that the code is safe and sound way easier. > > It's a boring solution, which is the best solution. A little query about this one, (with the seperate DB table) do you envisage private issues being a separate class altogether, i.e. issues, PRs, and Private issues? with a separate UI? Or they act like issues in every way in the UI, but in a separate DB table on the back-end? And if this case, the issue jumps between the tables if it is marked or unmarked as private?
Owner

@ryanlerch wrote in #2 (comment):

A little query about this one, (with the seperate DB table) do you envisage private issues being a separate class altogether, i.e. issues, PRs, and Private issues? with a separate UI?

I have no opinion on that, either could work. Separate DB tables is purely from a security standpoint to ensure whatever is designed is secure-by-design and easy to review that it doesn't leak private issues.

@ryanlerch wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-3008733: > A little query about this one, (with the seperate DB table) do you envisage private issues being a separate class altogether, i.e. issues, PRs, and Private issues? with a separate UI? I have no opinion on that, either could work. Separate DB tables is purely from a security standpoint to ensure whatever is designed is secure-by-design and easy to review that it doesn't leak private issues.
Author
Owner

I think they should try to behave like normal issues as much as possible. However, if it gets easier to implement if certain things are changed, this might also be fine.

Private pull requests are probably also out of scope, so all pull-request specific code would not have to work with private issues at first.

I think they should try to behave like normal issues as much as possible. However, if it gets easier to implement if certain things are changed, this might also be fine. Private pull requests are probably also out of scope, so all pull-request specific code would not have to work with private issues at first.

Hello! Quick introduction, I’m working on private issues on the Fedora end, concentrating at first on the use case where registered users report a private issue via the tracker. I’m new to the code of Forgejo obviously and still have to get my swimming feet in a golang/nodejs environment – I’m coming from Python and while I’m not new to compiled, closer-to-the-machine languages, it’s also been a while and C is a different beast. Because of this background, don’t hesitate to point out if I make what looks like rookie assumptions or mistakes.

I hope this is the right place to discuss implementation, but I’m happy to move it somewhere else.

Anyway, I’ve tried to get my bearings on the codebase and looked at the various discussion threads to get an idea how to approach this. My attempts so far started with a copy of the Issue type and an interface to allow functions to deal with either, see my private issues branch for how far I got 😉. Re-reading what @Gusted wrote in #2 (comment), I’m a bit puzzled because my impression was that xorm maps one class/type to one table in the DB, which is why I derived PrivateIssue from Issue as a separate type. I think I’d prefer this assumption of mine to be wrong, having one type cater for both would let me get rid of much of the wrapper and boiler plate code in my first shot (I guess). If someone could clear this up to me, that’d be great!

Hello! Quick introduction, I’m working on private issues on the Fedora end, concentrating at first on the use case where registered users report a private issue via the tracker. I’m new to the code of Forgejo obviously and still have to get my swimming feet in a golang/nodejs environment – I’m coming from Python and while I’m not new to compiled, closer-to-the-machine languages, it’s also been a while and C is a different beast. Because of this background, don’t hesitate to point out if I make what looks like rookie assumptions or mistakes. I hope this is the right place to discuss implementation, but I’m happy to move it somewhere else. Anyway, I’ve tried to get my bearings on the codebase and looked at the various discussion threads to get an idea how to approach this. My attempts so far started with a copy of the `Issue` type and an interface to allow functions to deal with either, see [my private issues branch](https://codeberg.org/nilsph/forgejo/src/branch/forgejo--private-issues) for how far I got 😉. Re-reading what @Gusted wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-3011908, I’m a bit puzzled because my impression was that xorm maps one class/type to one table in the DB, which is why I derived `PrivateIssue` from `Issue` as a separate type. I think I’d prefer this assumption of mine to be wrong, having one type cater for both would let me get rid of much of the wrapper and boiler plate code in my first shot (I guess). If someone could clear this up to me, that’d be great!
Author
Owner

@nilsph Yes, you have come to the right place. Given the code around this area is rather sensitive and involved, it might make sense for you to also follow our general suggestion for new contributors to start fixing bugs. Maybe you can find some things that allow you to get familiar to the current design of the issue tracker.

Have you considered the workarounds I suggested in #2 (comment) for the needs of Fedora? I know that they are not perfect, but I think they are somewhat simpler to implement. And moving issues is a feature with high gain anyway.

@nilsph Yes, you have come to the right place. Given the code around this area is rather sensitive and involved, it might make sense for you to also follow our [general suggestion for new contributors to start fixing bugs](https://codeberg.org/forgejo/discussions/issues/337#). Maybe you can find some things that allow you to get familiar to the current design of the issue tracker. Have you considered the workarounds I suggested in https://codeberg.org/forgejo/design/issues/2#issuecomment-2515772 for the needs of Fedora? I know that they are not perfect, but I think they are somewhat simpler to implement. And moving issues is a feature with high gain anyway.

@fnetX wrote in #2 (comment):

@nilsph Yes, you have come to the right place. Given the code around this area is rather sensitive and involved, it might make sense for you to also follow our general suggestion for new contributors to start fixing bugs. Maybe you can find some things that allow you to get familiar to the current design of the issue tracker.

This good advice. I’ll take a look at the “good first issues” and see if I can find something suitable in the area of issue tracking to get my feet wet.

Have you considered the workarounds I suggested in #2 (comment) for the needs of Fedora? I know that they are not perfect, but I think they are somewhat simpler to implement. And moving issues is a feature with high gain anyway.

I’ve looked at them and we’ve also discussed this and other workarounds in the Fedora team dealing with Forgejo deployment. To me, the big downside of “separate repo + submission-only issues” is that it precludes interaction with reporters – and we do that regularly in such tickets (I just checked). Also, I feel I have a better idea of what needs to be done about private issues rather than moving issues between repositories.

@fnetX wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-7874642: > @nilsph Yes, you have come to the right place. Given the code around this area is rather sensitive and involved, it might make sense for you to also follow our [general suggestion for new contributors to start fixing bugs](https://codeberg.org/forgejo/discussions/issues/337). Maybe you can find some things that allow you to get familiar to the current design of the issue tracker. This good advice. I’ll take a look at the “good first issues” and see if I can find something suitable in the area of issue tracking to get my feet wet. > Have you considered the workarounds I suggested in #2 (comment) for the needs of Fedora? I know that they are not perfect, but I think they are somewhat simpler to implement. And moving issues is a feature with high gain anyway. I’ve looked at them and we’ve also discussed this and other workarounds in the Fedora team dealing with Forgejo deployment. To me, the big downside of “separate repo + submission-only issues” is that it precludes interaction with reporters – and we do that regularly in such tickets (I just checked). Also, I feel I have a better idea of what needs to be done about private issues rather than moving issues between repositories.

@nilsph wrote in #2 (comment):

Have you considered the workarounds I suggested in #2 (comment) for the needs of Fedora? I know that they are not perfect, but I think they are somewhat simpler to implement. And moving issues is a feature with high gain anyway.

I’ve looked at them and we’ve also discussed this and other workarounds in the Fedora team dealing with Forgejo deployment. To me, the big downside of “separate repo + submission-only issues” is that it precludes interaction with reporters – and we do that regularly in such tickets (I just checked). Also, I feel I have a better idea of what needs to be done about private issues rather than moving issues between repositories.

In the Fedora Project, we will need this feature to be properly implemented. While we can implement a workaround for our current deployment. We will need properly implemented private issues for the purposes of tracking distribution artifacts.

Some feedback or design discussion is nicer than dealing with inconsistencies in code review...

@nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-7879814: > > Have you considered the workarounds I suggested in #2 (comment) for the needs of Fedora? I know that they are not perfect, but I think they are somewhat simpler to implement. And moving issues is a feature with high gain anyway. > > I’ve looked at them and we’ve also discussed this and other workarounds in the Fedora team dealing with Forgejo deployment. To me, the big downside of “separate repo + submission-only issues” is that it precludes interaction with reporters – and we do that regularly in such tickets (I just checked). Also, I feel I have a better idea of what needs to be done about private issues rather than moving issues between repositories. In the Fedora Project, we will need this feature to be properly implemented. While we can implement a workaround for our current deployment. We will need properly implemented private issues for the purposes of tracking distribution artifacts. Some feedback or design discussion is nicer than dealing with inconsistencies in code review...
Owner

@nilsph wrote in #2 (comment):

I’m a bit puzzled because my impression was that xorm maps one class/type to one table in the DB, which is why I derived PrivateIssue from Issue as a separate type

My comment was purely on the UI if they should be a separate 'unit', for the database you need two separate types. This is the correct approach. You might be able to make it a bit easier for yourself if you implement a equivalent of

// AsUser returns the org as user object
func (org *Organization) AsUser() *user_model.User {
return (*user_model.User)(org)
}

@nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-7858238: > I’m a bit puzzled because my impression was that xorm maps one class/type to one table in the DB, which is why I derived `PrivateIssue` from `Issue` as a separate type My comment was purely on the UI if they should be a separate 'unit', for the database you need two separate types. This is the correct approach. You might be able to make it a bit easier for yourself if you implement a equivalent of https://codeberg.org/forgejo/forgejo/src/commit/6b4d5966bf64012ae279621f26f4c2f5ae9ae97e/models/organization/org.go#L237-L240

@Gusted wrote in #2 (comment):

@nilsph wrote in #2 (comment):

I’m a bit puzzled because my impression was that xorm maps one class/type to one table in the DB, which is why I derived PrivateIssue from Issue as a separate type

My comment was purely on the UI if they should be a separate 'unit', for the database you need two separate types. This is the correct approach. You might be able to make it a bit easier for yourself if you implement a equivalent of …

Thanks for the pointer! Here are my understanding and thoughts:

  • Organization is a copy of the User struct, with func OrgFromUser() and Organization.AsUser() to cast back and forth. I assume, the latter is used when you need to perform user-like operations on an organization.
  • Both User and Organization “live” in the same DB table, storage/retrieval always happens using the User type, with the Type attribute as a discriminator.
  • We don’t want Issue and PrivateIssue to be stored in the same DB table – they should end up in issue and private_issue, respectively.
  • I’ve experimented with xorm dispatching into different tables via a Table() method, discriminating on an IsPrivate attribute. It seems to me that this makes accidentally accessing private issues difficult, e.g. when naively querying the issue table or using db.GetEngine(ctx).Find(…) with the Issue type.
  • The POC didn’t store IsPrivate into the DB, but I guess we should do it with Issue/PrivateIssue nonetheless, so loading a private issue doesn’t let someone end up with the attribute unset.
  • It seems as if the use of foreign keys in the DB isn’t widespread, I don’t have a clear idea yet if this would become harder by introducing a separate table for private issues. As it is, a comment just carries the issue id which would probably work as long as no foreign key is defined on that field – I guess we’d have to add a PrivateIssueID/PrivateIssue pair then?

Barring objections, I’ll try to come up with something fairly minimal, say a way to create a private issue through the API, first, so there is something more concrete to discuss.

@Gusted wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8005010: > @nilsph wrote in #2 (comment): > > > I’m a bit puzzled because my impression was that xorm maps one class/type to one table in the DB, which is why I derived `PrivateIssue` from `Issue` as a separate type > > My comment was purely on the UI if they should be a separate 'unit', for the database you need two separate types. This is the correct approach. You might be able to make it a bit easier for yourself if you implement a equivalent of … Thanks for the pointer! Here are my understanding and thoughts: - `Organization` is a copy of the `User` struct, with `func OrgFromUser()` and `Organization.AsUser()` to cast back and forth. I assume, the latter is used when you need to perform user-like operations on an organization. - Both `User` and `Organization` “live” in the same DB table, storage/retrieval always happens using the `User` type, with the `Type` attribute as a discriminator. - We don’t want `Issue` and `PrivateIssue` to be stored in the same DB table – they should end up in `issue` and `private_issue`, respectively. - [I’ve experimented](https://codeberg.org/nilsph/poc-xorm_map_type_two_tables) with xorm dispatching into different tables via a `Table()` method, discriminating on an `IsPrivate` attribute. It seems to me that this makes accidentally accessing private issues difficult, e.g. when naively querying the `issue` table or using `db.GetEngine(ctx).Find(…)` with the `Issue` type. - The POC didn’t store `IsPrivate` into the DB, but I guess we should do it with `Issue`/`PrivateIssue` nonetheless, so loading a private issue doesn’t let someone end up with the attribute unset. - It seems as if the use of foreign keys in the DB isn’t widespread, I don’t have a clear idea yet if this would become harder by introducing a separate table for private issues. As it is, a comment just carries the issue id which would probably work as long as no foreign key is defined on that field – I guess we’d have to add a `PrivateIssueID`/`PrivateIssue` pair then? Barring objections, I’ll try to come up with something fairly minimal, say a way to create a private issue through the API, first, so there is something more concrete to discuss.
Owner

@nilsph wrote in #2 (comment):

Organization is a copy of the User struct, with func OrgFromUser() and Organization.AsUser() to cast back and forth. I assume, the latter is used when you need to perform user-like operations on an organization.

Both User and Organization “live” in the same DB table, storage/retrieval always happens using the User type, with the Type attribute as a discriminator.

We don’t want Issue and PrivateIssue to be stored in the same DB table – they should end up in issue and private_issue, respectively.

Yes.

@nilsph wrote in #2 (comment):

I’ve experimented with xorm dispatching into different tables via a Table() method, discriminating on an IsPrivate attribute. It seems to me that this makes accidentally accessing private issues difficult, e.g. when naively querying the issue table or using db.GetEngine(ctx).Find(…) with the Issue type.

Ah this is quite interesting, I didn't consider such a approach with XORM and it looks quite nice. Depending on how the code looks in Forgejo this might be a acceptable compromise.

@nilsph wrote in #2 (comment):

  • It seems as if the use of foreign keys in the DB isn’t widespread, I don’t have a clear idea yet if this would become harder by introducing a separate table for private issues. As it is, a comment just carries the issue id which would probably work as long as no foreign key is defined on that field – I guess we’d have to add a PrivateIssueID/PrivateIssue pair then?

Yes-ish, maybe @mfenniak knows something about this?

@nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8064134: > `Organization` is a copy of the `User` struct, with `func OrgFromUser()` and `Organization.AsUser()` to cast back and forth. I assume, the latter is used when you need to perform user-like operations on an organization. > Both `User` and `Organization` “live” in the same DB table, storage/retrieval always happens using the `User` type, with the `Type` attribute as a discriminator. > We don’t want `Issue` and `PrivateIssue` to be stored in the same DB table – they should end up in `issue` and `private_issue`, respectively. Yes. @nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8064134: > [I’ve experimented](https://codeberg.org/nilsph/poc-xorm_map_type_two_tables) with xorm dispatching into different tables via a `Table()` method, discriminating on an `IsPrivate` attribute. It seems to me that this makes accidentally accessing private issues difficult, e.g. when naively querying the `issue` table or using `db.GetEngine(ctx).Find(…)` with the `Issue` type. Ah this is quite interesting, I didn't consider such a approach with XORM and it looks quite nice. Depending on how the code looks in Forgejo this might be a acceptable compromise. @nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8064134: > * It seems as if the use of foreign keys in the DB isn’t widespread, I don’t have a clear idea yet if this would become harder by introducing a separate table for private issues. As it is, a comment just carries the issue id which would probably work as long as no foreign key is defined on that field – I guess we’d have to add a `PrivateIssueID`/`PrivateIssue` pair then? Yes-ish, maybe @mfenniak knows something about this?
Owner

@Gusted wrote in #2 (comment):

Depending on how the code looks in Forgejo this might be a acceptable compromise.

I might say that the field should be IsPublic so that if this column, for whatever reason, is not loaded then it defaults to being private issues which avoids the risk of code bugs making private issues public.

@Gusted wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8184479: > Depending on how the code looks in Forgejo this might be a acceptable compromise. I might say that the field should be `IsPublic` so that if this column, for whatever reason, is not loaded then it defaults to being private issues which avoids the risk of code bugs making private issues public.
Member

@nilsph wrote in #2 (comment):

I’ve experimented with xorm dispatching into different tables via a Table() method, discriminating on an IsPrivate attribute. It seems to me that this makes accidentally accessing private issues difficult, e.g. when naively querying the issue table or using db.GetEngine(ctx).Find(…) with the Issue type.

I think this would be very messy. It's so common in the code today to use the default implementation of the struct as a way to refer to "the table" of a bean, and having different tables would have really unpredictable behaviour. If you had an []Issue passed it to Find, it would... probably?... operate on the issue table and not the private_issue table. If you used any of the Engine methods that take ...any, like for example Delete(), it would almost certainly only work on one table and the rest would be silent no-ops. The table would need to be registered twice with RegisterModel in order for both tables to appear in the tables list in models/db/engine.go... which might work OK but would be full of "unknowns".

I think it's clever, but I don't think it will function well.

@nilsph wrote in #2 (comment):

It seems as if the use of foreign keys in the DB isn’t widespread, I don’t have a clear idea yet if this would become harder by introducing a separate table for private issues. As it is, a comment just carries the issue id which would probably work as long as no foreign key is defined on that field – I guess we’d have to add a PrivateIssueID/PrivateIssue pair then?

Yes, I think having two fields IssueID and PrivateIssueID (plus the loaded models) would be the appropriate design for a foreign-key reference with the two table design.

There are 17 tables with foreign keys to issue (although most of those are "conceptually foreign keys", not actually implemented as an in-database FOREIGN KEY constraint yet). It kinda makes all of this pretty ugly.

I wonder if there might be value in a slightly hybrid design... like a pull request is an issue, has an record in the issue table, and also has a foreign key to a pull_request that contains additional data about the issue... a private issue could also be an issue but with a private_issue table that has additional data on it. The information in issue can be some kind of standard placeholder with no confidential information. This has a bit of a weakness because it opens up opportunities for metadata to leak about private issues... but I think the risks are worth weighing against the implementation benefits.

@nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8064134: > [I’ve experimented](https://codeberg.org/nilsph/poc-xorm_map_type_two_tables) with xorm dispatching into different tables via a `Table()` method, discriminating on an `IsPrivate` attribute. It seems to me that this makes accidentally accessing private issues difficult, e.g. when naively querying the `issue` table or using `db.GetEngine(ctx).Find(…)` with the `Issue` type. I think this would be very messy. It's so common in the code today to use the default implementation of the struct as a way to refer to "the table" of a bean, and having different tables would have really unpredictable behaviour. If you had an `[]Issue` passed it to `Find`, it would... probably?... operate on the `issue` table and not the `private_issue` table. If you used any of the `Engine` methods that take `...any`, like for example `Delete()`, it would almost certainly only work on one table and the rest would be silent no-ops. The table would need to be registered twice with `RegisterModel` in order for both tables to appear in the `tables` list in `models/db/engine.go`... which might work OK but would be full of "unknowns". I think it's clever, but I don't think it will function well. @nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8064134: > It seems as if the use of foreign keys in the DB isn’t widespread, I don’t have a clear idea yet if this would become harder by introducing a separate table for private issues. As it is, a comment just carries the issue id which would probably work as long as no foreign key is defined on that field – I guess we’d have to add a `PrivateIssueID`/`PrivateIssue` pair then? Yes, I think having two fields `IssueID` and `PrivateIssueID` (plus the loaded models) would be the appropriate design for a foreign-key reference with the two table design. There are 17 tables with foreign keys to `issue` (although most of those are "conceptually foreign keys", not actually implemented as an in-database `FOREIGN KEY` constraint yet). It kinda makes all of this pretty ugly. I wonder if there might be value in a slightly hybrid design... like a pull request is an `issue`, has an record in the `issue` table, and also has a foreign key to a `pull_request` that contains additional data about the issue... a private issue could also be an `issue` but with a `private_issue` table that has additional data on it. The information in `issue` can be some kind of standard placeholder with no confidential information. This has a bit of a weakness because it opens up opportunities for metadata to leak about private issues... but I think the risks are worth weighing against the implementation benefits.

@mfenniak wrote in #2 (comment):

@nilsph wrote in #2 (comment):

I’ve experimented with xorm dispatching into different tables via a Table() method, discriminating on an IsPrivate attribute. It seems to me that this makes accidentally accessing private issues difficult, e.g. when naively querying the issue table or using db.GetEngine(ctx).Find(…) with the Issue type.

I think this would be very messy. It's so common in the code today to use the default implementation of the struct as a way to refer to "the table" of a bean, and having different tables would have really unpredictable behaviour. If you had an []Issue passed it to Find, it would... probably?... operate on the issue table and not the private_issue table. If you used any of the Engine methods that take ...any, like for example Delete(), it would almost certainly only work on one table and the rest would be silent no-ops. The table would need to be registered twice with RegisterModel in order for both tables to appear in the tables list in models/db/engine.go... which might work OK but would be full of "unknowns".

I think it's clever, but I don't think it will function well.

I agree, interfacing with the database is way less messy with one type ⇔ one table. I wonder how to best combine that with the User/Organization pattern Gusted pointed me to above:

  • DB storage retrieval through the respective type (Issue and PrivateIssue for now, both should be and remain identical in internal structure)
  • Issue methods that don’t touch the DB wrapped through simple wrappers
  • Functions that get passed issues and don’t touch the DB… either could be taught to accept both(?) or might need a wrapper. Not sure.
  • Issue methods or functions involving issues that touch the DB might need to be reimplemented.

Makes sense?

@nilsph wrote in #2 (comment):

It seems as if the use of foreign keys in the DB isn’t widespread, I don’t have a clear idea yet if this would become harder by introducing a separate table for private issues. As it is, a comment just carries the issue id which would probably work as long as no foreign key is defined on that field – I guess we’d have to add a PrivateIssueID/PrivateIssue pair then?

Yes, I think having two fields IssueID and PrivateIssueID (plus the loaded models) would be the appropriate design for a foreign-key reference with the two table design.

There are 17 tables with foreign keys to issue (although most of those are "conceptually foreign keys", not actually implemented as an in-database FOREIGN KEY constraint yet). It kinda makes all of this pretty ugly.

Yes, it does. 🥳

I wonder if there might be value in a slightly hybrid design... like a pull request is an issue, has an record in the issue table, and also has a foreign key to a pull_request that contains additional data about the issue... a private issue could also be an issue but with a private_issue table that has additional data on it. The information in issue can be some kind of standard placeholder with no confidential information. This has a bit of a weakness because it opens up opportunities for metadata to leak about private issues... but I think the risks are worth weighing against the implementation benefits.

I wonder what data of a private issue would be considered non-sensitive, other than the ID maybe? Even that has a sensitive quality because if there are IDs I can’t access through public issues and PRs, I know there must be a private issue. Also, right now I can’t think of any additional data that a private issue would have over a public one (or vice versa).

I guess we could introduce a BaseIssue which would FK-reference issue (both issues and PRs for now) or PrivateIssue – this could be enforced via a DB-level constraint only I don’t know if xorm is any help there. Anything that references an issue (public or private) or PR could point to that instead of having separate columns itself. And it would allow untangling PRs from issues structurally later (if desired). I fear that the migration script would have to be somewhat messy though, especially because most FK relations aren’t explicit in the DB yet so we would have to be careful with potentially inconsistent entries.

@mfenniak wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8187389: > @nilsph wrote in #2 (comment): > > > [I’ve experimented](https://codeberg.org/nilsph/poc-xorm_map_type_two_tables) with xorm dispatching into different tables via a `Table()` method, discriminating on an `IsPrivate` attribute. It seems to me that this makes accidentally accessing private issues difficult, e.g. when naively querying the `issue` table or using `db.GetEngine(ctx).Find(…)` with the `Issue` type. > > I think this would be very messy. It's so common in the code today to use the default implementation of the struct as a way to refer to "the table" of a bean, and having different tables would have really unpredictable behaviour. If you had an `[]Issue` passed it to `Find`, it would... probably?... operate on the `issue` table and not the `private_issue` table. If you used any of the `Engine` methods that take `...any`, like for example `Delete()`, it would almost certainly only work on one table and the rest would be silent no-ops. The table would need to be registered twice with `RegisterModel` in order for both tables to appear in the `tables` list in `models/db/engine.go`... which might work OK but would be full of "unknowns". > > I think it's clever, but I don't think it will function well. I agree, interfacing with the database is way less messy with one type ⇔ one table. I wonder how to best combine that with the `User`/`Organization` pattern Gusted pointed me to above: - DB storage retrieval through the respective type (`Issue` and `PrivateIssue` for now, both should be and remain identical in internal structure) - `Issue` methods that don’t touch the DB wrapped through simple wrappers - Functions that get passed issues and don’t touch the DB… either could be taught to accept both(?) or might need a wrapper. Not sure. - `Issue` methods or functions involving issues that touch the DB might need to be reimplemented. Makes sense? > @nilsph wrote in #2 (comment): > > > It seems as if the use of foreign keys in the DB isn’t widespread, I don’t have a clear idea yet if this would become harder by introducing a separate table for private issues. As it is, a comment just carries the issue id which would probably work as long as no foreign key is defined on that field – I guess we’d have to add a `PrivateIssueID`/`PrivateIssue` pair then? > > Yes, I think having two fields `IssueID` and `PrivateIssueID` (plus the loaded models) would be the appropriate design for a foreign-key reference with the two table design. > > There are 17 tables with foreign keys to `issue` (although most of those are "conceptually foreign keys", not actually implemented as an in-database `FOREIGN KEY` constraint yet). It kinda makes all of this pretty ugly. Yes, it does. 🥳 > I wonder if there might be value in a slightly hybrid design... like a pull request is an `issue`, has an record in the `issue` table, and also has a foreign key to a `pull_request` that contains additional data about the issue... a private issue could also be an `issue` but with a `private_issue` table that has additional data on it. The information in `issue` can be some kind of standard placeholder with no confidential information. This has a bit of a weakness because it opens up opportunities for metadata to leak about private issues... but I think the risks are worth weighing against the implementation benefits. I wonder what data of a private issue would be considered non-sensitive, other than the ID maybe? Even that has a sensitive quality because if there are IDs I can’t access through public issues and PRs, I know there must be a private issue. Also, right now I can’t think of any additional data that a private issue would have over a public one (or vice versa). I guess we could introduce a `BaseIssue` which would FK-reference `issue` (both issues and PRs for now) or `PrivateIssue` – this could be enforced via a DB-level constraint only I don’t know if xorm is any help there. Anything that references an issue (public or private) or PR could point to that instead of having separate columns itself. And it would allow untangling PRs from issues structurally later (if desired). I fear that the migration script would have to be somewhat messy though, especially because most FK relations aren’t explicit in the DB yet so we would have to be careful with potentially inconsistent entries.
Member

@nilsph wrote in #2 (comment):

I wonder what data of a private issue would be considered non-sensitive, other than the ID maybe? Even that has a sensitive quality because if there are IDs I can’t access through public issues and PRs, I know there must be a private issue. Also, right now I can’t think of any additional data that a private issue would have over a public one (or vice versa).

I suppose we could use a documented threat model here about what we are trying to protect, and what we are OK with accepting risk on. If we decided that knowing about the existence of a private issue is not an acceptable risk, it would greatly impact the user experience that private issues will have -- (a) they couldn't use the same index that issues/PRs use, (b) that would imply that cross-referencing issues has to have a different format, and (c) it would imply that if there's a capability to convert a private issue to a public issue, then cross-references to the old private issue need to redirect to the new public issue later, adding a new database table.

The existence of a private issue, absent any other details, seems like an acceptable risk to me.

@nilsph wrote in #2 (comment):

There are 17 tables with foreign keys to issue (although most of those are "conceptually foreign keys", not actually implemented as an in-database FOREIGN KEY constraint yet). It kinda makes all of this pretty ugly.

Yes, it does. 🥳

There is an advantage to this, as well -- private issues are naturally going to have different capabilities. Each of these foreign keys is an opportunity to ask the question "do we want this feature for a private issue?" Here's the list that I'm aware of:

  • tracked_time
    • Aggregated time is available at different levels, like the total time spent on work in a repo. This may not make sense to expose/include for private issues.
  • stopwatch
    • If tracked_time doesn't make sense, neither does stopwatch.
  • pull_request
    • A very fundamental question! Are we building private issues, or private issues and pull requests? If pull requests, how is the code being protected?
  • attachment
    • Pretty fundamental to support for collaboration.
  • comment
    • Pretty fundamental to support for collaboration.
  • issue_assignees
    • Good to have, could be argued it isn't critical.
  • issue_content_history
    • This is the ability to view old versions of an issue description (and comment). Good to have, could be argued it isn't critical.
  • issue_dependency
    • Adds complexity in the concept of referencing issues->private-issues, and vice-versa, being potential sources of data leakage. I'd argue against it.
  • issue_label
    • Labels expose aggregate stats about the number of issues assigned to them. Labels would be useful for organization, but need to side-step this problem.
  • issue_user
    • Not actually sure all of the places this is used -- seems to be primarily for search indexing. So that's something to avoid supporting.
  • issue_watch
    • Used for email notifications on the issue. Personally wouldn't be able to use private issues without email notifications. :-)
  • notification
    • Used for the UI notifications screen. Personal opinion, that screen isn't very useful.
  • project_issue
    • Links between projects and issues, probably something to not support for private issues.
  • reaction
    • Emojis on an issue, could take or leave.
  • review
    • Code review data, essential if pull requests are supported.

I supported the idea of using a separate table for private issues, but as I'm documenting some of these things ^^^ I'm feeling like it's probably a bad idea.

I would propose an alternate technical direction:

  • Use in-database encryption (using keying) on fields that reasonably can be foreseen to contain sensitive information. Title, description, contents and filenames of attachments, contents of comments, contents of review diffs... etc.
  • In order to ensure that access control is enforced, design access to the data in the models through a mechanism that requires positive action to decrypt. This is a little vague because it will require work to define, but imagine that the Title field of an issue isn't a string, but it's a DecryptableString. A DecryptableString can't be used without being evaluated in the context of who is viewing the data with an access control check.
    • For example, it can't be copied into an email notification without a developer resolving the question of "who is trying to view this".
  • The UI design for a private issue should make it clear what fields are secure to enter data in, such as with a 🔒, so that the threat model designed here is being clearly communicated to end users.

(This is rather vague as a direction, but it's a concept.)

@nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8208302: > I wonder what data of a private issue would be considered non-sensitive, other than the ID maybe? Even that has a sensitive quality because if there are IDs I can’t access through public issues and PRs, I know there must be a private issue. Also, right now I can’t think of any additional data that a private issue would have over a public one (or vice versa). I suppose we could use a documented threat model here about what we are trying to protect, and what we are OK with accepting risk on. If we decided that knowing about the existence of a private issue is not an acceptable risk, it would greatly impact the user experience that private issues will have -- (a) they couldn't use the same index that issues/PRs use, (b) that would imply that cross-referencing issues has to have a different format, and (c) it would imply that if there's a capability to convert a private issue to a public issue, then cross-references to the old private issue need to redirect to the new public issue later, adding a new database table. The existence of a private issue, absent any other details, seems like an acceptable risk to me. @nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8208302: > > There are 17 tables with foreign keys to `issue` (although most of those are "conceptually foreign keys", not actually implemented as an in-database `FOREIGN KEY` constraint yet). It kinda makes all of this pretty ugly. > > Yes, it does. :partying_face: There is an advantage to this, as well -- private issues are naturally going to have different capabilities. Each of these foreign keys is an opportunity to ask the question "do we want this feature for a private issue?" Here's the list that I'm aware of: - tracked_time - Aggregated time is available at different levels, like the total time spent on work in a repo. This may not make sense to expose/include for private issues. - stopwatch - If tracked_time doesn't make sense, neither does stopwatch. - pull_request - A very fundamental question! Are we building private issues, or private issues and pull requests? If pull requests, how is the *code* being protected? - attachment - Pretty fundamental to support for collaboration. - comment - Pretty fundamental to support for collaboration. - issue_assignees - Good to have, could be argued it isn't critical. - issue_content_history - This is the ability to view old versions of an issue description (and comment). Good to have, could be argued it isn't critical. - issue_dependency - Adds complexity in the concept of referencing issues->private-issues, and vice-versa, being potential sources of data leakage. I'd argue against it. - issue_label - Labels expose aggregate stats about the number of issues assigned to them. Labels would be useful for organization, but need to side-step this problem. - issue_user - Not actually sure all of the places this is used -- seems to be primarily for search indexing. So that's something to avoid supporting. - issue_watch - Used for email notifications on the issue. Personally wouldn't be able to use private issues without email notifications. :-) - notification - Used for the UI notifications screen. Personal opinion, that screen isn't very useful. - project_issue - Links between projects and issues, probably something to not support for private issues. - reaction - Emojis on an issue, could take or leave. - review - Code review data, essential if pull requests are supported. --- I supported the idea of using a separate table for private issues, but as I'm documenting some of these things ^^^ I'm feeling like it's probably a bad idea. I would propose an alternate technical direction: - Use in-database encryption (using `keying`) on fields that reasonably can be foreseen to contain sensitive information. Title, description, contents and filenames of attachments, contents of comments, contents of review diffs... etc. - In order to ensure that access control is enforced, design access to the data in the models through a mechanism that requires positive action to decrypt. This is a little vague because it will require work to define, but imagine that the `Title` field of an issue isn't a `string`, but it's a `DecryptableString`. A `DecryptableString` can't be used without being evaluated in the context of who is viewing the data with an access control check. - For example, it can't be copied into an email notification without a developer resolving the question of "who is trying to view this". - The UI design for a private issue should make it clear what fields are secure to enter data in, such as with a 🔒, so that the threat model designed here is being clearly communicated to end users. (This is rather vague as a direction, but it's a concept.)
Member

A key question coming out of that last comment, just to highlight since it's buried in the details. 🤣

Is the intent to support private pull requests as well? And if so, which would be a great feature, it's a good note that considerable work will need to be done to figure out how to hide the code.

A key question coming out of that last comment, just to highlight since it's buried in the details. 🤣 Is the intent to support private pull requests as well? And if so, which would be a great feature, it's a good note that considerable work will need to be done to figure out how to hide the code.
Author
Owner

I think that private pull requests are not on the immediate roadmap. If the code doesn't get too complex, I would appreciate if the approach could be flexible enough to be expanded to private PRs in some far future. However, I feel like if this is not trivial, we shouldn't prematurely build a complex architecture for an eventual feature.

I think that private pull requests are not on the immediate roadmap. If the code doesn't get too complex, I would appreciate if the approach could be flexible enough to be expanded to private PRs in some far future. However, I feel like if this is not trivial, we shouldn't prematurely build a complex architecture for an eventual feature.
Owner

@mfenniak wrote in #2 (comment):

Is the intent to support private pull requests as well? And if so, which would be a great feature, it's a good note that considerable work will need to be done to figure out how to hide the code.

I would be skeptical of this, git themselves says that once you put some commits into a public repository there will be several ways to get them despite whatever access control you try to put up. You would need a separate repository (or git objects directory) at minimum.

@mfenniak wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8209304: > Is the intent to support private pull requests as well? And if so, which would be a great feature, it's a good note that considerable work will need to be done to figure out how to hide the code. I would be skeptical of this, [git themselves says](https://git-scm.com/docs/gitnamespaces#_security) that once you put some commits into a public repository there will be several ways to get them despite whatever access control you try to put up. You would need a separate repository (or git objects directory) at minimum.

@Gusted wrote in #2 (comment):

@Gusted wrote in #2 (comment):

Depending on how the code looks in Forgejo this might be a acceptable compromise.

I might say that the field should be IsPublic so that if this column, for whatever reason, is not loaded then it defaults to being private issues which avoids the risk of code bugs making private issues public.

Taking @mfenniak’s comments into account, i.e. having a more defined separation between public and private issues (ignoring whether or not we tidy up the DB structures while we’re at it) than I suggested above, I wonder if we should even persist the attribute.

BTW, this is one place where I really miss the ability to set a default value for a struct field that isn’t the null value of the type of the field – it would allow us to prevent anybody from creating an Issue with isPublic == false or a PrivateIssue with isPublic == true, which might give us a way to prevent one or the other being stored in the wrong place even if variables are cast back and forth (which as I understand it has to happen in some places unless we want to duplicate the code dealing with issues, which I really don’t want to).

@Gusted wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8184527: > @Gusted wrote in #2 (comment): > > > Depending on how the code looks in Forgejo this might be a acceptable compromise. > > I might say that the field should be `IsPublic` so that if this column, for whatever reason, is not loaded then it defaults to being private issues which avoids the risk of code bugs making private issues public. Taking @mfenniak’s comments into account, i.e. having a more defined separation between public and private issues (ignoring whether or not we tidy up the DB structures while we’re at it) than I suggested above, I wonder if we should even persist the attribute. BTW, this is one place where I really miss the ability to set a default value for a struct field that isn’t the null value of the type of the field – it would allow us to prevent anybody from creating an `Issue` with `isPublic == false` or a `PrivateIssue` with `isPublic == true`, which might give us a way to prevent one or the other being stored in the wrong place even if variables are cast back and forth (which as I understand it has to happen in some places unless we want to duplicate the code dealing with issues, which I really don’t want to).

@mfenniak wrote in #2 (comment):

The existence of a private issue, absent any other details, seems like an acceptable risk to me.

Seconded, I just wanted to mention it because this has come up in discussions within the team working on Forgejo in Fedora infrastructure.

@nilsph wrote in #2 (comment):

There are 17 tables with foreign keys to issue (although most of those are "conceptually foreign keys", not actually implemented as an in-database FOREIGN KEY constraint yet). It kinda makes all of this pretty ugly.

Yes, it does. 🥳

There is an advantage to this, as well -- private issues are naturally going to have different capabilities. Each of these foreign keys is an opportunity to ask the question "do we want this feature for a private issue?" Here's the list that I'm aware of:

[… snipped list of issue features …]

I think for many of these, the question "how hard is it to do it right?" and "does it make sense to have for private issues from a user’s perspective?" have different answers. I would argue the latter can be answered in the positive for most (all?) of these items:

  • If a user cares about say time tracking or labels, they don’t care if it’s about a public or private issue. Off the cuff, I would say we can ignore aggregating data for such fields at first (and document that), but the field per se makes sense from a UX point of view.
  • Private pull requests are hard to do right for the reasons already outlined, but if we had them, they would probably be used to implement whatever it is that’s asked in a private issue. For now, I’d just not touch IsPull/PullRequest in a PrivateIssue.

I supported the idea of using a separate table for private issues, but as I'm documenting some of these things ^^^ I'm feeling like it's probably a bad idea.

I mean, it’s probably not super clean, with fields we decide to ignore for private issues for the moment, but it should be effective for the intended purpose: preventing naive DB queries from leaking private issues.

I would propose an alternate technical direction:

  • Use in-database encryption (using keying) on fields that reasonably can be foreseen to contain sensitive information. Title, description, contents and filenames of attachments, contents of comments, contents of review diffs... etc.

  • In order to ensure that access control is enforced, design access to the data in the models through a mechanism that requires positive action to decrypt. This is a little vague because it will require work to define, but imagine that the Title field of an issue isn't a string, but it's a DecryptableString. A DecryptableString can't be used without being evaluated in the context of who is viewing the data with an access control check.

    • For example, it can't be copied into an email notification without a developer resolving the question of "who is trying to view this".
  • The UI design for a private issue should make it clear what fields are secure to enter data in, such as with a 🔒, so that the threat model designed here is being clearly communicated to end users.

(This is rather vague as a direction, but it's a concept.)

It sounds rather like a more substantial piece of work than two tables, that’s for sure. 😉

I’m not familiar with using encryption in database fields and I don’t understand what you mean by “keying” here, do you have any pointers? I imagine this would make searches difficult and wouldn’t play well with indexes.

@mfenniak wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8209265: > The existence of a private issue, absent any other details, seems like an acceptable risk to me. Seconded, I just wanted to mention it because this has come up in discussions within the team working on Forgejo in Fedora infrastructure. > @nilsph wrote in #2 (comment): > > > > There are 17 tables with foreign keys to `issue` (although most of those are "conceptually foreign keys", not actually implemented as an in-database `FOREIGN KEY` constraint yet). It kinda makes all of this pretty ugly. > > > > > > Yes, it does. :partying_face: > > There is an advantage to this, as well -- private issues are naturally going to have different capabilities. Each of these foreign keys is an opportunity to ask the question "do we want this feature for a private issue?" Here's the list that I'm aware of: > > [… snipped list of issue features …] I think for many of these, the question "how hard is it to do it right?" and "does it make sense to have for private issues from a user’s perspective?" have different answers. I would argue the latter can be answered in the positive for most (all?) of these items: - If a user cares about say time tracking or labels, they don’t care if it’s about a public or private issue. Off the cuff, I would say we can ignore aggregating data for such fields at first (and document that), but the field per se makes sense from a UX point of view. - Private pull requests are hard to do right for the reasons already outlined, but if we had them, they would probably be used to implement whatever it is that’s asked in a private issue. For now, I’d just not touch `IsPull`/`PullRequest` in a `PrivateIssue`. > I supported the idea of using a separate table for private issues, but as I'm documenting some of these things ^^^ I'm feeling like it's probably a bad idea. I mean, it’s probably not super clean, with fields we decide to ignore for private issues for the moment, but it should be effective for the intended purpose: preventing naive DB queries from leaking private issues. > I would propose an alternate technical direction: > > * Use in-database encryption (using `keying`) on fields that reasonably can be foreseen to contain sensitive information. Title, description, contents and filenames of attachments, contents of comments, contents of review diffs... etc. > > * In order to ensure that access control is enforced, design access to the data in the models through a mechanism that requires positive action to decrypt. This is a little vague because it will require work to define, but imagine that the `Title` field of an issue isn't a `string`, but it's a `DecryptableString`. A `DecryptableString` can't be used without being evaluated in the context of who is viewing the data with an access control check. > > * For example, it can't be copied into an email notification without a developer resolving the question of "who is trying to view this". > > * The UI design for a private issue should make it clear what fields are secure to enter data in, such as with a :lock:, so that the threat model designed here is being clearly communicated to end users. > > > (This is rather vague as a direction, but it's a concept.) It sounds rather like a more substantial piece of work than two tables, that’s for sure. 😉 I’m not familiar with using encryption in database fields and I don’t understand what you mean by “keying” here, do you have any pointers? I imagine this would make searches difficult and wouldn’t play well with indexes.
Member

@nilsph wrote in #2 (comment):

I’m not familiar with using encryption in database fields and I don’t understand what you mean by “keying” here, do you have any pointers?

It's a module in Forgejo that encrypts data using Forgejo's configured secret key + context-specific derived keys. Here's an example of it's usage:

// SetSecret sets the 2FA secret.
func (t *TwoFactor) SetSecret(secretString string) {
key := keying.DeriveKey(keying.ContextTOTP)
t.Secret = key.Encrypt([]byte(secretString), keying.ColumnAndID("secret", t.ID))
}
// ValidateTOTP validates the provided passcode.
func (t *TwoFactor) ValidateTOTP(passcode string) (bool, error) {
key := keying.DeriveKey(keying.ContextTOTP)
secret, err := key.Decrypt(t.Secret, keying.ColumnAndID("secret", t.ID))
if err != nil {
return false, err
}
return totp.Validate(passcode, string(secret)), nil
}

I imagine this would make searches difficult and wouldn’t play well with indexes.

Yes, this is a good point. You would naturally be limited to not search, sort, or join based upon encrypted fields. It would be important to define exactly what data is expected to be secured in order to see if this pathway is feasible.

@nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8209778: > I’m not familiar with using encryption in database fields and I don’t understand what you mean by “keying” here, do you have any pointers? It's a module in Forgejo that encrypts data using Forgejo's configured secret key + context-specific derived keys. Here's an example of it's usage: https://codeberg.org/forgejo/forgejo/src/commit/5b77ddb050e5b6c29d627eb9b270625975802e85/models/auth/twofactor.go#L88-L102 > I imagine this would make searches difficult and wouldn’t play well with indexes. Yes, this is a good point. You would naturally be limited to not search, sort, or join based upon encrypted fields. It would be important to define exactly what data is expected to be secured in order to see if this pathway is feasible.

Meanwhile I got to a point where (in my fork/branch, caveat spectator) I can create a private issue through the REST API, have it end it up in the right place, return a sensible result (with an asterisk*) from the API and not log any problems. Here are things I discovered in the process and/or didn’t think about before:

  • I’ve found myself having to set IsPublic in quite some places that already deal with normal/public issues so that they keep on doing the right thing. I have a feeling it might – counterintuitively and going against what @Gusted advised above ­– be safer to have an IsPrivate field instead (that wouldn’t even have to be persisted in the database). I think we can ensure it to have the right value for private issues at only a few boundaries: 1) when converting the PrivateIssue to an Issue and 2) when loading it from the DB. I’ve learned about XORM events in the meantime which look like a good solution for the latter.

  • There is a lot of code dealing with issues which uses the id primary key of an issue instead of an Issue, which makes private issues in a different table “interesting”, as suddenly there can be two issues with the same id. I’m a bit puzzled why these functions don’t use the "index", i.e. the user-visible ID of an issue which is unique between issues and pull requests already. I can see some ways of addressing this:

    • Always pass on info if the issue is public or private.
    • Do so, but encapsulate in say an IssueID struct which carries this information.
    • Introduce <SomeOperation>Opts structs to avoid proliferation of function parameters – merely adding IsPublic (or IsPrivate) makes some function calls pretty unwieldy.
    • Always use (as the case may be, incompletely filled) Issue variables instead (which carry this info).
  • (*) Here’s the asterisk from above:

    • I get why id fields are used for referencing other DB tables, but why are they exposed through the API? I wondered if the API should return a private_id for private issues, or having is_public or is_private is enough to distinguish. Personally – and realizing I’m probably not familiar enough with the code to be a good judge here – I wouldn’t expose DB primary keys at all, only the user-visible Issue/PR index(/number) …
    • … which is confusing on its own: why is it number (in the API) vs. Index (in the code) in the first place?

What are your thoughts?

Meanwhile I got to a point where (in my [fork/branch](https://codeberg.org/nilsph/forgejo/src/branch/forgejo--private-issues), caveat spectator) I can create a private issue through the REST API, have it end it up in the right place, return a sensible result (with an asterisk\*) from the API and not log any problems. Here are things I discovered in the process and/or didn’t think about before: * I’ve found myself having to set `IsPublic` in quite some places that already deal with normal/public issues so that they keep on doing the right thing. I have a feeling it might – counterintuitively and going against [what @Gusted advised above](https://codeberg.org/forgejo/design/issues/2#issuecomment-8184527) ­– be safer to have an `IsPrivate` field instead (that wouldn’t even have to be persisted in the database). I think we can ensure it to have the right value for private issues at only a few boundaries: 1) when converting the `PrivateIssue` to an `Issue` and 2) when loading it from the DB. I’ve learned about XORM events in the meantime which look like a good solution for the latter. * There is a lot of code dealing with issues which uses the `id` primary key of an issue instead of an `Issue`, which makes private issues in a different table “interesting”, as suddenly there can be two issues with the same `id`. I’m a bit puzzled why these functions don’t use the "index", i.e. the user-visible ID of an issue which is unique between issues and pull requests already. I can see some ways of addressing this: * Always pass on info if the issue is public or private. * Do so, but encapsulate in say an `IssueID` struct which carries this information. * Introduce `<SomeOperation>Opts` structs to avoid proliferation of function parameters – merely adding `IsPublic` (or `IsPrivate`) makes some function calls pretty unwieldy. * Always use (as the case may be, incompletely filled) `Issue` variables instead (which carry this info). * (\*) Here’s the asterisk from above: * I get why `id` fields are used for referencing other DB tables, but why are they exposed through the API? I wondered if the API should return a `private_id` for private issues, or having `is_public` or `is_private` is enough to distinguish. Personally – and realizing I’m probably not familiar enough with the code to be a good judge here – I wouldn’t expose DB primary keys at all, only the user-visible Issue/PR index(/number) … * … which is confusing on its own: why is it `number` (in the API) vs. `Index` (in the code) in the first place? What are your thoughts?
Owner

@nilsph wrote in #2 (comment):

  • I’m a bit puzzled why these functions don’t use the "index", i.e. the user-visible ID of an issue which is unique between issues and pull requests already

I can take a guess, without looking at the code, not every operation operates in the context of a repository. The index is useful if you pass/know the repository being worked on this is not always the case to what I remember.

@nilsph wrote in #2 (comment):

  • Always use (as the case may be, incompletely filled) Issue variables instead (which carry this info).

I'm not opposed to this, this would be quite good as it makes functions harder to misuse I'm generally in favor for. The only concern I could raise is performance as pointers are being passed around but this is already commonly done (so most stuff should already be on the heap) and we are not adding more arguments (iirc for x86_64 the limit for passing arguments via registers is 5 in Go) so it should be fine.

@nilsph wrote in #2 (comment):

  • … which is confusing on its own: why is it number (in the API) vs. Index (in the code) in the first place?

Github API compatibility (we inherited this).

@nilsph wrote in #2 (comment):

  • I think we can ensure it to have the right value for private issues at only a few boundaries: 1) when converting the PrivateIssue to an Issue and 2) when loading it from the DB. I’ve learned about XORM events in the meantime which look like a good solution for the latter.

Feel free to go with this approach for now, I trust the first boundary (as it's bounded by Go's type safety). Relying on XORM, I would likely want to read the exact code for this myself as XORM has some funny undocumented behaviors that aren't yet in Forgejo's folklore knowledge and I hate if that broke the safety of private issues. So my only recommendation for this part is: unit tests.

@nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8376363: > * I’m a bit puzzled why these functions don’t use the "index", i.e. the user-visible ID of an issue which is unique between issues and pull requests already I can take a guess, without looking at the code, not every operation operates in the context of a repository. The index is useful if you pass/know the repository being worked on this is not always the case to what I remember. @nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8376363: > * Always use (as the case may be, incompletely filled) `Issue` variables instead (which carry this info). I'm not opposed to this, this would be quite good as it makes functions harder to misuse I'm generally in favor for. The only concern I could raise is performance as pointers are being passed around but this is already commonly done (so most stuff should already be on the heap) and we are not adding more arguments (iirc for x86_64 the limit for passing arguments via registers is 5 in Go) so it should be fine. @nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8376363: > * … which is confusing on its own: why is it `number` (in the API) vs. `Index` (in the code) in the first place? Github API compatibility (we inherited this). @nilsph wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8376363: > * I think we can ensure it to have the right value for private issues at only a few boundaries: 1) when converting the `PrivateIssue` to an `Issue` and 2) when loading it from the DB. I’ve learned about XORM events in the meantime which look like a good solution for the latter. Feel free to go with this approach for now, I trust the first boundary (as it's bounded by Go's type safety). Relying on XORM, I would likely want to read the exact code for this myself as XORM has some funny undocumented behaviors that aren't yet in Forgejo's folklore knowledge and I hate if that broke the safety of private issues. So my only recommendation for this part is: unit tests.

Thanks for your comments!

@Gusted wrote in #2 (comment):

@nilsph wrote in #2 (comment):

  • I think we can ensure it to have the right value for private issues at only a few boundaries: 1) when converting the PrivateIssue to an Issue and 2) when loading it from the DB. I’ve learned about XORM events in the meantime which look like a good solution for the latter.

Feel free to go with this approach for now, I trust the first boundary (as it's bounded by Go's type safety). Relying on XORM, I would likely want to read the exact code for this myself as XORM has some funny undocumented behaviors that aren't yet in Forgejo's folklore knowledge and I hate if that broke the safety of private issues. So my only recommendation for this part is: unit tests.

I want to be able to sleep at night, too – rest assured that I’ll go all out on unit tests before submitting changes in earnest, i.e. before any of this makes it out of my private fork/branch or, at the outmost, a WIP pull request.

Thanks for your comments! @Gusted wrote in https://codeberg.org/forgejo/design/issues/2#issuecomment-8405379: > @nilsph wrote in #2 (comment): > > > * I think we can ensure it to have the right value for private issues at only a few boundaries: 1) when converting the `PrivateIssue` to an `Issue` and 2) when loading it from the DB. I’ve learned about XORM events in the meantime which look like a good solution for the latter. > > Feel free to go with this approach for now, I trust the first boundary (as it's bounded by Go's type safety). Relying on XORM, I would likely want to read the exact code for this myself as XORM has some funny undocumented behaviors that aren't yet in Forgejo's folklore knowledge and I hate if that broke the safety of private issues. So my only recommendation for this part is: unit tests. I want to be able to sleep at night, too – rest assured that I’ll go all out on unit tests before submitting changes in earnest, i.e. before any of this makes it out of my private fork/branch or, at the outmost, a WIP pull request.
Sign in to join this conversation.
No milestone
No project
No assignees
8 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
forgejo/design#2
No description provided.