Private issues #2
Labels
No labels
User research - Accessibility
User research - Blocked
User research - Community
User research - Config (instance)
User research - Errors
User research - Filters
User research - Future backlog
User research - Git workflow
User research - Labels
User research - Moderation
User research - Needs input
User research - Notifications/Dashboard
User research - Rendering
User research - Repo creation
User research - Repo units
User research - Security
User research - Settings (in-app)
No milestone
No project
No assignees
8 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
forgejo/design#2
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Content
I'm starting work for a feature that allows private reporting of issues in Forgejo (mainly to demonstrate the workflow I have in mind), starting with forgejo/forgejo#142
work has started here: https://codeberg.org/forgejo/design/src/branch/private-issues
@algernon and / or @gusted, can I have your input on potential solutions for a sane separation of possible issues?
I was wondering if labels and issue templates could be (ab-)used for the job.
Issue templates already allow categorizing issue types and assigning labels.
If we can take this one step further, what about building workflows like this:
It sounds like a rather flexible solution where many building blogs are already present.
New database table, I've had numerous occasions since https://github.com/go-gitea/gitea/pull/17711 (please take no inspiration from the actual code, it's horrible) to pounder on some mechanism that would make me feel confident to have such feature (as well as seeing code I would've missed changing in such PR and would've added ten new CVEs to Forgejo), but if anything re-using the
issuetable would be a huge mistake, because it was never made with in mind that some issues need maximal protection for leakage, also it would require mental notes during PR reviews to ensure if someone do SQL with the issue table to ensure they exclude private issues.Doing so means that for every time you want to have private issues integrated in some way or form (I assume you would want to see this integrated into the issue tab?) you have to take special care to integrate private issues, thus you do it explicitly which makes reviewing that the code is safe and sound way easier.
It's a boring solution, which is the best solution.
I see. Thank you for the input. This makes a lot of sense.
Hi, is there any process on confidential issues? I am currently looking for a workaround for that.
Currently, if a user posted an issue on a private repo before removed access by admin, he will not be able to see the issue posted by him.
If we can make issue poster to view their issue regardless of the permission settings of the repo, we can achieve this functionality. Just create a private shadow repo of a public repo, and use api to post an issue in the shadow repo on behalf of the user, they will only be able to see the issue and not anything else. Also they can find their created issues in the dashboard, which seems to be a solution without the need to modify database/indexer schema.
That sounds like a rather complex workaround to me with questionable gain. For example, if someone is part of a team and creates an issue, but is then removed from the team, it would likely be unexpected for others that they still have access to some issues and ongoing (private) conversation because they created the issue.
However, I have an idea that might allow to provide a usable alternative without the complexity of a full-fledged private issue implementation:
This combination would allow to cover at least some use cases like secure reporting of issues:
It would not address the use case of having individual permissions per issue, though.
Just to chime in: excited that this is already being worked on! 😍 It's the current only thing stopping us from moving the remaining repos of our org over from GitLab…
They'll still have access to them, yes (as to all issues they've opened). That's how GitLab deals with that as well.
If there are concerns about that (and I understand there might be cases concern even should be there), maybe a "disown" (aka "ghosting") might be a viable approach (of course only available to repo owners and those having be granted the privilege to do so)? This approach would also cause less complication if a "mixed" repo (containing both, confidential and public issues) is migrated from e.g. GitLab.
Moving issues between repos is of course also something welcome, even to other use-cases (e.g. issues filed in the wrong repo of an organization).
@Gusted wrote in #2 (comment):
A little query about this one, (with the seperate DB table) do you envisage private issues being a separate class altogether, i.e. issues, PRs, and Private issues? with a separate UI?
Or they act like issues in every way in the UI, but in a separate DB table on the back-end? And if this case, the issue jumps between the tables if it is marked or unmarked as private?
@ryanlerch wrote in #2 (comment):
I have no opinion on that, either could work. Separate DB tables is purely from a security standpoint to ensure whatever is designed is secure-by-design and easy to review that it doesn't leak private issues.
I think they should try to behave like normal issues as much as possible. However, if it gets easier to implement if certain things are changed, this might also be fine.
Private pull requests are probably also out of scope, so all pull-request specific code would not have to work with private issues at first.
Hello! Quick introduction, I’m working on private issues on the Fedora end, concentrating at first on the use case where registered users report a private issue via the tracker. I’m new to the code of Forgejo obviously and still have to get my swimming feet in a golang/nodejs environment – I’m coming from Python and while I’m not new to compiled, closer-to-the-machine languages, it’s also been a while and C is a different beast. Because of this background, don’t hesitate to point out if I make what looks like rookie assumptions or mistakes.
I hope this is the right place to discuss implementation, but I’m happy to move it somewhere else.
Anyway, I’ve tried to get my bearings on the codebase and looked at the various discussion threads to get an idea how to approach this. My attempts so far started with a copy of the
Issuetype and an interface to allow functions to deal with either, see my private issues branch for how far I got 😉. Re-reading what @Gusted wrote in #2 (comment), I’m a bit puzzled because my impression was that xorm maps one class/type to one table in the DB, which is why I derivedPrivateIssuefromIssueas a separate type. I think I’d prefer this assumption of mine to be wrong, having one type cater for both would let me get rid of much of the wrapper and boiler plate code in my first shot (I guess). If someone could clear this up to me, that’d be great!@nilsph Yes, you have come to the right place. Given the code around this area is rather sensitive and involved, it might make sense for you to also follow our general suggestion for new contributors to start fixing bugs. Maybe you can find some things that allow you to get familiar to the current design of the issue tracker.
Have you considered the workarounds I suggested in #2 (comment) for the needs of Fedora? I know that they are not perfect, but I think they are somewhat simpler to implement. And moving issues is a feature with high gain anyway.
@fnetX wrote in #2 (comment):
This good advice. I’ll take a look at the “good first issues” and see if I can find something suitable in the area of issue tracking to get my feet wet.
I’ve looked at them and we’ve also discussed this and other workarounds in the Fedora team dealing with Forgejo deployment. To me, the big downside of “separate repo + submission-only issues” is that it precludes interaction with reporters – and we do that regularly in such tickets (I just checked). Also, I feel I have a better idea of what needs to be done about private issues rather than moving issues between repositories.
@nilsph wrote in #2 (comment):
In the Fedora Project, we will need this feature to be properly implemented. While we can implement a workaround for our current deployment. We will need properly implemented private issues for the purposes of tracking distribution artifacts.
Some feedback or design discussion is nicer than dealing with inconsistencies in code review...
@nilsph wrote in #2 (comment):
My comment was purely on the UI if they should be a separate 'unit', for the database you need two separate types. This is the correct approach. You might be able to make it a bit easier for yourself if you implement a equivalent of
// AsUser returns the org as user objectfunc (org *Organization) AsUser() *user_model.User {return (*user_model.User)(org)}@Gusted wrote in #2 (comment):
Thanks for the pointer! Here are my understanding and thoughts:
Organizationis a copy of theUserstruct, withfunc OrgFromUser()andOrganization.AsUser()to cast back and forth. I assume, the latter is used when you need to perform user-like operations on an organization.UserandOrganization“live” in the same DB table, storage/retrieval always happens using theUsertype, with theTypeattribute as a discriminator.IssueandPrivateIssueto be stored in the same DB table – they should end up inissueandprivate_issue, respectively.Table()method, discriminating on anIsPrivateattribute. It seems to me that this makes accidentally accessing private issues difficult, e.g. when naively querying theissuetable or usingdb.GetEngine(ctx).Find(…)with theIssuetype.IsPrivateinto the DB, but I guess we should do it withIssue/PrivateIssuenonetheless, so loading a private issue doesn’t let someone end up with the attribute unset.PrivateIssueID/PrivateIssuepair then?Barring objections, I’ll try to come up with something fairly minimal, say a way to create a private issue through the API, first, so there is something more concrete to discuss.
@nilsph wrote in #2 (comment):
Yes.
@nilsph wrote in #2 (comment):
Ah this is quite interesting, I didn't consider such a approach with XORM and it looks quite nice. Depending on how the code looks in Forgejo this might be a acceptable compromise.
@nilsph wrote in #2 (comment):
Yes-ish, maybe @mfenniak knows something about this?
@Gusted wrote in #2 (comment):
I might say that the field should be
IsPublicso that if this column, for whatever reason, is not loaded then it defaults to being private issues which avoids the risk of code bugs making private issues public.@nilsph wrote in #2 (comment):
I think this would be very messy. It's so common in the code today to use the default implementation of the struct as a way to refer to "the table" of a bean, and having different tables would have really unpredictable behaviour. If you had an
[]Issuepassed it toFind, it would... probably?... operate on theissuetable and not theprivate_issuetable. If you used any of theEnginemethods that take...any, like for exampleDelete(), it would almost certainly only work on one table and the rest would be silent no-ops. The table would need to be registered twice withRegisterModelin order for both tables to appear in thetableslist inmodels/db/engine.go... which might work OK but would be full of "unknowns".I think it's clever, but I don't think it will function well.
@nilsph wrote in #2 (comment):
Yes, I think having two fields
IssueIDandPrivateIssueID(plus the loaded models) would be the appropriate design for a foreign-key reference with the two table design.There are 17 tables with foreign keys to
issue(although most of those are "conceptually foreign keys", not actually implemented as an in-databaseFOREIGN KEYconstraint yet). It kinda makes all of this pretty ugly.I wonder if there might be value in a slightly hybrid design... like a pull request is an
issue, has an record in theissuetable, and also has a foreign key to apull_requestthat contains additional data about the issue... a private issue could also be anissuebut with aprivate_issuetable that has additional data on it. The information inissuecan be some kind of standard placeholder with no confidential information. This has a bit of a weakness because it opens up opportunities for metadata to leak about private issues... but I think the risks are worth weighing against the implementation benefits.@mfenniak wrote in #2 (comment):
I agree, interfacing with the database is way less messy with one type ⇔ one table. I wonder how to best combine that with the
User/Organizationpattern Gusted pointed me to above:IssueandPrivateIssuefor now, both should be and remain identical in internal structure)Issuemethods that don’t touch the DB wrapped through simple wrappersIssuemethods or functions involving issues that touch the DB might need to be reimplemented.Makes sense?
Yes, it does. 🥳
I wonder what data of a private issue would be considered non-sensitive, other than the ID maybe? Even that has a sensitive quality because if there are IDs I can’t access through public issues and PRs, I know there must be a private issue. Also, right now I can’t think of any additional data that a private issue would have over a public one (or vice versa).
I guess we could introduce a
BaseIssuewhich would FK-referenceissue(both issues and PRs for now) orPrivateIssue– this could be enforced via a DB-level constraint only I don’t know if xorm is any help there. Anything that references an issue (public or private) or PR could point to that instead of having separate columns itself. And it would allow untangling PRs from issues structurally later (if desired). I fear that the migration script would have to be somewhat messy though, especially because most FK relations aren’t explicit in the DB yet so we would have to be careful with potentially inconsistent entries.@nilsph wrote in #2 (comment):
I suppose we could use a documented threat model here about what we are trying to protect, and what we are OK with accepting risk on. If we decided that knowing about the existence of a private issue is not an acceptable risk, it would greatly impact the user experience that private issues will have -- (a) they couldn't use the same index that issues/PRs use, (b) that would imply that cross-referencing issues has to have a different format, and (c) it would imply that if there's a capability to convert a private issue to a public issue, then cross-references to the old private issue need to redirect to the new public issue later, adding a new database table.
The existence of a private issue, absent any other details, seems like an acceptable risk to me.
@nilsph wrote in #2 (comment):
There is an advantage to this, as well -- private issues are naturally going to have different capabilities. Each of these foreign keys is an opportunity to ask the question "do we want this feature for a private issue?" Here's the list that I'm aware of:
I supported the idea of using a separate table for private issues, but as I'm documenting some of these things ^^^ I'm feeling like it's probably a bad idea.
I would propose an alternate technical direction:
keying) on fields that reasonably can be foreseen to contain sensitive information. Title, description, contents and filenames of attachments, contents of comments, contents of review diffs... etc.Titlefield of an issue isn't astring, but it's aDecryptableString. ADecryptableStringcan't be used without being evaluated in the context of who is viewing the data with an access control check.(This is rather vague as a direction, but it's a concept.)
A key question coming out of that last comment, just to highlight since it's buried in the details. 🤣
Is the intent to support private pull requests as well? And if so, which would be a great feature, it's a good note that considerable work will need to be done to figure out how to hide the code.
I think that private pull requests are not on the immediate roadmap. If the code doesn't get too complex, I would appreciate if the approach could be flexible enough to be expanded to private PRs in some far future. However, I feel like if this is not trivial, we shouldn't prematurely build a complex architecture for an eventual feature.
@mfenniak wrote in #2 (comment):
I would be skeptical of this, git themselves says that once you put some commits into a public repository there will be several ways to get them despite whatever access control you try to put up. You would need a separate repository (or git objects directory) at minimum.
@Gusted wrote in #2 (comment):
Taking @mfenniak’s comments into account, i.e. having a more defined separation between public and private issues (ignoring whether or not we tidy up the DB structures while we’re at it) than I suggested above, I wonder if we should even persist the attribute.
BTW, this is one place where I really miss the ability to set a default value for a struct field that isn’t the null value of the type of the field – it would allow us to prevent anybody from creating an
IssuewithisPublic == falseor aPrivateIssuewithisPublic == true, which might give us a way to prevent one or the other being stored in the wrong place even if variables are cast back and forth (which as I understand it has to happen in some places unless we want to duplicate the code dealing with issues, which I really don’t want to).@mfenniak wrote in #2 (comment):
Seconded, I just wanted to mention it because this has come up in discussions within the team working on Forgejo in Fedora infrastructure.
I think for many of these, the question "how hard is it to do it right?" and "does it make sense to have for private issues from a user’s perspective?" have different answers. I would argue the latter can be answered in the positive for most (all?) of these items:
IsPull/PullRequestin aPrivateIssue.I mean, it’s probably not super clean, with fields we decide to ignore for private issues for the moment, but it should be effective for the intended purpose: preventing naive DB queries from leaking private issues.
It sounds rather like a more substantial piece of work than two tables, that’s for sure. 😉
I’m not familiar with using encryption in database fields and I don’t understand what you mean by “keying” here, do you have any pointers? I imagine this would make searches difficult and wouldn’t play well with indexes.
@nilsph wrote in #2 (comment):
It's a module in Forgejo that encrypts data using Forgejo's configured secret key + context-specific derived keys. Here's an example of it's usage:
// SetSecret sets the 2FA secret.func (t *TwoFactor) SetSecret(secretString string) {key := keying.DeriveKey(keying.ContextTOTP)t.Secret = key.Encrypt([]byte(secretString), keying.ColumnAndID("secret", t.ID))}// ValidateTOTP validates the provided passcode.func (t *TwoFactor) ValidateTOTP(passcode string) (bool, error) {key := keying.DeriveKey(keying.ContextTOTP)secret, err := key.Decrypt(t.Secret, keying.ColumnAndID("secret", t.ID))if err != nil {return false, err}return totp.Validate(passcode, string(secret)), nil}Yes, this is a good point. You would naturally be limited to not search, sort, or join based upon encrypted fields. It would be important to define exactly what data is expected to be secured in order to see if this pathway is feasible.
Meanwhile I got to a point where (in my fork/branch, caveat spectator) I can create a private issue through the REST API, have it end it up in the right place, return a sensible result (with an asterisk*) from the API and not log any problems. Here are things I discovered in the process and/or didn’t think about before:
I’ve found myself having to set
IsPublicin quite some places that already deal with normal/public issues so that they keep on doing the right thing. I have a feeling it might – counterintuitively and going against what @Gusted advised above – be safer to have anIsPrivatefield instead (that wouldn’t even have to be persisted in the database). I think we can ensure it to have the right value for private issues at only a few boundaries: 1) when converting thePrivateIssueto anIssueand 2) when loading it from the DB. I’ve learned about XORM events in the meantime which look like a good solution for the latter.There is a lot of code dealing with issues which uses the
idprimary key of an issue instead of anIssue, which makes private issues in a different table “interesting”, as suddenly there can be two issues with the sameid. I’m a bit puzzled why these functions don’t use the "index", i.e. the user-visible ID of an issue which is unique between issues and pull requests already. I can see some ways of addressing this:IssueIDstruct which carries this information.<SomeOperation>Optsstructs to avoid proliferation of function parameters – merely addingIsPublic(orIsPrivate) makes some function calls pretty unwieldy.Issuevariables instead (which carry this info).(*) Here’s the asterisk from above:
idfields are used for referencing other DB tables, but why are they exposed through the API? I wondered if the API should return aprivate_idfor private issues, or havingis_publicoris_privateis enough to distinguish. Personally – and realizing I’m probably not familiar enough with the code to be a good judge here – I wouldn’t expose DB primary keys at all, only the user-visible Issue/PR index(/number) …number(in the API) vs.Index(in the code) in the first place?What are your thoughts?
@nilsph wrote in #2 (comment):
I can take a guess, without looking at the code, not every operation operates in the context of a repository. The index is useful if you pass/know the repository being worked on this is not always the case to what I remember.
@nilsph wrote in #2 (comment):
I'm not opposed to this, this would be quite good as it makes functions harder to misuse I'm generally in favor for. The only concern I could raise is performance as pointers are being passed around but this is already commonly done (so most stuff should already be on the heap) and we are not adding more arguments (iirc for x86_64 the limit for passing arguments via registers is 5 in Go) so it should be fine.
@nilsph wrote in #2 (comment):
Github API compatibility (we inherited this).
@nilsph wrote in #2 (comment):
Feel free to go with this approach for now, I trust the first boundary (as it's bounded by Go's type safety). Relying on XORM, I would likely want to read the exact code for this myself as XORM has some funny undocumented behaviors that aren't yet in Forgejo's folklore knowledge and I hate if that broke the safety of private issues. So my only recommendation for this part is: unit tests.
Thanks for your comments!
@Gusted wrote in #2 (comment):
I want to be able to sleep at night, too – rest assured that I’ll go all out on unit tests before submitting changes in earnest, i.e. before any of this makes it out of my private fork/branch or, at the outmost, a WIP pull request.