Skip to content

Adding Groups/Albums #27

@crspybits

Description

@crspybits

Motivation: Currently, a user can share files with a fixed set of users. There is no means for a user to share files with multiple sets of users. For example, suppose in the SharedImages app that I want to share one group of images with LouisianaBuddies (one set of users) and another group of images with Family (with my wife). Currently, I cannot do this with a single user account.

Proposal: A user will be able to belong to multiple named groups. A file will also be able to belong to multiple named groups. The rationale for having a file belong to multiple groups comes from the SharedImages app-- a user will want to be able to share an image with buddies and family. Any given file will be owned by a single user. People and files are the entities. Groups are relationships linking people and files.

Implementation Details: This is going to require the addition of three more mySQL tables:
Groups: Describes groups on the server. E.g., it will give the GroupId, and the name of the group. And policy information (see below).
GroupUsers: Describes the users that are members of particular groups. And the owning/sharing role they play (see below).
GroupFiles: Describes the files that are members of particular groups

Changes to some tables will be required:
FileIndex: Must indicate whether a file is read-only or read-write (see below). (Note that this change will only have meaning once we implement multiple file versions).
Upload: Same change.

Some changes to existing endpoints would be required:

  1. UploadFile-- now you must also indicate the initial groups to which the file will belong. The user doing the uploading must belong to all the groups. You also have to indicate if the file is read-only or read-write, if it belongs to multiple groups.
  2. Sharing Invitation Creation -- now you must also indicate the group that is being shared.
  3. FileIndex: You must indicate a group, and you will receive the list of files that are in that group. Since files can be in multiple groups, each file can indicate that it is multiple groups. This will also return the master versions for each of these groups (see below).

New server endpoints will be required:

  1. CreateGroup: Enables a single user to create a new group. This will return an identifier for the new group. Question: Must the user creating the group have cloud storage? i.e., be capable of being an owning user?
  2. DeleteGroup: Enables a user to remove a group. Question: Must this user be the same as the one that created the group?
  3. UserGroupIndex: List the groups to which the user belongs.
  4. Get attributes of sharing invitation: Before accepting a sharing invitation it seems reasonable to present the user with some attributes of the group. E.g., the group sharing policy (see below). We could also add into this the permissions that are being extended with the invitation.
  5. AddFileToGroups: Add a file to additional group(s). Again, the user calling this must belong to the group(s).
  6. RemoveFileFromGroups: Remove the file from one or more groups. Note that we could instead use our current UploadDeletion to handle this. And an UploadDeletion from all groups could cause the file to be finally deleted.

Considerations of master version: While it seems important to allow a file to belong to multiple groups (e.g., the SharedImages example of sharing an image with two groups of people), this raises problems with synchronization. First, let's consider what the master version will look like after this change. The master version is used by the SyncServer to enable optimistic synchronization. A particular state of a set of files is indicated by a specific master version. A client indicates its assumption about the state of those files by providing a specific master version value to the server. Part of the utility of the master version comes from its relatively infrequent likelihood of change while a given client is making its changes. That is, we've been assuming a relatively low frequency of changes to the files encompassed by the master version.

Right now master versions are maintained per owning user. But that's also confounded with a group. That is, right now, an owning user is the group. An apparently natural extension for the change to using groups is to now have master version's indicate the state of a set of files within a group. When a file is added to or removed from a group, or a file within the group changes (after we get multiple file versions added to SyncServer), the master version for that group would change.

Suppose that we allow files to belong to multiple groups, without any restrictions. This could have a significant negative impact on the utility of the master version. Consider an example in the SharedImages app, in the context a further extension to the system to allow multiple file versions (i.e., the ability to change files) and a change to the SharedImages app to allow discussion threads. Suppose there is a popular image. That popular image might come to be shared across all groups. With such a popular image there could be a relatively large number of people participating in the discussion thread associated with that image. That discussion thread, as I've been planning, would be stored in a single file. Thus, there would be a relatively large number of changes over time to that file. Now, given that master versions will represent the state of files in a group, and given that the popular file is a member of all groups, every time that file changes, every master version would change. With a sufficiently high frequency of changes to that file this would reduce the utility of the master version completely. There would be a high likelihood that, while any particular client had a specific value for the master version, when the client reported that master version back to the server, the master version would have already changed. Thus the client would be informed that it's assumptions about the state of data on the server were incorrect. And the client would have to retry it's operation.

It seems clear that some restrictions are needed on the ability of files to belong to multiple groups. We plan on the following:

  1. If a file is read-only, the file can belong to any number of groups. This would require an additional meta-data flag per file on the server to mark some files as read-only (easy right now, given that all files are single versioned-- and thus read-only). This also requires that the client will need to indicate, at least when sharing a file whether or not a file is read-only.
  2. If a file is read-write, then it seems reasonable to place a restriction on the number of groups it can be shared with. E.g., sharing across 2 groups. This could be a configuration-time limitation for the server-- i.e., any given person deploying the server could decide on this limit.
  3. Of course, files could be "shared" across groups simply by copying. In this manner, there would be no difficulty for the master versions across groups. From a client's perspective, this would require uploading the same file multiple times.

It seems clear that the DoneUploads endpoint call will need to operate across groups. One clear cross-group operation is when a file is uploaded and that file belongs to multiple groups. Thus, the server will need to update multiple master versions. Another case of this when you upload multiple files, each belonging to separate groups, and then call DoneUploads. We could potentially restrict this latter capability of DoneUploads, but it seems capricious given that we will need to deal with the case of single file belonging to multiple groups.

How complicated will it be to update these new master versions? Previously, when a DoneUploads was carried out, a single master version was incremented. Now, if a file belonging to a group is read-write, master versions will need to be incremented for all of the groups that the files are members of that are in the upload. E.g., suppose read-write File F1 was uploaded, and it's in groups G1, G2, ..., GN. Then master versions for each of these N groups would need to be incremented. Of course, we are putting a limit on N (e.g., 2), so this won't grow without bound. This incrementing of N master versions would need to be done in an atomic manner on the server.

How would a client be notified about a master version change? Prior to the group idea, a client is informed of a master version change on a UploadFile, UploadDeletion, or DoneUploads. On an UploadFile, this will remain the same. The UploadFile operation would be rejected if any of the master versions of the groups to which the file belongs have changed (this is true for both read-only and read-write file uploads-- it applies to read-only because the only time such a file is uploaded is for version 0, i.e., the first upload). A client would know this, and so would be best off limiting the total number of groups across the files in a given upload. The same can be done for an UploadDeletion. The client would have to have possession of a set of master versions. i.e., the master version for each of the groups that a file belonged to. For a DoneUploads, the client would have to provide master versions for all of the file/groups in the upload operations done previously.

How would a client receive the master version values? Currently, a client receives the current master version value (for the user) by doing a FileIndex endpoint call. The new FileIndex call will be relative to a specific group. However, each of the files in the group can also belong to other groups. So, we'll need to return with each file, the list of groups to which it belongs.

There is a policy option that must be decided when creating a group: Who will the owners of the data be?

  1. One policy option is as we have now: A single user owns all of the data.
  2. Another policy option is for any user who has cloud storage to own the data they share.

This latter option raises the question of what to do with purely sharing users or users with cloud storage who decide not to use their cloud storage. How can they share files? It seems like such purely sharing users would need to establish an agreement with an (at least one) owning user who would serve as the host for their data. How would such an agreement be created in terms of UI and endpoints? It seems that the application UI would need to present requests for hosting by sharing users. Alternatively, when you join a group as an owning user, you could indicate whether or not you are willing to host sharing users files.

With the advent of these groups, a single user may not just be an owning user or a sharing user. For example, a given Google user could be a member of two groups, owning files in one group, and only sharing files in another group. So, it seems that a user's owning/non-owning status is decided by the group policy-- and will be an attribute of the Group and GroupMembers table.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions