Allow vic-machine configure to set appropriate roles for ops user#7777
Allow vic-machine configure to set appropriate roles for ops user#7777mdubya66 merged 2 commits intovmware:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7777 +/- ##
==========================================
+ Coverage 25.85% 25.86% +0.01%
==========================================
Files 35 35
Lines 5124 5122 -2
==========================================
Hits 1325 1325
+ Misses 3692 3690 -2
Partials 107 107
Continue to review full report at Codecov.
|
| validator, err := validate.NewValidator(op, c.Data) | ||
| if err != nil { | ||
| op.Errorf("Configuring cannot continue - failed to create validator: %s", err) | ||
| op.Errorf("Configure cannot continue - failed to create validator: %s", err) |
lib/install/management/configure.go
Outdated
| err = d.update(conf, settings, isConfigureOp) | ||
|
|
||
| // If successful try to grant permissions to the ops-user | ||
| // try to grant permissions to the ops-user |
There was a problem hiding this comment.
Minor: Grant permissions to the ops-user before initializing the appliance
lib/migration/feature/feature.go
Outdated
| // create time is stored in nanoseconds (previously seconds) in the portlayer. | ||
| ContainerCreateTimestampVersion | ||
|
|
||
| VMFolderSupportVersion |
There was a problem hiding this comment.
Minor: s/VMFolderSupportVersion/VCHFolderSupportVersion perhaps?
A short comment to describe this field for posterity would be nice to have.
There was a problem hiding this comment.
Yeah I wondered whether it was a catch-all (VM) or exclusive to the VCH. I'll clarify.
1964072 to
3e2b28e
Compare
lib/install/management/configure.go
Outdated
| } | ||
| } | ||
|
|
||
| err = d.update(conf, settings, isConfigureOp) |
There was a problem hiding this comment.
I think the reason that this happens before is that there's no way to rollback opsuser.GrantOpsUserPerms if this fails.
There was a problem hiding this comment.
To summarize this issue:
Today, configure (including upgrade) is organized in the following way:
- A new configuration for the VCH is constructed based on the old configuration
and requested changes. - The new configuration is validated.
- The updated version of any ISO files being changed are uploaded.
- Resource settings are updated, and a defer is registered to undo the change
if subsequent steps fail. - A snapshot is taken.
- The VCH is updated:
a. If running, it is powered off.
b. If volume stores are being added, those are created.
c. The VCH itself is reconfigured.
d. The VCH is powered on.
e. We wait for the VCH to start and the service to be ready. - If the update is successful, we update the operations user.
- If either the update or the operations user update is unsuccessful, we undo:
a. We rollback to the snapshot we took.
b. We delete uploaded images.
c. We delete the snapshot we took.
d. The defer undoes resource settings updates.
This process has a key characteristic: if a failure occurs, we cleanly return to
the initial state.
However, this process also has a key limitation: the process will fail at (6)(d)
or (6)(e) if the operations user requires additional privileges for the basic
operation of the VCH following the configuration change or upgrade.
A simple solution to this might be to reverse the order of steps 6 and 7: update
the operations user before updating the VCH. This will address the limitation,
but will weaken the guarantee around cleanly returning to the initial state when
a failure occurs; it's not clear how to undo the changes to the operations user.
The effects of this are twofold:
- Following the rollback of a change where additional privileges were granted
to the operations user, those additional privileges are left as "cruft". - Following the rollback of a change where privileges were revoked from the
operations user, the VCH will not start.
Options to improve this might include:
- Looking into ways to undo changes to the operations user, perhaps following a
similar pattern to the resource settings. This requires reading the old state
from the system before making changes, including both the role-privilege map
and resource-role associations. - Moving the operations logic in between (6)(c) and (6)(d) instead of between
(5) and (6). This reduces the cases where we will leave cruft behind.
There was a problem hiding this comment.
I am not extremely familiar with this code. But I believe that we must determine what permissions the ops-user has before we begin to assemble the perms that are going to be needed in order to function. If we store that we could defere a configure back to those original permissions. @zjs why not make the ops-user changes before creating the volume-stores as well? They are created by vic-machine, but if we fail to assign the ops-user that will leave 1 less set of things to clean up.
In the event of a failure what is the downside to returning the supplied ops-user to it's original permissions set?
There was a problem hiding this comment.
In the event of a failure what is the downside to returning the supplied ops-user to it's original permissions set?
Returning the operations user to the original permissions set is the desired behavior.
But I believe that we must determine what permissions the ops-user has before we begin to assemble the perms that are going to be needed in order to function. If we store that we could defere a configure back to those original permissions.
This is a possible approach, but actually extracting the "before" state is non-trivial. We need to track the changes being made to the privileges for each role as well as the roles being applied to each resource. Because of the way the vSphere APIs work, it's somewhat complicated to read all of that information. Because we use share the same set of roles for multiple operations users, safely rolling back in the presence of concurrent upgrade operations may not be straight-forward.
why not make the ops-user changes before creating the volume-stores as well? They are created by vic-machine, but if we fail to assign the ops-user that will leave 1 less set of things to clean up.
Currently, the operations user changes is the hardest thing to roll back (now tracked by #7814), so we do those as late in the process as possible.
There was a problem hiding this comment.
Added #7814 (comment) to describe an alternate pattern, which is likely to be less work than tracking the old privileges/roles.
zjs
left a comment
There was a problem hiding this comment.
I think #7777 (review) needs to be resolved before merging this.
|
I've moved the logic that grants the ops user permissions into the |
b39ca1e to
3420762
Compare
| if c.OpsCredentials.IsSet { | ||
| o.Username = n.Username | ||
| o.Token = n.Token | ||
| o.GrantPermsLevel = n.GrantPermsLevel |
There was a problem hiding this comment.
I'm not sure whether this is the behavior we want. (I'm also not sure that it isn't.)
By including o.GrantPermsLevel = n.GrantPermsLevel in this check, if an administrator uses vic-machine configure to change an operations user's password which previously had a GrantPermsLevel of AddPerms without explicitly including --ops-grant-perms in that command we'll actually downgrade GrantPermsLevel to "". I think that would be surprising behavior: change the password (e.g., due to expiration) and no longer have permissions be automatically managed.
It may be better to have a check here along the lines of clic.IsSet("ops-grant-perms") so that we only adjust GrantPermsLevel if --ops-grant-perms or --ops-grant-perms=false is included on the command line. (To do this, you'd just need to pass clic *cli.Context into the copyChangedConf method.) In that case, we only change the properties a user has asked us to change.
A downside of both the current implementation and this suggestion (and therefore an upside of the proposed change) is that if you change to a completely new operations user we won't clear the GrantPermsLevel. This could also be surprising: this is a case where the user might expect us to change something even though they haven't asked us to (because they may see --ops-grant-perms as being associated with the user, not the VCH). Unfortunately, the way configure is designed really doesn't give us a lot of options here (and leads to similar issues for other settings).
There was a problem hiding this comment.
Good catch. There was no clean way to get the name of the option out of the cli flags, nor did there seem to be much reason to declare a constant for it all on its own, but if you find it irksome, I can pull it out as a constant anyway.
zjs
left a comment
There was a problem hiding this comment.
LGTM. Just minor comments on test code.
| Log Govc output: ${output} | ||
| Should Be Equal As Integers ${rc} 1 | ||
| Should Contain ${output} Permission to perform this operation was denied | ||
| Attempt To Create Resource Pool |
There was a problem hiding this comment.
I think this can be Attempt To Disable DRS, can't it?
There was a problem hiding this comment.
Could be, but it was basically a merge of your additions and @anchal-agrawal's additions, so I left both Attempt To Create Resource Pool and Attempt To Disable DRS in there for variety. Still trying to get granted ops-user perms work after upgrade to pass, so I'll clean it up as I go forward.
There was a problem hiding this comment.
Ah, disabling DRS requires less privilege than creating a resource pool. Usually disabling DRS makes a good sanity check that we're not granting the operations user privileges they don't need.
In the specific case that we're using --affinity-vm-group, we currently have to grant the operations user the privilege that lets them disable DRS (as that's the same privilege that lets them manage affinity constructs), so we check creating a resource pool instead.
|
|
||
| Run Privileged Commands | ||
|
|
||
| Cleanup VIC Appliance On Test Server No newline at end of file |
mdubya66
left a comment
There was a problem hiding this comment.
Approved for cherry pick into 1.4
This change fixes a bug in
vic-machine configurethat was preventing a VCH installed without an ops user enabled to be reconfigured to do so. With this change, you can now runvic-machine configureagainst an existing VCH to configure the ops user credentials and permissions.Fixes #7725, #7796