I upgraded from 1.5.1 to 1.5.6, and found I was no longer able to use ansible to provision the VM packer built (on GCP) because ansible didn't have sufficient permissions to create its temporary directory.
I narrowed it down to the change in #8942, and although I understand what I need to do to fix the issue (and I do believe it's my fix to make), I think there should have been something in the documentation that brought me to the fix much sooner.
I'm happy to share my entire discovery process, but here's the slightly shorter version. With the packer-provisioned VM using ssh-keys instead of sshKeys, google_account_daemon now starts creating all the users defined in the project metadata, rather than just the packer user defined in the instance metadata, as documented here
If instance-level metadata is set to block project-wide SSH keys or has a deprecated instance-only sshKeys value, the instance ignores all project-wide SSH keys.
I'd configured packer to connect as a packer user, and I'd let ansible do what it did by default, which was to assume it was connecting as me, duvall. Because packer sets up ssh connection sharing with the user I specified, then ansible was always logging in as packer, even though it was all geared up to use duvall, as mentioned at the top of the ansible provisioner docs.
Before 1.5.5, when sshKeys was used, and so the duvall user wasn't created, ansible logged in as packer, tried to expand ~duvall to find out where the user's home directory was, and got back ~duvall, so it created its temporary files in /home/packer/~duvall/.ansible. Silly, but just fine.
After that change, the duvall user was created. Ansible still logged in as packer, but now expanding ~duvall returned /home/duvall, so it tried to mkdir -p /home/duvall/.ansible/tmp, and failed because it didn't have the permissions.
There are a couple of fixes. I can change the ansible configuration to set remote_tmp=/tmp, so that packer is guaranteed to be able to write there. Or I can set user to packer in the provisioner configuration. (I could also tell packer to use the duvall user, but I don't want that.)
But I feel like it shouldn't have taken me a day to figure this all out. I feel like this is probably a documentation issue, but I'm not sure which docs should be enhanced. The first thing I checked was the changelog, and although #8942 stood out, it wasn't at all clear to me why that would be related, there was no warning there or in the PR that backwards compatibility might be an issue, and so I dismissed it until a git bisect landed me on that commit, at which point I had to dig into the sources of packer and the Google Linux guest packages to figure out how all the pieces fit together. I suspect the note at the top of the ansible provisioner page should be enhanced as well, to note that the user ansible uses should match the user packer creates, but that ultimately there should be some docs around the change. Maybe simply having this issue linked to the PR is sufficient.
I upgraded from 1.5.1 to 1.5.6, and found I was no longer able to use ansible to provision the VM packer built (on GCP) because ansible didn't have sufficient permissions to create its temporary directory.
I narrowed it down to the change in #8942, and although I understand what I need to do to fix the issue (and I do believe it's my fix to make), I think there should have been something in the documentation that brought me to the fix much sooner.
I'm happy to share my entire discovery process, but here's the slightly shorter version. With the packer-provisioned VM using
ssh-keysinstead ofsshKeys,google_account_daemonnow starts creating all the users defined in the project metadata, rather than just thepackeruser defined in the instance metadata, as documented hereI'd configured packer to connect as a
packeruser, and I'd let ansible do what it did by default, which was to assume it was connecting as me,duvall. Because packer sets up ssh connection sharing with the user I specified, then ansible was always logging in aspacker, even though it was all geared up to useduvall, as mentioned at the top of the ansible provisioner docs.Before 1.5.5, when
sshKeyswas used, and so theduvalluser wasn't created, ansible logged in aspacker, tried to expand~duvallto find out where the user's home directory was, and got back~duvall, so it created its temporary files in/home/packer/~duvall/.ansible. Silly, but just fine.After that change, the
duvalluser was created. Ansible still logged in aspacker, but now expanding~duvallreturned/home/duvall, so it tried tomkdir -p /home/duvall/.ansible/tmp, and failed because it didn't have the permissions.There are a couple of fixes. I can change the ansible configuration to set
remote_tmp=/tmp, so thatpackeris guaranteed to be able to write there. Or I can setusertopackerin the provisioner configuration. (I could also tell packer to use theduvalluser, but I don't want that.)But I feel like it shouldn't have taken me a day to figure this all out. I feel like this is probably a documentation issue, but I'm not sure which docs should be enhanced. The first thing I checked was the changelog, and although #8942 stood out, it wasn't at all clear to me why that would be related, there was no warning there or in the PR that backwards compatibility might be an issue, and so I dismissed it until a
git bisectlanded me on that commit, at which point I had to dig into the sources of packer and the Google Linux guest packages to figure out how all the pieces fit together. I suspect the note at the top of the ansible provisioner page should be enhanced as well, to note that the user ansible uses should match the user packer creates, but that ultimately there should be some docs around the change. Maybe simply having this issue linked to the PR is sufficient.