"vagrant box remove" should remove image from libvirt pool
When asking vagrant to remove a libvirt-specific base box, the corresponding image file should be removed from the libvirt pool.
That would avoid cluttering the disk with old image files, and would avoid user confusion when discovering that the box is still present somewhere even though vagrant said it was "removed".
Thanks. i will check it.
+1
Note: If you sudo rm path to the image inside /var/lib/libvirt/images/ you also need to restart libvirtd otherwise when you use vagrant and it tries to put an image there, you'll get an error because it still thinks one is present when it's not.
@purpleidea, you shouldn't have to restart libvirt; a virsh pool-refresh <pool> should be sufficient.
This has been giving me the right-royal shits for the past two days, so I spent some time today faffing around trying to hack it in.
Basically, I don't think it can be done cleanly, within the constraints of the Vagrant system:
- You can't tell libvirt to nuke a box image without being able to talk to libvirt;
- You can't talk to libvirt if you don't know how to talk to libvirt;
- You can't find out how to talk to libvirt without knowing which machine you want to manipulate (because the libvirt connection settings are per-machine);
- When you remove a box, you're not manipulating a machine.
Given the fundamental design of vagrant, which allows pretty much every setting to be overridden on a per-machine basis, I don't think this is ever going to be doable. I've got dodgy code that works in mpalmer@83b7e8b, but it makes the assumption that all libvirt config is being done globally. That works for me, but I don't know if that's a reasonable assumption or not.
@mpalmer good idea about the refresh. Didn't know that exists, but I'll try it next time, thanks!
If there's no clean solution to remove the image on box removal, maybe a bug can be opened for vagrant proper, and explain the problem.
Yes, getting feedback from vagrant developers may be a good idea.
Any progress on this? I just wasted a day debugging an issue with a box, only to finally realise that I had not actually been testing the new versions of the box I built: the vagrant box remove didn't remove the old version, rendering meaning that vagrant box add of the new version was effectively a no-op through not updating the existing image in /var/lib/libvirt/images ...
@mitchellh - are you aware of this problem?
I haven't checked into how we would do it, but perhaps we could hook into vagrant box remove and when it's removing a libvirt box print a warning about this. The warning could contain the virsh commands needed to remove the box's volume if you're using the defaults (connecting to libvirt on localhost and using the storage pool named default) and instructions to modify the commands if you're not.
Hmm, hopefully we can do better than that.
The fact that vagrant-libvirt is already able to upload an image to the storage pool without any of these issues may yield a clue to a possible clean solution. IIUC this upload is possible because it happens during vagrant up, not during vagrant box add: the latter does not have any Vagrantfile providing libvirt connection config, whereas the former does. So equally, couldn't we extend the provider so that vagrant destroy removes the image, in the case where there are no more Vagrant VMs referencing it?
I like the idea, but I'm leery of doing it by default. I think a lot of people using vagrant for testing frequently run vagrant destroy followed by vagrant up. It would be annoying to have to wait for the box to be uploaded each time you ran vagrant up.
Adding a flag (vagrant destroy --remove-box ?) to optionally do it seems feasible.
To make this discoverable, we could still try to add a warning message to vagrant box remove. The message could tell people to navigate to vagrant projects where they used the box and run vagrant destroy with the new flag.
To throw out a half-baked idea:
Since what seems to bite people is unexpectedly using an outdated box, I wonder if we could find a way to check if the base box loaded into libvirt differs from what is in vagrant's box store. If we don't have a good way of doing that now, we could change the box metadata and/or how we name the box in libvirt to make it feasible. Vagrant itself has a concept of box versions that we could potentially find a way to utilize, or we could look at something file-based like modification date, size, checksum. E.G. we add the sha256sum of the disk image to metadata.json, and we make that part of the volume name when uploading it to libvirt.
I think vagrant box remove should cause removal of the box from /var/lib/libvirt/images/ In addition, copying it in could make use of a symlink (not recommended) or a reflink (recommended, where supported) because as it turns outs:
$ pwd; sha1sum box.img /home/james/.vagrant.d/boxes/centos-6/0/libvirt 1116b0b4fb3ac95728f403d26fabe6e6fd6180e4 box.img # pwd; sha1sum centos-6_vagrant_box_image.img /var/lib/libvirt/images 1116b0b4fb3ac95728f403d26fabe6e6fd6180e4 centos-6_vagrant_box_image.img
The boxes are identical, so we're also unnecessarily duplicating data.
@purpleidea I'm hesitant to go around libvirt and work directly on the filesystem, because then we don't have any safeguards preventing us from removing the base box if there are still VMs running based on it.
Working directly with /var/lib/libvirt/images also breaks if you're storing boxes in another storage pool, connecting to libvirt on another machine, etc. We're in an uncomfortable place where we want to have the same behavior as vagrant with virtualbox, but like mpalmer said earlier we've made vagrant-libvirt too configurable to do this.
@sciurus I agree... I think we should use the libvirt mechanism to remove the file from /var/lib/libvirt/images/...
I agree that removing the image from libvirt upon vagrant destroy is absolutely not what users expect so we shouldn't be doing that. the place where it would belong best would be upon vagrant box remove. iiuc vagrant doesn't provide enough hooks for providers to clean things up their own way during that action?
@lelutin the problem is knowing where to clean up.
When you run vagrant box add the box is placed in ~/.vagrant.d/boxes.
Later you run vagrant up in the context of a Vagrantfile, which tells you how to connect to libvirt (either by specifying the various provider options or leaving them out to use the defaults). If the box hasn't been uploaded to libvirt yet, this is done, otherwise the existing verion is uses.
I could run vagrant box add, then run vagrant up with two Vagrantfiles that upload the same box to qemu:///system and qemu+ssh://example.com/system.
Now when I run vagrant box remove vagrant knows it needs to delete it from ~/.vagrant.d/boxes, but how does it know to connect to qemu:///system and qemu+ssh://example.com/system to remove it from there?
Only after writing the above does it occur to me, why does a Vagrantfile have to be the only place where information about how to connect to libvirt is stored? What if there was a file somewhere in ~/.vagrant.d for each box that recorded the libvirt URIs to which that box had been uploaded? We could update this when needed during vagrant up and reference this during vagrant box remove.
What do you think of the idea @mpalmer?
@sciurus That idea has a lot of merit to it. I'm pretty sure Vagrant won't let you remove a box that's in use anywhere within its purview, and if you happen to have used the image for a non-Vagrant-managed VM, libvirt will have something to say about it, I believe (so the code would need to catch that error condition and warn the user that the image wasn't removed from
Only after writing the above does it occur to me, why does a Vagrantfile have to be the only place where information about how to connect to libvirt is stored?
Excellent point!
What if there was a file somewhere in
~/.vagrant.dfor each box that recorded the libvirt URIs to which that box had been uploaded? We could update this when needed duringvagrant upand reference this duringvagrant box remove.
I agree with @mpalmer - this sounds like a great idea! There could be a new directory ~/.vagrant.d/boxes/$BOX_NAME/$VERSION/libvirt/hypervisors.d, with one entry per libvirt hypervisor containing a copy of the image.
And it would be used not only by vagrant box remove, but also by vagrant box add --force when replacing an existing box.
Another possibility would be to do it the other way around, i.e. rather than record the URI after the upload, configure the URI prior to upload, e.g.:
- Allow
libvirtURIs to be registered under~/.vagrant.d/libvirt/hosts.d, where the filename is the URI descriptor and the file contains the URI itself (and any other connection configuration which may be needed) - Populate
~/.vagrant.d/libvirt/hosts.d/localhostautomatically - Extend
vagrant box add/removewith a--hostoption which can reference any file in that directory but defaults tolocalhost - Allow
Vagrantfiles to reference the same URI descriptor
So by default, the user would need no extra work in order to use libvirt on localhost.
I guess the main differences with your approach are:
- Pro: Allows updating/removal of a box in/from individual hypervisors
- Pro: Extra layer of indirection allows
Vagrantfiles to use a different hypervisor URI without modification (a bit likegit'sinsteadOffeature allows you to redirect to a different repo without changing thegit-remoteconfiguration) - Con: Maybe slightly more demanding on the user when remote hypervisors are in use, since they need to add the file to
hosts.dfirst.
@aspiers your second idea is interesting.
Would we pull all of the provider options (driver, host, etc) from the Vagrantfile and store them in hosts.d instead?
When running vagrant up, would the user specify the --host to choose where to create the VM?
Would we pull all of the provider options (driver, host, etc) from the Vagrantfile and store them in hosts.d instead?
That's not what I had in mind; when I wrote "Allow libvirt URIs to be registered", I meant manually (except the localhost case, which would be prepopulated out of the box). BTW does Vagrant's plugin framework allow plugins to add new subcommands? E.g. it would be nice to support stuff like
vagrant libvirt host add kvm1 --uri qemu+ssh://[email protected]/system
rather than requiring the user to edit config files under hosts.d directly.
When running vagrant up, would the user specify the --host to choose where to create the VM?
Yeah, that's one way it could work, e.g. --host kvm1 would obtain the hypervisor's URI (and any other info needed) from ~/.vagrant.d/libvirt/hosts.d/kvm1. But not just vagrant up! It would also work for vagrant box add/remove, which is why it's a solution to this issue.
It could also be made possible to specify which host to use via a directive in the Vagrantfile. Support for the existing provider options should be kept for backwards-compatibility; in fact, they could override any settings configured via the hosts.d file (which would default to localhost if --host wasn't specified).
So for example, if ~/.vagrant.d/libvirt/hosts.d/kvm1 contained:
host=kvm1.mydomain.com
connect_via_ssh=true
username=root
and Vagrantfile contained:
config.vm.provider :libvirt do |libvirt|
libvirt.id_ssh_key_file = "my-ssh-key.dsa"
username=libvirt
end
and you typed vagrant up --host kvm1 or vagrant box remove foo --host kvm1, it would use the following configuration:
host=kvm1.mydomain.com
connect_via_ssh=true
libvirt.id_ssh_key_file = "my-ssh-key.dsa"
username=libvirt
@aspiers sounds like we have the same understanding of how your proposal would work. Yes, we can add new subcommands.
Both my idea and @aspiers refinement are a bigger project than I'm planning to bite off, but I'm happy to provide testing and feedback to anyone who works on this.
Extend vagrant box add/remove with a --host option which can reference any file in that directory but defaults to localhost
Related to this, I am intrigued by the possibility of no longer storing boxes in ~/.vagrant.d/boxes/ at all but instead having vagrant box always operate directly on a libvirt storage pool.
I could run
vagrant box add, then runvagrant upwith two Vagrantfiles that upload the same box toqemu:///systemandqemu+ssh://example.com/system.
This usecase seems a bit out of scope for what vagrant was intended to be. @sciurus I would be interested in a more detailed example to understand when you need this.
Related to this, I am intrigued by the possibility of no longer storing boxes in
~/.vagrant.d/boxes/at all but instead having vagrant box always operate directly on a libvirt storage pool.
I like this approach. As purpleidea stated, saving the image twice seems like an inefficient way to go. Most users will not use multiple and/or remote hypervisors.
Related to this, I am intrigued by the possibility of no longer storing boxes in ~/.vagrant.d/boxes/ at all but instead having vagrant box always operate directly on a libvirt storage pool. I like this approach. As purpleidea stated, saving the image twice seems like an inefficient way to go.
True, it's a nice thought, but ...
Most users will not use multiple and/or remote hypervisors.
In other words, some users will use multiple hypervisors, in which case at least one is remote. And the problem is that Vagrant doesn't support a model where boxes can be stored in different locations. You'd need to extend commands like vagrant box list so that they would support a --hypervisor option or similar, but only for the libvirt provider. And I'm not sure if that's even possible or something Mitchell would want to entertain, because it's a fundamentally different model. Currently box storage is flat, and local.
@dennisklein this isn't about something I need. In that comment I was just trying to explain what the current design allows and how it makes pool cleanup difficult.
If we were going by what I need we would rip out support for remote libvirt altogether. :smile:
I think what we need here is a two pronged approach:
- store information about which libvirt URIs have been uploaded to and query the user (y/n) if they'd like to remove them from libvirt on
vagrant box remove. - upload the box to the libvirt storage pool based on hash+name
- since versioning isn't supported for boxes that don't get pulled from atlas #294 doesn't help us in all scenarios (see this mail list thread)
- compute a hash (one time upon vagrant box add) and name the volume in the libvirt storage pool accordingly
- so if a box was originally going to be named 'cdkv2_vagrant_box_image_0.img' in the pool it could now be named cdkv2_vagrant_box_image_0_fb28e33f098abaf356940fdb3aaae417.img
Does this seem reasonable?
Store information about which libvirt URIs have been uploaded to and query the user (y/n) if they'd like to remove them from libvirt on vagrant box remove.
I would be happy if we can just fix this as our first attempt.
I can see some interesting discussion in the issue, but we do not have a solution yet. I think we have increased the expected fix's scope more than what can be done by a contributor in a reasonable time frame.
I would be happy if we can just fix this as our first attempt.
@LalatenduMohanty I think we need both. It's possible that the user will either answer "no" to removing from the libvirt provider or that a libvirt host is down during the time of the vagrant box remove. Identifying by content hash would mitigate this corner case as well.
@dustymabe @LalatenduMohanty Check out sections 4 and 5 here: https://github.com/hollodotme/Helpers/blob/master/Tutorials/vagrant/self-hosted-vagrant-boxes-with-versioning.md It should be possible host w/ versioning. I did I quick test locally, it does seem to work...