vagrant-libvirt "vagrant box remove" should remove image from libvirt pool

When asking vagrant to remove a libvirt-specific base box, the corresponding image file should be removed from the libvirt pool.

That would avoid cluttering the disk with old image files, and would avoid user confusion when discovering that the box is still present somewhere even though vagrant said it was "removed".

Nov 11 '13 04:11 lelutin

Thanks. i will check it.

Nov 11 '13 07:11 pronix

+1

Note: If you sudo rm path to the image inside /var/lib/libvirt/images/ you also need to restart libvirtd otherwise when you use vagrant and it tries to put an image there, you'll get an error because it still thinks one is present when it's not.

Jan 04 '14 20:01 purpleidea

@purpleidea, you shouldn't have to restart libvirt; a virsh pool-refresh <pool> should be sufficient.

Jan 20 '14 02:01 mpalmer

This has been giving me the right-royal shits for the past two days, so I spent some time today faffing around trying to hack it in.

Basically, I don't think it can be done cleanly, within the constraints of the Vagrant system:

You can't tell libvirt to nuke a box image without being able to talk to libvirt;
You can't talk to libvirt if you don't know how to talk to libvirt;
You can't find out how to talk to libvirt without knowing which machine you want to manipulate (because the libvirt connection settings are per-machine);
When you remove a box, you're not manipulating a machine.

Given the fundamental design of vagrant, which allows pretty much every setting to be overridden on a per-machine basis, I don't think this is ever going to be doable. I've got dodgy code that works in mpalmer@83b7e8b, but it makes the assumption that all libvirt config is being done globally. That works for me, but I don't know if that's a reasonable assumption or not.

Jan 20 '14 04:01 mpalmer

@mpalmer good idea about the refresh. Didn't know that exists, but I'll try it next time, thanks!

If there's no clean solution to remove the image on box removal, maybe a bug can be opened for vagrant proper, and explain the problem.

Jan 20 '14 07:01 purpleidea

Yes, getting feedback from vagrant developers may be a good idea.

Jan 20 '14 16:01 sciurus

Any progress on this? I just wasted a day debugging an issue with a box, only to finally realise that I had not actually been testing the new versions of the box I built: the vagrant box remove didn't remove the old version, rendering meaning that vagrant box add of the new version was effectively a no-op through not updating the existing image in /var/lib/libvirt/images ...

@mitchellh - are you aware of this problem?

Sep 12 '14 15:09 aspiers

I haven't checked into how we would do it, but perhaps we could hook into vagrant box remove and when it's removing a libvirt box print a warning about this. The warning could contain the virsh commands needed to remove the box's volume if you're using the defaults (connecting to libvirt on localhost and using the storage pool named default) and instructions to modify the commands if you're not.

Sep 12 '14 18:09 sciurus

Hmm, hopefully we can do better than that.

The fact that vagrant-libvirt is already able to upload an image to the storage pool without any of these issues may yield a clue to a possible clean solution. IIUC this upload is possible because it happens during vagrant up, not during vagrant box add: the latter does not have any Vagrantfile providing libvirt connection config, whereas the former does. So equally, couldn't we extend the provider so that vagrant destroy removes the image, in the case where there are no more Vagrant VMs referencing it?

Sep 21 '14 22:09 aspiers

I like the idea, but I'm leery of doing it by default. I think a lot of people using vagrant for testing frequently run vagrant destroy followed by vagrant up. It would be annoying to have to wait for the box to be uploaded each time you ran vagrant up.

Adding a flag (vagrant destroy --remove-box ?) to optionally do it seems feasible.

To make this discoverable, we could still try to add a warning message to vagrant box remove. The message could tell people to navigate to vagrant projects where they used the box and run vagrant destroy with the new flag.

Sep 22 '14 02:09 sciurus

To throw out a half-baked idea:

Since what seems to bite people is unexpectedly using an outdated box, I wonder if we could find a way to check if the base box loaded into libvirt differs from what is in vagrant's box store. If we don't have a good way of doing that now, we could change the box metadata and/or how we name the box in libvirt to make it feasible. Vagrant itself has a concept of box versions that we could potentially find a way to utilize, or we could look at something file-based like modification date, size, checksum. E.G. we add the sha256sum of the disk image to metadata.json, and we make that part of the volume name when uploading it to libvirt.

Sep 22 '14 03:09 sciurus

I think vagrant box remove should cause removal of the box from /var/lib/libvirt/images/ In addition, copying it in could make use of a symlink (not recommended) or a reflink (recommended, where supported) because as it turns outs:

$ pwd; sha1sum box.img 
/home/james/.vagrant.d/boxes/centos-6/0/libvirt
1116b0b4fb3ac95728f403d26fabe6e6fd6180e4  box.img
# pwd; sha1sum centos-6_vagrant_box_image.img 
/var/lib/libvirt/images
1116b0b4fb3ac95728f403d26fabe6e6fd6180e4  centos-6_vagrant_box_image.img

The boxes are identical, so we're also unnecessarily duplicating data.

Sep 22 '14 04:09 purpleidea

@purpleidea I'm hesitant to go around libvirt and work directly on the filesystem, because then we don't have any safeguards preventing us from removing the base box if there are still VMs running based on it.

Working directly with /var/lib/libvirt/images also breaks if you're storing boxes in another storage pool, connecting to libvirt on another machine, etc. We're in an uncomfortable place where we want to have the same behavior as vagrant with virtualbox, but like mpalmer said earlier we've made vagrant-libvirt too configurable to do this.

Sep 22 '14 14:09 sciurus

@sciurus I agree... I think we should use the libvirt mechanism to remove the file from /var/lib/libvirt/images/...

Sep 22 '14 16:09 purpleidea

I agree that removing the image from libvirt upon vagrant destroy is absolutely not what users expect so we shouldn't be doing that. the place where it would belong best would be upon vagrant box remove. iiuc vagrant doesn't provide enough hooks for providers to clean things up their own way during that action?

Sep 22 '14 21:09 lelutin

@lelutin the problem is knowing where to clean up.

When you run vagrant box add the box is placed in ~/.vagrant.d/boxes.

Later you run vagrant up in the context of a Vagrantfile, which tells you how to connect to libvirt (either by specifying the various provider options or leaving them out to use the defaults). If the box hasn't been uploaded to libvirt yet, this is done, otherwise the existing verion is uses.

I could run vagrant box add, then run vagrant up with two Vagrantfiles that upload the same box to qemu:///system and qemu+ssh://example.com/system.

Now when I run vagrant box remove vagrant knows it needs to delete it from ~/.vagrant.d/boxes, but how does it know to connect to qemu:///system and qemu+ssh://example.com/system to remove it from there?

Sep 22 '14 22:09 sciurus

Only after writing the above does it occur to me, why does a Vagrantfile have to be the only place where information about how to connect to libvirt is stored? What if there was a file somewhere in ~/.vagrant.d for each box that recorded the libvirt URIs to which that box had been uploaded? We could update this when needed during vagrant up and reference this during vagrant box remove.

What do you think of the idea @mpalmer?

Sep 22 '14 23:09 sciurus

@sciurus That idea has a lot of merit to it. I'm pretty sure Vagrant won't let you remove a box that's in use anywhere within its purview, and if you happen to have used the image for a non-Vagrant-managed VM, libvirt will have something to say about it, I believe (so the code would need to catch that error condition and warn the user that the image wasn't removed from location). Looks like a winner to me. Nice job thinking of it!

Sep 22 '14 23:09 mpalmer

Only after writing the above does it occur to me, why does a Vagrantfile have to be the only place where information about how to connect to libvirt is stored?

Excellent point!

What if there was a file somewhere in ~/.vagrant.d for each box that recorded the libvirt URIs to which that box had been uploaded? We could update this when needed during vagrant up and reference this during vagrant box remove.

I agree with @mpalmer - this sounds like a great idea! There could be a new directory ~/.vagrant.d/boxes/$BOX_NAME/$VERSION/libvirt/hypervisors.d, with one entry per libvirt hypervisor containing a copy of the image.

And it would be used not only by vagrant box remove, but also by vagrant box add --force when replacing an existing box.

Another possibility would be to do it the other way around, i.e. rather than record the URI after the upload, configure the URI prior to upload, e.g.:

Allow libvirt URIs to be registered under ~/.vagrant.d/libvirt/hosts.d, where the filename is the URI descriptor and the file contains the URI itself (and any other connection configuration which may be needed)
Populate ~/.vagrant.d/libvirt/hosts.d/localhost automatically
Extend vagrant box add/remove with a --host option which can reference any file in that directory but defaults to localhost
Allow Vagrantfiles to reference the same URI descriptor

So by default, the user would need no extra work in order to use libvirt on localhost.

I guess the main differences with your approach are:

Pro: Allows updating/removal of a box in/from individual hypervisors
Pro: Extra layer of indirection allows Vagrantfiles to use a different hypervisor URI without modification (a bit like git's insteadOf feature allows you to redirect to a different repo without changing the git-remote configuration)
Con: Maybe slightly more demanding on the user when remote hypervisors are in use, since they need to add the file to hosts.d first.

Sep 23 '14 00:09 aspiers

@aspiers your second idea is interesting.

Would we pull all of the provider options (driver, host, etc) from the Vagrantfile and store them in hosts.d instead?

When running vagrant up, would the user specify the --host to choose where to create the VM?

Sep 28 '14 13:09 sciurus

Would we pull all of the provider options (driver, host, etc) from the Vagrantfile and store them in hosts.d instead?

That's not what I had in mind; when I wrote "Allow libvirt URIs to be registered", I meant manually (except the localhost case, which would be prepopulated out of the box). BTW does Vagrant's plugin framework allow plugins to add new subcommands? E.g. it would be nice to support stuff like

vagrant libvirt host add kvm1 --uri qemu+ssh://[email protected]/system

rather than requiring the user to edit config files under hosts.d directly.

When running vagrant up, would the user specify the --host to choose where to create the VM?

Yeah, that's one way it could work, e.g. --host kvm1 would obtain the hypervisor's URI (and any other info needed) from ~/.vagrant.d/libvirt/hosts.d/kvm1. But not just vagrant up! It would also work for vagrant box add/remove, which is why it's a solution to this issue.

It could also be made possible to specify which host to use via a directive in the Vagrantfile. Support for the existing provider options should be kept for backwards-compatibility; in fact, they could override any settings configured via the hosts.d file (which would default to localhost if --host wasn't specified).

So for example, if ~/.vagrant.d/libvirt/hosts.d/kvm1 contained:

host=kvm1.mydomain.com
connect_via_ssh=true
username=root

and Vagrantfile contained:

config.vm.provider :libvirt do |libvirt|
  libvirt.id_ssh_key_file = "my-ssh-key.dsa"
  username=libvirt
end

and you typed vagrant up --host kvm1 or vagrant box remove foo --host kvm1, it would use the following configuration:

host=kvm1.mydomain.com
connect_via_ssh=true
libvirt.id_ssh_key_file = "my-ssh-key.dsa"
username=libvirt

Sep 29 '14 00:09 aspiers

@aspiers sounds like we have the same understanding of how your proposal would work. Yes, we can add new subcommands.

Both my idea and @aspiers refinement are a bigger project than I'm planning to bite off, but I'm happy to provide testing and feedback to anyone who works on this.

Sep 29 '14 01:09 sciurus

Extend vagrant box add/remove with a --host option which can reference any file in that directory but defaults to localhost

Related to this, I am intrigued by the possibility of no longer storing boxes in ~/.vagrant.d/boxes/ at all but instead having vagrant box always operate directly on a libvirt storage pool.

Nov 01 '14 17:11 sciurus

I could run vagrant box add, then run vagrant up with two Vagrantfiles that upload the same box to qemu:///system and qemu+ssh://example.com/system.

This usecase seems a bit out of scope for what vagrant was intended to be. @sciurus I would be interested in a more detailed example to understand when you need this.

Related to this, I am intrigued by the possibility of no longer storing boxes in ~/.vagrant.d/boxes/ at all but instead having vagrant box always operate directly on a libvirt storage pool.

I like this approach. As purpleidea stated, saving the image twice seems like an inefficient way to go. Most users will not use multiple and/or remote hypervisors.

Jan 16 '15 19:01 dennisklein

Related to this, I am intrigued by the possibility of no longer storing boxes in ~/.vagrant.d/boxes/ at all but instead having vagrant box always operate directly on a libvirt storage pool. I like this approach. As purpleidea stated, saving the image twice seems like an inefficient way to go.

True, it's a nice thought, but ...

Most users will not use multiple and/or remote hypervisors.

In other words, some users will use multiple hypervisors, in which case at least one is remote. And the problem is that Vagrant doesn't support a model where boxes can be stored in different locations. You'd need to extend commands like vagrant box list so that they would support a --hypervisor option or similar, but only for the libvirt provider. And I'm not sure if that's even possible or something Mitchell would want to entertain, because it's a fundamentally different model. Currently box storage is flat, and local.

Jan 16 '15 19:01 aspiers

@dennisklein this isn't about something I need. In that comment I was just trying to explain what the current design allows and how it makes pool cleanup difficult.

If we were going by what I need we would rip out support for remote libvirt altogether. :smile:

Jan 16 '15 21:01 sciurus

I think what we need here is a two pronged approach:

store information about which libvirt URIs have been uploaded to and query the user (y/n) if they'd like to remove them from libvirt on vagrant box remove.
upload the box to the libvirt storage pool based on hash+name
- since versioning isn't supported for boxes that don't get pulled from atlas #294 doesn't help us in all scenarios (see this mail list thread)
- compute a hash (one time upon vagrant box add) and name the volume in the libvirt storage pool accordingly
  - so if a box was originally going to be named 'cdkv2_vagrant_box_image_0.img' in the pool it could now be named cdkv2_vagrant_box_image_0_fb28e33f098abaf356940fdb3aaae417.img

Does this seem reasonable?

Jun 02 '16 14:06 dustymabe

Store information about which libvirt URIs have been uploaded to and query the user (y/n) if they'd like to remove them from libvirt on vagrant box remove.

I would be happy if we can just fix this as our first attempt.

I can see some interesting discussion in the issue, but we do not have a solution yet. I think we have increased the expected fix's scope more than what can be done by a contributor in a reasonable time frame.

Jun 02 '16 15:06 LalatenduMohanty

I would be happy if we can just fix this as our first attempt.

@LalatenduMohanty I think we need both. It's possible that the user will either answer "no" to removing from the libvirt provider or that a libvirt host is down during the time of the vagrant box remove. Identifying by content hash would mitigate this corner case as well.

Jun 02 '16 16:06 dustymabe

@dustymabe @LalatenduMohanty Check out sections 4 and 5 here: https://github.com/hollodotme/Helpers/blob/master/Tutorials/vagrant/self-hosted-vagrant-boxes-with-versioning.md It should be possible host w/ versioning. I did I quick test locally, it does seem to work...

Jun 02 '16 16:06 jasonbrooks