OpenStack and provider networks

VM images in OpenStack (and by extension all cloud providers) need to do two things to get configured after boot:

  1. Get an IP (v4) through DHCP
  2. Download its initial config by sending a request to 169.254.169.254. A tool such as cloud-init takes care of this. More information at the Amazon EC2 documentation.

In a “normal” OpenStack deployment with a Network node that performs L3 functions (routing), requests to 169.254.169.254 are redirect with NAT to the metadata-proxy, which on its turn proxies the requests to nova-api.

If you want to use OpenStack merely as a management layer on top of compute and storage with a little touch of networking (and skip self-service networks) provider networks are used. The OpenStack admin defines these networks statically: the NIC to use, VLAN ID, subnet, gateway, … This means that the default gateway is not controlled by OpenStack and therefore metadata requests cannot be redirected.

OpenStack has a solution for that. You have to use the OpenStack dhcp service and set enable_isolated_metadata to true in /etc/neutron/dhcp_agent.ini This will pass a host route for 169.254.169.254/16 with the DHCP offer (option 121). The metadata-proxy will also listen for requests to this IP in the DHCP server network namespace on the network node.

Ceph, OpenStack and fstrim

Ceph is a popular storage backend for OpenStack. It allows you to use the same storage for images, ephemeral disks and volumes. Each Ceph image is also thinly provisioned (by default in blocks of 4M). Until recently, once storage was allocated the filesystem in the virtual machine had no method to release it back. In recent Ceph and Openstack releases it is possible to use fstrim inside the virtual machine to release disks blocks. fstrim was not supported in older operating systems such as Ubuntu 14.04, but Ubuntu 16.04 and CentOS 7 both support it.

Lets try it:

In the virtual machine, 5.4G is used. Before the df I ran sync and fstrim /

vm$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G  5.4G   45G  11% /

The real usage of the vm disks (diff against the stock centos 7 image):

$ rbd diff vms/ae445f4c-38e7-446d-9edd-618533632469_disk | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }'
7467.23 MB

This is already quite a lot more. This is a virtual machine that has been in use for some time. So probably not all deleted data has been released by the filesystem.

Add a file of 1G and sync it to disk:

vm$ dd if=/dev/zero of=test.img bs=4M count=250
250+0 records in
250+0 records out
1048576000 bytes (1.0 GB) copied, 1.71766 s, 610 MB/s
vm$ sync
vm$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G  6.4G   44G  13% /

The real usage increased with 824M. This is not the full 1G but close:

$ rbd diff vms/ae445f4c-38e7-446d-9edd-618533632469_disk | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }'
8291.23 MB

When we delete the file, only a few MB are released from Ceph storage:

vm$ rm -f test.img 
vm$ sync
vm$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        50G  5.4G   45G  11% /

Ceph usage:

$ rbd diff vms/ae445f4c-38e7-446d-9edd-618533632469_disk | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }'
8283.23 MB

After fstrim:

vm$ fstrim /

Ceph usage:

$ rbd diff vms/ae445f4c-38e7-446d-9edd-618533632469_disk | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }'
7107.23 MB

This is even less than before the 1G test file. This probably due to additional fs blocks that are released.

Ceph rbd image size

Another Ceph related command: getting the disk usage of a rbd based image. There is no simple command to get the real disk usage of an rbd based image. That is because in an OpenStack deployment with Ceph, your rbd images are often a snapshot of a OS image or another disk. On the Ceph blog I found the following:

$ rbd diff rbd/tota | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }'

rdb/tota is the path to the disk, with rbd the pool name and tota the image name. In an OpenStack deployment you will probably see the pools vms, images and volumes. You can get a listing of the images in each pool with:

$ rbd -p $poolname ls

For example:

$ rbd -p volumes ls
volume-bb85c32b-a003-405d-8029-cf02b0e9eec9
volume-003e8ed8-1e9a-4344-9b97-34ee440a1f35
volume-006c6bed-0407-4334-9bfa-678fc3c2de90
volume-01652b4d-1d70-487e-9b8f-a5bcf6f7fc8f
volume-02489e8f-12fa-4a62-ada5-999fdae7c1f0
volume-036f1366-4df1-4154-8c21-968a29e09db1

In the volumes pool (managed by cinder) the default name is volume-${volumeid}. For virtual machines ephemeral disks (managed by nova) this is ${vm id}_disk and for vm images (managed by glance) this is only the id of the image without any prefix of suffix.

Getting the real size of an rbd image is than for example:

$ rbd diff volumes/volume-a5caf687-bbb0-45ca-a423-7a2ce2f5004d | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }'
52847.6 MB