Preparing an install of CentOS to become a VMWare template.
Since being centralised and virtual machine infrastructure being some what new to ITMS, familiar to some, we are homing in on single solutions for systems administration that should provide “wins” in terms of short-order provision of machines for services/applications, development, testing and research/student environments.
We have virtual machine environments in several forms and some colleagues including myself have been preparing for the great day when we converge our infrastructure in to two very reliable data centres. This week my team, of developers, were discussing the old days of installing Windows (not me…) using 20 floppy disks with the occasional bad sector. A week of that would constitute work and was common practice. In the GNU/Linux world, and probably Windows which is losing ground in the server room, dev ops are constantly struggling to get away from that and to move to instant provision of machines ready to run services. It is a problem of scale. At DMU, at last count there was around 800 servers. Re-provisioning those installing one operating system at a time will take a long time. With some preparation, now, around agreed practices we can speed up the move to the new infrastructure.
Why am I looking at this now? There is an instant need to create a server with a LAMP stack on it for DMU Global. That ties in with a need for a WordPress (LAMP) install and a requirement for an LDAP server to supports the Library’s OpenAthens LA service. There are requirements in common and tonnes of choices about the best approach. The requirements are not simply related to the common software components but relate to disk use/partitioning and security of the servers. There is also the opportunity to create a template that can be used by other systems administrators in ITMS. I hope to reduce the amount of work that needs to be done post install so that colleagues can get on with the meat of setting up applications.
When I managed the web team in the library we insisted on secure keys, passphrases and encrypted sessions: command line and file transfer. This, pretty much, is a unique practice in the university but passwords are being proven to be a weak authentication and I think that secure keys are the way to go. There is some flexibility in how SSH can be set up. It is possible to allow passwords to be used by a restricted set of IP addresses. This is something that we should discuss. Passwords need to be changed if someone leaves the business. It might be easier to revoke a secure key.
For fifteen odd years I have separated out custom software, the application (web server) and data from the OS. This has the advantage of being able to update the operating system or even change it without touching those local aspects. A disadvantage is that on the data side those changes need to be reflected. Another systems administrator would have to know about the changes or have my skills and experience to ‘devine’ how the machine is set up, additionally it is harder for those changes to survive an upgrade. Of course, another advantage is that hackers and root kits can not rely on the usual assumptions of a default install. Part 2 of this series will detail those changes but the blog entry will be internal to DMU only.
GNU/Linux is faster when it is paravirtualised. This is when code is shared between the guest virtual machine and the host. Simplistically, more of the CPU is used in the traditional sense giving an efficiency that virtualising via hardware can not achieve. Every solution has its own software for doing this. The software is installed in the guest. In our preparation for the Great Convergence we have used KVM, Virtual Box and very recently oVirt. VMware has its own solution too. We had hoped to move our virtual machines from their original home to the new VMware infrastructure by exporting and importing but if this paravirtualisation software exists on the machine it may confuse things in the VMware environment. A LAMP stack will work with out the software so I am leaving it out of the install for the template. It could be added later if, for example, access to a USB dongle is needed. We are wedded to VMware for a time. Hopefully, something like oVirt or OpenStack will be considered later on. The guest tools will help bits spin faster but the administration overhead of moving between environments might something we want to avoid. The choice for Windows might be different.
Back to security. Some favour the use of sudo and mulitple accounts with passwords. I have a feeling I will not win this one. My small team was very used to being themselves locally but logging in as root to remote machines. All machines had a small number of accounts and only root had a password. This is unusual and might have added to our success in that it is not the natural assumption. Hackers/root kits look for accounts with simple passwords. We made it impossible to login via any of those accounts. Of course, we only have to manage the root account on those machines. Choices like this, local firewall configuration and secure key/passphrases have probably saved us a mountain of trouble as well as increasing the up time of our services.
In our requirements gathering, working with HP, it became obvious that one of our practices will change. Before virtualisation we had virtual servers in the Apache web server. Either one IP address and multiple CNAME aliases were used using HTTP 1.1 or multiple IP addresses were used in order to home several web applications with their own sub/domain names. This, while it feels like a package one administrator would be comfortable with, multiple administrators or new colleagues would have to use their skills to understand the install with any problems that might bring. Virtual machine infrastructure technology makes for very light virtual machines and running more of them, separated services, make administration easier. One meaningful sub/domain, one machine and one configuration. Separating out services allows us to organise services more easily including housing within hosts and in backing them up. If we think about hybrid cloud solutions and bursting it makes sense simplify virtual machines. We used to have a dedicated IP address for the machine and separate IP addresses for web servers on a machine. We will assume, for now, that we will have one IP address for the virtual machine and the service running on it.
There is a choice to be made about disk usage or consumption. PostgreSQL is a better database manage system than MySQL. MySQL is very popular and some software using LAMP only works with MySQL. I could leave the choice the next administrator or make both available and rely on the storage solution to take care of duplicated blocks of data across multiple virtual machines. If the next administrator knows that that the software is installed she can configure the machine to use the installed software and skip the download. I have been using PostgreSQL and MySQL for many years. The greatest advantage is that there are some default tunings that can be made to both DBMSs which will be consistent across all virtual machines if I make the changes for the template. An alternative to this is a wide tree of templates starting with the minimum install linked to many templates: a complete LAMP stack with MySQL, a complete LAMP stack with PostgreSQL, just PostgreSQL, just MySQL, Perl instead of PHP etc.
Backups… I am going to use Amanda for now because I know I can support it for disaster recovery. I’m sure it will get swapped out later but I do not know how it would be provisioned in the mean time. Amanda is free and does not need a licence; quicker and cheaper.
On file system encryption. I’m am not encrypting anything now. This setup will allow the data partition to be encrypted. If we want to encrypt the operating system partition then we need to separate out /boot from /.
Time. We are using NTP as a belt and braces approach to keeping machines in sync. VMware could within its infrastructure guarantee the time but if we teleport a machine to another infrastructure or burst it to a cloud we can not guarantee the time is the same in the new environment.
We are using rsyslog to record interesting changes to files locally. I have also set up the virtual machine template to share changes logged to a remote server. Because this is done for the template every virtual machine created from the template will automatically report to the remote server. That server will be used to generate reports, warnings and help in any compromises should they occur.
I have created a script that can be run before the template virtual machine is shutdown. This cleans up:
/bin/rm -f /etc/ssh/*key*
/bin/rm -f /var/log/*-???????? /var/log/*.gz
/bin/cat /dev/null > /var/log/audit/audit.log
/bin/cat /dev/null > /var/log/wtmp
/bin/rm -f /etc/udev/rules.d/70*
/bin/sed -i '/^\(HWADDR\|UUID\)=/d' /etc/sysconfig/network-scripts/ifcfg-eth0
/bin/rm -rf /tmp/*
/bin/rm -rf /var/tmp/*
/bin/rm -f ~root/.bash_history
and before the machine is shutdown we should run ‘unset HISTFILE’ to prevent the current sessions history being saved.
- Need email for the root user (machine and web server) to go somewhere
- Who should own responsibility for backup of e.g. SQL and disk space?
- We send syslog events to a remote syslog server
- Need to adjust logrotate for non-default logs
- Add webalizer for web statistics later on?
In doing the work I will list parts of the web that have influenced the design:
A year later (update for 2016)
We now have some experience of working with VMware at scale. We have gained experience in how resources are used and some interesting things have come up. Backups are interesting. We don’t have no implemented a solution yet (licensed or free) that will snap a MySQL database and the filesystem so that we have a consistent backup. We, therefore, spit the SQL out nightly while the application is in maintenance mode and have that backed up by our backup solution. Some of our services are getting big! DMU Commons has grown by 50% in the last term. The previous size represented five years of the service. Backing up the VM takes, relatively, a long time. Most VMs are 50GB where The Commons is 200GB. We want to move the users’ content to a central store and mount it by NFS. This gives us quicker backups and the ability to easily tune the volume size. But, WordPress has the application and the data under the same directory. We need to engineer the disk layout to support WordPress as best we can. That is it need to make sense to the next tech who is asked to look at it. We are looking at:
- /, /tmp, /boot, swap on one volume group, disk, controller
- /dbms on one volume group, disk, controller to support MySQL and PostgreSQL
- /usr1 on one volume group, disk, controller for application/data
- /usr2, possibly, in case the service grows to a size that user data should be moved.
We are currently looking to implement SAP, SAP recommend separate disk controllers for performance reasons.
The service is being heavily used both by creators and consumers. We host a web analytics service on the same VM. Running reports uses lots of RAM and CPU. We see a need therefore to move the stats service away from the VM. This is another reason why services should be split one per VM.