I personally have a numerous number of hosts which I sometimes have to SSH to. It can get rather confusing and inefficient if you get lost among them.
I’m going to show you here how you can get your SSHing to be heaps more efficient with just 5 minutes of your time.
.ssh/config
In $HOME/.ssh/config I usually store all my hosts in such a way:
Host host1
Port 1234
User root
HostName host1.potentially.very.long.domain.name.com
Host host2
Port 5678
User root
HostName host2.potentially.very.long.domain.name.com
Host host3
Port 9012
User root
HostName host3.potentially.very.long.domain.name.com
You obviously got the idea. So if I’d like to ssh to host2, all I have to do is:
ssh host2
That will ssh to root@host2.potentially.very.long.domain.name.com:5678 – saves a bit of time.
I usually manage all of my hosts in that file. Makes life simpler, even use git if you feel like it…
Auto complete
I’ve added to my .bashrc the following:
_ssh_hosts() {
local cur="${COMP_WORDS[COMP_CWORD]}"
COMPREPLY=()
local ssh_hosts=`grep ^Host ~/.ssh/config | cut -d' ' -f2 | xargs`
[[ ! ${cur} == -* ]] && COMPREPLY=( $(compgen -W "${ssh_hosts}" -- ${cur}) )
}
complete -o bashdefault -o default -o nospace -F _ssh_hosts ssh 2>/dev/null \
|| complete -o default -o nospace -F _ssh_hosts ssh
complete -o bashdefault -o default -o nospace -F _ssh_hosts scp 2>/dev/null \
|| complete -o default -o nospace -F _ssh_hosts scp
Sweet. All that you have to do now is:
$ ssh TAB TAB
host1 host2 host3
We are a bit more efficient today.
SSH is amazing
Show me one unix machine today without SSH. It’s everywhere, for a reason.
OpenSSH specifically allows you to do so much with it. What would we have done without SSH?
OpenSSH Tunnelling and full VPN
Tunnelling with SSH is really cool, utilizing the secure SSH connection you can virtually secure any TCP/IP connection using port forwarding (-R and -L):
http://www.openssh.org/faq.html#2.11
However for full VPN support, you can use -w which opens a tun/tap device on both ends of connection, allowing you potentially to have all of your network passing via your SSH connection. In other words – full VPN support for free!!!
Server configuration
On the server, the configuration would be minimal:
- Allow tunnelling in sshd configuration
echo 'PermitTunnel=yes' >> /etc/ssh/sshd_config
service sshd reload
Allow forwarding
-I FORWARD -i tun+ -j ACCEPT
-I FORWARD -o tun+ -j ACCEPT
-I INPUT -i tun+ -j ACCEPT
-I POSTROUTING -o EXTERNAL_INTERFACE -j MASQUERADE
echo 1 > /proc/sys/net/ipv4/ip_forward
That’s all!! Congratulations on your new VPN server!!
Client configuration (your personal linux machine)
These 2 commands will configure you with a very simple VPN (run as root!!!):
ssh -f -v -o Tunnel=point-to-point \
-o ServerAliveInterval=10 \
-o TCPKeepAlive=yes \
-w 100:100 root@YOUR_SSH_SERVER \
'/sbin/ifconfig tun100 172.16.40.1 netmask 255.255.255.252 pointopoint 172.16.40.2' && \
/sbin/ifconfig tun100 172.16.40.2 netmask 255.255.255.252 pointopoint 172.16.40.1
The only downside of this awesome VPN is that you have to be root on both ends.
But this whole setup is rather clumsy, lets use some UI for that, no?
NetworkManager-ssh
Somewhere in time, after intensively working in a company dealing with VPNs (but no SSH VPNs at all) I was looking at my taskbar at NetworkManager and thinking “Hey! There’s an OpenVPN, PPTP and IPSEC plugin for NetworkManager, why not build a SSH VPN plugin?”
And hell, why not?
I started searching the Internet frantically, believing that someone already implemented that ingenious idea (like most good ideas), but except for one mailing list post from a few years ago where someone suggested to implement it – nada.
Guess it’s my prime time. Within a week of forking the code of NetworkManager-openvpn (the NetworkManager OpenVPN plugin) I managed to get something that actually works (ssh-agent authentication only). I was surprised because I’ve never dealt with glib/gtk infrastructure not to mention UI programming (I’m a pure backend/infrastructure developer for the most of it).
And today?
I’m writing this post perhaps 2 months after I started development and committed my first alpha release. While writing this post I’m trying to submit NetworkManager-ssh to fedora (fedora-extras to be precise).
Getting into the bits and bytes behind it is redundant, all that you have to know is that the source is available here:
https://github.com/danfruehauf/NetworkManager-ssh
It compiles easily into a RPM or DEB for your convenience. I urge you to give it a shot and please open me issues on github if you find any.
Cloud computing and being lazy
The need to create template images in our cloud environment is obvious. Especially with Amazon EC2 offering an amazing API and spot instances in ridiculously low prices.
In the following post I’ll show what I am doing in order to prepare a “puppet-ready” image.
Puppet for the rescue
In my environment I have puppet configured and provisioning any of my machines. With puppet I can deploy anything I need – “if it’s not in puppet – it doesn’t exist”.
Coupled with Puppet dashboard the interface is rather simple for manually adding nodes. But doing stuff manually is slow. I assume that given the right base image I (and you) can deploy and configure that machine with puppet.
In other words, the ability to convert a bare machine to a usable machine is taken for granted (although it is heaps of work on its own).
Handling the “bare” image
Most cloud computing providers today provide you (usually) with an interface for starting/stopping/provisioning machines on its cloud.
The images the cloud providers are usually supplying are bare, such as CentOS 6.3 with nothing. Configuring an image like that will require some manual labour as you can’t even auto-login to it without some random password or something similar.
Create a “puppet ready” image
So if I boot up a simple CentOS 6.x image, these are the steps I’m taking in order to configure it to be “puppet ready” (and I’ll do it only once per cloud computing provider):
# install EPEL, because it's really useful
rpm -q epel-release-6-8 || rpm -Uvh http://download.fedoraproject.org/pub/epel/6/`uname -i`/epel-release-6-8.noarch.rpm
# install puppet labs repository
rpm -q puppetlabs-release-6-6 || rpm -ivh http://yum.puppetlabs.com/el/6/products/i386/puppetlabs-release-6-6.noarch.rpm
# i usually disable selinux, because it's mostly a pain
setenforce 0
sed -i -e 's!^SELINUX=.*!SELINUX=disabled!' /etc/selinux/config
# install puppet
yum -y install puppet
# basic puppet configuration
echo '[agent]' > /etc/puppet/puppet.conf
echo ' pluginsync = true' >> /etc/puppet/puppet.conf
echo ' report = true' >> /etc/puppet/puppet.conf
echo ' server = YOUR_PUPPETMASTER_ADDRESS' >> /etc/puppet/puppet.conf
echo ' rundir = /var/run/puppet' >> /etc/puppet/puppet.conf
# run an update
yum update -y
# highly recommended is to install any package you might deploy later on
# the reason behind it is that it will save a lot of precious time if you
# install 'httpd' just once, instead of 300 times, if you deploy 300 machines
# also recommended is to run any 'baseline' configuration you have for your nodes here
# such as changing SSH port or applying common firewall configuration for instance
yum install -y MANY_PACKAGES_YOU_MIGHT_USE
# and now comes the cleanup phase, where we actually make the machine "bare", removing
# any identity it could have
# set machine hostname to 'changeme'
hostname changeme
sed -i -e "s/^HOSTNAME=.*/HOSTNAME=changeme" /etc/sysconfig/network
# remove puppet generated certificates (they should be recreated)
rm -rf /etc/puppet/ssl
# stop puppet, as you should change the hostname before it will be permitted to run again
service puppet stop; chkconfig puppet off
# remove SSH keys - they should be recreated with the new machine identity
rm -f /etc/ssh/ssh_host_*
# finally add your key to authorized_keys
mkdir -p /root/.ssh; echo "YOUR_SSH_PUBLIC_KEY" > /root/.ssh/authorized_keys
Power off the machine and create an image. This is your “puppet-ready” image.
Using the image
Now you’re good to go, create a new image from that machine and any machine you’re going to create in the future should be based on that image.
When creating a new machine the steps you should follow are:
- Start the machine with the “puppet-ready” image
- Set the machine’s hostname
hostname=uga.bait.com
hostname $hostname
sed -i -e "s/^HOSTNAME=.*/HOSTNAME=$hostname/" /etc/sysconfig/network
Run ‘puppet agent –test’ to generate a new certificate request
Add the puppet configuration for the machine, for puppet dashboard it’ll be something similar to:
hostname=uga.bait.com
sudo -u puppet-dashboard RAILS_ENV=production rake -f /usr/share/puppet-dashboard/Rakefile node:add name=$hostname
sudo -u puppet-dashboard RAILS_ENV=production rake -f /usr/share/puppet-dashboard/Rakefile node:groups name=$hostname groups=group1,group2
sudo -u puppet-dashboard RAILS_ENV=production rake -f /usr/share/puppet-dashboard/Rakefile node:parameters name=$hostname parameters=parameter1=value1,parameter2=value2
Authorize the machine in puppetmaster (if autosign is disabled)
Run puppet:
# initial run, might actually change stuff
puppet agent --test
service puppet start; chkconfig puppet on
This is 90% of the work if you want to quickly create usable machines on the fly, it shortens the process significantly and can be easily implemented to support virtually any cloud computing provider!
I personally have it all scripted and a new instance on EC2 takes me 2-3 minutes to load + configure. It even notifies me politely via email when it’s done.
I’m such a lazy bastard.
Continued from my previous article at:
https://bashinglinux.wordpress.com/2013/02/18/bumblebee-and-fc18-a-horror-show/
This is a little manual about running primus with the previous setup I’ve suggested.
Packages
Pretty simple:
yum install glibc-devel.x86_64 glibc-devel.i686 libX11-devel.x86_64 libX11-devel.i686
We should be good to go in terms of packages (both x86_64 and i686)
Download and compile primus
Clone from github:
cd /tmp && git clone https://github.com/amonakov/primus.git
Compiling for x86_64:
export PRIMUS_libGLd='/usr/lib64/libGL.so.1'
export PRIMUS_libGLa='/usr/lib64/nvidia/libGL.so.1'
LIBDIR=lib64 make
unset PRIMUS_libGLd PRIMUS_libGLa
And for i686 (32 bit):
export PRIMUS_libGLd='/usr/lib/libGL.so.1'
export PRIMUS_libGLa='/usr/lib/nvidia/libGL.so.1'
CXX=g++\ -m32 LIBDIR=lib make
unset PRIMUS_libGLd PRIMUS_libGLa
Running
Running with x86_64:
cd /tmp/primus && \
LD_LIBRARY_PATH=/usr/lib64/nvidia:lib64 ./primusrun glxspheres
Untested by me, but that should be the procedure for i686 (32 bit):
cd /tmp/primus && \
LD_LIBRARY_PATH=/usr/lib/nvidia:lib ./primusrun YOUR_32_BIT_OPENGL_APP
Preface
I’ve seen numerous posts about how to get bumlebee, optirun and nvidia to run on Fedora Core 18, the only problem was that all of them were using the open source (and somewhat slow) nouveau driver.
I wanted to use the official Nvidia binary driver which is heaps faster.
My configuration is a Lenovo T430s with a NVS 5200M.
Following is a tick list of things to do to get it running (at least on my configuration with FC18 x86_64).
Installing the Nvidia driver
The purpose of this paragraph is to show you how to install the Nvidia driver without overwriting your current OpenGL libraries. Simply download the installer and run:
yum install libbsd-devel dkms
./NVIDIA-Linux-x86_64-XXX.XX.run --x-module-path=/usr/lib64/xorg/nvidia --opengl-libdir=lib64/nvidia --compat32-libdir=lib/nvidia --utility-libdir=lib64/nvidia --no-x-check --disable-nouveau --no-recursion
Even though we ask it to disable nouveau, it still wouldn’t.
This method will not ruin all the good stuff in /usr/lib and /usr/lib64.
Disabling nouveau
Disabling nouveau is rather simple, we need to blacklist it and remove it from initrd:
echo "blacklist nouveau" > /etc/modprobe.d/nvidia.conf
dracut /boot/initramfs-$(uname -r).img $(uname -r) --omit-drivers nouveau
Good on us. You may either reboot now to verify that nouveau is out of the house, or manually rmmod it:
rmmod nouveau
Bumblebee and all the rest
Install VirtualGL:
yum --enablerepo=updates-testing install VirtualGL
Remove some rubbish xorg files:
rm -f /etc/X11/xorg.conf
Download bbswtich and install with dkms:
cd /tmp && wget https://github.com/downloads/Bumblebee-Project/bbswitch/bbswitch-0.5.tar.gz
tar -xf bbswitch-0.5.tar.gz
cp -av bbswitch-0.5 /usr/src
ln -s /usr/src/bbswitch-0.5/dkms/dkms.conf /usr/src/bbswitch-0.5/dkms.conf
dkms add -m bbswitch -v 0.5
dkms build -m bbswitch -v 0.5
dkms install -m bbswitch -v 0.5
Download bumblebee and install:
cd /tmp && wget https://github.com/downloads/Bumblebee-Project/Bumblebee/bumblebee-3.0.1.tar.gz
tar -xf bumblebee-3.0.1.tar.gz
cd bumblebee-3.0.1
./configure --prefix=/usr --sysconfdir=/etc
make && make install
cp scripts/systemd/bumblebeed.service /lib/systemd/system/
sed -i -e 's#ExecStart=.*#ExecStart=/usr/sbin/bumblebeed --config /etc/bumblebee/bumblebee.conf#g' /lib/systemd/system/bumblebeed.service
chkconfig bumblebeed on
Bumblebee configuration is at /etc/bumblebee/bumblebee.conf, edit it to have this:
[bumblebeed]
VirtualDisplay=:8
KeepUnusedXServer=false
ServerGroup=bumblebee
TurnCardOffAtExit=false
NoEcoModeOverride=false
Driver=nvidia
[optirun]
VGLTransport=proxy
AllowFallbackToIGC=false
[driver-nvidia]
KernelDriver=nvidia
Module=nvidia
PMMethod=bbswitch
LibraryPath=/usr/lib64/nvidia:/usr/lib/nvidia:/usr/lib64/xorg/nvidia
XorgModulePath=/usr/lib64/xorg/nvidia/extensions,/usr/lib64/xorg/nvidia/drivers,/usr/lib64/xorg/modules
XorgConfFile=/etc/bumblebee/xorg.conf.nvidia
The default /etc/bumblebee/xorg.conf.nvidia which comes with bumblebee is ok, but in case you want to make sure, here is mine:
Section "ServerLayout"
Identifier "Layout0"
Option "AutoAddDevices" "false"
EndSection
Section "Device"
Identifier "Device1"
Driver "nvidia"
VendorName "NVIDIA Corporation"
Option "NoLogo" "true"
Option "UseEDID" "false"
Option "ConnectedMonitor" "DFP"
EndSection
Add yourself to bumblebee group:
groupadd bumblebee
usermod -a -G bumblebee YOUR_USERNAME
Restart bumblebeed:
systemctl restart bumblebeed.service
Testing
To run with your onboard graphics card:
glxspheres
Running with nvidia discrete graphics:
optirun glxspheres
Useful links
Inspiration from:
http://duxyng.wordpress.com/2012/01/26/finally-working-nvidia-optimus-on-fedora/
Download links:
https://github.com/Bumblebee-Project/Bumblebee/downloads
https://github.com/Bumblebee-Project/bbswitch/downloads
http://www.nvidia.com/Download/index.aspx
If it works for you, please let me know!!
And if it doesn’t, perhaps I could help.
Have decided to publish the infamous Bash scripting conventions.
Here they are:
https://github.com/danfruehauf/Scripts/tree/master/bash_scripting_conventions
Please, comment, challenge and help me modify it. I’m very open for feedback.
It’s also been a long time since I’ve played any computer interactive game. Unfortunately a work colleague introduced me to EVE Online.
I’m usually playing EVE on Microsoft Windows, which I believe is the best platform for PC gaming.
It’s been a while since I dealt with WINE. In the old days WINE was very complicated to deal with.
I thought I should give it a try – EVE Online on CentOS.
This is a short, semi-tutorial post about how to run EVE Online on CentOS.
It’s fairly childish so even very young Linux users will be able to understand it easily.
Let’s go (as root):
# cat > /tmp/epel.conf <<EOF
[epel]
name=\$releasever - \$basearch - epel
baseurl=http://download.fedora.redhat.com/pub/epel/5/x86_64/
enabled=1
EOF
# yum -y -c /tmp/epel.conf install wine
Let’s get EVE Online (from now there’s no need for root user access):
$ cd /tmp
$ wget http://content.eveonline.com/EVE_Premium_Setup_XXXXXX_m.exe
XXXXXX is obviously the version number, which is subject to change.
Let’s install EVE:
$ wine /tmp/EVE_Premium_Setup_XXXXXX_m.exe
OK, here’s the tricky part, if you’ll run it now, the EULA page will not display properly and you won’t be able to accept it. This is because it needs TrueType fonts.
We’ll need to install the package msttcorefonts, a quick look at google suggest you can follow the instructions found here.
Let’s configure the fonts in wine:
$ for font_file in `rpm -ql msttcorefonts`; do ln -s $font_file /home/dan/.wine/drive_c/windows/Fonts; done
Run EVE:
$ wine /home/dan/.wine/drive_c/Program Files/CCP/EVE/eve.exe
It’ll also most likely add a desktop icon for you, in case you didn’t notice.
EVE works nicely with WINE, an evident that WINE has gone a very long way since the last time I’ve used it!!
I believe these instructions can be generalized quite easily for recent fedora distros just as well.
\o/
Feel free to contact me on this issue in case you encounter any problems.
I still remember my Linux nightmares of the previous century. Trying to install Linux and wiping my whole HD while trying to multi boot RedHat 5.0.
It was for a reason that they said you have to be a rocket scientist in order to install Linux properly.
Times have changed, Linux is easy to install. Perhaps two things are different, one is that objectively Linux became much easier to handle and the second is probably the fact I gained much more experience.
In my opinion – one of the reasons that Linux became easier along the years is the improving support for various device drivers. For the home users – it is excellent news. However, for the SysAdmin who deals mostly with servers and some high-end devices, the headache, I believe, still exists.
If you thought that having a NIC without a driver is a problem, I can assure you that having a RAID controller without a driver is ten times the headache.
I bring you here the story of the RocketRAID device, how to remaster initrd and driver disks and of course, how to become a rocket scientist!
WTF
With Centos 5.4 you get an ugly error in the middle of the installation saying you have no devices you can partition.
DOH!!! Because it discovered no HDs.
So now you’re asking yourself, where am I going? – Google of course.
RocketRAID 3530 driver page
And you discover you have drivers only for RHEL/CentOS 5.3. Oh! but there’s also source code!
It means we can do either of both:
- Remaster initrd and insert the RocketRAID drivers where needed
- Create a new driver disk and use it
I’ll show how we do them both.
I’ll assume you have the RocketRAID driver compiled for the installation kernel.
In addition, I’m also going to assume you have a network installation that’s easy to remaster.
Remastering the initrd
What do we have?
# file initrd.img
initrd.img: gzip compressed data, from Unix, last modified: Sun Jul 26 17:39:09 2009, max compression
I’ll make it quicker for you. It’s a gzipped cpio archive.
Let’s open it:
# mkdir initrd; gunzip -c initrd.img | (cd initrd && cpio -idm)
12113 blocks
It’s open, let’s modify what’s needed.
- modules/modules.alias – Contains a list of PCI device IDs and the module to load
- modules/pci.ids – Common names for PCI devices
- modules/modules.dep – Dependency tree for modules (loading order of modules)
- modules/modules.cgz – The actual modules inside this initrd
Most of the work was done for us already in the official driver package from HighPoint.
Edit modules.alias and add there the relevant new IDs:
alias pci:v00001103d00003220sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003320sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003410sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003510sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003511sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003520sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003521sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003522sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003530sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003540sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003560sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004210sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004211sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004310sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004311sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004320sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004321sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004322sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004400sv*sd*bc*sc*i* hptiop
This was taken from the RHEL5.3 package on the HighPoint website.
So now the installer (anaconda) knows it should load hptiop for our relevant devices. But it needs the module itself!
Download the source package and do the usual configure/make/make install – I’m not planning to go into it. I assume you now have your hptiop.ko compiled against the kernel version the installation is going to use.
OK, so the real deal is in modules.cgz, let’s open it:
# file modules/modules.cgz
modules/modules.cgz: gzip compressed data, from Unix, last modified: Sat Mar 21 15:13:43 2009, max compression
# mkdir /tmp/modules; gunzip -c modules/modules.cgz | (cd /tmp/modules && cpio -idm)
41082 blocks
# cp /home/dan/hptiop.ko /tmp/modules/2.6.18-164.el5/x86_64
Now we need to repackage both modules.cgz and initrd.img:
# (cd /tmp/modules && find . -print | cpio -c -o | gzip -c9 > /tmp/initrd/modules/modules.cgz)
41083 blocks
# (cd /tmp/initrd && find . -print | cpio -c -o | gzip -c9 > /tmp/initrd-with-rr.img)
Great, use initrd-with-rr.img now for your installation, it should load your RocketRAID device!
A driver disk
Creating a driver disk is much cleaner in my opinion. You do not remaster a stock initrd just for a stupid driver.
So you ask what is a driver disk? – Without going into the bits and bytes, I’ll just say that it’s a brilliant way of incorporating a custom modules.cgz and modules.alias without touching the installation initrd at all!
I knew I couldn’t live quietly with the initrd remaster so choosing the driver disk (dd in short) option was inevitable.
As I noted before, HighPoint provided me only a RHEL/CentOS 5.3 driver disk (and binary), but they also provided the source. I knew it was a matter of some adjustments to get it to work also for 5.4.
It is much easier to approach the driver disk now as we are much more familiar with how the installation initrd works.
I’m lazy, I already created a script that takes the 5.3 driver package and creates a dd:
#!/bin/bash
# $1 - driver_package
# $2 - destination of driver disk
make_rocketraid_driverdisk() {
local driver_package=$1; shift
local destination=$1; shift
local tmp_image=`mktemp`
local tmp_mount_dir=`mktemp -d`
dd if=/dev/zero of=$tmp_image count=1 bs=1M && \
mkdosfs $tmp_image && \
mount -o loop $tmp_image $tmp_mount_dir && \
tar -xf $driver_package -C $tmp_mount_dir && \
umount $tmp_mount_dir && \
local -i retval=$?
if [ $retval -eq 0 ]; then
cp -aL $tmp_image $destination
chmod 644 $destination
echo "Driver disk created at: $destination"
fi
rm -f $tmp_image
rmdir $tmp_mount_dir
return $retval
}
make_rocketraid_driverdisk rr3xxx_4xxx-rhel_centos-5u3-x86_64-v1.6.09.0702.tgz /tmp/rr.img
Want it for 5.4? – easy. Just remaster the modules.cgz that’s inside rr3xxx_4xxx-rhel_centos-5u3-x86_64-v1.6.09.0702.tgz and replace it with a relevant hptiop.ko module 🙂
Edit your kickstart to load the driver disk:
driverdisk --source=http://UGAT/HA/BAIT/INC/HighPoint/RocketRAID/3xxx-4xxx/rr3xxx-4xxx-2.6.18-164.el5.img
Make sure you have this line in the main section and not meta generated in your %pre section as the driverdisk directive is being processed before the %pre section.
The OS doesn’t boot after installation
You moron! This is because the installation kernel/initrd and the one that boots afterwards are not the same!
You can fix it in one of the 3 following ways:
- Recompile the CentOS/RHEL kernel and repackage it with the RocketRAID driver – pretty ugly, not to mention time consuming.
- Build a module RPM for the specific kernel version you’re going to use – very clean but also very time consuming!
- Just build the module for the relevant kernel in the %post section – my way.
In the %post section of your kickstart, add the following:
(cd /tmp && \
wget http://UGAT/HA/BAIT/INC/rr3xxx_4xxx-linux-src-v1.6-072009-1131.tar.gz && \
tar -xf rr3xxx_4xxx-linux-src-v1.6-072009-1131.tar.gz && \
cd rr3xxx_4xxx-linux-src-v1.6 && \
make install)
The next boot obviously have a different initrd image. Generally speaking, initrd creation is done after the %post section, so you should not bother about it too much…
Server should boot now. Go play with your 12x2TB RAID array.
I hope I could teach you something in this post. It was a hell of a war discovering how to properly do all of these.
Now if you’ll excuse me – I’ll be going to play with spaceships and shoot rockets!
Being the one that is responsible for the backups at work – I never compromised on anything.
As a SysAdmin you must:
- Backup
- Backup some more
- Test your backups
Shoot to maim
At first we mainly needed to backup our subversion repository. A pretty easy task for any SysAdmin.
What I would do is simply dump the repository at night, and scp it to two other workstations of developers in the company (I didn’t really have much of a choice in terms of other computers in the network).
It worked.
The golden goose is on the loose
After a while I managed to convince our R&D manager it is time for detachable backups. Detachable backups can save you in case the building is on fire or if someone decides to shoot a rocket on your building (unlikely even in Israel, but as a SysAdmin – never take any chances).
With the virtual threat of a virtual rocket that might incinerate all of our important information, we decided that the cheapest and most effective way of action is to purchase a tape drive and a few tapes. Mind you, the year is 2006 and portable HDs are expensive, uncommon and small.
Backing up to a tape has always been something on my TODO list that I had to tick.
During one of my previous jobs, we had a tape archive that took care of it transparently, it was managed by a different team. Ever since I had always had yearnings for the /dev/tape that’s totally sequential.
It was very soon that I’d discovered that these tapes are a plain headache:
- It is a mess in general to deal with the tapes as the access is sequential
- It’s slow!!!
- The only reasonable way to deal with the backups is with dumpe2fs – it’s an archaic tool it’s archaic and work only on the extX filesystem family!
- It takes a while to eject the tape, I can still remember the minutes of waiting in the servers room for the tape to eject, so I can deposit it at home
- The tapes tend to break! like any tape, the film tends to run away from the bearings, rendering the tape useless
Too bad our backup was far from being able to fit on a DVD media.
The glamour, the fortune, the pain
The tape backup held us good for more than 2 years. I was so happy the solution was robust enough to keep us running for that much time without the need of any major changes.
But portable USB HDs became cheaper and larger and it was time for a change. I was excited to receive two brand new and shiny 500GB HDs. I diligently worked on a new backup script. A backup script that would not be dependant on the filesystem type (hell! i wanted to use JFS!), a backup script that would have snapshots weeks back! a backup script that would rule them all!
This backup script will hopefully be published in one of my next posts.
I felt like king of the world, backups became easy, I was much more confident with the new backup as the files could be seen on the mounted HD easily, in contrast to the sequential tape and the binary filesystem dump.
Backups ran manually by me during the day. I inspected them carefully and was pleased.
It was time for the backup to take place at night. And so it was.
From time to time I would get in the backup log:
Input/output error
At first I didn’t pay much attention.
WTF?! are my HDs broken?! – no way, they are brand new and it happened on both of them. But dmesg also showed some nasty information while accessing the HDs.
I started to trigger the backups manually at day time. Not a single error.
Backups went back to night time.
At the morning I would issue a ls:
# ls /media/backup
Input/output error
# ls /media/backup
daily.1 daily.2 daily.3 daily.4 daily.5 daily.6 daily.7
What the hell is going on around here?! – first command fails but the second succeeds?
First command also used to lag for a while, where the second breezed out. I discovered only later it was a key hint.
My backup creates many links using “cp -aL” (in order to preserve snapshots), I had a speculation I might be messing the filesystem structure with too many links to the same inode – unlikely, but I was shooting at all directions, I was clueless.
So there I go, easing the backups up and eliminating the snapshot functionality. Guess what? – still errors on the backup.
What do I do next? Do I stay up at night just to witness the problem in real time?!, Don’t laugh, a friend of mine actually had to do it once in other occasions.
At this time I already introduced this issue to all of my fellow SysAdmin friends. None of them had any idea. I can’t blame them.
I was frustrated, even the archaic tape backups worked better than the HDs, is newer always better? – perhaps not.
I recreated the filesystem on the portable HDs as ext3 instead of JFS, maybe JFS is buggy?
I’ll save you the trouble. JFS is far from being buggy and it had also nothing to do with crond.
We’ll show the unbelievers
For days I’d watch the nightly email the backup would produce, notice the failure and rerun it manually during the day. Until one day.
It had struck me like a lightning on a sunny day.
The second command would always succeed on the device. What if this HD is a little tired?
What if the portable HD goes to sleep and is having problems waking up?
It’s worth trying.
# sdparm --set=STANDBY=0 /dev/sdb
# sdparm --save /dev/sdb
What do you say? – It worked!
It appears that some USB HDs go to sleep and doesn’t wake up nicely when they should.
Should I file a bug about it? Was it the hardware that malfunctioned?
I was so happy this issue was solved – I never cared about either.
Maybe after crafting this post – it is time to care a little more though.
As the madmen play on words and make us all dance to their song…
I’m sitting at my desk, receiving the nightly email informing me the backup was successful. The portable HDs now also utilize an encrypted filesystem. The backup never fails.
I look at my watch, drink a glass of wine and rejoice.
Introduction
Landing in a new startup company has its cons and pros.
The pros being:
- You can do almost whatever you want
The cons:
- You have to do it from scracth!
The Developers
Linux developers are not dumb. They can’t be. If they were dumb, they couldn’t have developed anything on Linux. They might have been called developers on some other platforms.
I was opted quite early about the question of:
“Am I, as a SysAdmin, going to give those Linux developers root access on their machines?”
Why not:
- They can cause a mess and break their system in a second.
A fellow developer (the chowner) who ran:
# chown -R his_username:his_group *
He came to me saying “My Linux workstation stopped working well!!!”
Later on I also discovered he was at /, when performing this command! 🙂
For his defence he added: “But I stopped the command quickly! after I saw the mistake!”
- And there’s no 2, I think this is the only main reason, given that these are actually people I generally trust.
Why yes:
- They’ll bother me less with small things such as mounting/umounting media.
- If they need to perform any other administrative action – they’ll learn from it.
- Heck, it’s their own workstation, if they really want, they’ll get root access, so who am I to play god with them?
Choosing the former and letting the developers rejoice with their root access on their machines, I had to perform some proactive actions in order to avoid unwanted situations I might encounter.
Installation
Your flavor of installation should be idempotent, in terms of letting the user destroy his workstation, but still be able to reinstall and get to the same position.
Let’s take for example the chowner developer. His workstation was ruined. I never even thought of starting to change back permissions to their originals. It would cause much more trouble in the long run than any good.
We reinstalled his workstation and after 15 minutes he was happy again to continue development.
Automatic network installations are too easy to implement today on Linux. If you don’t have one, you must be living in the medieval times or so.
I can give you one suggestion though about partitioning – make sure your developers have a /home on a different partition. It’ll be easier when reinstalling to preserve /home and remove all the rest.
Consolidating software
I consider installing non-packaged software on Linux a very dirty action.
The reasons for that are:
- You can’t uninstall it using standard ways
- You can’t upgrade it using standard ways
- You can’t keep track of it
In addition to installing packaged software, you must also have all your workstations and server synchronize against the same software repositories.
If user A installs software from repository A and user B from repository B, they might run into different behavior on their software.
Have you ever heard: “How come it works on my computer and doesn’t work on yours??”
As a SysAdmin, you must eliminate the possibilities of this to happen to a zero.
How do you do it?
Well, using CentOS – use a YUM repository and cache whatever packages you need from the various internet repositories out there.
Debian? – just the same – just with apt.
Remember – if you have any software on workstations that is not well packaged or not well controlled – you’ll run into awkward situations very soon.
Today
Up until today Linux developers in my company still posses their root access, but they barely use it. To be honest I don’t think they even really need it. However, they have it. It is also about educating the developers that they are given the root access because they are being trusted. If they blew it, it’s mostly their fault, not yours.
I’ll continue to let them be root when needed. They have proved worthy so far.
And I’ll ask you another question – do you really think that someone who can’t handle his own workstation be a good developer? – think again!