Linux | Bashing Linux

Archive for the ‘Linux’ Category

Primus (primusrun) and FC18 Leave a comment

Continued from my previous article at:
https://bashinglinux.wordpress.com/2013/02/18/bumblebee-and-fc18-a-horror-show/

This is a little manual about running primus with the previous setup I’ve suggested.

Packages

Pretty simple:

yum install glibc-devel.x86_64 glibc-devel.i686 libX11-devel.x86_64 libX11-devel.i686

We should be good to go in terms of packages (both x86_64 and i686)

Download and compile primus

Clone from github:

cd /tmp && git clone https://github.com/amonakov/primus.git

Compiling for x86_64:

export PRIMUS_libGLd='/usr/lib64/libGL.so.1'
export PRIMUS_libGLa='/usr/lib64/nvidia/libGL.so.1'
LIBDIR=lib64 make
unset PRIMUS_libGLd PRIMUS_libGLa

And for i686 (32 bit):

export PRIMUS_libGLd='/usr/lib/libGL.so.1'
export PRIMUS_libGLa='/usr/lib/nvidia/libGL.so.1'
CXX=g++\ -m32 LIBDIR=lib make
unset PRIMUS_libGLd PRIMUS_libGLa

Running

Running with x86_64:

cd /tmp/primus && \
LD_LIBRARY_PATH=/usr/lib64/nvidia:lib64 ./primusrun glxspheres

Untested by me, but that should be the procedure for i686 (32 bit):

cd /tmp/primus && \
LD_LIBRARY_PATH=/usr/lib/nvidia:lib ./primusrun YOUR_32_BIT_OPENGL_APP

Posted February 18, 2013 by malkodan in Bash, Linux, System Administration

Tagged with bbswitch, bumblebee, bumblebeed, fc18, fedora, fedora core 18, glxinfo, lenovo, nouveau, nvidia, optimus, optirun, t430s

Bumblebee and FC18 – a horror show 3 comments

Preface

I’ve seen numerous posts about how to get bumlebee, optirun and nvidia to run on Fedora Core 18, the only problem was that all of them were using the open source (and somewhat slow) nouveau driver.

I wanted to use the official Nvidia binary driver which is heaps faster.

My configuration is a Lenovo T430s with a NVS 5200M.

Following is a tick list of things to do to get it running (at least on my configuration with FC18 x86_64).

Installing the Nvidia driver

The purpose of this paragraph is to show you how to install the Nvidia driver without overwriting your current OpenGL libraries. Simply download the installer and run:

yum install libbsd-devel dkms
./NVIDIA-Linux-x86_64-XXX.XX.run --x-module-path=/usr/lib64/xorg/nvidia --opengl-libdir=lib64/nvidia --compat32-libdir=lib/nvidia --utility-libdir=lib64/nvidia --no-x-check --disable-nouveau --no-recursion

Even though we ask it to disable nouveau, it still wouldn’t.

This method will not ruin all the good stuff in /usr/lib and /usr/lib64.

Disabling nouveau

Disabling nouveau is rather simple, we need to blacklist it and remove it from initrd:

echo "blacklist nouveau" > /etc/modprobe.d/nvidia.conf 
dracut /boot/initramfs-$(uname -r).img $(uname -r) --omit-drivers nouveau

Good on us. You may either reboot now to verify that nouveau is out of the house, or manually rmmod it:

rmmod nouveau

Bumblebee and all the rest

Install VirtualGL:

yum --enablerepo=updates-testing install VirtualGL

Remove some rubbish xorg files:

rm -f /etc/X11/xorg.conf

Download bbswtich and install with dkms:

cd /tmp && wget https://github.com/downloads/Bumblebee-Project/bbswitch/bbswitch-0.5.tar.gz 
tar -xf bbswitch-0.5.tar.gz
cp -av bbswitch-0.5 /usr/src
ln -s /usr/src/bbswitch-0.5/dkms/dkms.conf /usr/src/bbswitch-0.5/dkms.conf
dkms add -m bbswitch -v 0.5
dkms build -m bbswitch -v 0.5
dkms install -m bbswitch -v 0.5

Download bumblebee and install:

cd /tmp && wget https://github.com/downloads/Bumblebee-Project/Bumblebee/bumblebee-3.0.1.tar.gz
tar -xf bumblebee-3.0.1.tar.gz
cd bumblebee-3.0.1
./configure --prefix=/usr --sysconfdir=/etc 
make && make install 
cp scripts/systemd/bumblebeed.service /lib/systemd/system/
sed -i -e 's#ExecStart=.*#ExecStart=/usr/sbin/bumblebeed --config /etc/bumblebee/bumblebee.conf#g' /lib/systemd/system/bumblebeed.service
chkconfig bumblebeed on

Bumblebee configuration is at /etc/bumblebee/bumblebee.conf, edit it to have this:

[bumblebeed]
VirtualDisplay=:8
KeepUnusedXServer=false
ServerGroup=bumblebee
TurnCardOffAtExit=false
NoEcoModeOverride=false
Driver=nvidia

[optirun]
VGLTransport=proxy
AllowFallbackToIGC=false
[driver-nvidia]
KernelDriver=nvidia
Module=nvidia
PMMethod=bbswitch
LibraryPath=/usr/lib64/nvidia:/usr/lib/nvidia:/usr/lib64/xorg/nvidia
XorgModulePath=/usr/lib64/xorg/nvidia/extensions,/usr/lib64/xorg/nvidia/drivers,/usr/lib64/xorg/modules
XorgConfFile=/etc/bumblebee/xorg.conf.nvidia

The default /etc/bumblebee/xorg.conf.nvidia which comes with bumblebee is ok, but in case you want to make sure, here is mine:

Section "ServerLayout"
    Identifier "Layout0"
    Option "AutoAddDevices" "false"
EndSection

Section "Device"
    Identifier "Device1"
    Driver "nvidia"
    VendorName "NVIDIA Corporation"
    Option "NoLogo" "true"
    Option "UseEDID" "false"
    Option "ConnectedMonitor" "DFP"
EndSection

Add yourself to bumblebee group:

groupadd bumblebee
usermod -a -G bumblebee YOUR_USERNAME

Restart bumblebeed:

systemctl restart bumblebeed.service

Testing

To run with your onboard graphics card:

glxspheres

Running with nvidia discrete graphics:

optirun glxspheres

Useful links

Inspiration from:
http://duxyng.wordpress.com/2012/01/26/finally-working-nvidia-optimus-on-fedora/
Download links:
https://github.com/Bumblebee-Project/Bumblebee/downloads
https://github.com/Bumblebee-Project/bbswitch/downloads
http://www.nvidia.com/Download/index.aspx

If it works for you, please let me know!!
And if it doesn’t, perhaps I could help.

Posted February 18, 2013 by malkodan in Bash, Linux, System Administration

Tagged with bbswitch, bumblebee, bumblebeed, fc18, fedora, fedora core 18, glxinfo, lenovo, nouveau, nvidia, optimus, optirun, t430s

Bash Scripting Conventions 2 comments

Have decided to publish the infamous Bash scripting conventions.

Here they are:
https://github.com/danfruehauf/Scripts/tree/master/bash_scripting_conventions

Please, comment, challenge and help me modify it. I’m very open for feedback.

Posted January 28, 2013 by malkodan in Bash, Linux, System Administration

Tagged with Bash, Conventions, Linux, programming, script, Scripting

EVE and WINE 1 comment

It’s also been a long time since I’ve played any computer interactive game. Unfortunately a work colleague introduced me to EVE Online.
I’m usually playing EVE on Microsoft Windows, which I believe is the best platform for PC gaming.

It’s been a while since I dealt with WINE. In the old days WINE was very complicated to deal with.
I thought I should give it a try – EVE Online on CentOS.

This is a short, semi-tutorial post about how to run EVE Online on CentOS.
It’s fairly childish so even very young Linux users will be able to understand it easily.

Let’s go (as root):

# cat > /tmp/epel.conf <<EOF
[epel]
name=\$releasever - \$basearch - epel
baseurl=http://download.fedora.redhat.com/pub/epel/5/x86_64/
enabled=1
EOF

# yum -y -c /tmp/epel.conf install wine

Let’s get EVE Online (from now there’s no need for root user access):

$ cd /tmp
$ wget http://content.eveonline.com/EVE_Premium_Setup_XXXXXX_m.exe

XXXXXX is obviously the version number, which is subject to change.

Let’s install EVE:

$ wine /tmp/EVE_Premium_Setup_XXXXXX_m.exe

OK, here’s the tricky part, if you’ll run it now, the EULA page will not display properly and you won’t be able to accept it. This is because it needs TrueType fonts.
We’ll need to install the package msttcorefonts, a quick look at google suggest you can follow the instructions found here.
Let’s configure the fonts in wine:

$ for font_file in `rpm -ql msttcorefonts`; do ln -s $font_file /home/dan/.wine/drive_c/windows/Fonts; done

Run EVE:

$ wine /home/dan/.wine/drive_c/Program Files/CCP/EVE/eve.exe

It’ll also most likely add a desktop icon for you, in case you didn’t notice.

EVE works nicely with WINE, an evident that WINE has gone a very long way since the last time I’ve used it!!

I believe these instructions can be generalized quite easily for recent fedora distros just as well.
\o/

Feel free to contact me on this issue in case you encounter any problems.

Posted February 4, 2010 by malkodan in Linux

Tagged with CentOS, EVE, EVE Online, fedora, game, gaming, Linux, WINE

Rocket science Leave a comment

I still remember my Linux nightmares of the previous century. Trying to install Linux and wiping my whole HD while trying to multi boot RedHat 5.0.
It was for a reason that they said you have to be a rocket scientist in order to install Linux properly.
Times have changed, Linux is easy to install. Perhaps two things are different, one is that objectively Linux became much easier to handle and the second is probably the fact I gained much more experience.
In my opinion – one of the reasons that Linux became easier along the years is the improving support for various device drivers. For the home users – it is excellent news. However, for the SysAdmin who deals mostly with servers and some high-end devices, the headache, I believe, still exists.
If you thought that having a NIC without a driver is a problem, I can assure you that having a RAID controller without a driver is ten times the headache.
I bring you here the story of the RocketRAID device, how to remaster initrd and driver disks and of course, how to become a rocket scientist!

WTF

With Centos 5.4 you get an ugly error in the middle of the installation saying you have no devices you can partition.
DOH!!! Because it discovered no HDs.

So now you’re asking yourself, where am I going? – Google of course.
RocketRAID 3530 driver page

And you discover you have drivers only for RHEL/CentOS 5.3. Oh! but there’s also source code!
It means we can do either of both:

Remaster initrd and insert the RocketRAID drivers where needed
Create a new driver disk and use it

I’ll show how we do them both.
I’ll assume you have the RocketRAID driver compiled for the installation kernel.
In addition, I’m also going to assume you have a network installation that’s easy to remaster.

Remastering the initrd

What do we have?

# file initrd.img
initrd.img: gzip compressed data, from Unix, last modified: Sun Jul 26 17:39:09 2009, max compression

I’ll make it quicker for you. It’s a gzipped cpio archive.
Let’s open it:

# mkdir initrd; gunzip -c initrd.img | (cd initrd && cpio -idm)
12113 blocks

It’s open, let’s modify what’s needed.

modules/modules.alias – Contains a list of PCI device IDs and the module to load
modules/pci.ids – Common names for PCI devices
modules/modules.dep – Dependency tree for modules (loading order of modules)
modules/modules.cgz – The actual modules inside this initrd

Most of the work was done for us already in the official driver package from HighPoint.
Edit modules.alias and add there the relevant new IDs:

alias pci:v00001103d00003220sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003320sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003410sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003510sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003511sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003520sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003521sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003522sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003530sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003540sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00003560sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004210sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004211sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004310sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004311sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004320sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004321sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004322sv*sd*bc*sc*i* hptiop
alias pci:v00001103d00004400sv*sd*bc*sc*i* hptiop

This was taken from the RHEL5.3 package on the HighPoint website.

So now the installer (anaconda) knows it should load hptiop for our relevant devices. But it needs the module itself!
Download the source package and do the usual configure/make/make install – I’m not planning to go into it. I assume you now have your hptiop.ko compiled against the kernel version the installation is going to use.
OK, so the real deal is in modules.cgz, let’s open it:

# file modules/modules.cgz
modules/modules.cgz: gzip compressed data, from Unix, last modified: Sat Mar 21 15:13:43 2009, max compression
# mkdir /tmp/modules; gunzip -c modules/modules.cgz | (cd /tmp/modules && cpio -idm)
41082 blocks
# cp /home/dan/hptiop.ko /tmp/modules/2.6.18-164.el5/x86_64

Now we need to repackage both modules.cgz and initrd.img:

# (cd /tmp/modules && find . -print | cpio -c -o | gzip -c9 > /tmp/initrd/modules/modules.cgz)
41083 blocks
# (cd /tmp/initrd && find . -print | cpio -c -o | gzip -c9 > /tmp/initrd-with-rr.img)

Great, use initrd-with-rr.img now for your installation, it should load your RocketRAID device!

A driver disk

Creating a driver disk is much cleaner in my opinion. You do not remaster a stock initrd just for a stupid driver.
So you ask what is a driver disk? – Without going into the bits and bytes, I’ll just say that it’s a brilliant way of incorporating a custom modules.cgz and modules.alias without touching the installation initrd at all!
I knew I couldn’t live quietly with the initrd remaster so choosing the driver disk (dd in short) option was inevitable.
As I noted before, HighPoint provided me only a RHEL/CentOS 5.3 driver disk (and binary), but they also provided the source. I knew it was a matter of some adjustments to get it to work also for 5.4.
It is much easier to approach the driver disk now as we are much more familiar with how the installation initrd works.
I’m lazy, I already created a script that takes the 5.3 driver package and creates a dd:

#!/bin/bash

# $1 - driver_package
# $2 - destination of driver disk
make_rocketraid_driverdisk() {
        local driver_package=$1; shift
        local destination=$1; shift

        local tmp_image=`mktemp`
        local tmp_mount_dir=`mktemp -d`

        dd if=/dev/zero of=$tmp_image count=1 bs=1M && \
        mkdosfs $tmp_image && \
        mount -o loop $tmp_image $tmp_mount_dir && \
        tar -xf $driver_package -C $tmp_mount_dir && \
        umount $tmp_mount_dir && \
        local -i retval=$?

        if [ $retval -eq 0 ]; then
                cp -aL $tmp_image $destination
                chmod 644 $destination
                echo "Driver disk created at: $destination"
        fi

        rm -f $tmp_image
        rmdir $tmp_mount_dir

        return $retval
}

make_rocketraid_driverdisk rr3xxx_4xxx-rhel_centos-5u3-x86_64-v1.6.09.0702.tgz /tmp/rr.img

Want it for 5.4? – easy. Just remaster the modules.cgz that’s inside rr3xxx_4xxx-rhel_centos-5u3-x86_64-v1.6.09.0702.tgz and replace it with a relevant hptiop.ko module 🙂

Edit your kickstart to load the driver disk:

driverdisk --source=http://UGAT/HA/BAIT/INC/HighPoint/RocketRAID/3xxx-4xxx/rr3xxx-4xxx-2.6.18-164.el5.img

Make sure you have this line in the main section and not meta generated in your %pre section as the driverdisk directive is being processed before the %pre section.

The OS doesn’t boot after installation

You moron! This is because the installation kernel/initrd and the one that boots afterwards are not the same!
You can fix it in one of the 3 following ways:

Recompile the CentOS/RHEL kernel and repackage it with the RocketRAID driver – pretty ugly, not to mention time consuming.
Build a module RPM for the specific kernel version you’re going to use – very clean but also very time consuming!
Just build the module for the relevant kernel in the %post section – my way.

In the %post section of your kickstart, add the following:

(cd /tmp && \
        wget http://UGAT/HA/BAIT/INC/rr3xxx_4xxx-linux-src-v1.6-072009-1131.tar.gz && \
        tar -xf rr3xxx_4xxx-linux-src-v1.6-072009-1131.tar.gz && \
        cd rr3xxx_4xxx-linux-src-v1.6 && \
        make install)

The next boot obviously have a different initrd image. Generally speaking, initrd creation is done after the %post section, so you should not bother about it too much…
Server should boot now. Go play with your 12x2TB RAID array.

I hope I could teach you something in this post. It was a hell of a war discovering how to properly do all of these.
Now if you’ll excuse me – I’ll be going to play with spaceships and shoot rockets!

Posted February 3, 2010 by malkodan in Linux, System Administration

Tagged with CentOS, cpio, Hardware, HighPoint, initrd, lazy, Linux, Redhat, remaster, RHEL, Rocket, RocketRAID, Science

Backups… all night? 1 comment

Being the one that is responsible for the backups at work – I never compromised on anything.
As a SysAdmin you must:

Backup
Backup some more
Test your backups

Shoot to maim

At first we mainly needed to backup our subversion repository. A pretty easy task for any SysAdmin.
What I would do is simply dump the repository at night, and scp it to two other workstations of developers in the company (I didn’t really have much of a choice in terms of other computers in the network).
It worked.

The golden goose is on the loose

After a while I managed to convince our R&D manager it is time for detachable backups. Detachable backups can save you in case the building is on fire or if someone decides to shoot a rocket on your building (unlikely even in Israel, but as a SysAdmin – never take any chances).
With the virtual threat of a virtual rocket that might incinerate all of our important information, we decided that the cheapest and most effective way of action is to purchase a tape drive and a few tapes. Mind you, the year is 2006 and portable HDs are expensive, uncommon and small.

Backing up to a tape has always been something on my TODO list that I had to tick.
During one of my previous jobs, we had a tape archive that took care of it transparently, it was managed by a different team. Ever since I had always had yearnings for the /dev/tape that’s totally sequential.
It was very soon that I’d discovered that these tapes are a plain headache:

It is a mess in general to deal with the tapes as the access is sequential
It’s slow!!!
The only reasonable way to deal with the backups is with dumpe2fs – it’s an archaic tool it’s archaic and work only on the extX filesystem family!
It takes a while to eject the tape, I can still remember the minutes of waiting in the servers room for the tape to eject, so I can deposit it at home
The tapes tend to break! like any tape, the film tends to run away from the bearings, rendering the tape useless

Too bad our backup was far from being able to fit on a DVD media.

The glamour, the fortune, the pain

The tape backup held us good for more than 2 years. I was so happy the solution was robust enough to keep us running for that much time without the need of any major changes.
But portable USB HDs became cheaper and larger and it was time for a change. I was excited to receive two brand new and shiny 500GB HDs. I diligently worked on a new backup script. A backup script that would not be dependant on the filesystem type (hell! i wanted to use JFS!), a backup script that would have snapshots weeks back! a backup script that would rule them all!
This backup script will hopefully be published in one of my next posts.
I felt like king of the world, backups became easy, I was much more confident with the new backup as the files could be seen on the mounted HD easily, in contrast to the sequential tape and the binary filesystem dump.
Backups ran manually by me during the day. I inspected them carefully and was pleased.
It was time for the backup to take place at night. And so it was.

From time to time I would get in the backup log:
Input/output error

At first I didn’t pay much attention.
WTF?! are my HDs broken?! – no way, they are brand new and it happened on both of them. But dmesg also showed some nasty information while accessing the HDs.
I started to trigger the backups manually at day time. Not a single error.

Backups went back to night time.
At the morning I would issue a ls:

# ls /media/backup
Input/output error
# ls /media/backup
daily.1 daily.2 daily.3 daily.4 daily.5 daily.6 daily.7

What the hell is going on around here?! – first command fails but the second succeeds?
First command also used to lag for a while, where the second breezed out. I discovered only later it was a key hint.

My backup creates many links using “cp -aL” (in order to preserve snapshots), I had a speculation I might be messing the filesystem structure with too many links to the same inode – unlikely, but I was shooting at all directions, I was clueless.

So there I go, easing the backups up and eliminating the snapshot functionality. Guess what? – still errors on the backup.

What do I do next? Do I stay up at night just to witness the problem in real time?!, Don’t laugh, a friend of mine actually had to do it once in other occasions.
At this time I already introduced this issue to all of my fellow SysAdmin friends. None of them had any idea. I can’t blame them.
I was frustrated, even the archaic tape backups worked better than the HDs, is newer always better? – perhaps not.
I recreated the filesystem on the portable HDs as ext3 instead of JFS, maybe JFS is buggy?
I’ll save you the trouble. JFS is far from being buggy and it had also nothing to do with crond.

We’ll show the unbelievers

For days I’d watch the nightly email the backup would produce, notice the failure and rerun it manually during the day. Until one day.
It had struck me like a lightning on a sunny day.

The second command would always succeed on the device. What if this HD is a little tired?
What if the portable HD goes to sleep and is having problems waking up?
It’s worth trying.

# sdparm --set=STANDBY=0 /dev/sdb
# sdparm --save /dev/sdb

What do you say? – It worked!
It appears that some USB HDs go to sleep and doesn’t wake up nicely when they should.
Should I file a bug about it? Was it the hardware that malfunctioned?
I was so happy this issue was solved – I never cared about either.
Maybe after crafting this post – it is time to care a little more though.

As the madmen play on words and make us all dance to their song…

I’m sitting at my desk, receiving the nightly email informing me the backup was successful. The portable HDs now also utilize an encrypted filesystem. The backup never fails.
I look at my watch, drink a glass of wine and rejoice.

Posted January 1, 2010 by malkodan in Linux, System Administration

Tagged with backup, dumpe2fs, error, ext2, ext3, Harddisk, Hardisk, HD, input, JFS, Linux, maintenance, output, sdparm, sleep, usb

Bash RPC 2 comments

Auto configuring complex cluster architectures is a task many of you might probably skip, using the excuse that it’s a one time task and it’ll never repeat itself. WRONG!
Being lazy as I could and the need to quickly deploy systems for hungry customers, I started out with a small little infrastructure that helped me along the way to auto configure clusters of two nodes or more.
My use cases were:
1. RedHat cluster
2. LVS
3. Heartbeat
4. Oracle RAC

Auto configuring an Oracle RAC DB is not an easy task at all. However, with the proper infrastructure, it can become noticeably easier.
The common denominators for all cluster configurations I had to carry were:
1. They had to run on more than one node
2. Except from entering a password once for all nodes, I didn’t want any interaction
3. They all consisted from steps that should either run on all nodes, or on just one node
4. Sometimes you could logically split the task into a few phases of configuration, making it easier to comprehend the tasks you have to achieve

Even though it is not the most tidy piece of Bash code I’m going to post here, I’m very proud of it as it saved me countless hours. I give you the skeleton of code, which is the essence of what I nicknamed Bash RPC. On top of this you should be able to easily auto configure various configurations involving more than one computer.

Using it

The sample attached is a simple script that should be able to bake /home/cake on all relevant nodes.
In order to use the script properly, edit it using your favorite editor and stroll through the configuration_vars() function. Populate the array HOST_IP with your relevant hosts.
Now you can simply run the script.
I’m aware to the slight disadvantage that you can’t have your configuration come from command line, on the other hand – when dealing with big, complex, do you really think your configuration can be defined in a single line of arguments?

Taming it

OK, this is the interesting part. Obviously no one needs to bake cakes on his nodes, it is truly pointless and was merely given as an example.
So how would you go about customizing this skeleton to your needs?
First and foremost, we must plan. Planning and designing is the key for every tech related activity you carry. Be it a SysAdmin task or a pure development task. While configuring your cluster for the first time, keep notes of the steps (and commands) you have to go through. Later on, try to logically separate the steps into phases. When you have them all, we can start hacking Bash RPC.
Start drawing your phases and steps, using functions in the form of PHASEx_STEPx.
Fill up your functions with your ideas and start testing! and that’s it!

How does it work?

Simplicity is the key for everything.
Bash RPC can be ran in 2 ways – either running all phases (full run) or running just one step.
If you give Bash RPC just one argument, it assumes it is a function you have to run. If no arguments are given it will run the whole script.
Have a look at run_function_on_node(). This function receives a node and functions it should run on. It will copy the script to the destination node and initiate it with the arguments it received.
And this is more or less the essence of Bash RPC. REALLY!

Oh, there’s one small thing. Sometimes you have to run things on just one host, in that case you can add a suffix of ___SINGLE_HOST for your steps. This will make sure the step will run on just one host (the first one you defined).

I’m more than aware that this skeleton of Bash RPC can be polished some more and indeed I have a list of TODOs for this skeleton. But all in all – I really think this one is a big time saver.

Real world use cases

I consider this script a success mainly because of 2 real world use cases.
The first one is the act of configuring from A to Z Oracle RAC. Those of you who had to go through this nightmare can testify that configuring Oracle RAC takes 2-3 days (modest estimation) of both a DBA and SysAdmin working closely together. How about an unattended script running in the background, notifying you 45 minutes later you have an Oracle RAC ready for service?

The second use case is my good friend and former colleague, Oren Held. Oren could easily take this skeleton and use it for auto configuring a totally different cluster using LVS over Heartbeat. He was even satisfied while using it. Oren never consulted me while performing this task – and this is the great achievement.

I hope you could use this code snippet and customize it for your own needs, continuing with the YOU CAN CREATE A SCRIPT FOR EVERYTHING attitude!!

Have a look at cluster-config-cake.sh to get an idea about how it’s being done.

Posted October 10, 2009 by malkodan in Bash, Linux, System Administration

Tagged with Bash, cluster, configure, lazy, Linux, oracle, Oren Held, RAC, rpc, setup, ssh

Buying time 1 comment

Wearing both hats of developer and SysAdmin, I believe you shouldn’t work hard as a SysAdmin. It is for a reason that sometimes developers look at SysAdmins as inferior. It’s not because SysAdmins are really inferior, or has an easier job. I think it is mainly because a large portion of their workload could be automated. And if it can’t be automated (very little things), you should at least be able to do it quickly.

Obviously, we cannot buy time, but we can at least be efficient. In the following post I’ll introduce a few of my most useful aliases (or functions) I’d written and used in the last few years. Some of them are development related while the others could make you a happy SysAdmin.
You may find them childish – but trust me, sometimes the most childish alias is your best friend.

Jumping on the tree

Our subversion trunk really reminds me of a tree sometimes. The trunk is very thick, but it has many branches and eventually many leaves. While dealing with the leaves is rare, jumping on the branches is very common. Many times i have found myself typing a lot of ‘cd’ commands, where some are longer than others, repeatedly, just to get to a certain place. Here my stupid aliases come to help me.

lib takes me straight away to our libraries sub directory, where sim takes me to the simulators sub directory. Not to mention tr (shortcut for trunk), which takes me exactly to the sub directory where the sources are checked out. Oh, and pup which takes me to my puppet root, on my puppet master. Yes, I’m aware to the fact that you probably wonder now “Hey, is this guy going to teach me something new today? – Aliases are for babies!!”, I can identify. I didn’t come to teach you how to write aliases, I’m here to preach you to start using aliases. Ask yourself how many useful day-to-day aliases you really have defined. Do you have any at all? – Don’t be shy to answer no.

*nix jobs are diverse and heterogeneous, but let’s see if I can encourage you to write some useful aliases after all.
In case you need some ideas for aliases, run the following:

$ history | tr -s ' ' | cut -d' ' -f3- | sort | uniq -c | sort -n

Yes, this will show the count of the most recent commands you have used. Still not getting it? – OK, I’ll give you a hint, it should be similar to this:

$ alias recently_used_commands="history | tr -s ' ' | cut -d' ' -f3- | sort | uniq -c | sort -n"

If you did it – you’ve just kick-started your way to liberation. Enjoy.
As a dessert – my last childish alias:

$ alias rsrc='source ~/.bashrc'

Always useful if you want to re-source your .bashrc while working on some new aliases.

Two more things I must mention though:

Enlarge your history size, I guess you can figure out alone how to do it.
If you’re feeling generous – periodically collect the history files from your fellow team members (automatically of course, with another alias) and create aliases that will suit them too.

The serial SSHer

Our network is on 192.168.8.0/24. Many times I’ve found myself issuing commands like:

$ ssh root@192.168.8.1
$ ssh root@192.168.8.2
$ ssh root@192.168.8.3

Dozens of these in a single day. It was really frustrating. One day I decided to make an end to it:

# function for easier ssh
# $1 - network
# $2 - subnet
# $3 - host in subnet
_ssh_specific_network() {
	local network=$1; shift
	local subnet=$1; shift
	local host=$1; shift
	ssh root@$network.$subnet.$host
}

# easy ssh to 192.168.8.0/24
# $1 - host
_ssh_net_192_168_8() {
	local host=$1; shift
	_ssh_specific_network 192.168 8 $host
}
alias ssh8='_ssh_net_192_168_8'

Splendid, now I can run the following:

$ ssh8 1

Which is equal to:

$ ssh root@192.168.8.1

Childish, but extremely efficient. Do it also for other commands you would like to use, such as ping, telnet, rdesktop and many others.

The cop

Using KDE’s konsole? – I really like DCOP, let’s add some spice to the above function, we’ll rename the session name to the host we’ll ssh to, and then restore the session name back after logging out:

# returns session name
_konsole_get_session_name() {
	# using dcop - obtain session name
	dcop $KONSOLE_DCOP_SESSION sessionName
}

# renames a konsole session
# $1 - session name
_konsole_rename_session_name() {
	local session_name=$1; shift
	_konsole_store_session_name `_konsole_get_session_name`
	dcop $KONSOLE_DCOP_SESSION renameSession "$session_name"
}

# store the current session name
_konsole_store_session_name() {
	STORED_SESSION_NAME=`_konsole_get_session_name`
}

# restores session name
_konsole_restore_session_name() {
	if [ x"$STORED_SESSION_NAME" != x ]; then
		_konsole_rename_session_name "$STORED_SESSION_NAME"
	fi
}

# function for easier ssh
# $1 - network
# $2 - subnet
# $3 - host in subnet
_ssh_specific_network() {
	local network=$1; shift
	local subnet=$1; shift
	local host=$1; shift
	# rename the konsole session name
	_konsole_rename_session_name .$subnet.$host
	ssh root@$network.$subnet.$host
	# restore the konsole session name
	_konsole_restore_session_name
}

Extend it as needed, this is only the tip of the iceberg! I can assure you that my aliases are much more complex than these.

Finders keepers

For the next one I’m not taking full credit – this one belongs to Uri Sivan – obviously one of the better developers I’ve met along the way.
Grepping cpp files is essential, many times I’ve found myself looking for a function reference on all of our cpp files.
The following usually does it:

$ find . -name "*.cpp" | xargs grep -H -r 'HomeCake'

But seriously, do I look like someone that likes to work hard?

# $* - string to grep
grepcpp() {
	local grep_string="$*";
	local filename=""
	find . -name "*.cpp" -exec grep -l "$grep_string" "{}" ";" | while read filename; do
		echo "=== $filename"
		grep -C 3 --color=AUTO "$grep_string" "$filename"
		echo ""
	done
}

OK, let’s generalize it:

# grepping made easy, taken from the suri
# grepext - grep by extension
# $1 - extension of file
# $* - string to grep
_grepext() {
	local extension=$1; shift
	local grep_string="$*"
	local filename=""
	find . -name "*.${extension}" -exec grep -l "$grep_string" "{}" ";" | while read filename; do
		echo "=== $filename"
		grep -C 3 --color=AUTO "$grep_string" "$filename"
		echo ""
	done
}

# meta generate the grepext functions
declare -r GREPEXT_EXTENSIONS="h c cpp spec vpj sh php html js"
_meta_generate_grepext_functions() {
	local tmp_grepext_functions=`mktemp`
	local extension=$1; shift
	for extension in $GREPEXT_EXTENSIONS; do
		echo "grep$extension() {" >> $tmp_grepext_functions
		echo '  local grep_string=$*' >> $tmp_grepext_functions
		echo '  _grepext '"$extension"' "$grep_string"' >> $tmp_grepext_functions
		echo "}" >> $tmp_grepext_functions
	done
	source $tmp_grepext_functions
	rm -f $tmp_grepext_functions
}

After this, you have all of your C++/Bash/PHP/etc developers happy!

Time to showoff

My development environment is my theme park, here is my proof:

$ (env; declare -f; alias) | wc -l
2791

I encourage you to run this as well, if my line count is small and I’m bragging about something I shouldn’t – let me know!

Posted August 21, 2009 by malkodan in Bash, Linux, System Administration

Tagged with /bin/bash, alias, Bash, DCOP, easy, find, grep, kde, lazy, ssh

Feeling at /home… 3 comments

The following took place more than a year ago, but it is still fresh in my mind. After a few colleagues urged me to write about it, I decided to finally do it. If the output of the commands does not match exactly whatever I had while dealing with it – bear with it. It’s far from being the point.

The horror

I took a break from work last year and decided to go and have some fun in NZ. Oh, did I have fun there!
There’s nothing more frustrating than returning to work, turning on your dusty computer and witnessing the following:

*** An error ocfcurred during the filesystem check.
Give root password for maintenance (or type Control-D to continue):

Investigating it just a bit more I got to a conclusion that my /home is not mounting. OMG!!! all of my personal customizations and some private data is at /home!
I must admit, it’s nothing that couldn’t be reproduced at a reasonable amount of time, but having my neat KDE customizations I didn’t want to start the process from the beginning. Think about yourself losing your /home, it’s no fun. I decided I wanted it back.
OK, so I’m running e2fsck:

# e2fsck /dev/sda5
e2fsck 1.39 (29-May-2006)
e2fsck: No such file or directory while trying to open /dev/sda5

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 

#

Oh man, how frustrating, e2fsck can’t read my superblock!
Something I did notice during boot up, is that the HD is very noisy, in addition to very slow boot process. It wasn’t a new HD and it has worked hard (I tested on it numerous times our FS infrastructure of video recording). Probably it’s time has come.

I wanted to feel at /home again.
I decided the smartest thing would be to first try and copy this whole partition aside as I knew there is a hardware problem with the HD. After I’ll solve that one, I could hopefully handle the missing superblock problem much better.

Getting physical

So I quickly inserted a fresh new HD to my machine, disconnected the old faulty HD (it caused the computer to boot so slow because of it’s defects) and issued a network install.
15 minutes later I’m again at linux, with a bare new /home, and the faulty HD connected to it, slowing the computer as hell.
I was sure dd would come for the rescue:

# dd if=/dev/sda5 of=home-sweet-home.ext3 bs=4M

After a few minutes of anticipation and cranky HD noises, I’m with:

dd: reading `/dev/sda5': Input/output error
0+0 records in
0+0 records out
# ls -l /home/home-sweet-home.ext3
-rw-r--r-- 1 root root 0 Jul  10 08:26 home-sweet-home.ext3

Great :(. I’m searching the net, searching for an aggressive dd program, something that instead of giving up on bad sectors, would fill them with zeros and continue on (hoping the defects on the HD are at a very specific place). I must admit I have almost written something by myself, but finally I’ve found dd_rescue.
And off we go:

# dd_rescue -e 1 -A /dev/sda5 /home/home-sweet-home.ext3

It ran for hours! It was 65GB that dd_rescue had to tackle. With a dying HD that could take a lot of time. After more or less 8 hours I was back at my desktop, looking at my old home:

# ls -lh /home/home-sweet-home.ext3
-rw-r--r--   1 root   root        61G Jul  10 20:43 home-sweet-home.ext3
#

Being logical

OK, that’s it, I have my data. Time to dump the old HD and deal with the logical errors I still have with this partition dump. Mounting the partition gave me the same result as I pasted above: no superblock – no fun!
Oh! but ext3 always creates a few backup superblocks, maybe this is my lucky day where I will finally be able to use one of these backups. You are probably familiar with the following output:

# e2fsck /tmp/e2fsck-test
...
27 block groups
8192 blocks per group, 8192 fragments per group
1992 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729, 204801

Writing inode tables: done
Writing superblocks and filesystem accounting information: done
...

Now go figure out where your backup superblocks are. Trying the obvious of 8193 and 32768 did not work for me. I knew there should be more backup superblocks. Google comes for the rescue again. I was quite close at this time as well to writing a small C program that would search the partition dump for ext3 superblock signatures and tell me where the backup superblocks are. But then again, I thought I’m probably not the first one who needs such a utility, here TestDisk came for the rescue.
I simply ran TestDisk which reveled the remaining trustworthy superblocks on my damaged filesystem.
Later on I discovered that it is also possible to run e2fsck on a partition with the same size and see where the superblocks get written. However, I think that probing the superblocks is much cleaner altogether.

# mkdir /home2
# mount -o loop,ro,sb=884736 /home/home-sweet-home.ext3 /home2
#

Silence.

Did it work?

# ls -l /home2
drwxr-xr-x  41 dan    dan    8192 Feb  5 13:39 dan
drwx------   7 distcc distcc   88 Jul 26 15:24 distcc
drwxrwxrwx   2 nobody nobody    1 Feb  5 05:39 public
#

Wow, it did!
So how much of it was eventually damaged? – less than 1%!
So I’ve found a few garbled files I didn’t need anyway, but I was more than 99% back at /home.

Epilogue

Needless to say that ever since I’m backing up my /home in a strict manner. But this is obviously not the point.
The simplicity of Linux in general and ext2/3 in specific is something we should adore. I wouldn’t want to imagine what would have happened on a different OS or a different filesystem (and please don’t start a flame war about it now…).

Posted August 12, 2009 by malkodan in Linux, System Administration

Tagged with backup, dd, dd_rescue, e2fsck, ext2, ext3, Harddisk, Hardisk, HD, Linux, maintenance, mount, recovery, superblock, TestDisk

Conventions 11 comments

I like Bash, I really like Bash. It’s simple, it’s neat, it’s quick, it’s many things. No one will convince me otherwise. It’s not for big things though, I know, but for most system administration tasks it will suffice.
The problem with Bash begins where incompetent SysAdmins start to use it. They will most likely treat it poorly and lamely.
Conventions are truly needed when writing in Bash. In our subversion we have ~30,000 lines of bash. And HELL NO! our product is definitely not a lame conglomerate of Bash scripts gluing many pieces of binaries together. Bash is there for everything that C++ (in our case) shouldn’t do, things like:

Packaging and anything related to packaging (post scripts of packages for instance)
SysV service infrastructure
Backups
Customization of the development environment
Deployment infrastructure

Yes, so where have we been? – Oh, how do we manage ~30,000 lines of Bash without getting everything messed out. For this, we need 3 main things:

Version control system (CVS, SVN, git, pick your kick)
Competent people
Coding conventions

Configuring a version control system is easy, espcially if you are a competent developer, I’ll skip to the 3rd one.

Conventions. Show me just one competent C/C++/Java/C# developer who would skip on using coding conventions and program like an ape. OK, I know, there might be some, but we both think the same thing about them. Scripting in Bash shouldn’t be an exception in that case. We should use strict scripting conventions on Bash in particular and on any other scripting language in general.
There’s nothing uglier and messier than a Bash script that is written without conventions (well, maybe Perl without conventions).

I’ll try in the following post to introduce my Bash scripting conventions and if you find them neat – feel free to adopt them. Up until today, sadly, I havn’t seen anyone suggesting any Bash scripting conventions.

Before starting any Bash script, I have the following skeleton :

#!/bin/bash

main() {
}

main "$@"

Call me crazy, but without my main() function I ain’t going anywhere. Now we can start writing some bash. Once you have your main() it means you’re going to write code ONLY inside functions. I don’t care it’s a scripting language – let’s be strict.

#!/bin/bash

# $1 - hostname
# $2 - destination directory
main() {
	local hostname=$1; shift
	local destination_directory=$1; shift
}

main "$@"

Need to receive any arguments in a function? – Do the following:

Document the variables above the function
Use shift after receiving each variable, and always use $1, it’s easier to later move the order of the variables

Notice that i’m using the local keyword to make sure the scope of the variable is only within its function. Be strict with variables. If for some reason you decide you need a global variable, it’s OK, but make sure it’s in capital letters and declared as needed:

# a read only variable
declare -r CONFIGURATION_FILE="/etc/named.conf"
# a read only integer variable
declare -i -r TIMEOUT=600

# inside a function
sample_function() {
	local -i retries=0
}

You are getting the point – I’m not going to make it any easier for you. Adopt whatever you like and remember that being strict in Bash yields code in higher quality, not to mention better readability. Feeling like writing a sloppy script today? – Go home and come back tomorrow.

Following is a script written by the conventions I’m suggesting. This is a very simple script that simply displays a dialog and asks the user which host he would like to ping, then displays the result in a tailbox dialog. This is more or less how many of my scripts look like. I admit it took me a while to find a script which is not too long (under 100 lines) and can still represent some of the ideas I was mentioning.

I urge you to question my way and conventions and suggest some of your own, anyway, here it is:

#!/bin/bash

# dialog options (notice it's read only)
# notice as well it's in capital letters
declare -r DIALOG_OPTS="0 0 0"
declare -i -r OUTPUT_DIALOG_WIDTH=60

# ping timeout in seconds (using declare -i because it's an integer and -r because it's read only)
declare -i -r PING_TIMEOUT=5
# size of pings
declare -i -r DEFAULT_SIZE_OF_PINGS=56
# number of pings to perform
declare -i -r DEFAULT_NUMBER_OF_PINGS=1

# neatly pipes a command to a tailbox dialog
# here we can specify the parameters the function expects
# this is how i like to do it, perhaps there are smarter ways that may integrate better
# with doxygen or some other tools
# $1 - tailbox dialog height
# $2 - title
# $3 - command
pipe_command_to_tailbox_dialog() {
	# parameters extraction, always using $1, with `shift` right afterwards
	# in case you want to play with the order of parameters, just move the lines around
	local -i height=5+$1; shift
	local title="$1"; shift
	local command="$1"; shift

	# need a temporary file? - always use `mktemp`, please spare me the `/tmp/$$` stuff
	# or other insecure alternative to temporary file creations, use only `mktemp`
	local output_file=`mktemp`
	# run in a subshell and with eval, so we can pass pipes and stuff...
	# eval is a favorite of mine, it means you truely understand the Bash shell
	# nevertheless - it is really needed in this case
	(eval $command) >& $output_file &
	# need to obtain a pid? - it's surely an integer, so use 'local -i'
	local -i bg_job=$!

	# ok, show the user the dialog
	dialog --title "$title" --tailbox $output_file $height $OUTPUT_DIALOG_WIDTH

	# TODO my lame way of checking if a process is running on linux, anyone has a better way??
	# my way of specifying a 'TODO' is in the above line
	if [ -d /proc/$bg_job ]; then
		# if the process is stubborn, use 'kill -9'
		kill $bg_job || kill -9 $bg_job >& /dev/null
	fi
	# wait for process to end itself
	wait $bg_job

	# not cleaning up your temporary files is similar to a memory leak in C++
	rm -f $output_file
}

# pings a host with a nice dialog
ping_host_dialog() {
	local ping_params_tmp=`mktemp`
	# slice lines nicely, i have long commands, i'm not going even to mention
	# line indentation - that goes without saying
	# i like to use dialogs, it makes more people use your scripts
	if dialog --ok-label "Submit" \
		--form "Ping host" \
		$DIALOG_OPTS \
			"Address:" 1 1 "" 1 30 40 0 \
			"Size of pings:" 2 1 "$DEFAULT_SIZE_OF_PINGS" 2 30 40 0 \
			"Number of pings:" 3 1 "$DEFAULT_NUMBER_OF_PINGS" 3 30 40 0 2> $ping_params_tmp; then
		# ping_params_tmp will be empty if the user aborted the dialog...
		local address=`head -1 $ping_params_tmp | tail -1`
		# yet again if you expect an integer, use 'local -i'
		local -i size_of_pings=`head -2 $ping_params_tmp | tail -1`
		local -i number_of_pings=`head -3 $ping_params_tmp | tail -1`
	fi
	rm -f $ping_params_tmp

	# this is my standard way of checking if a variable is empty
	# may not be the prettiest way, but it surely catches the eye...
	if [ x"$address" != x ]; then
		pipe_command_to_tailbox_dialog 15 "Pinging host \"$address\"" "ping -c $number_of_pings -s $size_of_pings -W $PING_TIMEOUT \"$address\""
	fi
}

# main function, can't live without it
main() {
	ping_host_dialog
}

# although there are no parameters passed in this script, i still pass $* to main, as a habit
#main $*
# after Uri's comment, I'm fixing the following and calling "$@" instead
main "$@"

I don’t want this post to be too long (it is already quite long), but I think you got the idea. I still have some conventions I did not introduce here. In case you liked what you’ve seen – do not hesitate to contact me so I can provide you with a document describing all of my Bash scripting conventions.

Posted August 1, 2009 by malkodan in Bash, Linux, System Administration

Tagged with /bin/bash, Bash, Coding, Conventions, declare, dialog, eval, Linux, local, ping, Scripting, strict