James Stanley


How to use GNU screen for ad-hoc cluster management

Fri 11 March 2016

If you have a cluster of machines, you can use GNU screen to run a management process on them all, monitor the output, and manually take over and repair any issues that come up on any individual machine.

1.) Run screen with a named session

$ screen -d -m -S jes

This starts a detached screen session, named jes.

2.) Create a window for each node of the cluster

(Thanks Rich for pointing out seq -w can be used instead of --format=%02g)

$ for i in `seq -w 1 50`; do
    screen -S jes -X screen ssh node$i
done

This starts a new window for each node, inside the jes session, ssh'd into the node. Change the `seq -w 1 50` to whatever is appropriate. If your nodes don't have predictable names like this, something like for node in `cat nodelist` would suffice.

3.) Type your command into the shell on each node

$ for i in `seq 1 50`; do
    screen -S jes -p $i -X stuff "ls\n"
done

The stuff command to screen tells it to insert the text as if it were typed. The "\n" is needed, otherwise the "ls" would be typed but left at the input of the shell. This time you must use "seq" (or similar...) as the windows in screen are numbered. The "-p $i" argument tells screen which window to stuff the text into. The windows for your cluster nodes are numbered starting from 1. You'll also have a window 0 that is a shell on the local machine.

Run whatever it is you gotta run, instead of ls.

4.) Attach to screen, monitor progress, and manually take over where necessary

$ screen -S jes -r

This attaches you to the screen session. Initially you'll be shown the window of the last node. You can use it as normal. E.g. ^C if something goes wrong, and then you are in the shell and can do whatever you want.

You can repeatedly press C-a n to iterate through all of the windows and check on each one. Exit each one after you've finished with it to save yourself time.

Here is a screen cheatsheet that will help you navigate:

list windowsC-a "
switch window (0-9)C-a 0, C-a 1, etc.
detach from screenC-a d
next/previous windowC-a n, C-a p
list screen sessionsscreen -ls

How does this compare to clusterssh?

It doesn't spawn a new terminal window for every node in your cluster. It also makes it easier to run a slightly different command on each machine by adjusting the content of the stuff in step 3. Apart from that, clusterssh is probably better.



If you like my blog, please consider subscribing to the RSS feed or the mailing list: