Distcc over SSH

Introduction
distcc allows you to distribute a compilation over several machines, significantly reducing build times. There is already an official Gentoo distcc guide, but it only covers how to set up distcc using its own daemon, distccd. Sometimes, you have to use distcc over SSH instead, perhaps because of restrictive firewalls, or because you just don't want another daemon listening to the network. This guide covers how to configure distcc and Portage to get your Emerge jobs distributed by distcc using SSH.

Requirements

 * At least two computers running the same version of on the same architecture (e.g. X86, SPARC) (or not - see HOWTO Distcc server on Windows and TIP AMD64-x86-distcc for how to run compilation nodes on slightly different machines). The gcc version could be 4.5.x, where x can differ between the machines.For example, having gcc 4.5.3 on one machine and gcc 4.5.2 on another machine works fine, whereas 4.4.x and 4.5.x does not.
 * A decision as to which machine should be the front end. This is the one you will run emerge on and the one that will do all the linking, running Autoconf, etc., so it should probably be the machine that will end up using the packages in the end. The compilation nodes can be anything, as long as they have installed and run compatible versions of gcc as the front end. In particular, they do not need to have installed any packages that what you are compiling depends on.
 * At least two computers running the same version of on the same architecture (e.g. X86, SPARC) (or not - see HOWTO Distcc server on Windows and TIP AMD64-x86-distcc for how to run compilation nodes on slightly different machines). The gcc version could be 4.5.x, where x can differ between the machines.For example, having gcc 4.5.3 on one machine and gcc 4.5.2 on another machine works fine, whereas 4.4.x and 4.5.x does not.
 * A decision as to which machine should be the front end. This is the one you will run emerge on and the one that will do all the linking, running Autoconf, etc., so it should probably be the machine that will end up using the packages in the end. The compilation nodes can be anything, as long as they have installed and run compatible versions of gcc as the front end. In particular, they do not need to have installed any packages that what you are compiling depends on.

Set up distcc user
The distcc ebuild creates a distcc user, but it cannot log in, as is standard secure practice for daemon accounts. We will now change that.

First, the distcc user needs a home directory. This should be done on all nodes.

On the compilation nodes, it also needs a valid shell:

Then we set up our ssh keys, since distcc can't supply passwords at login. Do this on the front end machine.

When ssh-keygen asks for a passphrase, just hit enter so it sets none.

Now, we need to distribute distcc's public key to all compilation nodes.

Because you will be compiling as root, you should check you can ssh into the compilation nodes using the command

I had to copy /etc/distcc/.ssh/id_dsa to /root/.ssh/ to get this to work...

Method 2
You should append each new host to the old one...

Maybe would be done in a more elegant way :)

Method 3
A more elegant way

Fix ssh permissions
Make sure ssh is happy with all permissions.

On all nodes,

On the front end node only,

On compilation nodes only,

Test ssh
Try to log in as the distcc user from your front end machine to a compilation node using the just-created keys:

If you get a prompt for a password, check syslog on the compilation node to see why sshd didn't like the ssh key. One possible reason could be that the ssh server does not allow empty passwords. Make sure that you set "PermitEmptyPasswords yes" in on the compilation nodes.

If that fails and shows something like this:

[sshd] User distcc not allowed because account is locked

then execute

to unlock this account.

If distcc was able to log in, then you need to collect the public ssh host keys of all compilation nodes so distcc doesn't get stuck waiting for you to confirm each host identity. A simple way to do this is to use ssh-keyscan:

chown portage:portage /var/tmp/portage/.ssh/known_hosts }}

You should then manually verify each host key, perhaps by logging in at the physical console of each machine and running ssh-keyscan locally. Your paranoia may vary.

(HINT: /var/tmp/portage should not be used to store anything persistant as it is a tmp directory and quite some people mount is as tmpfs or randomized cryptfs! This needs some adapting to be workable!)

Making a wrapper for ssh
Unfortunately, distcc can't supply arguments to ssh, so we need a wrapper script it can call that supplies the correct arguments. Point your favorite editor to and enter:

and make sure the file is executable

Setting up Portage
distcc is controlled by a couple of environment variables. One place to set such variables so that they will only be used for Portage is in. Add the following lines at the bottom:

DISTCC_SSH="/etc/distcc/distcc-ssh" DISTCC_HOSTS="localhost/2 distcc@/ [...]"

(see the distcc manual page for the syntax of the line).

To take advantage of all your new compilation power, you need to run a lot of jobs in parallel. Find the line, still in, and change the number after "-j" to the sum of the allowed number of jobs on each node, e.g.

for three machines with two jobs each.

Lastly, we need to notify portage that we will be using this feature so also add the following line below:

Test drive
That should be it. To watch it in action, start an emerge on the front end and watch top on a compilation node - you should see some gcc processes owned by the distcc user. If it doesn't seem to work, try setting

in on the front end and see if you get any informative messages.

TODO

 * I'm not happy with the security implications of letting a daemon account log in, even if it is just with SSH keys. Try to find another way. [See my comments in "discussion and bugs" above.]

Response: The security of this approach is based on your ability to control access to the keys and access to the computer which has permission to login. Anyone who has an account on the host computer could potentially gain access to the client computer(s). Other than that, this approach should be sufficiently secure. You can add another layer of security by using IPTABLES to limit access based on the computer's ip address or network. You can also disable the login account whenever it is not in use. You could also achieve a further layer of security by creating the client computer as a special purpose (untrusted) virtual machine, that way even if compromised the attacker wouldn't be able to do much. But such extreme measures aren't really needed; on the other hand, the isolation provided by a virtual computer could have other benefits.

A very different approach is to use ssh instead of NFS with the method shown here. Emerge on very slow systems.

Sharing SSH keys
Command to run for sharing SSH keys on 2 linux servers to login without authentication