Posted by kohsuke
on February 7, 2009 at 12:21 PM PST
Network boot is very convenient if you need to administer a large number of machines, like a Hudson cluster. How do you do this when you don't own a DHCP server?
Here at Sun, one of my job is to maintain our internal Hudson cluster of some 40 nodes. Among other things, a part of the administration task involves in setting up a new slave every so often, which means installing a new OS, configuring it, and adding it to the cluster. We need to support all kinds of different OSes, so that adds an interesting complexity to the mix. We've got a considerable portion of this automated, and while covering the entire spectrum of this automation will go far beyond a single blog posting, but today, I'll talk about one part of it.
See, one thing I hate in installing new systems is the OS installation. First, you have to have a installation CD. Because both OpenSolaris and Ubuntu keeps producing a new release every half a year, that means constantly have burning a new CD every so often. In addition, people keep borrowing installation CDs, and some of them will inevitably forget about the CD. Sometimes *I* forget to pull out a CD from a drive after installation, and some other times, the CD drive is bad and I have to rip the computer apart to install a new one. And oh, what if you need to install two systems at once?
To ease all these pains, I come up with a way to network boot the installer. In this way, I no longer need to keep the installation media around.
As some of you may know, pretty much all the current x86 systems and sparc systems come with BIOS that supports network boot . But unfortunately, This mechanism requires that you control your DHCP server, because that is how a PC finds where to load the initial boot image. Many corporate network are managed by a dedicated team, and they control DNS and DHCP tightly. This is certainly so in Sun, and a mere employee like me have no way to touch their DHCP server.
That's where gPXE comes into rescue. This is a tiny shell that can be booted from the USB stick, and it lets you specify where to load the image and does PXE boot, as if those parameters are given by your DHCP server. So in this way, you can have your PC network-boot from any server of your choice. This also works when your network boot server and the client PC are located in a different subnet.
Here are the steps to how to make this work.
Finding the right network driver for gPXE
First, you need to use the right gPXE binary that contains the driver that matches your NIC. To do this, you need to find out the network card that you have on the PC. Normally, a company tends to buy the same PC in a large number, so you only need to do this a few times.
There are several ways to do this. You can start a linux rescue CD, and run lspci -nn. This shows the PCI device ID of the ethernet, which loooks like [1002:203d] or something like that (basically a pair of 16bit integers.) On Windows, this value is available on a property page in the device manager. For computers from Sun, BigAdmin HCL list shows the details. Search by the model name, and look under "PCI Information" tab.
Once you note it down, look this value up in gPXE supported driver list page , and note down the driver ID. Then head over to ROM-o-matic.net and pick the correct driver that you just found, then get the USB boot image.
Writing this to a USB stick
You'll then need to write this image onto a USB stick. Pick a USB stick whose contents you can lose, and write the image with dd if=path/to/downloaded.usb of=/dev/sdX. "X" in sdX is the device of your USB stick. On my Linux, I find out about this by inserting a USB stick to the port and then run "dmesg|tail". In the following output, you can see that my USB stick is recognized as /dev/sdf.
% dmesg | tail
[2072608.351840] sd 36:0:0:0: [sdf] Mode Sense: 0b 00 00 08
[2072608.351842] sd 36:0:0:0: [sdf] Assuming drive cache: write through
[2072608.353955] sd 36:0:0:0: [sdf] 1041920 512-byte hardware sectors (533 MB)
[2072608.354579] sd 36:0:0:0: [sdf] Write Protect is off
[2072608.354581] sd 36:0:0:0: [sdf] Mode Sense: 0b 00 00 08
[2072608.354583] sd 36:0:0:0: [sdf] Assuming drive cache: write through
[2072608.354588] sdf: sdf4
[2072608.355728] sd 36:0:0:0: [sdf] Attached SCSI removable disk
[2072608.355766] sd 36:0:0:0: Attached scsi generic sg6 type 0
[2072738.761766] usb 1-2: USB disconnect, address 3
Once this is done, run sync command to make sure that the data is really written to it, then unplug it.
Booting from USB
Put the USB stick into the target PC that you want to network boot. Often you need to go into the BIOS setting to control the boot sequence for you to be able to boot. Once the system boot, you'll see a prompt from gPXE, asking you to type "Ctrl+B". Do so, and you get a shell.
From this shell, type the following commands:
set next-server 220.127.116.11
set filename boot\x86\pxelinux.0
The two "set" commands configure the PC which TFTP server to get the image, and what image in the TFTP server to download from. In my case, the TFTP server is on Windows, so the file name is '\' separated.
If your network doesn't have a DHCP server at all, use the following commands to configure the network card, then load the image and boot:
imgfetch -n img tftp://18.104.22.168/boot\x86\pxelinux.0
At this point your PXE boot should be going. If I get more time, I'll blog about how we automate the rest of the steps.