CERN

Not so dedicated

So. Ehr. The iLO/management-cards in some of the nodes (BMC, Tyan-motherboard — mainly the fep*-nodes), has it’s own, dedicated NIC (you can, of course, us it as a normal NIC if you want — even parallel with BMC). Previously, when everything was a flat network, you could basically plug a cable into any NIC, and you’d get it up and going. This is somewhat the case now, but since the management-NIC is going to be on it’s own subnet, you’ll need to distinguish between BMC NICs and normal NICs.

Yesterday I discovered that some of the nodes has failover-functionality, so that if the dedicated BMC NIC fails, it switches over to the normal NIC. Unfortunately it doesn’t have a fallback-solution (if/when the BMC NIC comes back online); once the failover has been triggered, it stays that way until the BMC power-cycles. When redoing the network-layout, this caused a lot of BMCs to failover, hence trying to request management IP on the PROD/DEV-network, which, of course, caused it to not get any IP at all.

Today I powered down all those nodes, and pulled the power-cord, making the BMCs lose power. After about 5-10 seconds off, I turned them on again. So far, so good. Loads of BMCs came up correctly on the management-network. However, a large part still tried to get IP on the PROD-network. I couldn’t seem to figure out why; the switches are all properly configured. And then I found the pattern;

root@portal-ecs1:/etc/dsh/group# for host in $(cat prodcluster); do if ! ping -c1 -t2 "$host-mgmt"|grep -qi "bytes from"; then echo "$host-mgmt"; fi; done
cn010-mgmt
fepdimutrk3-mgmt
fepemcal0-mgmt
fepemcal1-mgmt
fepemcal2-mgmt
fepemcal3-mgmt
fepemcal4-mgmt
fephltout2-mgmt
feppmd1-mgmt
fepsdd0-mgmt
fepsdd1-mgmt
fepsdd2-mgmt
fepsdd3-mgmt
fepsdd4-mgmt
fepsdd5-mgmt
feptofa00-mgmt
feptofa02-mgmt
feptofa04-mgmt
feptofa06-mgmt
feptofa08-mgmt
feptofa10-mgmt
feptofa12-mgmt
feptofa14-mgmt
feptofa16-mgmt
feptofc00-mgmt
feptofc02-mgmt
feptofc04-mgmt
feptofc06-mgmt
feptofc10-mgmt
feptofc12-mgmt
feptofc14-mgmt
feptofc16-mgmt
feptpcco17-mgmt
feptrd14-mgmt

All these (with _maybe_ a few exceptions) is running the same motherboard, with the same IPMI/BMC-addon card — both from Tyan. This addon-card is needed to activate the BMC-features. The sad thing, though, is that it seems to only send it’s DHCP-requests out the main NIC, that is LAN1/eth1 — regardless if it has links on the two other NICs. The ironic thing is that the source-mac of the DHCPDISCOVERs, is actually the one for eth0 (which is the dedicated management-NIC). So, with no help on Tyan’s homepages, and none of the available change-the-NIC-to-use-through-ipmitools-tricks working, I decided to go the somewhat easy way, even though it’s not ideal;

Make eth1 the dedicated management-NIC, and use eth2 for the normal network. I don’t like using a Gbps-NIC for management, but it’s better than spending ages to figure out how to get it to request IPs from the management-NIC. There has been some suggestions to flash the BIOS, NIC and IPMI/BMC-addon card, but this involves a lot of risk of bricking stuff, so I won’t go down that route. Not for now, at least.

So, tomorrow I’ll change eth1 to eth2 on ~30 nodes. Nice spending time on something as useful as this! :-D

Update: According to the picture below, it’s actually true; only eth1/LAN1 has the ability to use/have IPMI/BMC. That’s kinda LOL, considering you’ll be wasting a Gbps-NIC, when you have a 100Mbps-NIC available. GG, Tyan!