CERN

BMC-fail, solution

So, it seems as if I was correct (or at least somewhat). I found someone else with the same problem, and stumbled upon this explanation;

When power is applied to the power supply, the BMC powers on immediately. During the boot process the BMC (via Uboot which is booting Linux on the BMC) checks to see if the dedicated IPMI NIC port sees a link state. If not, the shared NIC port will be used. The NIC port selected at BMC boot time will be the NIC port used until the BMC is power cycled, either through a direct BMC reboot or when power is removed from the power supply. Rebooting the system itself will do nothing to the BMC.

This creates a cabling time race condition between plugging in the dedicated IPMI NIC and the power cable which is very obnoxious. Or, for example, if you have a power outtage and the BMC comes up before the switch does, the BMC will select the shared NIC in spite of the dedicated NIC being wired and LAN IPMI access will, in the case of VLANed ports, will be on the wrong network. We experience this more often than we like and find it quite frustrating.

So, I guess I’ll have to power those nodes completely down :-(