CERN

Fixing stuff

I’ve spent the day fixing stuff; mainly the messed-up BMC-stuff for the fep*-nodes. To summarize;

- Tyan BMC = fail
- It uses only LAN1 to send DHCP-requests to it’s BMC-interface
- Have to use a Gbps NIC for the management
- Inconsistent renaming of interfaces; LAN1 is sometimes eth0, other times eth1. Each node had to be checked manually — only a few nodes could be scripted;

root@portal-ecs1:~# for host in feptofc00 feptofc02 feptofc04 feptofc06 feptofa08 feptofc10 feptofc12 feptofc14; do ssh $host "sed -i 's/eth1/eth2/g' /etc/network/interfaces"; done

- Glad I don’t have to do this again

Current status of the cluster;

- 1 machine is down/gone (cn010)
- 3 machines down

root@portal-ecs1:~# for host in $(cat /etc/dsh/group/prodcluster); do if ! ping -c1 -w1 "$host"|grep -qi "bytes from"; then echo $host; fi; done
cn010
fepfmdaccorde
fephmpid0
fephmpid2
 
root@portal-ecs1:~# for host in $(cat /etc/dsh/group/prodcluster); do if ! ping -c1 -w1 "$host-mgmt"|grep -qi "bytes from"; then echo $host-mgmt; fi; done
cn010-mgmt