]]>
]]>
root@cn012:~# modprobe ipmi-devintf root@cn012:~# modprobe ipmi-si root@cn012:~# ipmitool mc reset cold Sent cold reset command to MC root@cn012:~# ping -c1 cn012-bmc PING cn012-bmc.internal (10.162.64.23) 56(84) bytes of data. 64 bytes from cn012-bmc.internal (10.162.64.23): icmp_seq=1 ttl=63 time=0.359 ms --- cn012-bmc.internal ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.359/0.359/0.359/0.000 ms |
#!/bin/bash for host in $(cat hosts); do if ping -c1 -w1 $host|grep -qi "bytes from"; then ssh $host "\ skill -KILL -u sysmes; \ if grep -qi '^sysmes' /etc/passwd; then \ usermod -u 901 sysmes; \ else \ useradd -m -d /opt/sysmes -g 100 -s /bin/bash -u 901 sysmes; \ fi; \ if [ ! -d /opt/sysmes/.ssh/ ]; then \ mkdir /opt/sysmes/.ssh; \ fi; \ chown -R sysmes:users /opt/sysmes; \ passwd -l sysmes; \ if ! dpkg --list|grep -qi 'ii sudo '; then \ apt-get install --force-yes -y sudo; \ fi; \ if ! grep -qi '^sysmes' /etc/sudoers; then \ printf '\n# Sudo for sysmes-user\nsysmes ALL = NOPASSWD: ALL\n\n' >> /etc/sudoers; \ fi;" scp /opt/sysmes/.ssh/authorized_keys $host:/opt/sysmes/.ssh/authorized_keys fi done |
]]>
Last week most of my time went away on writing this sync-users-to-these-nodes script. This is needed since we’re not using LDAP anymore. Once the script is done, maintaining users could be done by 6-year-olds (yes, for real), which is somewhat easier than what we had when using LDAP (where, literally, it took months before admins got admin-rights :-P).
So, anyways. The script is nearly done. A few parts remaining (adding/removing groups, changing userinfo), but the rest is there; add user, delete user, reset password. Adding/deleting/changing groups isn’t planned to be implemented, as this is somewhat rare. The groups will be synced even if added manually, though.
The virtualization-cluster is also starting to get on it’s feet. This is where we’ll, over time, move almost all infrastructure-machines; ns0, ns1, mon0, mon1, etc etc.
Today I also fixed the user-account for sysmes; removing password-login, and making it accessible by public-key only. Entire production-cluster done; still missing a few of the infrastructure-machines, and the DEV-cluster, but we’re getting there.
root@portal-ecs1:~# for host in $(cat /etc/dsh/group/prodcluster); do if ping -c1 -w1 $host|grep -qi "bytes from"; then ssh $host "skill -KILL -u sysmes; usermod -u 901 sysmes; if [ ! -d /opt/sysmes/.ssh/ ]; then mkdir /opt/sysmes/.ssh; fi; chown -R sysmes:users /opt/sysmes; passwd -l sysmes"; scp /opt/sysmes/.ssh/authorized_keys $host:/opt/sysmes/.ssh/authorized_keys; fi; done |
There are a lot of small things, like inconsistent sshd_config, that creates all these small obstacles when trying to fix/achieve something. Somewhat annoying in the long run, but we’ll get there I guess; God didn’t make the world in one day, you know. (-:
Oh, yes, we also got 3 of the premium licenses for the switches, which is good. Still missing the 4th (IT didn’t have more), so during the next two-three days we’ll figure out how long it’ll take before getting it. If it’s long, IT said they had a spare one installed in a lab-switch or something, which we could get. So unless something comes up, I’ll most likely configure the core-switches to be fully redundant using VRRP within a week or two.
I’ll also be spending the next days to upgrade BIOSes. We made a new image that _should_ work, so I’ll test it out tomorrow. If it works, then I’ll have 49 nodes to upgrade. Yay \o/
Then I’ll also fix host-based login in the PROD-cluster, which is somewhat more easy to maintain than key-based, as you don’t need to generate/copy keys to each user you wan’t to have password-less login for. You need to maintain the host-list, but this we can sync/push from somewhere, using pubkey for root, or something (since the pubkey can’t be “broken” as easily as the host-based can).
Oh, and then we have the script for configuring DHCP/DNS; this is somewhat important as well. I kinda want to write this in Perl, so that I can learn it a bit, but we’ll see.
]]>Since we have no clue to find the nodes we want to upgrade, by just looking at the hostnames, we need to find it some other way. We know they have 2x CPU’s with 12 cores each, so, the following should list the nodes we need;
root@portal-ecs1:/etc/dsh/group# for host in $(cat prodcluster); do if ping -c1 -w1 $host|grep -qi "bytes from"; then ssh $host 'if [ `cat /proc/cpuinfo|grep -i "model name"|wc -l` -eq 24 ]; then hostname; fi'; fi; done cn006 cn007 cn008 cn009 cn011 cn012 cn013 cn020 cn021 cn022 cn023 cn024 cn025 cn026 cn027 cngpu000 cngpu001 cngpu002 cngpu003 cngpu004 cngpu005 cngpu006 cngpu007 cngpu008 cngpu009 cngpu010 cngpu011 cngpu012 cngpu013 cngpu014 cngpu015 cngpu016 cngpu017 cngpu018 cngpu019 cngpu020 cngpu021 cngpu022 cngpu023 cngpu024 cngpu025 cngpu026 cngpu027 cngpu028 cngpu029 cngpu030 cngpu031 cngpu032 cngpu033 |
49 in total.
The problem, however, is that, even though we live in 2011, you need to use DOS to upgrade it. Fair enough. But what about the built-in BIOS-upgrade in BMC/IPMI? Well, the latter actually bricks the node, as, at some point, the BMC/IPMI goes through the BIOS, and hence, when upgrading the BIOS, it looses connectivity to itself some way. Brilliant. So, back to DOS. The nodes, obviously, has no floppy-disks, so, we need to use a CD. They don’t have a CD either, so you’ll either have to use a USB-stick, a USB-CD-ROM, or BMC/IPMI’s built-in virtual CD-ROM, where you can mount .iso-files from a SMB-share. Quite nifty. Except that, so far, we haven’t found a CD-ROM driver for DOS that accepts the virtual CD-ROM. So, then we can’t access the BIOS-upgrade software. Great. What about using the BMC/IPMI’s built-in virtual floppy-disk? That would have been a great idea, except that it’s limited to 1.44MB. Guess what? The new BIOS-firmware is 2.1MB. Wohooo!
So, for the moment we’re somewhat stuck. We’ll be looking into using a USB-stick, and maybe get it to work that way. I guess it’s all about finding a driver that accepts the virtual virtual virtual floppy-disk virtualized as a CD-ROM on a USB-stick, or something. Hahaha.
]]>- Tyan BMC = fail
- It uses only LAN1 to send DHCP-requests to it’s BMC-interface
- Have to use a Gbps NIC for the management
- Inconsistent renaming of interfaces; LAN1 is sometimes eth0, other times eth1. Each node had to be checked manually — only a few nodes could be scripted;
root@portal-ecs1:~# for host in feptofc00 feptofc02 feptofc04 feptofc06 feptofa08 feptofc10 feptofc12 feptofc14; do ssh $host "sed -i 's/eth1/eth2/g' /etc/network/interfaces"; done |
- Glad I don’t have to do this again
Current status of the cluster;
- 1 machine is down/gone (cn010)
- 3 machines down
root@portal-ecs1:~# for host in $(cat /etc/dsh/group/prodcluster); do if ! ping -c1 -w1 "$host"|grep -qi "bytes from"; then echo $host; fi; done cn010 fepfmdaccorde fephmpid0 fephmpid2 root@portal-ecs1:~# for host in $(cat /etc/dsh/group/prodcluster); do if ! ping -c1 -w1 "$host-mgmt"|grep -qi "bytes from"; then echo $host-mgmt; fi; done cn010-mgmt |
Yesterday I discovered that some of the nodes has failover-functionality, so that if the dedicated BMC NIC fails, it switches over to the normal NIC. Unfortunately it doesn’t have a fallback-solution (if/when the BMC NIC comes back online); once the failover has been triggered, it stays that way until the BMC power-cycles. When redoing the network-layout, this caused a lot of BMCs to failover, hence trying to request management IP on the PROD/DEV-network, which, of course, caused it to not get any IP at all.
Today I powered down all those nodes, and pulled the power-cord, making the BMCs lose power. After about 5-10 seconds off, I turned them on again. So far, so good. Loads of BMCs came up correctly on the management-network. However, a large part still tried to get IP on the PROD-network. I couldn’t seem to figure out why; the switches are all properly configured. And then I found the pattern;
root@portal-ecs1:/etc/dsh/group# for host in $(cat prodcluster); do if ! ping -c1 -t2 "$host-mgmt"|grep -qi "bytes from"; then echo "$host-mgmt"; fi; done cn010-mgmt fepdimutrk3-mgmt fepemcal0-mgmt fepemcal1-mgmt fepemcal2-mgmt fepemcal3-mgmt fepemcal4-mgmt fephltout2-mgmt feppmd1-mgmt fepsdd0-mgmt fepsdd1-mgmt fepsdd2-mgmt fepsdd3-mgmt fepsdd4-mgmt fepsdd5-mgmt feptofa00-mgmt feptofa02-mgmt feptofa04-mgmt feptofa06-mgmt feptofa08-mgmt feptofa10-mgmt feptofa12-mgmt feptofa14-mgmt feptofa16-mgmt feptofc00-mgmt feptofc02-mgmt feptofc04-mgmt feptofc06-mgmt feptofc10-mgmt feptofc12-mgmt feptofc14-mgmt feptofc16-mgmt feptpcco17-mgmt feptrd14-mgmt |
All these (with _maybe_ a few exceptions) is running the same motherboard, with the same IPMI/BMC-addon card — both from Tyan. This addon-card is needed to activate the BMC-features. The sad thing, though, is that it seems to only send it’s DHCP-requests out the main NIC, that is LAN1/eth1 — regardless if it has links on the two other NICs. The ironic thing is that the source-mac of the DHCPDISCOVERs, is actually the one for eth0 (which is the dedicated management-NIC). So, with no help on Tyan’s homepages, and none of the available change-the-NIC-to-use-through-ipmitools-tricks working, I decided to go the somewhat easy way, even though it’s not ideal;
Make eth1 the dedicated management-NIC, and use eth2 for the normal network. I don’t like using a Gbps-NIC for management, but it’s better than spending ages to figure out how to get it to request IPs from the management-NIC. There has been some suggestions to flash the BIOS, NIC and IPMI/BMC-addon card, but this involves a lot of risk of bricking stuff, so I won’t go down that route. Not for now, at least.
So, tomorrow I’ll change eth1 to eth2 on ~30 nodes. Nice spending time on something as useful as this! :-D
Update: According to the picture below, it’s actually true; only eth1/LAN1 has the ability to use/have IPMI/BMC. That’s kinda LOL, considering you’ll be wasting a Gbps-NIC, when you have a 100Mbps-NIC available. GG, Tyan!
]]>When power is applied to the power supply, the BMC powers on immediately. During the boot process the BMC (via Uboot which is booting Linux on the BMC) checks to see if the dedicated IPMI NIC port sees a link state. If not, the shared NIC port will be used. The NIC port selected at BMC boot time will be the NIC port used until the BMC is power cycled, either through a direct BMC reboot or when power is removed from the power supply. Rebooting the system itself will do nothing to the BMC.
This creates a cabling time race condition between plugging in the dedicated IPMI NIC and the power cable which is very obnoxious. Or, for example, if you have a power outtage and the BMC comes up before the switch does, the BMC will select the shared NIC in spite of the dedicated NIC being wired and LAN IPMI access will, in the case of VLANed ports, will be on the wrong network. We experience this more often than we like and find it quite frustrating.
So, I guess I’ll have to power those nodes completely down :-(
]]>