CERN / Not just one thing

So. It’s been a little while since last post. There’s been some progress. Things are starting to fall into place.

Last week most of my time went away on writing this sync-users-to-these-nodes script. This is needed since we’re not using LDAP anymore. Once the script is done, maintaining users could be done by 6-year-olds (yes, for real), which is somewhat easier than what we had when using LDAP (where, literally, it took months before admins got admin-rights :-P).

So, anyways. The script is nearly done. A few parts remaining (adding/removing groups, changing userinfo), but the rest is there; add user, delete user, reset password. Adding/deleting/changing groups isn’t planned to be implemented, as this is somewhat rare. The groups will be synced even if added manually, though.

The virtualization-cluster is also starting to get on it’s feet. This is where we’ll, over time, move almost all infrastructure-machines; ns0, ns1, mon0, mon1, etc etc.

Today I also fixed the user-account for sysmes; removing password-login, and making it accessible by public-key only. Entire production-cluster done; still missing a few of the infrastructure-machines, and the DEV-cluster, but we’re getting there.

root@portal-ecs1:~# for host in $(cat /etc/dsh/group/prodcluster); do if ping -c1 -w1 $host|grep -qi "bytes from"; then ssh $host "skill -KILL -u sysmes; usermod -u 901 sysmes; if [ ! -d /opt/sysmes/.ssh/ ]; then mkdir /opt/sysmes/.ssh; fi; chown -R sysmes:users /opt/sysmes; passwd -l sysmes"; scp /opt/sysmes/.ssh/authorized_keys $host:/opt/sysmes/.ssh/authorized_keys; fi; done

There are a lot of small things, like inconsistent sshd_config, that creates all these small obstacles when trying to fix/achieve something. Somewhat annoying in the long run, but we’ll get there I guess; God didn’t make the world in one day, you know. (-:

Oh, yes, we also got 3 of the premium licenses for the switches, which is good. Still missing the 4th (IT didn’t have more), so during the next two-three days we’ll figure out how long it’ll take before getting it. If it’s long, IT said they had a spare one installed in a lab-switch or something, which we could get. So unless something comes up, I’ll most likely configure the core-switches to be fully redundant using VRRP within a week or two.

I’ll also be spending the next days to upgrade BIOSes. We made a new image that _should_ work, so I’ll test it out tomorrow. If it works, then I’ll have 49 nodes to upgrade. Yay \o/

Then I’ll also fix host-based login in the PROD-cluster, which is somewhat more easy to maintain than key-based, as you don’t need to generate/copy keys to each user you wan’t to have password-less login for. You need to maintain the host-list, but this we can sync/push from somewhere, using pubkey for root, or something (since the pubkey can’t be “broken” as easily as the host-based can).

Oh, and then we have the script for configuring DHCP/DNS; this is somewhat important as well. I kinda want to write this in Perl, so that I can learn it a bit, but we’ll see.