Today at a client site, vCenter stopped responding. In the course of troubleshooting, I discovered that the C: drive of the SQL server housing the vCenter database had 1.59MB free. That was alleviated by cleaning up a 526MB system state backup from late 2007. Next, I found that the SQL DBA’s had set the SQL login for the vCenter db to expire. The VMware admins confirmed that they had requested a login with no account expiration and that the request had been approved through change control before the account was created. Of course, this is not the default setting, so care must be taken to confirm the config.
At the same site we also discovered that the VLAN tags for a single VLAN were left off of 22 physical ports… Simple typo in the console “1-2” rather than “1-24” that was showing up as intermittent inability for VM’s to get ip addresses via DHCP. Based on good documentation of the vSwitch configs and a good diagram and documentation of connections from the ESX hosts to the physical switch, we convinced the network admin to check and correct the switch config.
Thankfully, both errors were remedied quickly, but both could have been avoided with careful checking and better feedback. Virtualization crosses many disciplines and I routinely encounter resistance from the various admin groups at client sites who have become isolated from each other. At this site, the admin groups are starting to loosen up and cooperate better, and incidents like today’s are helping them to appreciate the need for coordination rather than insulation.