Automation tip — adjust a file on a lot of servers

I have a customer that has 40 servers that perform a given function. They are comprised of physical machines and Solaris zones. I needed to adjust a file on each of those machines. I was not about to ssh into each machine and then start up vi and adjust the file by hand.

Here is what I did instead

for host in 1 2 3 4 5; do
  for zone in 1 2 3 4 5 6 7 8; do
    ssh -q $host\-$zone 'perl -p -i -e "s/ReplaceMe/WithMe/g" /path/to/file'
  done
done

I’m confident that I’m not the first person do this but I thought it was creative all the same. Combines a PERL one liner with two nested for loops for nice system automation.

See my post on consistency, this is a great example of why it is necessary.

ESXi – creating new virtual machines (servers) from the command line

I was able to get a server up and running at home again, and given what I want to do, using ESXi is a good solution. When it comes to servers I prefer to do: (a) from the command line and (b) using ssh. First thing I did after getting ESXi installed was to enable their “Tech Support Mode“, and then things got interesting. The command line of ESXi 4.1 is limited, but yet powerful enough to do the job nicely.  After some searching I learned how to create a new server on the command line, power it on/off, register it with ESX and destroy it too.

To create and power on a new server I created the following script

#!/bin/sh

## Most of this taken from http://www.vm-help.com/esx40i/manage_without_VI_client_1.php

mkdir $1

# First make the disk
vmkfstools -c 15G -a lsilogic $1/$1.vmdk

# Now output the vmx file
cat < $1/$1.vmx

config.version = "8"
virtualHW.version = "7"
vmci0.present = "TRUE"
displayName = "$1"
floppy0.present = "FALSE"
numvcpus = "2"
scsi0.present = "TRUE"
scsi0.sharedBus = "none"
scsi0.virtualDev = "lsilogic"
memsize = "256"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "$1.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
ide1:0.present = "TRUE"
ide1:0.fileName = "/vmfs/volumes/datastore1/ISOs/install48-amd64.iso"
ide1:0.deviceType = "cdrom-image"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "vmxnet"
ethernet0.features = "15"
ethernet0.networkName = "VM Network"
ethernet0.addressType = "generated"
ethernet1.present = "TRUE"
ethernet1.virtualDev = "vmxnet"
ethernet1.features = "15"
ethernet1.networkName = "VM Network 2"
ethernet1.addressType = "generated"
guestOS = "freebsd-64"
EOF

# Now register our new VM
vnum=`vim-cmd solo/registervm /vmfs/volumes/datastore1/$1/$1.vmx`
vim-cmd vmsvc/power.on $vnum
#!/bin/sh
vim-cmd vmsvc/power.off `vim-cmd vmsvc/getallvms |grep $1|awk '{print $1}'`
vim-cmd vmsvc/destroy `vim-cmd vmsvc/getallvms |grep $1|awk '{print $1}'`

Then I went ahead and wrote a one liner to create 10 new machines and then destroy them.  Once I get my completely automated OpenBSD installer finished, then I can adjust the creation of the machines to boot from the network.  Thus all I will have to do is run the script to create the machine and then sit back and wait.  In I’m guessing about 15 minutes I’ll have an up and working OpenBSD system.  Since the install will be automated I can also fully customize the final result.  If I was responsible for a group of say web servers and my company just announced some awesome widget that everyone wants, then I had better be prepared to deploy more servers quickly.  Using the above I could easily accomplish that task.  After all if our customers could not use our website then they will not be happy.

To sum up in a single word…. CONSISTENCY

There is one word that comes to my mind when I think about how to run a data center, consistency! I have worked with many people and organizations over the years. Recently I have seen a fair number of issues and to summarize them with one word I picked consistency.

In my mind this means right or wrong, if you are going to do something be consistent with it. If you’re using jumpstart or kickstart then put the environment in a revision control system, like CVS or Subversion. This way changes can be tracked and logged. Sometimes it is the simplest things that tip me off that say one system out of ten is different.

For example, when I’m deploying applications on many servers at the same time I use cluster ssh. Once connected I’ll ‘sudo su -’ so I can do what I need to do. If some servers have different root prompts then that is an immediate tip to me that the servers are not all the same.

How do you achieve consistency? Automated scripts/tools. When I deploy the applications I don’t do a lot by hand, except for running some scripts that install the various applications.

Now I’m off to continue the fun I’m having today with ESXi and OpenBSD. I’ve figured out how to create hosted servers from the command line, using ssh. Right now I can easily create an OpenBSD virtual server, power it on, and have the install started all using ssh and the ESXi command line. Next up is to create a fully automated OpenBSD install routine. While the installer is simple and easy, it does require someone answer questions. I want a fully automated and customized environment. I did this a few years ago but am now going to re-visit and improve it.

FBI Supply chain compromised :)

http://blogs.csoonline.com/the_fbi_supply_chain_illustrated

Funny!

Monitor your traffic and egress filters

I’m reading this story and I quote

Last year, for example, an unidentified defense contractor discovered 100 compromised systems on its network, and found that the intruders had been inside since at least 2007.

Hopefully now they’ve come to realize that monitoring your network, as in the traffic patterns, rates, etc. is very important too.  In the past I know I’ve looked at a graph of traffic, say email messages over a 24 hour period, and when compared to previous data, it seemed very high.  Due to the change in trend data that I was able to see visually I investigated further and found that indeed there was a problem.

Many entities don’t discover a breach until someone from law enforcement tells them. By then, it’s too late.

In 2008, Mandiant investigated a breach at a law firm that was representing a client in a lawsuit related to China. The attackers were in the firm’s network for a year before the firm learned from law enforcement that it been hacked. By then, the intruders harvested thousands of e-mails and attachments from mail servers. They also had access to every other server, desktop workstation and laptop on the firm’s network.

That is sad, the tools exist so that companies can detect and stop this type of thing in house. If you have something that is very important to your organization then put an air gap between it and the rest of your network.  In other words don’t connect it to your network, isolate it completely.  While that makes it more difficult to use, what is the loss to your organization if the information leaves your company?

Stolen e-mail messages and documents are collected and stored on a staging server inside the company’s network before being encrypted with custom algorithms and compressed into an .rar file. The files are then siphoned out in small random bursts generally via normal protocols with spoofed headers to disguise the activity. In the case of the Google hack, the attackers used an SSL port but a custom protocol.

All applications should be forced to use a proxy, otherwise the applications should not be allowed to enter or leave your network.  I learned a while ago while taking the SANS Hacker Techniques and Incident Handling class I learned about a tool that lets you craft ping packets.  While that is not amazing, what I did not know at the time was that ping packets (icmp echo/reply) can contain a payload, i.e. data.  In other words, if I had the motivation I could write a script that could take the data I want to smuggle out of a network and get it out all via ping.  It would take a while, but it would work.  Egress filtering, is a must.  The first rule set of a firewall should be deny all.  Only open up what is absolutely required, and only to a limited set of devices.  APT requires defense in depth and more.  :)