CS530 S08 |
TR 11:40-12:55 |
Olin 245 |
Architecture of Large-Scale Information Systems |
|
Getting Started on AWSThe CS530 projects will use the Amazon Web Services (AWS) to give us practical experience in using web services and especially clusters. Amazon has graciously offered to provide us with a prepaid AWS account to support the course. This is a single account, to be shared among all the CS530 students. 1. Conventions for Sharing a Single AWS AccountBecause we will all be sharing a single Amazon AWS account, it will be possible (with some effort) for students to decrypt and read one another's Amazon machine image files, or to steal one another's static content files. This is not fundamentally different from the bad old days when CS programming projects were done on a batch computing system and printouts were delivered in sorted piles in public terminal rooms. The University Academic Integrity Policy applies to everything we put into AWS, and you are expected to follow this policy scrupulously. Also as a consequence of sharing a single account, we will need to modify some of the procedures from the Amazon documentation so different students' projects will not interfere with one another. We have the following issues:
The following sections will describe what you need to do to set up your system to use the shared AWS account, in particular pointing out which parts of the Amazon "Getting Started" documentation you should not do. The examples will follow Linux / MacOS syntax and filename conventions, with occasional hints for Windows users. Long commands may be split across several lines in this documentation for readability; your command interpreter won't let you do that. 2. Downloading AWS Account Information
To get the shared AWS account information,
log on to CMS
go to Project 2
and download the file ~/.awsor (for Windows users) c:\awscontaining two files: pk-xxxxxxxx.pemValues for xxxxxxxx will come from the download. These files contain the private key and X.509 certificate used to authenticate to Amazon EC2. You will also have modified your environment to contain AWS_ACCOUNT_ID=xxxxxxxxor (for Windows users) AWS_ACCOUNT_ID=xxxxxxxxThe actual values for xxxxxxxx will come from the download, and the actual value for hhhhhhhh will be the path to your home directory. All done! The following sections will discuss getting started with S3 and EC2 in some detail. Here is a general guideline: If you go through the Amazon "Getting Started" documents for S3 or EC2 there will be places where you will be instructed to use the Amazon web site to sign up for a service or to create a new X.509 certificate. In other places you will be instructed to create a new keypair, to modify the rules of the default network security group, or to do some other thing that affects the global state of the AWS account. Clearly, if several students were to try this concurrently it would be a Bad Thing. So, You should ignore such instructions!The AWS account has already been set up, the AccountID, KeyID, Secret Key, X.509 certificate have already been created, and you have installed them on your machine. Your bucket names, keypair names, image names and security group names should always be constructed according to the conventions described in Section 1, and you should avoid the default network security group altogether. 3. Getting Started with S3The Amazon "Getting Started" document for S3 is here. Read the first couple of sections. As stated earlier, ignore the section "Subscribing to the Service" -- we've already subscribed, and the KeyID and Secret Key are already in your environment. The next section, "Authenticating", can be skimmed over quickly for now, as the techniques it describes are built in to any API library you might choose to use.
The Java code samples in the remaining sections assume you are using this
Amazon
S3 Library in Java.
I have tried this library in Java 1.5 and it seems okay.
But the example drivers
Make these changes to both
S3Test.java and S3Driver.java ,
in the obvious places.
The code will now create a test bucket name using our bucket naming conventions
with your own netid,
avoiding name collisions with other users.
Make sure you can run both Eventually (in fact, for Section 4 below) you will need a suite of command line tools (a "shell") to manipulate your S3 buckets and objects. A couple of these are available, for example a Java one here. Or you may want to write a few simple tools yourself to gain experience, for example using the Amazon S3 Library in Java that the Amazon documentation relies on. I (Demers) personally prefer the jets3t toolkit, as I find its API a bit more natural. Both libraries work.
There are also a couple of S3 shells available as linux 4. Getting Started with EC2The Amazon "Getting Started" document for EC2 is here. Unlike the S3 document, this one is written in a step-by-step "cookbook" style. As always, you should ignore the early sections about signing up for AWS S3 and EC2 services, as this is already done. Notes about the remaining sections follow. Setting up the Tools
Follow the instructions for downloading the ec2 command line tools
and "Telling the Tools Where They Live"
(setting the
The final step of this section, "Telling the Tools Who You Are"
(which sets The Amazon online documentation includes a reference manual for the command line tools. It is a Really Good Idea to read the manual page for each ami command as you are about to use it, to make sure you understand what it is about to do. Running an InstanceThis section includes a step for "Generating a Keypair", which must be changed to conform to our shared account naming conventions. The Amazon instructions tell you toname your keypair "gsg-keypair" (The "gsg" part presumably stands for "Getting Started Guide.") Instead, you should use a netid-specific name following our naming conventions; for example, kp-ab123-gsgwhere the ab123 part should be replaced by your own netid. In each of the remaining steps that requires the name of an EC2 keypair, substitute "kp-ab123-gsg" for "gsg-keypair". The Amazon document instructs you to store the private key of the keypair in a local file. The logon step where you connect to your instance using an ssh client requires the name of this file, so put it someplace where you can find it, for example ~/.aws/id-rsa-kp-ab123-gsgAgain, the name you use for this file is arbitrary, but you need some convention that will enable you to find the RSA private key file associated with each of your EC2 keypair names. The Network Security Group is another important issue that is not well covered in the Amazon document. Every EC2 instance runs in a named security group that you specify when you start the instance. The security group has a set of firewall rules that control network connectivity between instances in the group and instances outside it. If you start an instance without explicitly specifying a security group, the instance runs in a predefined group named "default". Clearly, it would be a Bad Idea for concurrent users of a shared account to have instances running in the (same) default security group; so we use the naming convention described above for security groups. You should create a new security group for the remainder of this exercise using a command like ec2-add-group gp-ab123-xxx -d "yyyyyyyy"Where as above ab123 should be replaced by your own netid, xxx by a string to make the name unique among group names you define, and yyyyyyy by a short description of the group. For example, ec2-add-group gp-ajd28-test -d "test group for getting started" You can check to make sure this worked by typing ec2-describe-groups gp-ab123-xxxor just ec2-describe-groupswhich will list all groups that have been defined by anyone using the shared account. For some of the later steps in this exercise you will need to specify the group name explicitly rather than allowing it to default to "default". The command that actually starts your instance is the first of these. For example ec2-run-instances ami-2bb65342 -g gp-ajd28-test -k kp-ajd28-gsgstarts an instance in the specified group gp-ajd28-test. In the next step, "Authorizing Network Access," you need to specify your group name in place of "default". For example, ec2-authorize gp-ajd28-test -P tcp -p 22Opens the standard TCP ports for ssh (22) and HTTP (80) in the group gp-ajd28-test. At this point, you should be able to connect to your instance with the ssh command from the Amazon documentation, using the name of the RSA private key file you saved when you created your EC2 keypair, and the external network address assigned to your running instance, for example ssh -i ~/.aws/id-rsa-kp-ajd28-gsg root@ec2-67-202-33-73.compute-1.amazonaws.comWell, "Congratulations!" You have started an instance. As the Amazon docs warn you, DO NOT go away without remembering to shut down your instance -- the account is charged for the instance for as long as it continues to run. Creating an ImageFor this part of the exercise there is a convention that may not be obvious: Examples that use the prompt string prompt>are commands you execute on your local machine; while examples using the prompt string #are commands you execute on a running EC2 instance to which you have logged in as the root user with ssh .
It is important to keep this straight.
In addition, we need to worry about naming conventions for Amazon Machine Image (AMI) bundles. The procedure used in the Amazon document gives the image files default names, and so does not support bundling more than one AMI into the same bucket. First, you need to copy the private key and X.509 certificates associated with (shared) account up to the running machine instance you want to bundle. Using our naming and environment conventions a command something like prompt> scp -i ~/.aws/id-rsa-kp-ajd28-gsg ${EC2_PRIVATE_KEY} ${EC2_CERT}will upload these files to the /mnt directory of the running instance.
At this point you are ready to ask the EC2 instance to bundle itself. As you can see from the Amazon document, the default bundling command creates a number of files with names of the form image.foo.barin the /mnt directory of the instance.
These names appear in the S3 bucket where the AMI is stored,
preventing you from creating more than one AMI in the same bucket.
This is a Bad Thing.
To avoid it, you need to add a common prefix to the name of each file
in the bundle using the -p option to the ec2-bundle-vol command.
This is the "Image Name Prefix" discussed in our naming conventions in Section 1.
The command
# ec2-bundle-vol -d /mnt -p im-ajd28-gsgbundles the image to a collection of files on /mnt ;
all the file names will begin with the prefix "im-ajd28-gsg" rather than
"image".
Here xxxxxxxx must be replaced by the names of the .pem files
that were uploaded using scp above,
and iiiiiiii must be replaced by your AWS Account ID,
the value of the environment variable AWS_ACCOUNT_ID
on your local machine.
Sadly, these values don't appear in the environment of the instance,
where you are executing the ec2-bundle-vol command,
so you will need to cut and paste.
The next step is uploade the AMI to S3.
The example command in the Amazon document does not reflect our use of
an image name prefix.
In the manifest file name # ec2-upload-bundle -b edu-cornell-cs-cs530-ajd28 -m /mnt/im-ajd28-gsg.manifest.xmlwas used to upload an AMI to my ownpersonal bucket. The aws-key-id and aws-secret-key need to be replaced by their true values. These are the values of $AWS_KEY_ID$ and $AWS_SECRET_KEY$ on your own machine, but again, as the upload command is being run on the EC2 instance, the environment variables will not be available and you will have to cut and paste. Once you have successfully uploaded your AMI you no longer need your running instance; you can shut it down with the command # /sbin/shutdown -h nowon the instance itself, or you can use the AWS command prompt> ec2-terminate-instances i-nnnnnwhich is probably more reliable. The final step of this lengthy process is to register your AMI so you can start it in a new instance. Again the Amazon documentation needs to be modified to get the manifest file name right. For example, the command prompt> ec2-register edu-cornell-cs-cs530-ajd28/im-ajd28-gsg.manifest.xmlcould be used to register the image we uploaded in the previous step. At this point you have a registered AMI and can try running it. The command given in the Amazon document has one of the same issue we discussed when running a public AMI: for a shared account, you should never start an instance in the "default" group, so the command to run your instance should specify one of your own group names, for example prompt> ec2-run-instances ami-5bae4b32 -g gp-ajd28-testComparing this command to the one used to start a public instance, note there is no longer a keypair ( -k ) argument --
the bundled image implicitly uses the same keypair that it was created with.
|