Hosting containers on AWS for teaching

Scripts to deploy multiple docker containers simultaneously for teaching. Below you will find the documentation. Are you new? Have a look at the tutorial

A note on security

The connections between the participant and the container will be through http, and are therefore not encrypted. Therefore:

No sensitive data in the hosted container
Do not let participants choose their own password
Be careful with downloading executable files
Do not re-use the IP of the host machine
(If possible) restrict the IP from which to approach the host machine

We are currently developing scripts and docs on how to host containers with a reverse proxy over https. You can find it here: https://github.com/GeertvanGeest/AWS-docker-nginx

Preparation

Start an AWS EC2 instance with an Ubuntu AMI. If you are new to this, here is a good place to get started.

Here’s a repository that automates the start of an EC2 instance with terraform.

If docker is not pre-installed in your chosen AMI (it is pre-installed in e.g. the Ubuntu deep learning base AMI), install docker on the instance:

curl https://get.docker.com | sh
sudo usermod -a -G docker ubuntu # ubuntu is the user with root access
sudo service docker start

You can add the above code to the code run at initialisation. Otherwise, logout and login again to be able use docker without sudo.

After that, clone this repository:

git clone https://github.com/GeertvanGeest/AWS-docker.git

Generate credentials

You can generate credentials from a comma separated list of users, with two columns: first name and last name. Do not use column names. Here’s an example:

Jan,de Wandelaar
Piet,Kopstoot
Joop,Zoetemelk

Run the script generate_credentials.sh like this (use -l to specify the user list):

./generate_credentials \
-l examples/user_list_credentials.txt \
-o ./credentials
-p 9001
-a 18.192.64.150

The option -o specifies an output directory in which the following files are created:

input_docker_start.txt: A file that serves as input to deploy the docker containers on the server
user_info.txt: A file with user names, passwords and links that can be used to communicate credentials to the participants

The option -p is used to generate an individual port for each user. Ports will be assigned in an increasing number from -p to each user. So, in the example above, the first user gets port 9001, the second 9002, the third 9003, etc. Be aware that port 9000 and 10000 are reserved for the admin containers!

The option -a specifies the address on which the container will be hosted. This is used to generate the address per user. If you have already started your instance (or if you have an elastic IP), specify the public IPv4 address here.

Deploying containers

The output generated by generate_credentials consists of two files. One of them is called input_docker_start.txt and should look like this (if you wish, you can also choose to generate this yourself):

9001    jdewandelaar    OZDRqwMRmkjKzM48v+I=
9002    pkopstoot   YTnSh6SmhsVUe+aC2HY=
9003    jzoetemelk  LadwVbiYY4rH0S5TjeI=

Each line will be used to start up a container hosted on the specified port and accessible with the specified password (third column). Once deployed, the jupyter notebook or rstudio server will be available through [HOST IP]:[PORT]. If you want to have both rstudio server and jupyter notebook running on the same instance, you can generate two tab-delimited files (one for rstudio and one for jupyter) and give them the same passwords for convenience. Note that each container uses a single port, so the files should contain different ports!

Deploy containers based on jupyter notebook

Prepare an image that you want to use for the course. This image should be based on a jupyter notebook container, e.g. jupyter/base-notebook, and should be available from dockerhub.

Run the script run_jupyter_notebooks on the server:

run_jupyter_notebooks \
-i jupyter/base-notebook \
-u examples/credentials_jupyter/input_docker_start.txt \
-p test1234

Here, -i is the image tag, -u is the user list as generated by ./generate_credentials.sh, and -p is the password for the admin container. No username is required to log on to a jupyter notebook.

To access the admin container, go to [HOST IP]:10000

Deploy containers based on Rstudio server

Prepare an image that you want to use for the course. This image should be based on a rocker image, e.g. rocker/rstudio, and should be available from dockerhub.

Run the script run_rstudio_server on the server:

run_rstudio_server \
-i rocker/rstudio \
-u examples/credentials_rstudio/input_docker_start.txt \
-p test1234

See above for the meaning of the options.

The username to log on to rstudio server is rstudio.

To access the admin container, go to [HOST IP]:9000

Deploy containers based on vscode server

Prepare an image that you want to use for the course. This image should be based on a image linuxserver/code-server image, and should be available from dockerhub.

In the docker file you can install code-server extensions with /usr/local/bin/install-extension.

Run the script run_vscode_server on the server:

run_vscode_server \
-i linuxserver/code-server \
-u examples/credentials_vscode/input_docker_start.txt \
-p test1234

See above for the meaning of the options.

To access the admin container, go to [HOST IP]:7000

Restricting resource usage

To prevent overcommitment of the server, it can be convenient to restrict resource usage per participant. You can do that with the options -c and -m, which are passed to the arguments --cpus and --memory of docker run. Use it like this:

run_rstudio_server \
-i rocker/rstudio \
-u examples/credentials_rstudio/input_docker_start.txt \
-p test1234 \
-c 2 \
-m 4g

Resulting in a hard limit of 2 cpu and 4 Gb of memory for each user. By default these are 2 cpu and 16 Gb of memory. These restrictions are not applied to the admin container.

Container & volume infrastructure

There are three volumes mounted to each container:

The volume data is mounted to /data. This volume is meant to harbour read-only data (e.g. raw data).
The volume group_work is mounted to /group_work. The group volume is meant as a shared directory, where everybody can read and write.
Each user has a personal volume, named after the username (output of generate_credentials). This volume is mounted to /home/rstudio/workdir/ for rstudio, /home/jovyan/workdir for jupyter, and /config/project for vscode.

Below you can find an example of the container infrastructure. Blue squares are containers, yellow are volumes. Arrows indicate accessibility.

container infrastructure

How to use admin privileges

The admin container (i.e. with sudo rights) is available from port 10000 for the jupyter containers and 9000 for the rstudio containers. The regular users at the ports specified in the tab-delimited text file.

You can check out a user volume with mount_user_volume.sh:

./mount_user_volume.sh user01

This will create an ubuntu container directly accessing the home directory of user01. As an alternative to this ubuntu container, you can mount the user volume to any other container.

Stopping services

You can stop all services (containers and volumes) with the script stop_services.sh.

Setting up a backup

With the script backup_s3.sh you can sync files from the docker volumes to s3. It will sync the shared volume group_work and the invidual user volumes. In order to run the script, first configure AWS cli on the server:

aws configure

More info about configuring AWS cli here.

After that, we can specify a cronjob, to sync these files regularly. The script scripts/backup_s3_cronjob.sh calls backup_s3.sh and can be used in your cronjob. To do this, first edit scripts/backup_s3_cronjob.sh:

#!/usr/bin/env bash

cd /home/ubuntu
AWS-docker/backup_s3 \
-u [CREDENTIALS input_docker_start.txt] \
-s [EXISTING S3 BUCKET] \
-e [DIRECTORY IN THE BUCKET (newly created)] \
2>> cronjob.err

Now run:

crontab -e

And add a cronjob. E.g. for every hour you can add this line (use a full path to the cronjob script):

0 * * * * /home/ubuntu/backup_s3_cronjob.sh