Docker-izing Production Quality API's
Updated: Jun 17, 2019
Where to begin? I always love to learn new technologies, especially when my job forces my hand. In this case, I was initially using Docker to develop facial recognition API, naively at best. Virtual environments just weren't going to cut it when I used packages such as dlib, torch, opencv, numpy, celery... I'll stop here. Making it all production quality became a journey and something I wanted to share for others to gain insights from.
Here I will cover:
Multicontainer communication between application ready docker images
Application deployment tools, commands, and configurations for AWS
Links to tutorials and posts that I used as reference when developing
Boiler plate repository as I cannot show all the code I used
This repository will be referenced continually so be sure to dig around!
Let us begin. I had a Docker file that was pretty naive. I utilized the concept of having a base image that contained the necessary packages and a runner image that would update application code for ease of deployment. This was all fine and good to get things running but I knew I would hit a crossroad and eventually need more efficient deployment. Initially, the stack contained uwsgi, nginx, supervisor as the manager for a flask as the web framework. Than asynchronous tasks came into play and I needed celery and rabbitmq integrations. This was that point. Elastic beanstalk went severe every time I deployed.
Thus began the multi container development. Tutorials such as this and this really paved the way for building DockerFiles ready for container communication. You'll see these docker files under all respective folders within /dock. Lets move on to the subsequent order of operation when getting containers to talk and how I used docker-compose to manage this.
Flask and Nginx
Initially, supervisor managed nginx and uwsgi to serve the flask application from one location. Now I needed to make the two communicate from two different containers. I was able to get enough information from those initial tutorials to do so. In the nginx container, flask.conf is added to /etc/nginx/conf.d/ in order for a server to reverse proxy to my uwsgi service. The uwsgi.ini is important here since it connects my application code, environment modules, and variables to a socket. You'll find this in my base folder. This socket needed configuration when moving to multi container. It is pretty intuitive once you spec the code.
Celery and RabbitMQ
You'll see in the first tutorial that celery is used to communicate to a message broker in rabbitmq. To pick out key points, you'll see the broker_url is managed by an environment variable that is automatically set by linking these two containers. The broker_url variable is RABBIT_PORT_5672_TCP. Due to a bridged networking, rabbitmq exposes 5672 to its own user and host. I'll get more into this later but since I wanted to run under the host network I had to do some manual configuration for this to fully work. Thus there are .env files within these containers that set the user, default password, and nodename for celery to link to. The code provided in the default.py will actually handle for both bridged and host networks, just substitute your username for ubuntu. You'll see how tools limit my ability to run on host all the time, so this configuration was big time.
Flask and Celery
Joining these two became more of a task. The decision to separate celery from the main application code raised the issue of mimicking the same environment with the same modules and packages that my application used. The background tasks would not be ancillary if I were not able to do so. The first requirement was to access the same application code from the celery container. After taking into account memory and portability, I landed on using Docker volumes to share across containers. Here is Docker's official documentation on doing so. A nice find for further sharing environments came with this post here. It is a nice hack on sharing environment variables, since I was throwing around access and secret keys that my background tasks needed as well. These could obviously not be set in plain text.
To manage all tasks I utilized docker-compose following this tutorial. Along with my yaml file in the dock folder, I also included management commands in the management folder that uses click and subprocess to run management commands like so ./manage compose up. This would be equivalent to running docker-compose service up -f dock/docker-compose.yml. You will also see some containers such as nginx and celery are run via scripts. The commands themselves are no different from if I were to set those as the entry commands for the container. However, the result is that other processes can finish initialization before they try to connect.
Following production quality guidelines for docker, my main takeaway was performance. Thus I needed to setup the networking to run over the host network rather than Docker's default bridged network. Using this bible of information, I learned the differences as well as the keys to configuring the correct setup. Here is also a nice stack overflow post to further explain the rational of this decision. You can toggle the network_mode in my docker-compose yaml just by commenting and uncommenting.
Now onto deploying to AWS.
Let me first list out what options I saw: standard aws ecs, ecs-cli, and Dockerrun.aws.json via elastic beanstalk. These were all the methods I used to experiment deployment. Standard ecs uses tasks.json to define tasks in ECS and can manage services. Esc-cli actually utilizes the yaml file similar to the one developed for docker-compose. This is a great thought and my first hope of working but due to its relatively recent introduction and ongoing development it was not viable. First and foremost, I had to write its own yaml file since it does not support all the parameters given in docker-compose. So my eventual option was to use the Dockerrun.aws.json. This had its own pitfalls, see below. But it is very viable and automates enough for me to be able to rely on. Read this for more information, especially check out the Amazon Resources Created by Elastic Beanstalk section. The first two options required more configuration like adding a load balancer and manual service management if you want to be able to expose the endpoint.
Sidebar, I use Amazon ECR in order to store containers. The authentication is much more manageable and I like containing my development environments. Check out the deploy.py in the management to see how I tag, push, and untag images for clean rollouts.
The two points of contention when deploying to AWS were volumes and networking, like I said above running on host network is preferred. To start, lets look at volume management. As you know, I used a volume to talk to celery and be able to run async tasks. Here is Amazon's guide on deploying volumes with Dockerrun.aws.json files. You will laugh after reading that and seeing this. Here is the insight I used to answer that question. This took me a while to figure out, especially when testing by deploying. Networking was my next consideration when choosing my method of deployment. So seeing this Amazon issue hurt. I had already given up on the other methods of deployment for the reasons listed above, so I guess I'll wait for Dockerrun.aws.json files to support this bridged networks (pretty sure ecs-cli doesn't support this either). I even tried to create multiple upstream options for nginx since it is different for host and bridged networks using these two posts  . Not a huge back breaker, since my application still runs in acceptable time and thankfully I have celery to run in background for tasks that do not.
The deploy command is simplified here. I deploy a zip containing the Dockerrun.aws.json as well as .ebextensions to be able to customize nginx limits, environment variable placeholders, and docker configurations for elastic beanstalk. Traditionally you can just deploy the Dockerrun.aws.json with a configuration to your artifact configuration in your .elasticbeanstalk/config.yml.
Hope you made it all the way here. I wrote this post with the intention of explaining the higher level overview of my thought process and key issues I came across rather than step by step show you how to use these technologies. I figured I would let the code speak for itself so be sure to dive into that repository. The ReadMe will detail further the management commands and their functionality. Some output will show you what would happen but of course will not do anything in reality. Contact me if you have more questions but hack around before asking them!