To Deploy in AWS ☁️
You need a Kubernetes cluster and kubectl
set up to be able to access that cluster. On AWS, we use Amazon Elastic Kubernetes Service (Amazon EKS) for this.
- Please refer to the Amazon EKS on how to set things up
- Make sure you can AWS CLI installed and configured as well
Also, make sure Docker is installed and running in your environment
Set up Paradigm
In a terminal with the above kubectl access, follow the below steps.
- (Recommended) Create a new Python environment with your preferred environment manager
- Clone this repo
git clone https://github.com/ParadigmAI/paradigm.git
- Go into the directory
cd paradigm
- Make the installation script executable
chmod +x install-aws.sh
- Run the intallation script
./install-aws.sh
- Validate if paradigm was properly installed
paradigm --help
Now let's move into your ML project folder
Your folder can contain one or more scripts/notebooks that you want to execute as steps in an ML pipeline.
From here we follow a basic example project just to make it easier to exaplin the commands. Please change the necessary parameters according to your project
- The preferred directory structure should be as follows. In the below example, p1, p2 and p3
represent the names of the python scripts or notebooks you have. (Refer the examples/basic)
- IMPORTANT - Note the requirements.<file name>
files. You have to create a txt with that specific naming only for the scripts or notebooks that have additional dependencies. It becomes the requirements.txt
for that step. We promise this is the only file addition before taking your ML code to prodution.
- Example:
- 📁 project_root
- 📄 p1.py
- 📄 p2.ipynb
- 📄 p3.py
- 📄 requirements.p1
- 📄 requirements.p3
- Now we are ready to let Paradigm get things ready before deploying to Kubernetes. Include the scripts/notebook you want as steps in the below command. This command basically containerizes your code.
paradigm launch --steps p1 p2 p3 --region_name us-east-1
- As the final step, deploy the pipeline with the below command.
paradigm deploy --steps p1 p2 --dependencies "p2:p1,p3:p2|p1" --deployment p3 --deployment_port 8000 --deployment_memory 2Gi --output workflow.yaml --name pipe1 --region_name us-east-1
-
In the above command:
--steps
should speicify all steps, except any step that should be run as a service, e.g., an API endpoint.--dependencies "p2:p1,p3:p2|p1"
defines the graph stucture (DAG) on how the steps should be run. In this example, we are stating that stepp2
is dependent onp1
and stepp3
is dependent on bothp2
andp1
.--deployment p3
defines a service that needs to be run at the end of the pipeline. Hence, we don't mention is under--steps
.--deployment_port
is defined if the above service is exposed via a specific port internally.--deployment_memory
is to specify the amount of memory required for the deployment step--name
can be any name that you want to give this particualr pipeline--region_name
is the aws region that you want to use
-
(OPTIONAL) You can use Argo UI to observe all pipelines and get logs. For that, first make it accessible via your browser by running the below command.
kubectl -n paradigm port-forward deployment/argo-server 2746:2746
- Now I your local browser, go to
http://localhost:2746
-
(OPTIONAL) In case you want to delete the running service and deployment, use the following commands.
<deployment_step>
is the make of the file that has the deolyment code.kubectl delete deployment deploy-<deployment_step> -n paradigm
kubectl delete service deploy-<deployment_step> -n paradigm