Batch Jobs | Notion

<aside> 💡 Tip: To edit and publish batch jobs, you don’t need to start them. Simply start editing inside the beautiful editor and that’s it.

</aside>

Overview

Batch jobs within Ploutos are python programs and scripts that can be run from the browser and executed in the cloud with ease. The fundamental difference between running batch jobs on Ploutos versus elsewhere is the ease with which one can create and run them. Pop open a browser, bring your code or write it and press run. The notion of setting up instance (or instances), installing packages and dependencies, setting up communication, ports and security and tailing logs is completely eliminated. Just press run and you can train a pytorch or tensorflow model on multiple GPUs and mulitple Nodes with a single click. Let’s dig deeper.

Create Batchjob projects

<aside> 💡 Make sure to create an environment first before creating a batch job. You will need to

</aside>

Creating batch job is very simple. On the home page, click on Create + and select Job. You can create an empty job or upload one from your computer or from your Drive or from GitHub. Here is the screenshot of the modal window where you choose:

Untitled

Once you select the project source - Empty, Filesystem, Drive or Github, you will then pick the project source/root directory and a main file. The main file is where the execution begins. The platform will launch the project by calling:

YOUR_ENV/python YOUR_MAIN_FILE.py YOUR_ARGUMENTS

In addition, you must select an Environment. Environment is a user defined environment that is essentially a conda yaml file. In addition you can also attach secrets to an environment. The platform will install the conda environment and store secrets as environment variables. It is important to note that you as a user must first define an environment. You can create an environment from the sidebar by going into Environments and pressing Create + at the top left.

Besides Job Settings, there are a few other settings, a user can control - Compute Settings, Distributed Settings and Collaboration Settings.

Under Compute Settings, the user can choose what type of compute (CPU, Memory) and GPUs they need for the job. A user can also choose to add more disk capacity.

Under Distributed Settings, the user can toggle on Distributed Training. This is a special feature for pytorch and tensorflow distributed training modules. User can simply toggle it on and select how many nodes they want to train a model on.

<aside> 💡 The training code simply needs to use torch.ddp or tensorflow.distributed package. The Ploutos platform is deeply integrated with both those packages and will set up the rank, master, ports and NCCL communication etc.

</aside>

Collaboration Settings control, who can edit the code in real time with the author. You can invite users to join live editing session and collaborate with them in real time.

Editing batchjobs

Editing a batch jobs is simply editing the code files in the browser. The Ploutos code editor is built on top of Monaco code editor which is the same editor component that VSCode uses.

You can edit code with auto-completion built in. (Code generation coming soon). You can create, edit and delete files and folders by right clicking on the directory navigation tree.

Overview

Create Batchjob projects

Editing batchjobs

Running batch jobs