Azure.Batch.Toolkit


Tutorial 2: The Toolkit Job Object Model

The purpose of this tutorial is to introduce the various components in the Toolkit's Job Object Model.

When you are done, you should have:

  1. An understanding of the object model of jobs and workloads
  2. An understanding of how to compose workloads incrementally

Remember, you should have run the Getting Set Up tutorial first!

Reference and Namespace

The first thing to do is to reference the toolkit. Note that it is packaged in Batch.Toolkit.dll

The next thing is to open the namespace. Note that this is Batch.Toolkit

The DSL is found in the

1: 
2: 
3: 
#r "Batch.Toolkit.dll"
open Batch.Toolkit
open Batch.Toolkit.DSL

Object Model: Command

A Command is the basic unit of execution. This is typically the name of your executable file or batch script.

As we've seen before, a Command can come in one of two flavours:

  • A SimpleCommand is just a string with the command line of the executable you want to run. You can have spaces and static arguments passed to the executable name as part of a simple command
  • A ParametrizedCommand pairs a command line template with a collection of parameter names, which can be replaced by the toolkit with a range of values. Surround the parameter name with % in the command line to identify it as a placeholder.

You can create instances of these as follows:

1: 
2: 
let simpleCommand = SimpleCommand "echo 'Hello, World!'"
let parametrizedCommand = { Command = "echo 'Hello, %user%"; Parameters = ["user"] }

Object Model: CommandWithErrorHandler

A CommandWithErrorHandler is a construct which allows you to specify a Command to execute, and optionally a Command to run if it fails. You can use this to represent an error-recoverable operation; or an operation paired with its compensation.

You can create an instance of a CommandWithErrorHandler as follows:

1: 
let recoverableCommand = { Try = parametrizedCommand; OnError = Some simpleCommand }

Object Model: CommandSet

The toolkit provides a way to collect groups of commands so we can build up a more complex workload.

A CommandSet has a MainCommands block, which is a list of CommandWithErrorHandler objects; and a FinallyCommands block, which is a list of Commands

You can create an instance of a CommandWithErrorHandler as follows:

1: 
2: 
let adiosCommand = SimpleCommand "echo 'Goodbye, Cruel World!'"
let sayHelloAndGoodbye = { MainCommands = [recoverableCommand]; FinallyCommands = [adiosCommand]}

You can use the CommandSet object to model a block of work which needs to be done by a task, followed by a block of work to finalize the work.

The FinallyCommands block is useful to specify commands to copy up the results of the work done into Azure storage, for example.

Note : Compositionality

The CommandSet type is augmented with functions to make it a structure called a "Monoid". This means that a set of CommandSet objects can be combined into a single CommandSet.

The implication of this is that CommandSet objects can be reused compositionally - new CommandSet objects can be built up from many, pre-developed CommandSets.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
let copyResultsToAzureCommand = SimpleCommand "run-this-command-to-copy-results-to-azure"
let copyResultsToAzure = 
    {
         MainCommands = []
         FinallyCommands = [copyResultsToAzureCommand]
    }

let combinedCommandSet = sayHelloAndGoodbye + copyResultsToAzure

combinedCommandSet will now have the following structure:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
{ 
    MainCommands = 
        [
            recoverableCommand
        ] 
    FinallyCommands = 
        [
            adiosCommand
            copyResultsToAzureCommand
        ]
}

Note : Script Generation

You only need to supply the construct itself. The toolkit will automatically generate a script to:

  • For each CommandWithErrorHandler in the MainCommands block
    • Execute the Try part of the construct
    • Check if the error status represents a failure
    • Execute the OnError part of the construct only if an error was signalled
  • For each Command in the FinallyCommands block
    • Execute the command

The toolkit will craft an instance of an Azure Batch CloudTask, attach the script file its resource, and set its CommandLine to run the script.

Object Model: LocalFiles & UploadedFiles

A Batch task consists of commands and resources. A resource is a file associated with the task.

Since the task executes on a remote machine, there are two types of files:

  • LocalFiles are files that are located locally. You can refer to them using System.IO.FileInfo instances.
  • UploadedFiles are files that have already been uploaded and are located in Azure. You can refer to them using Microsoft.Azure.Batch.FileStaging.ResourceFile instances.

Like CommandSets, LocalFiles and UploadedFiles are also monoids so you can build them up incrementally.

1: 
2: 
let localFiles = LocalFiles [ System.IO.FileInfo @"resources\7zip.msi" ]
let emptyRemoteFiles = UploadedFiles []

The LocalFiles and UploadedFiles types are also monoidally foldable.

Note : File upload

The toolkit automatically includes a file upload phase, where the files associated with a WorkloadUnitTemplate (and ultimately with a CloudTask object), are uploaded into a container specified by the StagingContainerName member of the StorageConfiguration object.

Tutorial 1 has an example of how the StorageConfiguration object is set up and used.

Object Model: WorkloadUnitTemplate

We can collect together a CommandSet and a LocalFiles collection into something that forms the recipe for a single unit of computation to be executed in Batch.

This object is named WorkloadUnitTemplate, bearing in mind that the Commands in the CommandSet are potentially parametrized.

The toolkit expresses the WorkloadUnitTemplate into a separateCloudTask object for each unique set of parameter values.

1: 
2: 
3: 
4: 
5: 
6: 
let simpleWorkloadUnitTemplate = 
    {
        WorkloadUnitRunElevated = true
        WorkloadUnitCommandSet = combinedCommandSet
        WorkloadUnitLocalFiles = localFiles
    }

Object Model: WorkloadArguments

As we have seen, the CommandSet member of a WorkloadUnitTemplate has is composed of multiple Commands. Any or all of these Commands can be ParametrizedCommands, and each ParametrizedCommand may have multiple parameters.

We need a way to define the range of values to be assigned, in turn, to each parameter.

The WorkloadArguments object allows us to define these collections and thereby specify the "parametric sweep" of the workload.

The WorkloadArguments object is effectively a dictionary mapping the key (the parameter name - a string) to a list of values (the range of parameter values).

You can consttruct one as follows:

1: 
2: 
3: 
4: 
5: 
6: 
let workloadNames = 
    [ 
        ("name", ["John"; "Ivan"; "Mark"])
    ] 
    |> Map.ofSeq 
    |> WorkloadArguments

The WorkloadArguments object is also a monoid!

You can combine entire groups of argument lists incrementally, and it will merge in parameter names and values.

Object Model: WorkloadSpecification

We are now finally able to define what our complete workload is going to look like:

  • A template for the computation and computation-specific resources
  • A set of parameter ranges

In fact we can further gneralize and support multiple WorkloadUnitTemplates, and introduce the concept of files shared across all WorkloadUnitTemplates as well.

The toolkit can express such an object as a CloudJob:

1: 
2: 
3: 
4: 
5: 
6: 
let workload = 
    {
        WorkloadUnitTemplates = [simpleWorkloadUnitTemplate]
        WorkloadCommonLocalFiles = []
        WorkloadArguments = workloadNames
    }

For good measure, the WorkloadSpecification is also a monoid!

We can incrementally build up simple workloads, and smash them together to make a complex workload, facilitating compositional re-use.

We have seen how we can incrementally build up a workload from smaller bits (each of which may reuse previously defined pieces), and define a batch CloudJob.

Other tutorials will cover the Pool Object Model, and describe how you can execute the workload itself against a pool of Azure Virtual Machines.

namespace Batch
namespace Batch.Toolkit
module DSL

from Batch.Toolkit
val simpleCommand : Command

Full name: Tutorial2.simpleCommand
union case Command.SimpleCommand: string -> Command
val parametrizedCommand : ParametrizedCommand

Full name: Tutorial2.parametrizedCommand
type Command =
  | SimpleCommand of string
  | ParametrizedCommand of ParametrizedCommand

Full name: Batch.Toolkit.Command
val recoverableCommand : CommandWithErrorHandler

Full name: Tutorial2.recoverableCommand
union case Option.Some: Value: 'T -> Option<'T>
val adiosCommand : Command

Full name: Tutorial2.adiosCommand
val sayHelloAndGoodbye : CommandSet

Full name: Tutorial2.sayHelloAndGoodbye
val copyResultsToAzureCommand : Command

Full name: Tutorial2.copyResultsToAzureCommand
val copyResultsToAzure : CommandSet

Full name: Tutorial2.copyResultsToAzure
val combinedCommandSet : CommandSet

Full name: Tutorial2.combinedCommandSet
val localFiles : LocalFiles

Full name: Tutorial2.localFiles
Multiple items
union case LocalFiles.LocalFiles: System.IO.FileInfo list -> LocalFiles

--------------------
type LocalFiles =
  | LocalFiles of FileInfo list
  static member Zero : LocalFiles
  static member ( + ) : a:LocalFiles * b:LocalFiles -> LocalFiles

Full name: Batch.Toolkit.LocalFiles
namespace System
namespace System.IO
Multiple items
type FileInfo =
  inherit FileSystemInfo
  new : fileName:string -> FileInfo
  member AppendText : unit -> StreamWriter
  member CopyTo : destFileName:string -> FileInfo + 1 overload
  member Create : unit -> FileStream
  member CreateText : unit -> StreamWriter
  member Decrypt : unit -> unit
  member Delete : unit -> unit
  member Directory : DirectoryInfo
  member DirectoryName : string
  member Encrypt : unit -> unit
  ...

Full name: System.IO.FileInfo

--------------------
System.IO.FileInfo(fileName: string) : unit
val emptyRemoteFiles : UploadedFiles

Full name: Tutorial2.emptyRemoteFiles
Multiple items

--------------------
val simpleWorkloadUnitTemplate : WorkloadUnitTemplate

Full name: Tutorial2.simpleWorkloadUnitTemplate
val workloadNames : WorkloadArguments

Full name: Tutorial2.workloadNames
Multiple items
module Map

from Microsoft.FSharp.Collections

--------------------
type Map<'Key,'Value (requires comparison)> =
  interface IEnumerable
  interface IComparable
  interface IEnumerable<KeyValuePair<'Key,'Value>>
  interface ICollection<KeyValuePair<'Key,'Value>>
  interface IDictionary<'Key,'Value>
  new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>
  member Add : key:'Key * value:'Value -> Map<'Key,'Value>
  member ContainsKey : key:'Key -> bool
  override Equals : obj -> bool
  member Remove : key:'Key -> Map<'Key,'Value>
  ...

Full name: Microsoft.FSharp.Collections.Map<_,_>

--------------------
new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>
val ofSeq : elements:seq<'Key * 'T> -> Map<'Key,'T> (requires comparison)

Full name: Microsoft.FSharp.Collections.Map.ofSeq
Multiple items
union case WorkloadArguments.WorkloadArguments: Map<string,Set<string>> -> WorkloadArguments

--------------------
type WorkloadArguments =
  | WorkloadArguments of Map<string,Set<string>>
  static member Zero : WorkloadArguments
  static member ( + ) : a:WorkloadArguments * b:WorkloadArguments -> WorkloadArguments

Full name: Batch.Toolkit.WorkloadArguments
val workload : WorkloadSpecification

Full name: Tutorial2.workload
Fork me on GitHub