Tutorial 2: The Toolkit Job Object Model
The purpose of this tutorial is to introduce the various components in the Toolkit's Job Object Model.
When you are done, you should have:
- An understanding of the object model of jobs and workloads
- An understanding of how to compose workloads incrementally
Remember, you should have run the Getting Set Up tutorial first!
Reference and Namespace
The first thing to do is to reference the toolkit. Note that it is packaged in Batch.Toolkit.dll
The next thing is to open the namespace. Note that this is Batch.Toolkit
The DSL is found in the
1: 2: 3: |
|
Object Model: Command
A Command is the basic unit of execution. This is typically the name of your executable file or batch script.
As we've seen before, a Command can come in one of two flavours:
- A
SimpleCommandis just a string with the command line of the executable you want to run. You can have spaces and static arguments passed to the executable name as part of a simple command - A
ParametrizedCommandpairs a command line template with a collection of parameter names, which can be replaced by the toolkit with a range of values. Surround the parameter name with%in the command line to identify it as a placeholder.
You can create instances of these as follows:
1: 2: |
|
Object Model: CommandWithErrorHandler
A CommandWithErrorHandler is a construct which allows you to specify a Command to execute, and optionally a Command to run if it fails. You can use this to represent an error-recoverable operation; or an operation paired with its compensation.
You can create an instance of a CommandWithErrorHandler as follows:
1:
|
|
Object Model: CommandSet
The toolkit provides a way to collect groups of commands so we can build up a more complex workload.
A CommandSet has a MainCommands block, which is a list of CommandWithErrorHandler objects; and a FinallyCommands block, which is a list of Commands
You can create an instance of a CommandWithErrorHandler as follows:
1: 2: |
|
You can use the CommandSet object to model a block of work which needs to be done by a task, followed by a block of work to finalize the work.
The FinallyCommands block is useful to specify commands to copy up the results of the work done into Azure storage, for example.
Note : Compositionality
The CommandSet type is augmented with functions to make it a structure called a "Monoid". This means that a set of CommandSet objects can be combined into a single CommandSet.
The implication of this is that CommandSet objects can be reused compositionally - new CommandSet objects can be built up from many, pre-developed CommandSets.
1: 2: 3: 4: 5: 6: 7: 8: |
|
combinedCommandSet will now have the following structure:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: |
|
Note : Script Generation
You only need to supply the construct itself. The toolkit will automatically generate a script to:
-
For each
CommandWithErrorHandlerin theMainCommandsblock- Execute the
Trypart of the construct - Check if the error status represents a failure
- Execute the
OnErrorpart of the construct only if an error was signalled
- Execute the
-
For each
Commandin theFinallyCommandsblock- Execute the command
The toolkit will craft an instance of an Azure Batch CloudTask, attach the script file its resource, and set its CommandLine to run the script.
Object Model: LocalFiles & UploadedFiles
A Batch task consists of commands and resources. A resource is a file associated with the task.
Since the task executes on a remote machine, there are two types of files:
LocalFilesare files that are located locally. You can refer to them usingSystem.IO.FileInfoinstances.UploadedFilesare files that have already been uploaded and are located in Azure. You can refer to them usingMicrosoft.Azure.Batch.FileStaging.ResourceFileinstances.
Like CommandSets, LocalFiles and UploadedFiles are also monoids so you can build them up incrementally.
1: 2: |
|
The LocalFiles and UploadedFiles types are also monoidally foldable.
Note : File upload
The toolkit automatically includes a file upload phase, where the files associated with a WorkloadUnitTemplate (and ultimately with a CloudTask object), are uploaded into a container specified by the StagingContainerName member of the StorageConfiguration object.
Tutorial 1 has an example of how the StorageConfiguration object is set up and used.
Object Model: WorkloadUnitTemplate
We can collect together a CommandSet and a LocalFiles collection into something that forms the recipe for a single unit of computation to be executed in Batch.
This object is named WorkloadUnitTemplate, bearing in mind that the Commands in the CommandSet are potentially parametrized.
The toolkit expresses the WorkloadUnitTemplate into a separateCloudTask object for each unique set of parameter values.
1: 2: 3: 4: 5: 6: |
|
Object Model: WorkloadArguments
As we have seen, the CommandSet member of a WorkloadUnitTemplate has is composed of multiple Commands. Any or all of these Commands can be ParametrizedCommands, and each ParametrizedCommand may have multiple parameters.
We need a way to define the range of values to be assigned, in turn, to each parameter.
The WorkloadArguments object allows us to define these collections and thereby specify the "parametric sweep" of the workload.
The WorkloadArguments object is effectively a dictionary mapping the key (the parameter name - a string) to a list of values (the range of parameter values).
You can consttruct one as follows:
1: 2: 3: 4: 5: 6: |
|
The WorkloadArguments object is also a monoid!
You can combine entire groups of argument lists incrementally, and it will merge in parameter names and values.
Object Model: WorkloadSpecification
We are now finally able to define what our complete workload is going to look like:
- A template for the computation and computation-specific resources
- A set of parameter ranges
In fact we can further gneralize and support multiple WorkloadUnitTemplates, and introduce the concept of files shared across all WorkloadUnitTemplates as well.
The toolkit can express such an object as a CloudJob:
1: 2: 3: 4: 5: 6: |
|
For good measure, the WorkloadSpecification is also a monoid!
We can incrementally build up simple workloads, and smash them together to make a complex workload, facilitating compositional re-use.
We have seen how we can incrementally build up a workload from smaller bits (each of which may reuse previously defined pieces), and define a batch CloudJob.
Other tutorials will cover the Pool Object Model, and describe how you can execute the workload itself against a pool of Azure Virtual Machines.
from Batch.Toolkit
Full name: Tutorial2.simpleCommand
Full name: Tutorial2.parametrizedCommand
| SimpleCommand of string
| ParametrizedCommand of ParametrizedCommand
Full name: Batch.Toolkit.Command
Full name: Tutorial2.recoverableCommand
Full name: Tutorial2.adiosCommand
Full name: Tutorial2.sayHelloAndGoodbye
Full name: Tutorial2.copyResultsToAzureCommand
Full name: Tutorial2.copyResultsToAzure
Full name: Tutorial2.combinedCommandSet
Full name: Tutorial2.localFiles
union case LocalFiles.LocalFiles: System.IO.FileInfo list -> LocalFiles
--------------------
type LocalFiles =
| LocalFiles of FileInfo list
static member Zero : LocalFiles
static member ( + ) : a:LocalFiles * b:LocalFiles -> LocalFiles
Full name: Batch.Toolkit.LocalFiles
type FileInfo =
inherit FileSystemInfo
new : fileName:string -> FileInfo
member AppendText : unit -> StreamWriter
member CopyTo : destFileName:string -> FileInfo + 1 overload
member Create : unit -> FileStream
member CreateText : unit -> StreamWriter
member Decrypt : unit -> unit
member Delete : unit -> unit
member Directory : DirectoryInfo
member DirectoryName : string
member Encrypt : unit -> unit
...
Full name: System.IO.FileInfo
--------------------
System.IO.FileInfo(fileName: string) : unit
Full name: Tutorial2.emptyRemoteFiles
--------------------
Full name: Tutorial2.simpleWorkloadUnitTemplate
Full name: Tutorial2.workloadNames
module Map
from Microsoft.FSharp.Collections
--------------------
type Map<'Key,'Value (requires comparison)> =
interface IEnumerable
interface IComparable
interface IEnumerable<KeyValuePair<'Key,'Value>>
interface ICollection<KeyValuePair<'Key,'Value>>
interface IDictionary<'Key,'Value>
new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>
member Add : key:'Key * value:'Value -> Map<'Key,'Value>
member ContainsKey : key:'Key -> bool
override Equals : obj -> bool
member Remove : key:'Key -> Map<'Key,'Value>
...
Full name: Microsoft.FSharp.Collections.Map<_,_>
--------------------
new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>
Full name: Microsoft.FSharp.Collections.Map.ofSeq
union case WorkloadArguments.WorkloadArguments: Map<string,Set<string>> -> WorkloadArguments
--------------------
type WorkloadArguments =
| WorkloadArguments of Map<string,Set<string>>
static member Zero : WorkloadArguments
static member ( + ) : a:WorkloadArguments * b:WorkloadArguments -> WorkloadArguments
Full name: Batch.Toolkit.WorkloadArguments
Full name: Tutorial2.workload