Tutorial 2: The Toolkit Job Object Model
The purpose of this tutorial is to introduce the various components in the Toolkit's Job Object Model.
When you are done, you should have:
- An understanding of the object model of jobs and workloads
- An understanding of how to compose workloads incrementally
Remember, you should have run the Getting Set Up tutorial first!
Reference and Namespace
The first thing to do is to reference the toolkit. Note that it is packaged in Batch.Toolkit.dll
The next thing is to open the namespace. Note that this is Batch.Toolkit
The DSL is found in the
1: 2: 3: |
|
Object Model: Command
A Command
is the basic unit of execution. This is typically the name of your executable file or batch script.
As we've seen before, a Command
can come in one of two flavours:
- A
SimpleCommand
is just a string with the command line of the executable you want to run. You can have spaces and static arguments passed to the executable name as part of a simple command - A
ParametrizedCommand
pairs a command line template with a collection of parameter names, which can be replaced by the toolkit with a range of values. Surround the parameter name with%
in the command line to identify it as a placeholder.
You can create instances of these as follows:
1: 2: |
|
Object Model: CommandWithErrorHandler
A CommandWithErrorHandler
is a construct which allows you to specify a Command
to execute, and optionally a Command
to run if it fails. You can use this to represent an error-recoverable operation; or an operation paired with its compensation.
You can create an instance of a CommandWithErrorHandler
as follows:
1:
|
|
Object Model: CommandSet
The toolkit provides a way to collect groups of commands so we can build up a more complex workload.
A CommandSet
has a MainCommands
block, which is a list of CommandWithErrorHandler
objects; and a FinallyCommands
block, which is a list of Command
s
You can create an instance of a CommandWithErrorHandler
as follows:
1: 2: |
|
You can use the CommandSet
object to model a block of work which needs to be done by a task, followed by a block of work to finalize the work.
The FinallyCommands
block is useful to specify commands to copy up the results of the work done into Azure storage, for example.
Note : Compositionality
The CommandSet
type is augmented with functions to make it a structure called a "Monoid". This means that a set of CommandSet
objects can be combined into a single CommandSet
.
The implication of this is that CommandSet
objects can be reused compositionally - new CommandSet
objects can be built up from many, pre-developed CommandSet
s.
1: 2: 3: 4: 5: 6: 7: 8: |
|
combinedCommandSet
will now have the following structure:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: |
|
Note : Script Generation
You only need to supply the construct itself. The toolkit will automatically generate a script to:
-
For each
CommandWithErrorHandler
in theMainCommands
block- Execute the
Try
part of the construct - Check if the error status represents a failure
- Execute the
OnError
part of the construct only if an error was signalled
- Execute the
-
For each
Command
in theFinallyCommands
block- Execute the command
The toolkit will craft an instance of an Azure Batch CloudTask
, attach the script file its resource, and set its CommandLine to run the script.
Object Model: LocalFiles & UploadedFiles
A Batch task consists of commands and resources. A resource is a file associated with the task.
Since the task executes on a remote machine, there are two types of files:
LocalFiles
are files that are located locally. You can refer to them usingSystem.IO.FileInfo
instances.UploadedFiles
are files that have already been uploaded and are located in Azure. You can refer to them usingMicrosoft.Azure.Batch.FileStaging.ResourceFile
instances.
Like CommandSet
s, LocalFiles
and UploadedFiles
are also monoids so you can build them up incrementally.
1: 2: |
|
The LocalFiles
and UploadedFiles
types are also monoidally foldable.
Note : File upload
The toolkit automatically includes a file upload phase, where the files associated with a WorkloadUnitTemplate
(and ultimately with a CloudTask
object), are uploaded into a container specified by the StagingContainerName
member of the StorageConfiguration
object.
Tutorial 1 has an example of how the StorageConfiguration
object is set up and used.
Object Model: WorkloadUnitTemplate
We can collect together a CommandSet
and a LocalFiles
collection into something that forms the recipe for a single unit of computation to be executed in Batch.
This object is named WorkloadUnitTemplate
, bearing in mind that the Command
s in the CommandSet
are potentially parametrized.
The toolkit expresses the WorkloadUnitTemplate
into a separateCloudTask
object for each unique set of parameter values.
1: 2: 3: 4: 5: 6: |
|
Object Model: WorkloadArguments
As we have seen, the CommandSet
member of a WorkloadUnitTemplate
has is composed of multiple Command
s. Any or all of these Command
s can be ParametrizedCommand
s, and each ParametrizedCommand
may have multiple parameters.
We need a way to define the range of values to be assigned, in turn, to each parameter.
The WorkloadArguments
object allows us to define these collections and thereby specify the "parametric sweep" of the workload.
The WorkloadArguments
object is effectively a dictionary mapping the key (the parameter name - a string) to a list of values (the range of parameter values).
You can consttruct one as follows:
1: 2: 3: 4: 5: 6: |
|
The WorkloadArguments
object is also a monoid!
You can combine entire groups of argument lists incrementally, and it will merge in parameter names and values.
Object Model: WorkloadSpecification
We are now finally able to define what our complete workload is going to look like:
- A template for the computation and computation-specific resources
- A set of parameter ranges
In fact we can further gneralize and support multiple WorkloadUnitTemplate
s, and introduce the concept of files shared across all WorkloadUnitTemplate
s as well.
The toolkit can express such an object as a CloudJob
:
1: 2: 3: 4: 5: 6: |
|
For good measure, the WorkloadSpecification
is also a monoid!
We can incrementally build up simple workloads, and smash them together to make a complex workload, facilitating compositional re-use.
We have seen how we can incrementally build up a workload from smaller bits (each of which may reuse previously defined pieces), and define a batch CloudJob
.
Other tutorials will cover the Pool Object Model, and describe how you can execute the workload itself against a pool of Azure Virtual Machines.
from Batch.Toolkit
Full name: Tutorial2.simpleCommand
Full name: Tutorial2.parametrizedCommand
| SimpleCommand of string
| ParametrizedCommand of ParametrizedCommand
Full name: Batch.Toolkit.Command
Full name: Tutorial2.recoverableCommand
Full name: Tutorial2.adiosCommand
Full name: Tutorial2.sayHelloAndGoodbye
Full name: Tutorial2.copyResultsToAzureCommand
Full name: Tutorial2.copyResultsToAzure
Full name: Tutorial2.combinedCommandSet
Full name: Tutorial2.localFiles
union case LocalFiles.LocalFiles: System.IO.FileInfo list -> LocalFiles
--------------------
type LocalFiles =
| LocalFiles of FileInfo list
static member Zero : LocalFiles
static member ( + ) : a:LocalFiles * b:LocalFiles -> LocalFiles
Full name: Batch.Toolkit.LocalFiles
type FileInfo =
inherit FileSystemInfo
new : fileName:string -> FileInfo
member AppendText : unit -> StreamWriter
member CopyTo : destFileName:string -> FileInfo + 1 overload
member Create : unit -> FileStream
member CreateText : unit -> StreamWriter
member Decrypt : unit -> unit
member Delete : unit -> unit
member Directory : DirectoryInfo
member DirectoryName : string
member Encrypt : unit -> unit
...
Full name: System.IO.FileInfo
--------------------
System.IO.FileInfo(fileName: string) : unit
Full name: Tutorial2.emptyRemoteFiles
--------------------
Full name: Tutorial2.simpleWorkloadUnitTemplate
Full name: Tutorial2.workloadNames
module Map
from Microsoft.FSharp.Collections
--------------------
type Map<'Key,'Value (requires comparison)> =
interface IEnumerable
interface IComparable
interface IEnumerable<KeyValuePair<'Key,'Value>>
interface ICollection<KeyValuePair<'Key,'Value>>
interface IDictionary<'Key,'Value>
new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>
member Add : key:'Key * value:'Value -> Map<'Key,'Value>
member ContainsKey : key:'Key -> bool
override Equals : obj -> bool
member Remove : key:'Key -> Map<'Key,'Value>
...
Full name: Microsoft.FSharp.Collections.Map<_,_>
--------------------
new : elements:seq<'Key * 'Value> -> Map<'Key,'Value>
Full name: Microsoft.FSharp.Collections.Map.ofSeq
union case WorkloadArguments.WorkloadArguments: Map<string,Set<string>> -> WorkloadArguments
--------------------
type WorkloadArguments =
| WorkloadArguments of Map<string,Set<string>>
static member Zero : WorkloadArguments
static member ( + ) : a:WorkloadArguments * b:WorkloadArguments -> WorkloadArguments
Full name: Batch.Toolkit.WorkloadArguments
Full name: Tutorial2.workload