Experiment Configuration#
Concepts#
Sites#
In this section, we define the resources to be provisioned on the different sites/testbeds for the experiment.
Software#
In this section, we define the software to be installed on the provisioned resources. Currently, we support installing Docker, Apptainer, and Ollama.
Deployment#
In this section, we define the cluster to be deployed on the provisioned resources. Currently, we support deploying HTCondor.
Experiments#
In this section, we define the experiments to be run on the provisioned resources.
Example#
# -------------------------------------------------------------------------
#
# Clone the repository from GitHub,
# git clone https://github.com/pegasus-isi/kiso-plankifier-experiment.git
# Install Kiso and its dependencies,
# pip install kiso[chameleon]
# Check the experiment configuration.
# kiso check
# Set up the experiment.
# kiso up
# Run the experiment.
# kiso run
# Destroy the experiment.
# kiso down
# See: https://github.com/pegasus-isi/kiso-plankifier-experiment.README.md
#
# -------------------------------------------------------------------------
name: plankifier-experiment
deployment:
htcondor:
- kind: central-manager
labels:
- central-manager-daemon
- kind: submit
labels:
- submit-daemon
- kind: execute
labels:
- execute-cloud-daemon
- kind: execute
labels:
- execute-edge-daemon
config_file: config/execute.conf
software:
apptainer:
labels:
- execute-cloud-daemon
sites:
- kind: chameleon-edge
walltime: "04:00:00"
lease_name: edge-lease
rc_file: secrets/edge-app-cred-oac-edge-openrc.sh
resources:
machines:
- labels:
- execute-edge-daemon
machine_name: raspberrypi4-64
count: 1
container:
name: execute
image: pegasus/plankifier
- kind: chameleon
walltime: "04:00:00"
lease_name: tacc-lease
rc_file: secrets/tacc-app-cred-oac-edge-openrc.sh
key_name: mayani-mac-mini
image: CC-Ubuntu18.04
resources:
machines:
- labels:
- central-manager-daemon
- submit-daemon
- execute-cloud-daemon
flavour: compute_zen3
number: 1
image: CC-Ubuntu22.04
networks:
- sharednet1
experiments:
- kind: pegasus
name: plankifier-experiment
count: 1
main: ./workflow.py
submit_node_labels:
- submit-daemon
inputs:
- labels:
- execute-edge-daemon
src: bin/train.py
dst: /srv/plankifier/
- labels:
- execute-edge-daemon
src: bin/predict.py
dst: /srv/plankifier/
setup:
- labels:
- submit-daemon
script: |
chmod +x workflow.py
- labels:
- execute-edge-daemon
script: |
chmod +x /srv/plankifier/train.py /srv/plankifier/predict.py
outputs:
- labels:
- submit-daemon
src: ~kiso/kiso-plankifier-experiment/output/count.txt
dst: ./
Schema#
Kiso experiment configuration#
type |
object |
|
properties |
||
|
A suitable name for the experiment |
|
type |
string |
|
|
Define all the resources to be provisioned |
|
type |
array |
|
items |
||
minItems |
1 |
|
|
Software to be installed on the resources |
|
type |
object |
|
properties |
||
|
||
|
||
|
||
additionalProperties |
False |
|
|
Workload management system to be installed on the resources |
|
type |
object |
|
properties |
||
|
||
additionalProperties |
False |
|
|
Define all the experiments to be executed |
|
type |
array |
|
items |
||
minItems |
1 |
|
additionalProperties |
False |
|
Site Definition#
oneOf |
allOf |
|
allOf |
||
allOf |
||
allOf |
Experiment Definition#
htcondor#
Specify how and on which resources HTCondor should be installed
type |
object |
|
properties |
||
|
Specify which resource will have the central manager and it’s configuration |
|
type |
string |
|
enum |
central-manager, execute, submit, personal |
|
|
||
|
type |
string |
additionalProperties |
False |
|
Vagrant Configuration Schema#
type |
object |
||
properties |
|||
|
Default VM hypervisor to use (default: libvirt) |
||
type |
string |
||
enum |
libvirt, virtualbox |
||
|
Base image to use (default: generic/debian11) |
||
type |
string |
||
|
SSH user to use (default: root) |
||
type |
string |
||
|
Prepend this prefix to box names |
||
type |
string |
||
|
Extra config to pass (in vagrant DSL) |
||
type |
string |
||
|
Vagrant Resource |
||
type |
object |
||
properties |
|||
|
type |
array |
|
items |
|||
uniqueItems |
True |
||
|
type |
array |
|
items |
|||
additionalProperties |
False |
||
|
const |
vagrant |
|
additionalProperties |
False |
||
Vagrant Network#
type |
object |
||
properties |
|||
|
type |
string |
|
|
Labels Schema |
||
A list of labels identify the resources. The values are strings that can’t start with ‘kiso.’ and can contain alphanumeric characters, dots, underscores and hyphens |
|||
type |
array |
||
items |
type |
string |
|
pattern |
^(?!kiso\\.)[a-zA-Z0-9._-]+$ |
||
minItems |
1 |
||
uniqueItems |
True |
||
additionalProperties |
False |
||
Vagrant Compute#
type |
object |
||
properties |
|||
|
VM hypervisor to use |
||
type |
string |
||
enum |
libvirt, virtualbox |
||
|
Base image to use |
||
type |
string |
||
|
SSH user to use |
||
type |
string |
||
|
Extra config to pass (in vagrant DSL) |
||
type |
string |
||
|
Labels Schema |
||
A list of labels identify the resources. The values are strings that can’t start with ‘kiso.’ and can contain alphanumeric characters, dots, underscores and hyphens |
|||
type |
array |
||
items |
type |
string |
|
pattern |
^(?!kiso\\.)[a-zA-Z0-9._-]+$ |
||
minItems |
1 |
||
uniqueItems |
True |
||
|
type |
number |
|
|
type |
string |
|
|
type |
string |
|
enum |
tiny, small, medium, big, large, extra-large |
||
|
|||
additionalProperties |
False |
||
Vagrant Flavour#
type |
object |
|
properties |
||
|
type |
number |
|
type |
number |
additionalProperties |
False |
|
Chameleon Configuration Schema#
type |
object |
||
properties |
|||
|
#/resources |
||
|
type |
string |
|
|
type |
string |
|
|
type |
string |
|
|
#/os_allocation_pool |
||
|
type |
boolean |
|
|
type |
array |
|
items |
type |
string |
|
|
type |
boolean |
|
|
type |
string |
|
|
#/os_network |
||
|
#/os_subnet |
||
|
type |
string |
|
|
const |
chameleon |
|
additionalProperties |
True |
||
Chameleon Edge Configuration Schema#
type |
object |
|
properties |
||
|
type |
string |
|
type |
string |
|
walltime in HH:MM format |
|
type |
string |
|
format |
walltime |
|
|
#/resources |
|
|
const |
chameleon-edge |
additionalProperties |
True |
|
FABRIC Configuration Schema#
type |
object |
||
properties |
|||
|
type |
string |
|
|
walltime in HH:MM format. Default to 24:00 |
||
type |
string |
||
format |
walltime |
||
|
Name of the site to deploy the node on. Default to a UCSD. |
||
type |
string |
||
|
Base image to use (default: default_rocky_8) |
||
type |
string |
||
|
Prefix to use for the name of the nodes. Default: fabric |
||
type |
string |
||
|
FABRIC Resource |
||
type |
object |
||
properties |
|||
|
type |
array |
|
items |
|||
minItems |
1 |
||
uniqueItems |
True |
||
|
type |
array |
|
items |
|||
additionalProperties |
False |
||
|
const |
fabric |
|
additionalProperties |
False |
||
FABRIC Network#
type |
object |
||
properties |
|||
|
Labels Schema |
||
A list of labels identify the resources. The values are strings that can’t start with ‘kiso.’ and can contain alphanumeric characters, dots, underscores and hyphens |
|||
type |
array |
||
items |
type |
string |
|
pattern |
^(?!kiso\\.)[a-zA-Z0-9._-]+$ |
||
minItems |
1 |
||
uniqueItems |
True |
||
|
type |
string |
|
minLength |
2 |
||
oneOf |
|||
FABRIC Fabnetv4 Network#
type |
object |
|
properties |
||
|
const |
FABNetv4 |
|
type |
string |
minLength |
3 |
|
|
||
FABRIC Fabnetv6 Network#
type |
object |
|
properties |
||
|
const |
FABNetv6 |
|
type |
string |
minLength |
3 |
|
|
||
FABRIC Fabnetv4Ext Network#
type |
object |
|
properties |
||
|
const |
FABNetv4Ext |
|
type |
string |
minLength |
3 |
|
|
||
FABRIC Fabnetv6Ext Network#
type |
object |
|
properties |
||
|
const |
FABNetv6Ext |
|
type |
string |
minLength |
3 |
|
|
||
FABRIC L2Bridge Network#
type |
object |
|
properties |
||
|
const |
L2Bridge |
|
type |
string |
minLength |
3 |
|
|
type |
string |
format |
ip |
|
|
||
FABRIC L2STS Network#
type |
object |
|
properties |
||
|
const |
L2STS |
|
type |
string |
minLength |
3 |
|
|
type |
string |
minLength |
3 |
|
|
type |
string |
format |
ip |
|
|
||
FABRIC Compute#
type |
object |
||
properties |
|||
|
Name of the site to deploy the node on. Default to UCSD. |
||
type |
string |
||
|
Base image to use |
||
type |
string |
||
|
type |
array |
|
items |
|||
minItems |
1 |
||
|
type |
array |
|
items |
|||
minItems |
1 |
||
|
Labels Schema |
||
A list of labels identify the resources. The values are strings that can’t start with ‘kiso.’ and can contain alphanumeric characters, dots, underscores and hyphens |
|||
type |
array |
||
items |
type |
string |
|
pattern |
^(?!kiso\\.)[a-zA-Z0-9._-]+$ |
||
minItems |
1 |
||
uniqueItems |
True |
||
|
type |
number |
|
|
type |
string |
|
enum |
tiny, small, medium, big, large, extra-large |
||
|
|||
additionalProperties |
False |
||
FABRIC GPU Component#
type |
object |
|
properties |
||
|
enum |
TeslaT4, RTX6000, A30, A40 |
FABRIC Storage Component#
type |
object |
||
oneOf |
properties |
||
|
const |
NVME |
|
|
enum |
P4510 |
|
|
type |
string |
|
minLength |
2 |
||
properties |
|||
|
type |
string |
|
minLength |
2 |
||
|
const |
Storage |
|
|
enum |
NAS |
|
|
type |
boolean |
|
default |
False |
||
FABRIC NIC Component#
type |
object |
||
properties |
|||
|
type |
string |
|
minLength |
2 |
||
oneOf |
properties |
||
|
const |
SharedNIC |
|
|
enum |
ConnectX-6 |
|
properties |
|||
|
const |
SmartNIC |
|
|
enum |
ConnectX-5, ConnectX-6 |
|
FABRIC Flavour#
type |
object |
|
properties |
||
|
Number of cores in the node. Default: 2 cores |
|
type |
integer |
|
|
Amount of ram in the node. Default: 8 GB |
|
type |
integer |
|
|
Amount of disk space n the node. Default: 10 GB |
|
type |
integer |
|
additionalProperties |
False |
|
Apptainer Software Configuration#
Specify on which resources the Apptainer runtime should be installed
type |
object |
|
properties |
||
|
||
|
type |
string |
additionalProperties |
False |
|
Docker Software Configuration#
Specify on which resources the Docker runtime should be installed
type |
object |
|
properties |
||
|
||
|
type |
string |
additionalProperties |
False |
|
Ollama Software Configuration#
type |
array |
items |
|
minItems |
1 |
Ollama Configuration#
Specify on which resources the Ollama service should be installed and what models should be pulled
type |
object |
||
properties |
|||
|
|||
|
A list of Ollama models to be installed |
||
type |
array |
||
items |
type |
string |
|
minItems |
1 |
||
|
|||
additionalProperties |
False |
||
HTCondor Deployment Configuration#
Specify how and on which resources HTCondor should be installed
type |
array |
items |
|
minItems |
1 |
HTCondor Daemon Configuration#
Specify how and on which resources HTCondor should be installed
type |
object |
|
properties |
||
|
Specify which resource will have the central manager and it’s configuration |
|
type |
string |
|
enum |
central-manager, execute, submit, personal |
|
|
||
|
type |
string |
additionalProperties |
False |
|
Shell Experiment Schema#
type |
object |
|
properties |
||
|
const |
shell |
|
A suitable name for the experiment |
|
type |
string |
|
|
A description name for the experiment |
|
type |
string |
|
|
Define all scripts to be executed on the remote machine |
|
type |
array |
|
items |
#/$defs/script |
|
|
Define all output files to be copied from the remote machine |
|
type |
array |
|
items |
#/$defs/location |
|
additionalProperties |
False |
|
Pegasus Workflow Experiment Schema#
type |
object |
||
properties |
|||
|
const |
pegasus |
|
|
A suitable name for the experiment |
||
type |
string |
||
|
A description name for the experiment |
||
type |
string |
||
|
The number of times the experiment should be run |
||
type |
integer |
||
minimum |
1 |
||
default |
1 |
||
|
A script which execute teh experiment |
||
type |
string |
||
|
A list of arguments to be passed to the main script |
||
type |
array |
||
items |
type |
string |
|
|
Checks the status of the experiment every poll_interval seconds |
||
type |
integer |
||
default |
60 |
||
|
If the experiment takes longer than timeout seconds, it is considered failed |
||
type |
integer |
||
default |
600 |
||
|
Define all input files to be copied to the remote machine |
||
type |
array |
||
items |
#/$defs/location |
||
|
Define all setup scripts to be executed on the remote machine |
||
type |
array |
||
items |
|||
|
|||
|
Define all scripts to be executed after the experiment |
||
type |
array |
||
items |
|||
|
Define all output files to be copied from the remote machine |
||
type |
array |
||
items |
#/$defs/location |
||
additionalProperties |
False |
||
setup#
Labels Schema#
A list of labels identify the resources. The values are strings that can contain alphanumeric characters, dots, underscores and hyphens
type |
array |
|
items |
type |
string |
pattern |
^[a-zA-Z0-9._-]+$ |
|
minItems |
1 |
|
uniqueItems |
True |
|
Variables Schema#
A map of variable name and values. The variable names can contain alphanumeric or underscore characters
type |
object |
||
patternProperties |
|||
|
oneOf |
type |
string |
type |
integer |
||
type |
number |
||
additionalProperties |
False |
||
Shell Script Schema#
type |
object |
|
properties |
||
|
||
|
The executable (shebang) to be used to run the script |
|
type |
string |
|
default |
/bin/bash |
|
|
The script to be executed |
|
type |
string |
|
additionalProperties |
False |
|
File Upload/Download Location Schema#
type |
object |
|
properties |
||
|
||
|
The src file to be copied |
|
type |
string |
|
|
The dst where the src should be copied too. This must be a directory |
|
type |
string |
|
additionalProperties |
False |
|