Concepts#
Labels#
Labels are user-defined strings that you assign to machines when provisioning sites. They act as selectors — every other section (software, deployment, experiments) references machines by label rather than by name or IP address.
This indirection means your software and experiment configuration stays the same regardless of which testbed you run on; only the sites section changes.
How they work
Assign one or more labels to a machine in the
sitessection.Reference those labels in
software,deployment, andexperimentsto target those machines.
sites:
- kind: vagrant
resources:
machines:
- labels:
- submit # you define this name
flavour: large
number: 1
- labels:
- execute # and this one
flavour: large
number: 2
software:
docker:
labels:
- submit # install Docker only on machines tagged "submit"
deployment:
htcondor:
- kind: submit
labels:
- submit # configure HTCondor submit node on "submit" machines
- kind: execute
labels:
- execute # configure HTCondor execute nodes on "execute" machines
experiments:
- kind: pegasus
submit_node_labels:
- submit # run the workflow from the "submit" machine
A machine can have multiple labels, and a label can match multiple machines (e.g., number: 3 with labels: [execute] gives you three execute nodes, all reachable by the execute label).
Sites#
In this section, we define the resources to be provisioned on the different sites/testbeds for the experiment. Currently we support the following testbeds.
FABRIC#
Note
pip install kiso[fabric] # Install Kiso with FABRIC
Example#
sites:
- kind: fabric
rc_file: secrets/fabric_rc
walltime: "02:00:00"
resources:
machines:
- labels:
- submit
site: FIU
image: default_rocky_8
flavour: big
number: 1
gpus:
- model: TeslaT4
storage:
- kind: NVME
model: P4510
mount_point: /mnt/nvme
- kind: Storage
model: NAS
name: kiso-fabric-integration
auto_mount: true
networks:
- labels:
- v4
kind: FABNetv4
site: FIU
nic:
kind: SharedNIC
model: ConnectX-6
Hint
For a complete schema reference see FABRIC Configuration Schema
Vagrant#
Note
pip install kiso[vagrant] # Install Kiso with Vagrant
Example#
sites:
- kind: vagrant
backend: virtualbox
box: bento/rockylinux-9
user: vagrant
config_extra: 'config.vm.synced_folder ".", "/vagrant", disabled: true'
resources:
machines:
- labels:
- execute
backend: virtualbox
box: bento/rockylinux-9
user: vagrant
flavour: "large"
number: 2
networks:
- labels:
- r1
cidr: "172.16.42.0/16"
Hint
For a complete schema reference see Vagrant Configuration Schema
Chameleon#
Note
pip install kiso[chameleon] # Install Kiso with Chameleon
Example#
sites:
- kind: chameleon
walltime: "04:00:00"
lease_name: tacc-lease
rc_file: secrets/chi-tacc-app-cred-openrc.sh
key_name: mayani-mac-mini
image: CC-Ubuntu18.04
resources:
machines:
- labels:
- submit
flavour: compute_zen3
number: 2
image: CC-Ubuntu22.04
networks:
- sharednet1
Hint
For a complete schema reference see Chameleon Configuration Schema
Chameleon Edge#
Note
pip install kiso[chameleon] # Install Kiso with Chameleon
Example#
sites:
- kind: chameleon-edge
walltime: "04:00:00"
lease_name: edge-lease
rc_file: secrets/chi-edge-app-cred-openrc.sh
resources:
machines:
- labels:
- central-manager
machine_name: raspberrypi4-64
count: 1
container:
name: execute
image: rockylinux:8
Hint
For a complete schema reference see Chameleon Edge Configuration Schema
Software#
In this section, we define the software to be installed on the provisioned resources. Currently, we support installing Docker, Apptainer, and Ollama.
Apptainer#
Example#
software:
apptainer:
labels:
- submit
Hint
For a complete schema reference see Apptainer Software Configuration
Docker#
Example#
software:
docker:
labels:
- submit
Hint
For a complete schema reference see Docker Software Configuration
Ollama#
Example#
software:
ollama:
- labels:
- large-model
models:
- gpt-oss:20b
environment:
OLLAMA_MAX_QUEUE: 512
- labels:
- small-model
models:
- qwen3.5:2b
environment:
OLLAMA_CONTEXT_LENGTH: 8192
Hint
For a complete schema reference see Ollama Software Configuration
Deployment#
In this section, we define the cluster to be deployed on the provisioned resources. Currently, we support deploying HTCondor.
HTCondor#
Example#
deployment:
htcondor:
- kind: central-manager
labels:
- central-manager
# Optionally, define a custom Condor configuration file
# config_file: config/cm-condor_config
# Optionally, define on or more execute nodes configurations
- kind: execute
labels:
- execute
# Optionally, define a custom Condor configuration file
# config_file: config/exec-condor_config
# Optionally, define on or more execute nodes configurations
- kind: submit
labels:
- submit
# Optionally, define a custom Condor configuration file
# config_file: config/submit-condor_config
# Optionally, define one or more personal HTCondor nodes configurations
- kind: personal
labels:
- edge-1
# Optionally, define a custom Condor configuration file
# config_file: config/personal-condor_config
Hint
For a complete schema reference see HTCondor Deployment Configuration
Experiments#
In this section, we define the experiments to be run on the provisioned resources. Currently we support the following experiment types.
Shell#
Example#
experiments:
- kind: shell
name: shell-experiment
description: An experiment to print a message
# Optionally, specify output files and on which node to copy them from after the experiment
inputs:
- labels:
- submit
src: name.txt
dst: ~kiso
# Specify what scripts to run and on which node to run them on
scripts:
- labels:
- submit
script: |
#!/bin/bash
echo "Hello, world!" | tee hello.txt
# Optionally, specify output files and on which node to copy them from after the experiment
outputs:
- labels:
- submit
src: hello.txt
dst: output
Hint
For a complete schema reference see Shell Experiment Schema
Pegasus#
Example#
experiments:
- kind: pegasus
name: process-experiment
description: A Pegasus workflow
# Number of time to run the experiment
count: 1
# Script to run the Pegasus workflow
main: bin/main.sh
# The node from which the workflow will be submitted
submit_node_labels:
- submit
# Optionally, specify input files and on which node to copy them on to setup the environment
# By default, the directory containing the experiment.yml file will be copied to all provisioned nodes
inputs:
- labels:
- execute
src: README.md
dst: ~kiso/kiso-process-experiment
# Optionally, specify what scripts to run and on which node to run them on to setup the environment
setup:
- labels:
- submit
executable: /bin/bash
script: |
#!/bin/bash
echo "Setup script here"
# Optionally, specify what scripts to run and on which node to run them on after the environment
post_scripts:
- labels:
- submit
executable: /bin/bash
script: |
#!/bin/bash
echo "Post script here"
# Optionally, specify output files and on which node to copy them from after the experiment
# By default, the Pegasus workflow submit directory will be copied to the local machine
outputs:
- labels:
- submit
src: ~kiso/kiso-process-experiment
dst: local-machine
Hint
For a complete schema reference see Pegasus Workflow Experiment Schema
Advanced Multi-Site Experiment#
# -------------------------------------------------------------------------
#
# Clone the repository from GitHub,
# git clone https://github.com/pegasus-isi/kiso-plankifier-experiment.git
# Install Kiso and its dependencies,
# pip install kiso[chameleon]
# Check the experiment configuration.
# kiso check
# Set up the experiment.
# kiso up
# Run the experiment.
# kiso run
# Destroy the experiment.
# kiso down
# See: https://github.com/pegasus-isi/kiso-plankifier-experiment.README.md
#
# -------------------------------------------------------------------------
name: plankifier-experiment
deployment:
htcondor:
- kind: central-manager
labels:
- central-manager-daemon
- kind: submit
labels:
- submit-daemon
- kind: execute
labels:
- execute-cloud-daemon
- kind: execute
labels:
- execute-edge-daemon
config_file: config/execute.conf
software:
apptainer:
labels:
- execute-cloud-daemon
sites:
- kind: chameleon-edge
walltime: "04:00:00"
lease_name: edge-lease
rc_file: secrets/edge-app-cred-oac-edge-openrc.sh
resources:
machines:
- labels:
- execute-edge-daemon
machine_name: raspberrypi4-64
count: 1
container:
name: execute
image: pegasus/plankifier
- kind: chameleon
walltime: "04:00:00"
lease_name: tacc-lease
rc_file: secrets/tacc-app-cred-oac-edge-openrc.sh
key_name: mayani-mac-mini
image: CC-Ubuntu18.04
resources:
machines:
- labels:
- central-manager-daemon
- submit-daemon
- execute-cloud-daemon
flavour: compute_zen3
number: 1
image: CC-Ubuntu22.04
networks:
- sharednet1
experiments:
- kind: pegasus
name: plankifier-experiment
count: 1
main: ./workflow.py
submit_node_labels:
- submit-daemon
inputs:
- labels:
- execute-edge-daemon
src: bin/train.py
dst: /srv/plankifier/
- labels:
- execute-edge-daemon
src: bin/predict.py
dst: /srv/plankifier/
setup:
- labels:
- submit-daemon
script: |
chmod +x workflow.py
- labels:
- execute-edge-daemon
script: |
chmod +x /srv/plankifier/train.py /srv/plankifier/predict.py
outputs:
- labels:
- submit-daemon
src: ~kiso/kiso-plankifier-experiment/output/count.txt
dst: ./