Concepts#

Labels#

Labels are user-defined strings that you assign to machines when provisioning sites. They act as selectors — every other section (software, deployment, experiments) references machines by label rather than by name or IP address.

This indirection means your software and experiment configuration stays the same regardless of which testbed you run on; only the sites section changes.

How they work

  1. Assign one or more labels to a machine in the sites section.

  2. Reference those labels in software, deployment, and experiments to target those machines.

sites:
  - kind: vagrant
    resources:
      machines:
        - labels:
            - submit    # you define this name
          flavour: large
          number: 1
        - labels:
            - execute   # and this one
          flavour: large
          number: 2

software:
  docker:
    labels:
      - submit          # install Docker only on machines tagged "submit"

deployment:
  htcondor:
    - kind: submit
      labels:
        - submit        # configure HTCondor submit node on "submit" machines
    - kind: execute
      labels:
        - execute       # configure HTCondor execute nodes on "execute" machines

experiments:
  - kind: pegasus
    submit_node_labels:
      - submit          # run the workflow from the "submit" machine

A machine can have multiple labels, and a label can match multiple machines (e.g., number: 3 with labels: [execute] gives you three execute nodes, all reachable by the execute label).

Sites#

In this section, we define the resources to be provisioned on the different sites/testbeds for the experiment. Currently we support the following testbeds.

FABRIC#

Note

pip install kiso[fabric] # Install Kiso with FABRIC

Example#

sites:
  - kind: fabric
    rc_file: secrets/fabric_rc
    walltime: "02:00:00"
    resources:
      machines:
        - labels:
            - submit
          site: FIU
          image: default_rocky_8
          flavour: big
          number: 1
          gpus:
            - model: TeslaT4
          storage:
            - kind: NVME
              model: P4510
              mount_point: /mnt/nvme
            - kind: Storage
              model: NAS
              name: kiso-fabric-integration
              auto_mount: true
      networks:
        - labels:
            - v4
          kind: FABNetv4
          site: FIU
          nic:
            kind: SharedNIC
            model: ConnectX-6

Hint

For a complete schema reference see FABRIC Configuration Schema

Vagrant#

Note

pip install kiso[vagrant] # Install Kiso with Vagrant

Example#

sites:
  - kind: vagrant
    backend: virtualbox
    box: bento/rockylinux-9
    user: vagrant
    config_extra: 'config.vm.synced_folder ".", "/vagrant", disabled: true'
    resources:
      machines:
        - labels:
            - execute
          backend: virtualbox
          box: bento/rockylinux-9
          user: vagrant
          flavour: "large"
          number: 2
      networks:
        - labels:
            - r1
          cidr: "172.16.42.0/16"

Hint

For a complete schema reference see Vagrant Configuration Schema

Chameleon#

Note

pip install kiso[chameleon] # Install Kiso with Chameleon

Example#

sites:
  - kind: chameleon
    walltime: "04:00:00"
    lease_name: tacc-lease
    rc_file: secrets/chi-tacc-app-cred-openrc.sh
    key_name: mayani-mac-mini
    image: CC-Ubuntu18.04
    resources:
      machines:
        - labels:
            - submit
          flavour: compute_zen3
          number: 2
          image: CC-Ubuntu22.04
      networks:
        - sharednet1

Hint

For a complete schema reference see Chameleon Configuration Schema

Chameleon Edge#

Note

pip install kiso[chameleon] # Install Kiso with Chameleon

Example#

sites:
  - kind: chameleon-edge
    walltime: "04:00:00"
    lease_name: edge-lease
    rc_file: secrets/chi-edge-app-cred-openrc.sh
    resources:
      machines:
        - labels:
            - central-manager
          machine_name: raspberrypi4-64
          count: 1
          container:
            name: execute
            image: rockylinux:8

Hint

For a complete schema reference see Chameleon Edge Configuration Schema

Software#

In this section, we define the software to be installed on the provisioned resources. Currently, we support installing Docker, Apptainer, and Ollama.

Apptainer#

Example#

software:
  apptainer:
    labels:
      - submit

Hint

For a complete schema reference see Apptainer Software Configuration

Docker#

Example#

software:
  docker:
    labels:
      - submit

Hint

For a complete schema reference see Docker Software Configuration

Ollama#

Example#

software:
  ollama:
    - labels:
        - large-model
      models:
        - gpt-oss:20b
      environment:
        OLLAMA_MAX_QUEUE: 512

    - labels:
        - small-model
      models:
        - qwen3.5:2b
      environment:
        OLLAMA_CONTEXT_LENGTH: 8192

Hint

For a complete schema reference see Ollama Software Configuration

Deployment#

In this section, we define the cluster to be deployed on the provisioned resources. Currently, we support deploying HTCondor.

HTCondor#

Example#

deployment:
  htcondor:
    - kind: central-manager
      labels:
        - central-manager
      # Optionally, define a custom Condor configuration file
      # config_file: config/cm-condor_config

    # Optionally, define on or more execute nodes configurations
    - kind: execute
      labels:
        - execute
      # Optionally, define a custom Condor configuration file
      # config_file: config/exec-condor_config

    # Optionally, define on or more execute nodes configurations
    - kind: submit
      labels:
        - submit
      # Optionally, define a custom Condor configuration file
      # config_file: config/submit-condor_config

    # Optionally, define one or more personal HTCondor nodes configurations
    - kind: personal
      labels:
        - edge-1
    #   Optionally, define a custom Condor configuration file
    #   config_file: config/personal-condor_config

Hint

For a complete schema reference see HTCondor Deployment Configuration

Experiments#

In this section, we define the experiments to be run on the provisioned resources. Currently we support the following experiment types.

Shell#

Example#

experiments:
  - kind: shell
    name: shell-experiment
    description: An experiment to print a message
    # Optionally, specify output files and on which node to copy them from after the experiment
    inputs:
      - labels:
          - submit
        src: name.txt
        dst: ~kiso

    # Specify what scripts to run and on which node to run them on
    scripts:
      - labels:
          - submit
        script: |
          #!/bin/bash
          echo "Hello, world!" | tee hello.txt

    # Optionally, specify output files and on which node to copy them from after the experiment
    outputs:
      - labels:
          - submit
        src: hello.txt
        dst: output

Hint

For a complete schema reference see Shell Experiment Schema

Pegasus#

Example#

experiments:
  - kind: pegasus
    name: process-experiment
    description: A Pegasus workflow
    # Number of time to run the experiment
    count: 1
    # Script to run the Pegasus workflow
    main: bin/main.sh
    # The node from which the workflow will be submitted
    submit_node_labels:
      - submit

    # Optionally, specify input files and on which node to copy them on to setup the environment
    # By default, the directory containing the experiment.yml file will be copied to all provisioned nodes
    inputs:
      - labels:
          - execute
        src: README.md
        dst: ~kiso/kiso-process-experiment

    # Optionally, specify what scripts to run and on which node to run them on to setup the environment
    setup:
      - labels:
          - submit
        executable: /bin/bash
        script: |
          #!/bin/bash
          echo "Setup script here"

    # Optionally, specify what scripts to run and on which node to run them on after the environment
    post_scripts:
      - labels:
          - submit
        executable: /bin/bash
        script: |
          #!/bin/bash
          echo "Post script here"

    # Optionally, specify output files and on which node to copy them from after the experiment
    # By default, the Pegasus workflow submit directory will be copied to the local machine
    outputs:
      - labels:
          - submit
        src: ~kiso/kiso-process-experiment
        dst: local-machine

Hint

For a complete schema reference see Pegasus Workflow Experiment Schema

Advanced Multi-Site Experiment#

# -------------------------------------------------------------------------
#
# Clone the repository from GitHub,
#   git clone https://github.com/pegasus-isi/kiso-plankifier-experiment.git
# Install Kiso and its dependencies,
#   pip install kiso[chameleon]
# Check the experiment configuration.
#   kiso check
# Set up the experiment.
#   kiso up
# Run the experiment.
#   kiso run
# Destroy the experiment.
#   kiso down
# See: https://github.com/pegasus-isi/kiso-plankifier-experiment.README.md
#
# -------------------------------------------------------------------------
name: plankifier-experiment

deployment:
  htcondor:
    - kind: central-manager
      labels:
        - central-manager-daemon
    - kind: submit
      labels:
        - submit-daemon
    - kind: execute
      labels:
        - execute-cloud-daemon
    - kind: execute
      labels:
        - execute-edge-daemon
      config_file: config/execute.conf

software:
  apptainer:
    labels:
      - execute-cloud-daemon

sites:
  - kind: chameleon-edge
    walltime: "04:00:00"
    lease_name: edge-lease
    rc_file: secrets/edge-app-cred-oac-edge-openrc.sh
    resources:
      machines:
        - labels:
            - execute-edge-daemon
          machine_name: raspberrypi4-64
          count: 1
          container:
            name: execute
            image: pegasus/plankifier

  - kind: chameleon
    walltime: "04:00:00"
    lease_name: tacc-lease
    rc_file: secrets/tacc-app-cred-oac-edge-openrc.sh
    key_name: mayani-mac-mini
    image: CC-Ubuntu18.04
    resources:
      machines:
        - labels:
            - central-manager-daemon
            - submit-daemon
            - execute-cloud-daemon
          flavour: compute_zen3
          number: 1
          image: CC-Ubuntu22.04
      networks:
        - sharednet1

experiments:
  - kind: pegasus
    name: plankifier-experiment
    count: 1
    main: ./workflow.py
    submit_node_labels:
      - submit-daemon
    inputs:
      - labels:
          - execute-edge-daemon
        src: bin/train.py
        dst: /srv/plankifier/
      - labels:
          - execute-edge-daemon
        src: bin/predict.py
        dst: /srv/plankifier/
    setup:
      - labels:
          - submit-daemon
        script: |
          chmod +x workflow.py
      - labels:
          - execute-edge-daemon
        script: |
          chmod +x /srv/plankifier/train.py /srv/plankifier/predict.py
    outputs:
      - labels:
          - submit-daemon
        src: ~kiso/kiso-plankifier-experiment/output/count.txt
        dst: ./