Experiment Configuration#

Concepts#

Sites#

In this section, we define the resources to be provisioned on the different sites/testbeds for the experiment.

Software#

In this section, we define the software to be installed on the provisioned resources. Currently, we support installing Docker, Apptainer, and Ollama.

Deployment#

In this section, we define the cluster to be deployed on the provisioned resources. Currently, we support deploying HTCondor.

Experiments#

In this section, we define the experiments to be run on the provisioned resources.

Example#

# -------------------------------------------------------------------------
#
# Clone the repository from GitHub,
#   git clone https://github.com/pegasus-isi/kiso-plankifier-experiment.git
# Install Kiso and its dependencies,
#   pip install kiso[chameleon]
# Check the experiment configuration.
#   kiso check
# Set up the experiment.
#   kiso up
# Run the experiment.
#   kiso run
# Destroy the experiment.
#   kiso down
# See: https://github.com/pegasus-isi/kiso-plankifier-experiment.README.md
#
# -------------------------------------------------------------------------
name: plankifier-experiment

deployment:
  htcondor:
    - kind: central-manager
      labels:
        - central-manager-daemon
    - kind: submit
      labels:
        - submit-daemon
    - kind: execute
      labels:
        - execute-cloud-daemon
    - kind: execute
      labels:
        - execute-edge-daemon
      config_file: config/execute.conf

software:
  apptainer:
    labels:
      - execute-cloud-daemon

sites:
  - kind: chameleon-edge
    walltime: "04:00:00"
    lease_name: edge-lease
    rc_file: secrets/edge-app-cred-oac-edge-openrc.sh
    resources:
      machines:
        - labels:
            - execute-edge-daemon
          machine_name: raspberrypi4-64
          count: 1
          container:
            name: execute
            image: pegasus/plankifier

  - kind: chameleon
    walltime: "04:00:00"
    lease_name: tacc-lease
    rc_file: secrets/tacc-app-cred-oac-edge-openrc.sh
    key_name: mayani-mac-mini
    image: CC-Ubuntu18.04
    resources:
      machines:
        - labels:
            - central-manager-daemon
            - submit-daemon
            - execute-cloud-daemon
          flavour: compute_zen3
          number: 1
          image: CC-Ubuntu22.04
      networks:
        - sharednet1

experiments:
  - kind: pegasus
    name: plankifier-experiment
    count: 1
    main: ./workflow.py
    submit_node_labels:
      - submit-daemon
    inputs:
      - labels:
          - execute-edge-daemon
        src: bin/train.py
        dst: /srv/plankifier/
      - labels:
          - execute-edge-daemon
        src: bin/predict.py
        dst: /srv/plankifier/
    setup:
      - labels:
          - submit-daemon
        script: |
          chmod +x workflow.py
      - labels:
          - execute-edge-daemon
        script: |
          chmod +x /srv/plankifier/train.py /srv/plankifier/predict.py
    outputs:
      - labels:
          - submit-daemon
        src: ~kiso/kiso-plankifier-experiment/output/count.txt
        dst: ./

Schema#

Kiso experiment configuration#

type

object

properties

  • name

A suitable name for the experiment

type

string

  • sites

Define all the resources to be provisioned

type

array

items

Site Definition

minItems

1

  • software

Software to be installed on the resources

type

object

properties

  • apptainer

Apptainer Software Configuration

  • docker

Docker Software Configuration

  • ollama

Ollama Software Configuration

additionalProperties

False

  • deployment

Workload management system to be installed on the resources

type

object

properties

  • htcondor

HTCondor Deployment Configuration

additionalProperties

False

  • experiments

Define all the experiments to be executed

type

array

items

Experiment Definition

minItems

1

additionalProperties

False

Site Definition#

Experiment Definition#

htcondor#

Specify how and on which resources HTCondor should be installed

type

object

properties

  • kind

Specify which resource will have the central manager and it’s configuration

type

string

enum

central-manager, execute, submit, personal

  • labels

Labels Schema

  • config_file

type

string

additionalProperties

False

Vagrant Configuration Schema#

type

object

properties

  • backend

Default VM hypervisor to use (default: libvirt)

type

string

enum

libvirt, virtualbox

  • box

Base image to use (default: generic/debian11)

type

string

  • user

SSH user to use (default: root)

type

string

  • name_prefix

Prepend this prefix to box names

type

string

  • config_extra

Extra config to pass (in vagrant DSL)

type

string

  • resources

Vagrant Resource

type

object

properties

  • networks

type

array

items

Vagrant Network

uniqueItems

True

  • machines

type

array

items

Vagrant Compute

additionalProperties

False

  • kind

const

vagrant

additionalProperties

False

Vagrant Network#

type

object

properties

  • cidr

type

string

  • roles

Labels Schema

A list of labels identify the resources. The values are strings that can’t start with ‘kiso.’ and can contain alphanumeric characters, dots, underscores and hyphens

type

array

items

type

string

pattern

^(?!kiso\\.)[a-zA-Z0-9._-]+$

minItems

1

uniqueItems

True

additionalProperties

False

Vagrant Compute#

type

object

properties

  • backend

VM hypervisor to use

type

string

enum

libvirt, virtualbox

  • box

Base image to use

type

string

  • user

SSH user to use

type

string

  • config_extra_vm

Extra config to pass (in vagrant DSL)

type

string

  • roles

Labels Schema

A list of labels identify the resources. The values are strings that can’t start with ‘kiso.’ and can contain alphanumeric characters, dots, underscores and hyphens

type

array

items

type

string

pattern

^(?!kiso\\.)[a-zA-Z0-9._-]+$

minItems

1

uniqueItems

True

  • number

type

number

  • name_prefix

type

string

  • flavour

type

string

enum

tiny, small, medium, big, large, extra-large

  • flavour_desc

Vagrant Flavour

additionalProperties

False

Vagrant Flavour#

type

object

properties

  • core

type

number

  • mem

type

number

additionalProperties

False

Chameleon Configuration Schema#

type

object

properties

  • resources

#/resources

  • key_name

type

string

  • image

type

string

  • user

type

string

  • allocation_pool

#/os_allocation_pool

  • configure_network

type

boolean

  • dns_nameservers

type

array

items

type

string

  • gateway

type

boolean

  • gateway_user

type

string

  • network

#/os_network

  • subnet

#/os_subnet

  • prefix

type

string

  • kind

const

chameleon

additionalProperties

True

Chameleon Edge Configuration Schema#

type

object

properties

  • lease_name

type

string

  • rc_file

type

string

  • walltime

walltime in HH:MM format

type

string

format

walltime

  • resources

#/resources

  • kind

const

chameleon-edge

additionalProperties

True

FABRIC Configuration Schema#

type

object

properties

  • rc_file

type

string

  • walltime

walltime in HH:MM format. Default to 24:00

type

string

format

walltime

  • site

Name of the site to deploy the node on. Default to a UCSD.

type

string

  • image

Base image to use (default: default_rocky_8)

type

string

  • name_prefix

Prefix to use for the name of the nodes. Default: fabric

type

string

  • resources

FABRIC Resource

type

object

properties

  • networks

type

array

items

FABRIC Network

minItems

1

uniqueItems

True

  • machines

type

array

items

FABRIC Compute

additionalProperties

False

  • kind

const

fabric

additionalProperties

False

FABRIC Network#

type

object

properties

  • roles

Labels Schema

A list of labels identify the resources. The values are strings that can’t start with ‘kiso.’ and can contain alphanumeric characters, dots, underscores and hyphens

type

array

items

type

string

pattern

^(?!kiso\\.)[a-zA-Z0-9._-]+$

minItems

1

uniqueItems

True

  • name

type

string

minLength

2

oneOf

FABRIC Fabnetv4 Network

FABRIC Fabnetv6 Network

FABRIC Fabnetv4Ext Network

FABRIC Fabnetv6Ext Network

FABRIC L2Bridge Network

FABRIC L2STS Network

FABRIC Fabnetv4 Network#

type

object

properties

  • kind

const

FABNetv4

  • site

type

string

minLength

3

  • nic

FABRIC NIC Component

FABRIC Fabnetv6 Network#

type

object

properties

  • kind

const

FABNetv6

  • site

type

string

minLength

3

  • nic

FABRIC NIC Component

FABRIC Fabnetv4Ext Network#

type

object

properties

  • kind

const

FABNetv4Ext

  • site

type

string

minLength

3

  • nic

FABRIC NIC Component

FABRIC Fabnetv6Ext Network#

type

object

properties

  • kind

const

FABNetv6Ext

  • site

type

string

minLength

3

  • nic

FABRIC NIC Component

FABRIC L2Bridge Network#

type

object

properties

  • kind

const

L2Bridge

  • site

type

string

minLength

3

  • cidr

type

string

format

ip

  • nic

FABRIC NIC Component

FABRIC L2STS Network#

type

object

properties

  • kind

const

L2STS

  • site_1

type

string

minLength

3

  • site_2

type

string

minLength

3

  • cidr

type

string

format

ip

  • nic

FABRIC NIC Component

FABRIC Compute#

type

object

properties

  • site

Name of the site to deploy the node on. Default to UCSD.

type

string

  • image

Base image to use

type

string

  • gpus

type

array

items

FABRIC GPU Component

minItems

1

  • storage

type

array

items

FABRIC Storage Component

minItems

1

  • roles

Labels Schema

A list of labels identify the resources. The values are strings that can’t start with ‘kiso.’ and can contain alphanumeric characters, dots, underscores and hyphens

type

array

items

type

string

pattern

^(?!kiso\\.)[a-zA-Z0-9._-]+$

minItems

1

uniqueItems

True

  • number

type

number

  • flavour

type

string

enum

tiny, small, medium, big, large, extra-large

  • flavour_desc

FABRIC Flavour

additionalProperties

False

FABRIC GPU Component#

type

object

properties

  • model

enum

TeslaT4, RTX6000, A30, A40

FABRIC Storage Component#

type

object

oneOf

properties

  • kind

const

NVME

  • model

enum

P4510

  • mount_point

type

string

minLength

2

properties

  • name

type

string

minLength

2

  • kind

const

Storage

  • model

enum

NAS

  • auto_mount

type

boolean

default

False

FABRIC NIC Component#

type

object

properties

  • name

type

string

minLength

2

oneOf

properties

  • kind

const

SharedNIC

  • model

enum

ConnectX-6

properties

  • kind

const

SmartNIC

  • model

enum

ConnectX-5, ConnectX-6

FABRIC Flavour#

type

object

properties

  • core

Number of cores in the node. Default: 2 cores

type

integer

  • mem

Amount of ram in the node. Default: 8 GB

type

integer

  • disk

Amount of disk space n the node. Default: 10 GB

type

integer

additionalProperties

False

Apptainer Software Configuration#

Specify on which resources the Apptainer runtime should be installed

type

object

properties

  • labels

Labels Schema

  • version

type

string

additionalProperties

False

Docker Software Configuration#

Specify on which resources the Docker runtime should be installed

type

object

properties

  • labels

Labels Schema

  • version

type

string

additionalProperties

False

Ollama Software Configuration#

type

array

items

Ollama Configuration

minItems

1

Ollama Configuration#

Specify on which resources the Ollama service should be installed and what models should be pulled

type

object

properties

  • labels

Labels Schema

  • models

A list of Ollama models to be installed

type

array

items

type

string

minItems

1

  • environment

Variables Schema

additionalProperties

False

HTCondor Deployment Configuration#

Specify how and on which resources HTCondor should be installed

type

array

items

HTCondor Daemon Configuration

minItems

1

HTCondor Daemon Configuration#

Specify how and on which resources HTCondor should be installed

type

object

properties

  • kind

Specify which resource will have the central manager and it’s configuration

type

string

enum

central-manager, execute, submit, personal

  • labels

Labels Schema

  • config_file

type

string

additionalProperties

False

Shell Experiment Schema#

type

object

properties

  • kind

const

shell

  • name

A suitable name for the experiment

type

string

  • description

A description name for the experiment

type

string

  • scripts

Define all scripts to be executed on the remote machine

type

array

items

#/$defs/script

  • outputs

Define all output files to be copied from the remote machine

type

array

items

#/$defs/location

additionalProperties

False

Pegasus Workflow Experiment Schema#

type

object

properties

  • kind

const

pegasus

  • name

A suitable name for the experiment

type

string

  • description

A description name for the experiment

type

string

  • count

The number of times the experiment should be run

type

integer

minimum

1

default

1

  • main

A script which execute teh experiment

type

string

  • args

A list of arguments to be passed to the main script

type

array

items

type

string

  • poll_interval

Checks the status of the experiment every poll_interval seconds

type

integer

default

60

  • timeout

If the experiment takes longer than timeout seconds, it is considered failed

type

integer

default

600

  • inputs

Define all input files to be copied to the remote machine

type

array

items

#/$defs/location

  • setup

Define all setup scripts to be executed on the remote machine

type

array

items

setup

  • submit_node_labels

Labels Schema

  • post_scripts

Define all scripts to be executed after the experiment

type

array

items

setup

  • outputs

Define all output files to be copied from the remote machine

type

array

items

#/$defs/location

additionalProperties

False

setup#

Labels Schema#

A list of labels identify the resources. The values are strings that can contain alphanumeric characters, dots, underscores and hyphens

type

array

items

type

string

pattern

^[a-zA-Z0-9._-]+$

minItems

1

uniqueItems

True

Variables Schema#

A map of variable name and values. The variable names can contain alphanumeric or underscore characters

type

object

patternProperties

  • ^[a-zA-Z0-9_]+$

oneOf

type

string

type

integer

type

number

additionalProperties

False

Shell Script Schema#

type

object

properties

  • labels

Labels Schema

  • executable

The executable (shebang) to be used to run the script

type

string

default

/bin/bash

  • script

The script to be executed

type

string

additionalProperties

False

File Upload/Download Location Schema#

type

object

properties

  • labels

Labels Schema

  • src

The src file to be copied

type

string

  • dst

The dst where the src should be copied too. This must be a directory

type

string

additionalProperties

False