Fuzzfile Syntax Guide

Workflows are described by Fuzzfiles which are codified as mostly YAML 1.2 files (can’t have multiple documents in a file) and have specific fields and syntax that are used when defining them. Below is a hierarchical tree of the fields that can be used to define a Fuzzfile, as they should be organized.

# $ fuzzball workflow validate workflow-syntax-example.yaml # this example, named
# "workflow-syntax-example.yaml" has been validated.

version: v1
volumes:
  ephemeral-volume-name:
    type: EPHEMERAL
    ingress:
      - source:
          uri: s3://bucket-name-1/information.tar.gz
          secret: secret://account/ACCOUNT_S3_SECRET
        destination:
          uri: file://inputs/information.tar.gz
      - source:
          uri: https://website.com/data/other-information.tar.gz
        destination:
          uri: file://other-information.tar.gz
    egress:
      - source:
          uri: file://results/new-information.tar.gz
        destination:
          uri: s3://bucket-name-1/new-information.tar.gz
          secret: secret://account/ACCOUNT_S3_SECRET
      - source:
          uri: file://other-new-data.tar.gz
        destination:
          uri: s3://bucket-name-2/other-new-data.tar.gz
          secret: secret://account/ACCOUNT_S3_SECRET
jobs:
  one:
    image:
      uri: docker://your.registry.com/path/to/container-one:stable
      secret: secret://account/ACCOUNT_REGISTRY_SECRET
    command: [python3, script.py, /data/inputs/information.tar.gz]
    mounts:
      ephemeral-volume-name:
        location: /data
    policy:
      timeout:
        execute: 2h
      retry:
        attempts: 1
    resource:
      annotations:
        gpu_compute_capability: sm80
        cpu_arch: epyc
      cpu:
        cores: 32
        affinity: NUMA
        sockets: 1
        threads: false
      memory:
        size: 92GB
        by-core: false
      devices:
        nvidia.com/gpu: 1
        cerebras.net/wse2: 1
  two-a:
    image:
      uri: docker://your.registry.com/path/to/container-two-a:stable
      secret: secret://account/ACCOUNT_REGISTRY_SECRET
    command: [tar, -xf, --no-replace-root, /data/other-information.tar.gz]
    requires: [one]
    mounts:
      ephemeral-volume-name:
        location: /data
  two-b:
    image:
      uri: docker://alpine:latest
    command: ['/bin/sh', '-c', 'software-executable --command-line-option /data/other-information']
    requires: [one]
    mounts:
      ephemeral-volume-name:
        location: /data
    resource:
      cpu:
        cores: 2
        affinity: CORE
        sockets: 1
        threads: false
      memory:
        size: 6GB
        by-core: false
      devices:
        nvidia.com/gpu: 1
    multinode:
      implementation: openmpi
      nodes: 2
  three:
    image:
      uri: docker://alpine:latest
    command:
      [mpirun, -np, 8, mpi-program, /data/newly-generated-directory/outputs-from-prior-workflow-job]
    requires: [two-a, two-b]
    mounts:
      ephemeral-volume-name:
        location: /data
    resource:
      cpu:
        cores: 2
        affinity: NUMA
        sockets: 1
        threads: false
      memory:
        size: 10GB
        by-core: false
    task-array:
      start: 1
      end: 1000
      concurrency: 50
  four:
    image:
      uri: docker://your.registry.com/path/to/container-three:stable
      secret: secret://account/ACCOUNT_REGISTRY_SECRET
    command:
      [
        '/bin/sh',
        '-c',
        'mkdir results && tar -czvf /data/results/new-information.tar.gz /data/outputs/ && tar -czvf /data/other-new-data.tar.gz /data/newly-generated-directory/outputs-from-prior-workflow-job',
      ]
    requires: [three]
    mounts:
      ephemeral-volume-name:
        location: /data

version: Denotes the workflow syntax version.
volumes: Volumes denote the storage volumes that jobs can interact with. This section contains the specifications for each volume as well as all the workflow data ingress and egress.
- : A custom name for the volume that’s used to specify it to the mounts: section of workflow jobs. Would be, for example, something like v1: or data:.
  - type: Type declares the type of the volume being created. There is one possible type: EPHEMERAL. An ephemeral volume persists for the life of the workflow run - anything not egressed at the end of a workflow from an ephemeral volume will be lost as the volume is destroyed when the workflow ends. Ephemeral volumes should be used in cases like when data is stored elsewhere than on-prem storage and will need to be downloaded - like when using S3 or a similar cloud object storage provider.
  - ingress: Declares the section that defines data movement into the workflow. ingress is used mostly to declare locations for data to be downloaded into an ephemeral volume from. A good example of an ingress is pulling a dataset hosted on S3 into a workflow being managed by Fuzzball.
    - source: A blank line starting a description of the source of a file being ingressed.
      - uri: The actual path to the data being moved into the workflow via the ingress. Can be of the form s3:// or http[s]://, ex. s3://bucket-name/information.tar.gz or https://website.com/data/information.tar.gz.
      - secrets: Part of the secrets/credentials functionality that describes access keys needed to access the endpoint.
    - destination: A blank line starting the description of a destination on the given workflow storage volume of a file being ingressed.
      - The path on the volume where the ingressed data should be placed. Should be of the form file://, ex. file://inputs/information.tar.gz. This, prepended with the mounts: location specified in the jobs, gives the location of the data inside the workflow job container. For example, a file information.tar.gz placed into an inputs folder (as shown above with file://inputs/information.tar.gz) on the data volume being accessed by a job that mounts data at /data would be found in the container at /data/inputs/information.tar.gz.
  - egress: Declares the section that defines data movement out of workflow. When the workflow completes and all jobs are done, the egress will trigger at the specified locations and offload the data to wherever specified as the last thing the workflow does before destroying the data volume.
    - uri: The actual path to the data being moved into the workflow via the ingress. Can be of the form s3://, ex. s3://bucket-name/new-information.tar.gz.
    - secrets: Part of the secrets/credentials functionality that describes access keys needed to access the endpoint.
jobs: This is where the actual compute jobs that the workflow will perform are defined. There can be as can be many jobs in a workflow or as few as a single one, and jobs can be used to accomplish and run all kinds of tasks. Jobs can be created for things like untarring some ingressed data, running scientific software directly as it would be run on the command line, or even to call scripts in Python or other languages that perform their own processing or runs of software.
- : An arbitrary job name. This name will be what shows up under commands like fuzzball workflow status for the name of the job. Can be anything - untar-data, run-training-alexnet, build-gene-database, etc. as need to describe what the job is doing.
  - image: A blank line starting the description of the container image a job will use.
    - uri: This defines where the container image the job will use to run is at. Supported schemes are oras:// for Apptainers and docker:// for Docker/OCI containers. The image contains all the software that the job using it needs - which could be scientific software, compilers/interpreters, MPI/other parallelism suites, etc.
    - secrets: Part of the secrets/credentials functionality that describes credentials needed to access the registry the container is at, if necessary. For example, privately hosted registries, GitLab or similar website registries, or pulling certain container from Docker Hub or the NVIDIA NGC may all require credentials to be specified.
  - command: The actual command that a job will run. This could be running, for example, a Bash shell command, script, or call to an executable for a piece of scientific software. The command will be executed in the job container specified in the image: section, with other constraints applied as specified in other job: subfields. Examples of common command field patterns are command: [python3, script.py, command: ["/bin/sh", "-c", "ls && tar -xf --no-replace-root information.tar.gz, or command: [mpi-program, -in, /path/to/inputs.txt].
  - multinode: How a job should implement multinode parallelism through MPI or PGAS/GASNet. More info can be found at the Fuzzball Workflow MPI Guide.
    - implementation: The specific multinode wrapper implementation that Fuzzball should use. There are two MPI-based options, OpenMPI (openmpi), MPICH (mpich), and one PGAS/GASNet option (gasnet).
    - nodes: How many nodes matching the job’s specified resource requirements the multinode job should spin up.
  - task-array: How a job should implement embarrassingly parallel-based workloads. More information can be found at the Fuzzball Task Arrays guide.
    - start: The starting task ID of the task array.
    - end: The ending task ID of the task array.
    - concurrency: The number of Compute nodes that should be used to process the tasks in the task array.
  - mounts: This specifies where the job will access the volumes defined previously in the workflow file.
    - : The name of the volume as previously defined in the volumes section.
      - location: The path inside of the job container that the mount will be attached at. For example, a file information.tar.gz placed into an inputs folder (with ex. file://inputs/information.tar.gz) on a volume named data being accessed by a job that mounts data at /data would be found in the container at /data/inputs/information.tar.gz.
  - policy: Defines specific rules and parameters related to how the job should run.
    - timeout: Beginning of timeout description for job.
      - execute: How long the job should be allowed to run for being failing due to timeout. Can be specified in minutes.
    - retry: Rules related to rerunning a workflow step in case of failure.
      - attempts: The number of retry attempts to perform before failing the step completely.
  - resource: Defines the resources the job will obtain before it runs. Resources are meant to be specified based on the context the workflow is running in and may be shared between jobs in that context. Fuzzball’s internal scheduler and resource manager will find where the desired resources are available, obtain them, and then start running the job on them. If the specified resources are unavailable, the workflow will halt at the job until resources are available. This will be reflected in the fuzzball workflow status output as a job waiting in the Pending state.
    - annotations: Sets specific, custom attributes of the desired compute node OS. Essentially a custom way to specify nodes for things like CPU architecture, GPU compute capability, specialized hardware accelerators. For example, knights-landing, sm-80, or gaudi.
    - cpu: Specifies how the job will utilize CPUs.
      - cores: How many CPU cores the job will attempt to claim when it starts.
      - affinity: Whether the workflow will attempt to claim any CPU core on a node (CORE), CPUs from the same socket on a node (SOCKET), or CPUs within the same NUMA (Non-uniform memory access) domain (NUMA). NUMA is a system architecture that co-locates each CPU core (or group of CPU cores, potentially) with a part of the overall system memory that the co-located core(s) have a physically and logically shorter path to, which can lead to drastic speedups for some workloads. A “socket” is a physical interface on the motherboard of the compute node that a CPU can fit into; if there is more than one socket, then the node can have more than one CPU. For example, a node may have two sockets, each with an 8 core CPU in them, for a total of 16 CPU cores on the system. If affinity is set to CORE, then the workflow will attempt to claim the requested number of cores from any CPU on the system, regardless of socket or NUMA domain. If affinity is set to SOCKET, then it will attempt to claim the requested number of cores from those available on a single CPU in a single socket. If affinity is set to NUMA, then it will attempt to minimize the number of utilized NUMA nodes - which typically also means minimizing the number of utilized sockets.
      - sockets: The number of sockets that the job should try to use.
      - threads: Whether or not hardware threads should be used. Hyper-threading allows for two threads to run on each core, but has varied results with high performance computing workloads; sometimes they see a performance increase, but also frequently see no speedup or even a performance decrease.
    - memory: Specifies how the job will use RAM.
      - size: The amount of RAM that the job will attempt to claim.
      - by-core: Whether or not the memory should be allocated by core or by node.
    - devices: Specifies which and how many devices the job will use. A device request take the following form <vendor>/<type>: <n> , by example to request Nvidia GPUs:
      - nvidia.com/gpu: 1
  - requires: What other jobs have to be completed before this job can start. This provides an easy to specify interface for setting up jobs that depend on the results of other jobs to run correctly - it specifies a workflow as a directed acyclic graph (DAG), meaning a graph with no cycles (which would represent circular impossible to fulfill job dependencies) where there is no backtracking; jobs can only flow forward through the path of execution. If a job has no dependencies defined with requires:, it will be treated as an entry point to the workflow and be started when the workflow begins.
  - network: Specifies parameters related to how a network namespace for the container should be established.
    - isolated: Creates an isolated network for the container to use, allowing the fuzzball workflow port-forward command to be used for forwarding a workflow job container port back to the user’s Fuzzball CLI. Provides a bridge to the open internet so connectivity there is maintained as is the default.