Fuzzball Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Executing a BLAST Workflow

The Fuzzfile below uses some of the BLAST tooling to pull one set of sequences for querying and another set of sequences for building a database, builds the database, and then executes the query. It generates output file results/blastp.out which can be saved through data egress to a path defined in an S3 URI. Different jobs within the workflow use BLAST and edirect containers published by NCBI on Dockerhub.

version: v1
volumes:
  blast-volume:
    name: blast-volume
    reference: volume://user/ephemeral
    # egress:
    #   - source:
    #       uri: file://results/blastp.out
    #     destination:
    #       uri:
    #       secret:
jobs:
  create-dirs:
    image:
      uri: docker://ncbi/blast:2.12.0
    command: [mkdir, blastdb, queries, fasta, results, blastdb_custom]
    cwd: /data
    resource:
      cpu:
        cores: 1
        affinity: NUMA
      memory:
        size: 1GiB
    mounts:
      blast-volume:
        location: /data
  retrieve-query-sequence:
    image:
      uri: docker://ncbi/edirect:20.6
    command: ["/bin/sh", "-c", "efetch -db protein -format fasta -id P01349 > P01349.fsa"]
    cwd: /data/queries
    resource:
      cpu:
        cores: 1
        affinity: NUMA
      memory:
        size: 1GiB
    mounts:
      blast-volume:
        location: /data
    requires: [create-dirs]
  retrieve-database-sequences:
    image:
      uri: docker://ncbi/edirect:20.6
    command: ["/bin/sh", "-c", "efetch -db protein -format fasta -id Q90523,P80049,P83981,P83982,P83983,P83977,P83984,P83985,P27950 > nurse-shark-proteins.fsa"]
    cwd: /data/fasta
    resource:
      cpu:
        cores: 1
        affinity: NUMA
      memory:
        size: 1GiB
    mounts:
      blast-volume:
        location: /data
    requires: [create-dirs]
  make-blast-database:
    image:
      uri: docker://ncbi/blast:2.12.0
    command: ["/bin/sh", "-c", "makeblastdb -in /data/fasta/nurse-shark-proteins.fsa -dbtype prot -parse_seqids -out nurse-shark-proteins -title 'Nurse shark proteins' -taxid 7801 -blastdb_version 5"]
    cwd: /data/fasta
    resource:
      cpu:
        cores: 1
        affinity: NUMA
      memory:
        size: 1GiB
    mounts:
      blast-volume:
        location: /data
    requires: [retrieve-query-sequence, retrieve-database-sequences]
  run-blast:
    image:
      uri: docker://ncbi/blast:2.12.0
    command: [blastp, -num_threads, 8, -query, /data/queries/P01349.fsa, -db, /data/fasta/nurse-shark-proteins, -out, /data/results/blastp.out]
    cwd: /data
    resource:
      cpu:
        cores: 8
        affinity: NUMA
      memory:
        size: 30GiB
    mounts:
      blast-volume:
        location: /data
    requires: [make-blast-database]

You can run this workflow either through the GUI or the CLI.

Please select either the GUI or CLI tab to see the appropriate instructions for your environment.

If you click “Workflow Editor” and “Create New”, you will see a blank page in the workflow editor.

Fuzzball create new workflow section

Now you can either click the ellipses (...) menu in the lower right and select “Edit YAML” or simply press e on your keyboard. An editor with a Fuzzfile stub will appear.

Fuzzball editor newly opened

You can delete the current contents and copy and paste the workflow definition of from above.

Fuzzball editor copied and pasted BLAST workflow

Now pressing “save” will return you to the interactive workflow editor. You will now see the BLAST workflow graph instead of a blank editor page. The Fuzzball GUI will automatically validate the yaml file for syntax errors.

Fuzzball workflow editor with BLAST

Note that the 2 retrieve-* jobs can proceed in parallel if enough resources are available.

Submitting your workflow to Fuzzball with the GUI is easy. Simply press the triangular “Start Workflow” button in the lower right corner of the workflow editor. You will be prompted to provide an optional descriptive name for your workflow.

Fuzzball name workflow and submit screen

Now you can click on “Start Workflow” in the lower right corner of the dialog box and your workflow will be submitted. If you click “Go to Status” you can view the workflow status page. The screenshot below shows the status page for a hello world workflow submission.

Fuzzball workflow status page

To retrieve logs produced by this workflow, select a job within the workflow such as make-blast-database, and click the “Logs” option on the right.

Fuzzball workflow dashboard showing logs from the make-blast-database

To run this workflow through the CLI you will need access to the Fuzzball CLI. You can install it using the Fuzzball CLI installation instructions.

First, you can create a Fuzzfile blast.fz with the contents above using the text editor of your choice.

You can start start this workflow using the CLI by running the following command:

$ fuzzball workflow start blast.yaml
Workflow "8ae68827-4bce-45c6-ab0c-9f086a8052fb" started.

You can monitor the workflow’s status by running the following command:

$ fuzzball workflow describe <workflow uuid>
Name:      blast.yaml
Email:     bphan@ciq.co
UserId:    e554e134-bd2d-455b-896e-bc24d8d9f81e
Status:    STAGE_STATUS_FINISHED
Created:   2024-06-18 09:37:23AM
Started:   2024-06-18 09:37:23AM
Finished:  2024-06-18 09:44:21AM
Error:


Stages:
KIND     | STATUS   | NAME                                 | STARTED               | FINISHED
Workflow | Finished | 8ae68827-4bce-45c6-ab0c-9f086a8052fb | 2024-06-18 09:37:23AM | 2024-06-18 09:44:21AM
Volume   | Finished | blast-volume                         | 2024-06-18 09:37:24AM | 2024-06-18 09:37:45AM
Image    | Finished | docker://ncbi/blast:2.12.0           | 2024-06-18 09:37:24AM | 2024-06-18 09:41:20AM
Job      | Finished | create-dirs                          | 2024-06-18 09:41:35AM | 2024-06-18 09:41:40AM
Job      | Finished | retrieve-query-sequence              | 2024-06-18 09:41:56AM | 2024-06-18 09:42:04AM
Job      | Finished | retrieve-database-sequences          | 2024-06-18 09:41:58AM | 2024-06-18 09:42:06AM
Job      | Finished | make-blast-database                  | 2024-06-18 09:42:21AM | 2024-06-18 09:42:28AM
Job      | Finished | run-blast                            | 2024-06-18 09:43:38AM | 2024-06-18 09:43:45AM
File     | Finished | file://results/blastp.out ->         | 2024-06-18 09:44:00AM | 2024-06-18 09:44:05AM
         |          | s3://co-ciq-m...                     |

You can view outputs logged by the workflow using the fuzzball workflow log command and provide the workflow UUID and job name. For example, executing the following command, will output logs from job make-blast-database in the workflow:

$ fuzzball workflow log <workflow uuid> make-blast-database
Building a new DB, current time: 06/18/2024 16:42:26
New DB name:   /data/fasta/nurse-shark-proteins
New DB title:  Nurse shark proteins
Sequence type: Protein
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 7 sequences in 0.244187 seconds.