HPC-T-Annotator

Interface Guide

Welcome to the user guide for the HPC-T-Annotator interface. This guide will help you understand and utilize the features of the interface to configure and generate a customized software package for parallelization tasks involving annotation software. Follow the steps below to effectively use the interface.

The web interface consists of two panels:

Here we show two examples of settings (scheduler SLURM, scheduler None) for the upper panel. Based on your selection, different configuration options will be displayed.

Step 1: Workload Manager Settings

Select a workload manager to configure settings:

Based on your selection, different configuration options will be displayed.

Base Configuration Settings

Configure basic settings when no workload manager is selected:

  • - Job name: Specify the name of the job.
  • - Number of processes: Set the desired number of processes. Must not excedeed the number of sequences.
  • - Threads: Specify the number of threads for processing.

Below is a screenshot as an example of the SLURM settings.

HPC-T-Annotator example

In the case where 'None' is selected in the upper panel, the parameters to be set are the following:

screenshot

Step 2: Alignment Software Settings

In the bottom panel, the first step is to select the aligner to use (BLAST or DIAMOND). Once the software is selected, the following parameters need to be filled in the form:

  • - Select an annotation tool: Choose either Blast or Diamond as annotation tool.
  • - Choose a tool: Select the specific tool to use (e.g., Blastp or Blastx).
  • - Database absolute path: Provide the absolute path to the reference database.
  • - Input file absolute path: Specify the absolute path to the input FASTA or Multi-FASTA file for annotation.
  • - Outdir absolute path: Specify the absolute path to the output directory.
  • - Binary absolute path: Enter the absolute path to the annotation software binary.
  • - Outformat: Define the output format for annotation results using double quotes (e.g., "6 qseqid sseqid...").
  • - Additional options Enter any additional options that you would like to access and input any extra parameters required.

In note, for Additional options field, only options related to computation are accepted; options indicating the usage of threads and input/output file names are not accepted. So the options -p and -o for diamond or -out and -num_threads for blast are not accepted! Use this field to provide any additional command-line options or parameters for the annotation tool. These options should be entered as you would on the command line, separated by spaces. For example, you can specify parameters like "-evalue 1e-5 -max_target_seqs 10" for a BLAST execution.

HPC-T-Annotator example

Generating the Software Package

After configuring the settings for both workload manager and annotation software, you can:

  • Click the Reset the Fields button to clear all entered data.
  • Click the Generate button to create the customized software package.

Once generated, you can download the software package in TAR format.

The following step must be done on the HPC cluster.

Extract the TAR archive and run

After downloading the TAR archive (and, if you want, uploading the archive on HPC cluster) you have to unTAR the archive and then run the start script.

tar -xf hpc-t-annotator.tar -C /path/to/myfolder/hpc-t-annotator
cd /path/to/myfolder/hpc-t-annotator

And the, if you are on HPC cluster with Slurm workload manager

sbatch start.sh

Else, if you are on workload manager less architecture

bash start.sh

Once the entire computation process has ended (check the general.log file for the status), the final result will be in tmp/final_blast.tsv file.

Some useful monitoring commands

Check that all jobs are finished

If all jobs are finished, you can check the logs by running this command:

cat general.log

That will display something like this:

Starting timestamp#2023-08-05 15:22:02
Input file: ./input/input.fa
Processes: 300
Out-format: 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore
Diamond: no
Tool: blastx
Binary: /g100/home/userexternal/larcioni/BLAST-2.14.0+/bin/blastx
Database: /g100_scratch/userexternal/larcioni/DATABASES/NR/blast/blast
Sequences: 36985
Average runtime: 8:35:38

Max runtime: 16:13:09

Min runtime: 3:36:23

Ending timestamp#2023-08-06 21:13:40
Total elapsed time: 05:51:38

Check for errors

You can check for jobs errors running these commands:

# For general errors
cat general.err
# For control script errors
cat control.err
# For specific job errors
cat tmp/*/general.err

A little example that shows a simple usecase of HPC-T-Annotator on a HPC cluster.

Step-by-step example

Assume you have to run the BLASTX software on a transcriptome file input.fa against the NR database.

Let

  • /home/user/assembly/input.fa be the transcriptome file's absolute path and
  • /home/user/DATABASES/NR be the database's absolute path.

Suppose also that you have an HPC cluster, with the SLURM workload manager, at your disposal. Both the transcriptome file and the database (and the executable binary as well) are already available on the cluster. You can run HPC-T-Annotator on your cluster by generating all the necessary files through our interface, filling out the form as follows.

HPC-T-Annotator example HPC-T-Annotator example

The images above show the filled HPC-T-Annotator interface. In this context, we have assumed that:

  • Job name: is "my_test_job".
  • Account name: is "job_account".
  • Serial partition: is "g100_all_serial".
  • Parallel partition: is "g100_usr_prod".
  • Number of Processes: is 10.
  • Number of Threads: is 48.
  • Wall time (hours): is 1.
  • Memory per process (GB): is 15.

and the following software settings:

  • Alignment Software: is Diamond.
  • Tool: is BLASTX.
  • Database: is /home/user/DATABASES/NR.
  • Input file: is /home/user/assembly/input.fa.
  • Outdir: is /home/user/assembly.
  • Binary: is /home/user/bin/diamond.
  • Outformat: is "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore".
  • Additional options: are -k 5 --ultra-sensitive.

Now, all we have to do is click on the "Generate" button and the interface automatically generates the software package in TAR format. After that, you can upload the package to your HPC cluster using the scp Unix command as follows:


scp hpc-t-annotator.tar user@cluster_domain:path/hpc-t-annotator.tar

Where cluster_domain is the domain name of the HPC cluster, path is the path where the software package is uploaded, and user is the username of your account on the HPC cluster.


Now, you have to extract the software package using the tar Unix command as follows:


tar -xf hpc-t-annotator.tar && rm hpc-t-annotator.tar

After that, you can run the start.sh script on the HPC cluster using the following command:


sbatch start.sh

After the computation process has ended (check the general.log file for the status), the final result will be in the /home/user/assembly/final_blast.tsv file.