HPC-T-Annotator

Post-processing Notebooks

Interpreting extensive biological datasets, generated from a variety of analysis and alignment tools, is complicated by the complexity of data visualization. Traditional methods often fall short in capturing the nuances of these intricate biological systems, highlighting the need for innovative graphical representation approaches. Given that transcriptome annotation produces large alignment data files, accurately interpreting this information is essential. Therefore, these notebooks tackle the crucial task of extracting and visualizing relevant information from this data, aiding in its accurate interpretation.

AnnoDegsReport: Multi-database Annotation Summary

Group: Transcriptomic data analysis

Input: ALL BLAST/Diamond alignment TSV results (blastx and blastp vs all databases) and Deseq2 output table.

This notebook is capable of generating summary reports in XSLX or TSV format, consolidating the annotation results of a specific transcriptome obtained by running alignment software (Diamond and BLAST) against various sequence databases. Additionally, this notebook can create links to the accession numbers for the input transcript sequences that mapped to the database, as well as resources on the NCBI and UniProt portals.

Open on GitHub

AnnoRate: Hit Rate Comparison

Group: Transcriptomic data analysis

Input: Input transcriptome (pep and cds) and Diamond/BLAST alignment results (in TSV format).

This notebook analyzes the hit percentages for sequences across various databases. It evaluates the performance of a database in comparison to an input transcriptome. The notebook accepts a Fasta-format transcriptome file and a TSV-format alignment output from tools like Diamond or BLAST. It calculates hit percentages and generates a summary table, aiding users in assessing database coverage and specificity. This analysis helps select the most suitable database for a specific analysis.

Open on GitHub

AnnoViz: A Jupyter Notebook for Enhanced Annotation Result Interpretation

Group: Transcriptomic data analysis

Input: One single BLAST/Diamond alignment TSV result.

This notebook takes as input the output file generated by annotation software (BLAST or Diamond) and creates graphs that provide an interpretation of the annotation and alignment results. It offers valuable insights into the transcripts that have mapped to the database.

Open on GitHub

MultiVenn: A Jupyter Notebook for Comparing Annotation Results Across Databases Using Venn Diagrams

Group: Transcriptomic data analysis

Input: Species specific Diamond/BLAST annotation TSV results across multiple databases.

Venn diagrams are a powerful visualization tool used in many areas of research, including bioinformatics. They are typically used to show the overlap between different sets or categories of data. In the context of omics data analysis, Venn diagrams can help identify common or unique hits across multiple databases or analyses, providing insights into the relationships between different datasets. In addition to the traditional Venn diagram, there are also variations such as Euler diagrams and Edwards-Venn diagrams, which can provide different types of visual representations of the data.

Open on GitHub