Genome annotation

Figure 1: The basic steps of genome annotation.Image courtesy of PB Works

Genome annotation is the process of identifying and attaching biological information to genes and their coding regions. Annotation is essential to understand DNA's significance and is needed to better our knowledge of biological processes. There are three steps to genome annotation: identification of non-coding regions of the genome, gene prediction, and attachment of biological information to sequences.

Annotation happens at the nucleotide-level, protein- level, and process- level. Nucleotide-level annotation is the localisation of genes, genetic landmarks, and other markers and discovering what role each bit plays in the genome. Protein-level annotation consists of compiling a catalog of proteins, naming them, and assigning a function. Process- level annotation is the most challenging part of genome annotation. It consists of discovering what building blocks in the genome relate to processes such as cell death, cell cycle, and metabolism.

Nucleotide-level and protein- level annotation are both structural annotation, whereas process- level annotation is considered functional annotation. Structural annotation is the identification of genomic elements, including gene structure, coding regions, ORFs localisation, and regulatory motif localisation. Functional annotation is the attachment of biological information to genomic elements including expression, biochemical function, biological function, and regulation and interactions.

Genome annotation can be performed manually, which requires human expertise and experimental verification, or automatically, which is done purely through computer analysis. Both methods co-exist and use the same annotation pipeline. A pipeline is a series of elements where one the output of an element is the input of another.

BLAST, or Basic Local Alignment Search Tool, is very popular while performing genomic annotation. BLAST is an algorithm that compares the primary biological sequences that can later be annotated. However, today scientists are using additional information to better differentiate between genes with the same basic annotations.

Genome annotation is still a challenge for many scientists. They are still only in the early stage of understanding how all the parts fit together, however, a variety of software have been developed to help view and share genomic annotations.

References Edit

1. Genome annotation. Medicine Net.

2. Genomics. Wikipedia.

3. Pipeline (computing). Wikipedia.

4. Genome Project. Wikipedia.

5. Genome Annotation: From Sequence to Biology. Lincoln Stein, pdf

6. BLAST. Wikipedia.