What is RetrogeneDB?

RetrogeneDB is a database of retrocopy annotations in sequenced eukaryotic genomes. It is the first database containing retrocopy data on such a big scale (currently 62 different genomes from Ensembl release 73), as they tend to be very poorly annotated in public genomic repositiories (NCBI, Ensembl) and the main retrocopy databases (RCPedia, Pseudogene.org) are limited to few model organisms. The retrocopies in RetrogeneDB were detected by our custom pipeline which was designed with low false positives level in mind. RetrogeneDB allows users to easily search for retrocopies and their parental genes using various criteria, from the similarity to the parental gene to expression levels.

What should I know to use RetrogeneDB?

Coordinate system:
We use 0-based coordinates (i.e. the first position in the chromosome has the coordinate 0). This is the same as the convention used in UCSC and unlike the one used in Ensembl (which is 1-based).
Parental gene data:
Data regarding parental genes and their homology were imported from Ensembl. As a result, some data may be missing (for example many genes don't have the gene symbol assigned).
Parental gene uncertainty:
In some cases two or more genes can give equlally good alignments to the retrocopy. In such situations, parental gene is assigned randomly.
Conserved ORF:
A retrocopy is considered to have conserved ORF if it contains no frameshifts and stop codons.
Saccharomyces cerevisiae genome:
Yeast genome was not included in the RetrogeneDB because there were no retrocopies detected.

What are the future plans?

In the nearest future we plan to improve RetrogeneDB by:

  • Including retrocopy annotations for the genomes from Ensembl Genomes databases, including metazoans, plants, fungi and protists
  • Adding RNA-Seq libraries for a diverged set of species
  • Including expression estimation based on EST data
  • Calculating orthology relationships between retrocopies
  • Importing the sequnce variation data from The 1000 Genomes Project
  • Importing the regulatory elements data from ENCODE and modENCODE