OrthoDB

user guide

Go to OrthoDB >>

# Terminology

TIP

# Orthologs

Orthologs are genes in different species that evolved from a common ancestral gene by speciation. If one or both of these genes were duplicated after the speciation they are all termed co-orthologs, or just orthologs.

TIP

# Orthologous group / level-of-orthology

If there are more than two species considered, there are more than one speciation event, and we refer as orthologs, or orthologous group, to all descendants of a particular single gene of the last common ancestor of these species. Thus our operational definition refers to a specific phylogeny radiation for a set of species, termed the level-of-orthology.

TIP

# Ortholog functions

It is a reasonable hypothesis that orthologs keep functions of their ancestor gene ("by tradition"), though there are examples of gene function gains and losses. The statement of gene orthology though refers to their evolutionary relation, not to the kept or altered functions.

TIP

# Paralogs

Paralogs are genes that evolved by duplication inside a genome. Notions of orthologs and paralogs are disjoint, e.g. paralogs can be co-orthologs if duplicated after the speciation or can be not if duplicated earlier.

# Standalone OrthoLoger software

OrthoDB standalone pipeline for delineation of orthologs, OrthoLoger, is freely available here (opens new window)

# Searching OrthoDB content

OrthoDB can be queried practically by anything shown on its pages. Search pattern might be a gene/protein name/symbol, accession identifier, annotation keyword, species/clade name, metabolite name, OrthoDB internal gene id, etc... We indexed various gene/protein identifiers from more than 100 databases, including UniProtKB, Ensembl, GenBank, RefSeq as well as organism/clade focused databases like NextProt, MGI, FlyBase, VectorBase, WormBase, ZFIN, etc.. Functional descriptors, like GeneOntology, InterPro, KEGG, etc. are also indexed in. BLAST-like search using amino-acid sequence pattern is also supported.

OrthoDB search might be performed on two levels of granularity, ether orthologous group (OG) or gene. The first yields a list of relevant OGs, while the second outputs single, the most relevant gene along with its description, followed by a list of orthologous gene/s from multiple organisms, sorted by evolutionary proximity.

TIP

To get a gene-centric view providing the available annotations and a list of pair-wise orthologs - switch from Text to get Gene on the left of the search input.

TIP

To query specifically for a numeric NCBI gene - switch from Text to NCBI ID on the left of the search input.

TIP

To query for EC numbers - use double quotes, e.g. "3.1.1.-".

# Text query format

The search algorithm in the plain text mode delivers OGs matching all keywords, in case-sensitive mode. For both single keywords and phrases users can take advantage of a Google-like autocomplete lookup, self-activating after the first three characters of each word entered. The autocomplete matches the characters case-sensitively anywhere in the word. This allows users to pin-point composite words, in addition to conventional left-anchored matches. For example, the search will return suggestions for various transferases [aminotransferase, methyltransferase, etc…] given ‘transferase’ as a prompt. The query can be more complex and supports logical operations to combine multiple keywords; for example ‘-’ or ‘!’ are interpreted as logical NOT that enables queries like [kinase !tyrosine]. To match a complete phrase one should use double quotation marks, e.g. [“Cytochrome P450”], as well as for querying EC numbers, e.g. [“3.1.1.-”]. Using the ‘Advanced’ panel one can filter the results for organismal taxonomy and/or the level of orthology by selecting the appropriate nodes on the species tree, and/or the member gene phyloprofile, e.g. present in >90% of the species. The search algorithm matches OGs containing genes in ‘at least’ the organisms selected on the tree, usually with many others. For even more precision, it is possible to negate a certain clade in the above-mentioned taxonomic selection by an additional taxonomic node name in the text search widget, e.g. text search pattern ‘kinase !Metazoa’ with Eukaryota level selected delivers very specific kinases not present in Metazoa and similar organisms. Searching for a particular gene, with term get Gene being selected, expects a pinpointing non-ambiguous pattern, usually a gene identifier, either OrthoDB or an external one, e.g. Uniprot accession number P12345. The most relevant gene is the output, though there is no provision made to disambiguate/present/mine search results if search pattern yields more than one gene. Precision while searching by an omnipresent term, e.g. gene symbol ABCD, might always be gained by one or several supplemental criteria, like exact species name, e.g. "Homo sapiens". Please note, that neither autocompletion nor OrthoDB taxonomic tree selection are (yet) working in this mode.

  • Use double quotation marks to match a phrase, e.g. "Cytochrome P450"
  • Take advantage of the autocomplete lookup feature
  • Logical operator NOT use - or !, e.g. kinase !tyrosine
  • Logical operator OR use |, e.g. protease | peptidase
  • Logical operator AND is implicit, e.g. sodium transporter actually means sodium AND transporter (if not quoted)

OrthoDB can be queried by homology to a protein sequence: switch from Text to Sequence on the left of the search input and paste the query protein sequence without a header line.

# Advanced options

# Phyloprofile

The result list of Orthologous Groups can be filtered for

  • universality, i.e. having member genes in all species of the selected taxonomic node, or a fraction of them, e.g. present in all, in >90% or >80% of the species.
  • gene copy-number (duplicability), requiring them to have only single-copy orthologs in all species of the selected taxonomic node, or a fraction of them, e.g. single-copy in all, in >90% or >80% of the species.

You can combine any presence filter with any copy-number filter to refine your results, e.g. present in >90% AND single-copy in >80% of species.

# Select species

You can tailor your search by using the expandable species tree to select a radiation point or particular sets of species.

  • Expand or collapse any node on the tree by clicking on the filled arrows or node names.
  • Select all species at a node by clicking on the unfilled box next to the node name, or
  • select specific species by clicking on the unfilled box next to the species name. You may also add species to the list of selected species to display by typing the species name in the search box and selecting from the autocompleted options. As you add or remove species from the expandable species tree, the Species to display box above it will automatically update to reflect your selections.

# Search at (level-of-orthology)

OrthoDB Orthologous Groups are hierarchical, being delineated at the major radiations along the species phylogeny. This enables to precise orthologs to a particular level-of-orthology: considering many distantly-related species delineates fewer, more general (inclusive) orthologous groups containing all the descendants of the ancestral gene, while examining only sets of more closely-related species produces many fine-grained orthologous groups of mostly one-to-one relations.

TIP

The level-of-orthology can be adjusted after species or clades of interest were selected (see Select species)

# Species to display

By default only genes from model species will be shown in details for returned Orthologous Groups in the Orthologs by organism section of results. This can be changed instead to a set of Species to display.

# Results

WARNING

Results of an OrthoDB query, according to the search mode Text/get Gene selected, are shown as either a list of relevant Orthologous Groups or a Gene-centric view for the most relevant gene,

Each detailed record of an Orthologous Group has following sections:

# Functional descriptions

OrthoDB provides tentative functional annotations of groups of orthologs and mapping to functional categories by summarizing functional gene annotations, extensively collected from other public resources. Annotation of genes is complicated and contains errors. Although in many cases OrthoDB makes such errors in the underlying data apparent, discordant annotations should be considered with caution.

# Evolutionary descriptions

The evolutionary annotations of the orthologs remain a distinguishing feature of OrthoDB.

# Phyletic Profile

is a summary of the ortholog presence (from universal to species-specific) and copy-numbers (single/multi-copy counts).

# Evolutionary Rate

is a measure if this Orthologous Group exhibit appreciably higher or lower levels of sequence divergence, derived from quantification of the relative divergence among their member genes. These are computed for each orthologous group as the average of inter-species identities normalized to the average identity of all inter-species best reciprocal hits, computed from pairwise alignments of protein sequences. The relative rate is indicated by the position of the black star along the scale of slow-blue to fast-red rates.

# Gene Architecture

shows median and standard deviation values of protein lengths and exon counts for each orthologous group, effectively describing a consensus gene architecture (for those genes with available data).

# Orthologs by organism

WARNING

This section can be very long. Use navigation arrows on the left to go to the beginning or the end of the record, or the cross to collapse the detailed view to the condensed view.

Condensed view for each gene includes gene/protein ID, UniProt ID, short description, number of amino acids (AAs), number of exons, and associated InterPro domains.

TIP

For the length (AAs) and exon counts (Exons) listed for each gene, the exclamation mark (!) indicates differences from consensus (left: shorter, right: longer, !: 1 stdev, !!: 2 stdev).

# Double-arrow icon

expands the view, if clicked, to the available for a given gene annotations with links to source databases.

Available annotation of InterPro domains are displayed for each protein member ordered from the N to C terminus. Click on the grey magnifying glass icon to query OrthoDB for groups containing proteins with the same domains. To search for specific domain architectures, enter an ordered list of InterPro identifiers separated with only commas into the Text Search field.

TIP

# View protein fasta

retrieves the corresponding protein sequences in Fasta format. Group ID, gene, organism, and other useful details are contained in the header of each sequence. This information can be saved as a file by right-clicking on the link followed by "save link as...".

TIP

# View CDS fasta

retrieves the corresponding CDS sequences in Fasta format in a similar way.

TIP

# View Tab Delimited

retrieves the corresponding ortholog information as tab delimited text. This information can be saved as a file by right-clicking on the link followed by "save link as...".

WARNING

Note that retrieving sequences/info in Fasta/Tab format is limited to a maximum of 30000 groups

# Sibling Groups

Related orthologous groups at the same level-of-orthology are defined according to their common InterPro domain annotations. The top 5 groups are listed with their percentage overlap in terms of common InterPro domains, and the complete list of related groups may be retrieved by clicking the Show all siblings link.

# Uploading and analyzing your own sequences

# Register

In order to be able to upload sequences for a custom analysis, you need to register:

  • Click on the "Register" link on the top right part of the OrthoDB webpage.
  • Enter your login detail in the form that will appear.

# Data upload

Upload a fasta file with the sequences to be analyzed

After logging in, you can upload your sequences using the "Own data mapping" link (next to "Help").

After clicking on "Own data mapping", click on the "Upload" button and select your fasta-formatted file. Be aware that the file should contain amino acid sequences only.

After uploading is finished you will have to enter a species name in the corresponding field.

Next step is to select where to map your sequence. This level of orthology can either be selected automatically or manually. If Auto is selected, BUSCO (opens new window) is used to find an appropriate level.
In manual mode, select species by clicking on the Advanced button next to the search button and select clade from the tree. Click on Advanced again to return back.

Click on "Run analysis" to add your job to the mapping queue. When the job starts, the status should change from "CREATED" to something else depending on your setup.

When mapping is done and it passed without error, the status will again change to "DONE".

If there is some kind of error, the state will be "ERROR" or something more informative. In particular when BUSCO has been used to determine where to map, the error may be any of the following:

Message Description
ERROR server side error
ERROR_BUSCO:io server side error
ERROR_BUSCO:bad_setup server side error
ERROR_BUSCO:no_result busco failed
POOR_BUSCO:AT<score> busco successful but score is too low

If the two last errors are generated, rerun the analysis using a manually selected orthology level. If it still fails (or any other error), contact orthodb support at support[at]orthodb.org.

# Retrieve results

Download the results in a plain text file

Click on the "Download" button to get the mapping results. The name of this file contains all the mapping information:

  • node_XXX: where "XXX" is the NCBI taxon ID of the mapping node.
  • subnode_AAA_BBB_CCC_DDD_EEE: where AAA, BBB, CCC, DDD, and EEE are the NCBI taxon IDs for the selected species.
  • taxid_YYY: where YYY is a temporary taxon ID for your species.

The mapping file contains 9 fields:

  • Ortholog group name
  • Gene name
  • Ortholog type; for mapped sequences this field is a number >=10 and <20.
  • Length of the matching region (in amino acids).
  • Start coordinate of the match.
  • End coordinate of the match.
  • Score of the match.
  • Normalized score of the match.
  • E-value of the match.

# Comparative Charts

This OrthoDB online tool allows generation of a comparative overview of the gene content across selected genomes. The total gene counts and the fractions of orthologs among these species shows the level of relatedness among the genomes, highlighting the "universal" core of genes and the ones evolving under single-copy constraint [PMID:21148284] (opens new window).

You can select up to 20 species on the right panel to be included into the comparative genomics chart. The colors, patterns, etc can be customised from the "Configure chart" tab on the right panel. The fractions shown are hyperlinked to their corresponding Ortholog Groups from which the gene counts were made. The tailored chart can then be exported as a publication quality vector graphics.

Explore an example (opens new window)

# Bookmarking

Search results can be saved by simply bookmarking the result page or saving the URL text.

You can also drag & drop the bookmarklet link under Bookmark OrthoDB at the right side under the search field to the browser toolbar for easy OrthoDB search next time with the same settings. You can later just highlight a keyword somewhere on a web page and click on the saved bookmarklet to search OrthoDB for this keyword.

# API

The OrthoDB data can be programmatically accessed using

  1. a URL based interface.
  2. python API - OrthoDB-py (opens new window)
  3. R - OrthoDB-R (opens new window)

The documentation below is for the URL based interface. For the python and R interfaces, the user is referred to the links above. At the moment the R API is being prepared to be included in bioconductor (opens new window).

In our implementation this means that the data can be retrieved using the following:

# URL

https://data.orthodb.org/v12/CMD?ARG1="value"&ARG2="value&..."

where CMD is a command and all ARGx are arguments to that specific command. Below follows a description of the available commands with arguments.

WARNING

NOTE the request rate is limited to 1 request/second for the following URLs:
/blast
/tab
/fasta
If the rate is too high, some of the requests will fail with a 503 error.

# Data Formats

All data is returned in JSON format, except for /fasta, /tab and /og_description. JSON data is widely supported by many languages. An overview with many examples can be found here (opens new window).

The JSON returned is of the generic format:

          {
             "data"   : JSON object containing nested objects/arrays with the data
             "status" : "ok" or "error"
             "message": eventually a message explaining the error
          }

# The OrthoDB organism, gene and orthologous group (OG) identifier patterns :

# Organism id

Generic form is taxid_version

  • taxid is the NCBI taxonomy id, extended with an organism-dedicated suffix

    Example: 10090_0

# Gene id

Generic form is taxid_version:number

  • taxid is the NCBI taxonomy id, extended with an organism-dedicated suffix
  • number is a unique left-zero-padded hexadecimal integer value
    Example: 10090_0:000d08

# OG id

Generic form CLIDatCLADE

  • CLID is a numerical cluster id, as evaluated by the OrthoDB clustering algorithm
  • CLADE is NCBI tax id of the clade/level of orthology
    Example: 124at33208

# NOTE

prior to OrthoDB v10 the OG ids were of the form FFFVVCCCCII, where

  • FFF either EOG (eukaryota) or POG (prokaryota)
  • VV OrthoDB version (09 for both v9 and v9.1)
  • CCCC unique identifier for each clade
  • II unique cluster identifier within the clade clade
    Example: EOG091G06KN
    Please, also note there is no any guaranteed inheritance of gene or OG identifiers between past and next releases, e.g. some gene or OG ids might appear in next releases with totally unrelated content.

# Using the API

Interacting with OrthoDB API can be done using either of :

  • a web browser
  • a GUI utility, e.g. FileZilla
  • a command line utility, like curl
  • a programming language subsystem/module, e.g RCurl in R or requests in Python
    Note that currently the command line utility wget is not fully supported.

TIP

Linux: curl is installed by default
Windows: curl (opens new window)
Mac: curl is usually installed natively, otherwise look here (opens new window)

Example to download sequences of a given cluster into local file data.fs :

curl https://data.orthodb.org/v12/fasta?id=32204at9721&species=9721 -L -o data.fs

Note the difference in options for specifying output file.

# API Commands

# /orthodb_release_id

  • Arguments: NONE

  • Returns: Double-quoted string with OrthoDB data/API version.

  • Description: This retrieves the OrthoDB data/API version.

Example (opens new window)

curl https://data.orthodb.org/v12/orthodb_release_id -L -o odb_version.dat

# /tree

  • Arguments: NONE

  • Returns: A JSON object containing the entire OrthoDB taxonomic tree

  • Description: This retrieves the entire OrthoDB taxonomic tree, i.e. a hierarchy of clades and leaf taxons (OrthoDB organisms).

Example (opens new window)

curl https://data.orthodb.org/v12/tree -L -o tree.dat

# /species

  • Arguments:
    clade - an OrthoDB clade/level, e.g. 99
    level - same as clade

  • Returns: A JSON object containing an array with OrthoDB leaf taxons, i.e. organisms.

  • Description: This retrieves a flat, non-hierarchical array of all leaf taxons (OrthoDB organisms). The results might optionally be filtered by parameter clade to include only the underneath leaf taxons.

Example (opens new window)

curl https://data.orthodb.org/v12/species?clade=99 -L -o taxons.dat
  • Arguments:
    query - a text query pattern, with logic operators (see above), sought anywhere in OG data, including the OG identifier itself
    gid - NCBI gene identifier, e.g. 1 of a gene required to be a part of each matched OG
    ncbi - same as gid
    species - taxonomic filter, the NCBI tax id of a clade (expanded to all leaf taxons) or a CSV list of leaf taxons (i.e. OrthoDB organism ids), all required to be a part of each matched OG
    level - the NCBI tax id of a clade (taxonomic level) of orthology, at which each matched OG is built
    universal - phyloprofile filter present in 1.0, 0.9, 0.8 specifying gene universality as a fraction of all species in the clade where matched OG is built
    singlecopy- phyloprofile filter singlecopy in 1.0, 0.9, 0.8 specifying if gene is present as a single copy, same logic as above \

skip - number of hits to skip to the next chunk, if the result is paginated
take - maximum number of OG ids on a page, default to 100, maximum 10000
counts_only - return only count of matches

  • Returns:
    a JSON object with an array of OG ids (key data), as well as another array of same OGs, with some info (key bigdata)

  • Description:
    This retrieves all OG matching a given query pattern, being it a text pattern passed via parameter query, with eventual additional phyloprofile or taxonomic filtering criteria.
    At least 1 of the 3 parameters (query,species or level) must be given. A combination of thereof obviously pinpoints the result. Request with only species parameter returns all OGs where orthologous genes of all explicitly listed or implicitly evaluated (after expansion of given clade to leaf taxons) organisms take part of. Request with only level parameter returns all OGs built at this level (clade) of orthology.
    Please, note that result is always delivered being cut at the default value of parameter take. Top-level key count contains the total number of matches, so user might iterate through as needed, using parameters take and skip.

Example (opens new window)

curl https://data.orthodb.org/v12/search?query=p450&take=2&level=33208&singlecopy=0.8 -L -o search.dat

# /genesearch

  • Arguments:
    query - a text query pattern, with logic operators (see above), sought anywhere in gene data, including the gene identifier itself
    gid - NCBI gene identifier, e.g. 1
    ncbi - same as gid

  • Returns:
    A nested JSON object with gene-centric view over the best matching gene. The respond consists of a section with the gene info, the organism info, and followed by a list (if available) of orthologous genes in model organisms at several cascading levels of orthology. The top-level keys are:

    gene: {},
    organism: {},
    organism_xref: "",
    nb_genes_matched_the_query: integer,
    orthologs_in_model_organisms: [],
    genes_and_clusters_statistics: {}

  • Description:
    A pinpointed query pattern is expected. OrthoDB gene id 9606_0:0017fc, Uniprot id P12345 supplied via query parameter, or 1 supplied via gid parameter are good examples. Obviously, specifying NCBI gid 1 via query ends up with tons of matches. Precision while searching by an omnipresent term, e.g. gene symbol ABCD, might always be gained by additional, more focused criteria, like exact species name, e.g. "Homo sapiens". Only the most relevant gene (with supplemented info) is returned by default, however total number of matches might be controlled via key nb_genes_matched_the_query in the returned JSON object, hence user might iterate over few matches via take and skip parameters.

Example (opens new window)
Example (opens new window)
Example (opens new window)
Example (opens new window)

curl https://data.orthodb.org/v12/genesearch?query=9606_0:0017fc -L -o gene_data.json

# /blast

  • Arguments:
    seq - protein sequence pattern, at least 20 aa, without fasta-header

  • Returns:
    A nested JSON object containing gene-centric view on the best matching gene, same as delivered by /genesearch. It contains gene basic data supplemented with genes xrefs, etc.. and followed by a list (if available) of its orthologs in model organisms at several cascaded levels of orthology.

  • Description:
    This finds the best matching gene according the given sequence pattern using RapSearch2 algorithm. No provision is made to disambiguate search results, if several genes (from several organisms) match with equal probability, e.g. when a too short and omnipresent sequence pattern is given.

Example (opens new window)

curl https://data.orthodb.org/v12/blast?seq=MGDSHEDTSATVPEAVAEEVSLFSTTDIVLF -L -o blast.dat

# /group

  • Arguments:
    id - OrthoDB OG id

  • Returns:
    A nested JSON object with some info about the OG. The top level keys are:

    id: "",
    name: "",
    tax_id: integer,
    ECnumber: [],
    public_id: "",
    level_name: "",
    KEGGpathway: [],
    interpro_domains: [],
    phyletic_profile: {},
    evolutionary_rate: decimal,
    gene_architecture: {},
    biological_process: [],
    cellular_component: [],
    molecular_function: [],
    functional_category: [],
    unclassified GO terms: [] \

  • Description:
    This returns detailed annotation on the given OG, without listing its genes.
    Please use /orthologs to enlist genes in an OG. \

Example (opens new window)

curl https://data.orthodb.org/v12/group?id=4977at9604 -L -o group.dat

# /og_description

  • Arguments:
    id - OrthoDB OG id
    clade- an OrthoDB clade/level NCBI tax id

  • Returns:
    Tab-separated records with OG annotation, preceded by a header line. The columns are: \ cluster_id
    cog
    description
    molfunction_go
    bioprocess_go
    ec
    kegg
    interpro

  • Description:
    This retrieves detailed annotation information on the given OG, or all OGs in the given clade.
    If id is given, the respond contains single line describing the OG.
    If clade is given, the respond contains multiple lines describing all OGs at the clade/level.

Example (opens new window)
Example (opens new window)

curl https://data.orthodb.org/v12/og_description?clade=943 -L -o group.dat

# /orthologs

  • Arguments:
    id- either OrthoDB OG id or OrthoDB gene id
    species- an OrthoDB leaf taxon, or a CSV list of them, or an upper clade
    species2- another OrthoDB leaf taxon
    clade- an OrthoDB clade/level NCBI tax id

  • Returns:
    A nested JSON object.
    If id is given, the respond contains an array of entries, each containing a nested array of orthologous genes (including eventual multi-copies), along with the organism description. Each entry top level keys are:

    genes: [],
    organism: {},
    genes_and_clusters_statistics: {}
    If species and species2 are given, the respond is an array of nested arrays of orthologous genes.

  • Description:
    If id is an OG id, this returns all (orthologous) genes in a given OG, optionally filtered by species
    If id is a gene id, this returns all (orthologous) genes in all OGs at all levels of orthology, optionally filtered by species
    If no id, but species and species2 are given, this returns all (orthologous) genes between the two organisms at clade level of orthology, default to the LCA between two given taxons. Unlike to id - driven requests, in this mode species must be a single OrthoDB leaf taxon, i.e. organism id, e.g. 9606_0

Example (opens new window)
Example (opens new window)
Example (opens new window)

curl https://data.orthodb.org/v12/orthologs?id=4977at9604 -L -o orthologs.dat

# /ogdetails

  • Arguments:
    id - OrthoDB gene id

  • Returns:
    JSON object with detailed information on the given gene id, top-level keys are:

    aas: integer,
    upkws: [],
    xrefs: [],
    gene_id: {},
    uniprot: [],
    interpro: [],
    entrezgene: [],
    entrezprotein: [],
    genomic_coordinates: {}

  • Description:
    Retrieve detailed info on the given gene id. Please note, /ogdetails will be called /geneinfo in next releases.

Example (opens new window) Example (opens new window)

curl https://data.orthodb.org/v12/ogdetails?id=9606_0:0017fc -L -o ogdetails.dat

# /siblings

  • Arguments:
    id - OrthoDB cluster id
    take - max nr of returned siblings

  • Returns:
    a list of OrthoDB OG ids

  • Description:
    Retrieve all siblings to the given OG.

Example (opens new window)

curl https://data.orthodb.org/v12/siblings?id=4977at9604 -L -o siblings.dat

# /fasta

  • Arguments:
    id- either OrthoDB OG id or OrthoDB gene id
    species- taxonomic filter, an OrthoDB leaf taxon, or a CSV list of them, or an upper clade
    seqtype - either protein (default) or cds

  • Returns:
    AA or CDS sequences in fasta format, with header line consisting of the OrthoDB gene id, followed by a JSON-like string with pertinent info about the gene

  • Description:
    If id is a gene id, this returns fasta-formatted sequence of this gene, either protein or cds, as given by seqtype.
    If id is an OG id, this returns fasta-formatted sequences of all genes in the OG, either protein or cds, optionally filtered by species
    If no id, but species parameter is given, this returns fasta-formatted sequences, either protein or cds of all genes in this organism. Unlike id - driven requests, in this mode species must be a single OrthoDB leaf taxon, i.e. organism id, e.g. 9606_0

Example (opens new window)
Example (opens new window)
Example (opens new window)

curl https://data.orthodb.org/v12/fasta?species=100_0 -L -o data.fs

# /tab

  • Arguments:
    id - OrthoDB OG id
    species- taxonomic filter, an OrthoDB leaf taxon, or a CSV list of them, or an upper clade

  • Returns:
    Tab-separated records with gene annotation, preceded by a header line. The columns are:
    pub_og_id \ og_name
    level_taxid
    organism_taxid
    organism_name
    pub_gene_id
    description

  • Description:
    This returns a table with gene annotation for all genes in the OG, optionally filtered by species \

Example (opens new window)
Example (opens new window)
Example (opens new window)

curl https://data.orthodb.org/v12/tab?id=4977at9604 -L -o data.tsv

# RDF

This SPARQL 1.1 endpoint serves OrthoDB data. The OrthoDB release 11.0 consists of 6'568M RDF triples describing evolutionary and functional properties of 101'009'660 genes from 28'055 organisms clustered in 11'677'084 orthologous groups on 1003 taxonomic levels. The data might be explored using Umaka data browser (opens new window), as well as via WIDOCO-generated documentation (opens new window). Here (opens new window) is an intro for the RDF data model.

# Downloads

OrthoDB data are also available as Flat files for download from here (opens new window). This is recommended way (instead of data offloads via API), if user intends to process locally really large parts of the OrthoDB data.

WARNING

Use API (Application Programming Interface) to download data if the data set is not too large.

# FAQ

# How can I ..?

..will come soon..

# Contact

Email: support[at]orthodb.org Join the OrthoDB-News (opens new window) mailing list (low traffic).

# Funding

  • UNIGE
  • SIB
  • SNSF

# Previous versions

# Cite us

OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity D Kuznetsov, F Tegenfeldt, M Manni, M Seppey, M Berkeley, EV Kriventseva, EM Zdobnov, NAR, Nov 2022, doi:10.1093/nar/gkac996 (opens new window). PMID:36350662 (opens new window)

..more & stats (opens new window)

Go to OrthoDB >>