OrthoDB
user guide
# Terminology
TIP
# Orthologs
Orthologs are genes in different species that evolved from a common ancestral gene by speciation. If one or both of these genes were duplicated after the speciation they are all termed co-orthologs, or just orthologs.
TIP
# Orthologous group / level-of-orthology
If there are more than two species considered, there are more than one speciation event, and we refer as orthologs, or orthologous group, to all descendants of a particular single gene of the last common ancestor of these species. Thus our operational definition refers to a specific phylogeny radiation for a set of species, termed the level-of-orthology.
TIP
# Ortholog functions
It is a reasonable hypothesis that orthologs keep functions of their ancestor gene ("by tradition"), though there are examples of gene function gains and losses. The statement of gene orthology though refers to their evolutionary relation, not to the kept or altered functions.
TIP
# Paralogs
Paralogs are genes that evolved by duplication inside a genome. Notions of orthologs and paralogs are disjoint, e.g. paralogs can be co-orthologs if duplicated after the speciation or can be not if duplicated earlier.
# Standalone OrthoLoger software
OrthoDB standalone pipeline for delineation of orthologs, OrthoLoger, is freely available
here
(opens new window)
# Searching OrthoDB content
OrthoDB can be queried practically by anything shown on its pages. Search pattern might be a gene/protein name/symbol, accession identifier, annotation keyword, species/clade name, metabolite name, OrthoDB internal gene id, etc... We indexed various gene/protein identifiers from more than 100 databases, including UniProtKB, Ensembl, GenBank, RefSeq as well as organism/clade focused databases like NextProt, MGI, FlyBase, VectorBase, WormBase, ZFIN, etc.. Functional descriptors, like GeneOntology, InterPro, KEGG, etc. are also indexed in. BLAST-like search using amino-acid sequence pattern is also supported.
OrthoDB search might be performed on two levels of granularity, ether orthologous group (OG) or gene. The first yields a list of relevant OGs, while the second outputs single, the most relevant gene along with its description, followed by a list of orthologous gene/s from multiple organisms, sorted by evolutionary proximity.
TIP
To get a gene-centric view providing the available annotations and a list of pair-wise orthologs - switch from Text
to get Gene
on the left of the search input.
TIP
To query specifically for a numeric NCBI gene - switch from Text
to NCBI ID
on the left of the search input.
TIP
To query for EC numbers - use double quotes, e.g. "3.1.1.-".
# Text query format
The search algorithm in the plain text mode delivers OGs matching all keywords, in case-sensitive mode. For both single keywords and phrases users can take advantage of a Google-like autocomplete lookup, self-activating after the first three characters of each word entered. The autocomplete matches the characters case-sensitively anywhere in the word. This allows users to pin-point composite words, in addition to conventional left-anchored matches. For example, the search will return suggestions for various transferases [aminotransferase, methyltransferase, etc…] given ‘transferase’ as a prompt. The query can be more complex and supports logical operations to combine multiple keywords; for example ‘-’ or ‘!’ are interpreted as logical NOT that enables queries like [kinase !tyrosine]. To match a complete phrase one should use double quotation marks, e.g. [“Cytochrome P450”], as well as for querying EC numbers, e.g. [“3.1.1.-”]. Using the ‘Advanced’ panel one can filter the results for organismal taxonomy and/or the level of orthology by selecting the appropriate nodes on the species tree, and/or the member gene phyloprofile, e.g. present in >90% of the species. The search algorithm matches OGs containing genes in ‘at least’ the organisms selected on the tree, usually with many others. For even more precision, it is possible to negate a certain clade in the above-mentioned taxonomic selection by an additional taxonomic node name in the text search widget, e.g. text search pattern ‘kinase !Metazoa’ with Eukaryota level selected delivers very specific kinases not present in Metazoa and similar organisms. Searching for a particular gene, with term get Gene being selected, expects a pinpointing non-ambiguous pattern, usually a gene identifier, either OrthoDB or an external one, e.g. Uniprot accession number P12345. The most relevant gene is the output, though there is no provision made to disambiguate/present/mine search results if search pattern yields more than one gene. Precision while searching by an omnipresent term, e.g. gene symbol ABCD, might always be gained by one or several supplemental criteria, like exact species name, e.g. "Homo sapiens". Please note, that neither autocompletion nor OrthoDB taxonomic tree selection are (yet) working in this mode.
- Use double quotation marks to match a phrase, e.g.
"Cytochrome P450"
- Take advantage of the autocomplete lookup feature
- Logical operator NOT use - or !, e.g.
kinase !tyrosine
- Logical operator OR use |, e.g.
protease | peptidase
- Logical operator AND is implicit, e.g.
sodium transporter
actually meanssodium AND transporter
(if not quoted)
# Sequence Search
OrthoDB can be queried by homology to a protein sequence: switch from Text
to Sequence
on the left of the search input and paste the query protein sequence without a header line.
# Advanced options
# Phyloprofile
The result list of Orthologous Groups can be filtered for
- universality, i.e. having member genes in all species of the selected taxonomic node, or a fraction of them, e.g.
present in all
,in >90%
or>80%
of the species. - gene copy-number (duplicability), requiring them to have only single-copy orthologs in all species of the selected taxonomic node, or a fraction of them, e.g.
single-copy in all
,in >90%
or>80%
of the species.
You can combine any presence filter with any copy-number filter to refine your results, e.g. present in >90% AND single-copy in >80% of species.
# Select species
You can tailor your search by using the expandable species tree to select a radiation point or particular sets of species.
- Expand or collapse any node on the tree by clicking on the filled arrows or node names.
- Select all species at a node by clicking on the unfilled box next to the node name, or
- select specific species by clicking on the unfilled box next to the species name.
You may also add species to the list of selected species to display by typing the species name in the search box and selecting from the autocompleted options.
As you add or remove species from the expandable species tree, the
Species to display
box above it will automatically update to reflect your selections.
# Search at (level-of-orthology)
OrthoDB Orthologous Groups are hierarchical, being delineated at the major radiations along the species phylogeny. This enables to precise orthologs to a particular level-of-orthology: considering many distantly-related species delineates fewer, more general (inclusive) orthologous groups containing all the descendants of the ancestral gene, while examining only sets of more closely-related species produces many fine-grained orthologous groups of mostly one-to-one relations.
TIP
The level-of-orthology can be adjusted after species or clades of interest were selected (see Select species
)
# Species to display
By default only genes from model species will be shown in details for returned Orthologous Groups in the Orthologs by organism
section of results.
This can be changed instead to a set of Species to display
.
# Results
WARNING
Results of an OrthoDB query, according to the search mode Text/get Gene selected, are shown as either a list of relevant Orthologous Groups or a Gene-centric view for the most relevant gene,
Each detailed record of an Orthologous Group has following sections:
# Functional descriptions
OrthoDB provides tentative functional annotations of groups of orthologs and mapping to functional categories by summarizing functional gene annotations, extensively collected from other public resources. Annotation of genes is complicated and contains errors. Although in many cases OrthoDB makes such errors in the underlying data apparent, discordant annotations should be considered with caution.
# Evolutionary descriptions
The evolutionary annotations of the orthologs remain a distinguishing feature of OrthoDB.
# Phyletic Profile
is a summary of the ortholog presence (from universal to species-specific) and copy-numbers (single/multi-copy counts).
# Evolutionary Rate
is a measure if this Orthologous Group exhibit appreciably higher or lower levels of sequence divergence, derived from quantification of the relative divergence among their member genes. These are computed for each orthologous group as the average of inter-species identities normalized to the average identity of all inter-species best reciprocal hits, computed from pairwise alignments of protein sequences. The relative rate is indicated by the position of the black star along the scale of slow-blue to fast-red rates.
# Gene Architecture
shows median and standard deviation values of protein lengths and exon counts for each orthologous group, effectively describing a consensus gene architecture (for those genes with available data).
# Orthologs by organism
WARNING
This section can be very long. Use navigation arrows on the left to go to the beginning or the end of the record, or the cross to collapse the detailed view to the condensed view.
Condensed view for each gene includes gene/protein ID, UniProt ID, short description, number of amino acids (AAs), number of exons, and associated InterPro domains.
TIP
For the length (AAs) and exon counts (Exons) listed for each gene, the exclamation mark (!
) indicates differences from consensus (left: shorter, right: longer, !
: 1 stdev, !!
: 2 stdev).
# Double-arrow icon
expands the view, if clicked, to the available for a given gene annotations with links to source databases.
Available annotation of InterPro domains are displayed for each protein member ordered from the N to C terminus. Click on the grey magnifying glass icon to query OrthoDB for groups containing proteins with the same domains. To search for specific domain architectures, enter an ordered list of InterPro identifiers separated with only commas into the Text Search field.
TIP
# View protein fasta
retrieves the corresponding protein sequences in Fasta format. Group ID, gene, organism, and other useful details are contained in the header of each sequence. This information can be saved as a file by right-clicking on the link followed by "save link as...".
TIP
# View Tab Delimited
retrieves the corresponding ortholog information as tab delimited text. This information can be saved as a file by right-clicking on the link followed by "save link as...".
WARNING
Note that retrieving sequences/info in Fasta/Tab format is limited to a maximum of 30000 groups
# Sibling Groups
Related orthologous groups at the same level-of-orthology are defined according to their common InterPro domain annotations. The top 5 groups are listed with their percentage overlap in terms of common InterPro domains, and the complete list of related groups may be retrieved by clicking the Show all siblings link.
# Uploading and analyzing your own sequences
# Register
In order to be able to upload sequences for a custom analysis, you need to register:
- Click on the "Register" link on the top right part of the OrthoDB webpage.
- Enter your login detail in the form that will appear.
# Data upload
Upload a fasta file with the sequences to be analyzed
After logging in, you can upload your sequences using the "Own data mapping" link (next to "Help").
After clicking on "Own data mapping", click on the "Upload" button and select your fasta-formatted file. Be aware that the file should contain amino acid sequences only.
After uploading is finished you will have to enter a species name in the corresponding field.
Next step is to select where to map your sequence. This level of orthology can either be selected automatically or manually.
If Auto is selected, BUSCO (opens new window) is used to find an appropriate level.
In manual mode, select species by clicking on the Advanced button next to the search button and select clade from the tree.
Click on Advanced
again to return back.
Click on "Run analysis" to add your job to the mapping queue. When the job starts, the status should change from "CREATED" to something else depending on your setup.
When mapping is done and it passed without error, the status will again change to "DONE".
If there is some kind of error, the state will be "ERROR" or something more informative.
In particular when BUSCO
has been used to determine where to map, the error may be any of the following:
Message | Description |
---|---|
ERROR | server side error |
ERROR_BUSCO:io | server side error |
ERROR_BUSCO:bad_setup | server side error |
ERROR_BUSCO:no_result | busco failed |
POOR_BUSCO:AT<score> | busco successful but score is too low |
If the two last errors are generated, rerun the analysis using a manually selected orthology level. If it still fails (or any other error), contact orthodb support at support[at]orthodb.org
.
# Retrieve results
Download the results in a plain text file
Click on the "Download" button to get the mapping results. The name of this file contains all the mapping information:
- node_XXX: where "XXX" is the NCBI taxon ID of the mapping node.
- subnode_AAA_BBB_CCC_DDD_EEE: where AAA, BBB, CCC, DDD, and EEE are the NCBI taxon IDs for the selected species.
- taxid_YYY: where YYY is a temporary taxon ID for your species.
The mapping file contains 9 fields:
- Ortholog group name
- Gene name
- Ortholog type; for mapped sequences this field is a number >=10 and <20.
- Length of the matching region (in amino acids).
- Start coordinate of the match.
- End coordinate of the match.
- Score of the match.
- Normalized score of the match.
- E-value of the match.
# Comparative Charts
This OrthoDB online tool allows generation of a comparative overview of the gene content across selected genomes. The total gene counts and the fractions of orthologs among these species shows the level of relatedness among the genomes, highlighting the "universal" core of genes and the ones evolving under single-copy constraint [PMID:21148284] (opens new window).
You can select up to 20 species on the right panel to be included into the comparative genomics chart. The colors, patterns, etc can be customised from the "Configure chart" tab on the right panel. The fractions shown are hyperlinked to their corresponding Ortholog Groups from which the gene counts were made. The tailored chart can then be exported as a publication quality vector graphics.
Explore an example (opens new window)
# Bookmarking
Search results can be saved by simply bookmarking the result page or saving the URL text.
You can also drag & drop the bookmarklet link under Bookmark OrthoDB
at the right side under the search field to the browser toolbar for easy OrthoDB search next time with the same settings.
You can later just highlight a keyword somewhere on a web page and click on the saved bookmarklet to search OrthoDB for this keyword.
# API
The OrthoDB data can be programmatically accessed using
- a URL based interface.
- python API - OrthoDB-py (opens new window)
- R - OrthoDB-R (opens new window)
The documentation below is for the URL based interface. For the python and R interfaces, the user is referred to the links above. At the moment the R API is being prepared to be included in bioconductor (opens new window).
In our implementation this means that the data can be retrieved using the following:
# URL
https://data.orthodb.org/v12/CMD?ARG1="value"&ARG2="value&..."
where CMD is a command and all ARGx are arguments to that specific command. Below follows a description of the available commands with arguments.
WARNING
NOTE the request rate is limited to 1 request/second for the following URLs:
/blast
/tab
/fasta
If the rate is too high, some of the requests will fail with a 503 error.
# Data Formats
All data is returned in JSON format, except for /fasta, /tab and /og_description. JSON data is widely supported by many languages. An overview with many examples can be found here (opens new window).
The JSON returned is of the generic format:
{
"data" : JSON object containing nested objects/arrays with the data
"status" : "ok" or "error"
"message": eventually a message explaining the error
}
# The OrthoDB organism, gene and orthologous group (OG) identifier patterns :
# Organism id
Generic form is taxid_version
- taxid is the NCBI taxonomy id, extended with an organism-dedicated suffix
Example: 10090_0
# Gene id
Generic form is taxid_version:number
- taxid is the NCBI taxonomy id, extended with an organism-dedicated suffix
- number is a unique left-zero-padded hexadecimal integer value
Example: 10090_0:000d08
# OG id
Generic form CLIDatCLADE
- CLID is a numerical cluster id, as evaluated by the OrthoDB clustering algorithm
- CLADE is NCBI tax id of the clade/level of orthology
Example: 124at33208
# NOTE
prior to OrthoDB v10 the OG ids were of the form FFFVVCCCCII, where
- FFF either EOG (eukaryota) or POG (prokaryota)
- VV OrthoDB version (09 for both v9 and v9.1)
- CCCC unique identifier for each clade
- II unique cluster identifier within the clade clade
Example: EOG091G06KN
Please, also note there is no any guaranteed inheritance of gene or OG identifiers between past and next releases, e.g. some gene or OG ids might appear in next releases with totally unrelated content.
# Using the API
Interacting with OrthoDB API can be done using either of :
- a web browser
- a GUI utility, e.g. FileZilla
- a command line utility, like curl
- a programming language subsystem/module, e.g RCurl in R or requests in Python
Note that currently the command line utility wget is not fully supported.
TIP
Linux: curl is installed by default
Windows: curl (opens new window)
Mac: curl is usually installed natively, otherwise look here (opens new window)
Example to download sequences of a given cluster into local file data.fs :
curl https://data.orthodb.org/v12/fasta?id=32204at9721&species=9721 -L -o data.fs
Note the difference in options for specifying output file.
# API Commands
# /orthodb_release_id
Arguments:
NONE
Returns: Double-quoted string with OrthoDB data/API version.
Description: This retrieves the OrthoDB data/API version.
curl https://data.orthodb.org/v12/orthodb_release_id -L -o odb_version.dat
# /tree
Arguments:
NONE
Returns: A JSON object containing the entire OrthoDB taxonomic tree
Description: This retrieves the entire OrthoDB taxonomic tree, i.e. a hierarchy of clades and leaf taxons (OrthoDB organisms).
curl https://data.orthodb.org/v12/tree -L -o tree.dat
# /species
Arguments:
clade
- an OrthoDB clade/level, e.g. 99
level
- same asclade
Returns: A JSON object containing an array with OrthoDB leaf taxons, i.e. organisms.
Description: This retrieves a flat, non-hierarchical array of all leaf taxons (OrthoDB organisms). The results might optionally be filtered by parameter
clade
to include only the underneath leaf taxons.
curl https://data.orthodb.org/v12/species?clade=99 -L -o taxons.dat
# /search
- Arguments:
query
- a text query pattern, with logic operators (see above), sought anywhere in OG data, including the OG identifier itself
gid
- NCBI gene identifier, e.g. 1 of a gene required to be a part of each matched OG
ncbi
- same asgid
species
- taxonomic filter, the NCBI tax id of a clade (expanded to all leaf taxons) or a CSV list of leaf taxons (i.e. OrthoDB organism ids), all required to be a part of each matched OG
level
- the NCBI tax id of a clade (taxonomic level) of orthology, at which each matched OG is built
universal
- phyloprofile filter present in 1.0, 0.9, 0.8 specifying gene universality as a fraction of all species in the clade where matched OG is built
singlecopy
- phyloprofile filter singlecopy in 1.0, 0.9, 0.8 specifying if gene is present as a single copy, same logic as above \
skip
- number of hits to skip to the next chunk, if the result is paginated
take
- maximum number of OG ids on a page, default to 100, maximum 10000
counts_only
- return only count of matches
Returns:
a JSON object with an array of OG ids (keydata
), as well as another array of same OGs, with some info (keybigdata
)Description:
This retrieves all OG matching a given query pattern, being it a text pattern passed via parameterquery
, with eventual additional phyloprofile or taxonomic filtering criteria.
At least 1 of the 3 parameters (query
,species
orlevel
) must be given. A combination of thereof obviously pinpoints the result. Request with onlyspecies
parameter returns all OGs where orthologous genes of all explicitly listed or implicitly evaluated (after expansion of given clade to leaf taxons) organisms take part of. Request with onlylevel
parameter returns all OGs built at this level (clade) of orthology.
Please, note that result is always delivered being cut at the default value of parametertake
. Top-level keycount
contains the total number of matches, so user might iterate through as needed, using parameterstake
andskip
.
curl https://data.orthodb.org/v12/search?query=p450&take=2&level=33208&singlecopy=0.8 -L -o search.dat
# /genesearch
Arguments:
query
- a text query pattern, with logic operators (see above), sought anywhere in gene data, including the gene identifier itself
gid
- NCBI gene identifier, e.g. 1
ncbi
- same asgid
Returns:
A nested JSON object with gene-centric view over the best matching gene. The respond consists of a section with the gene info, the organism info, and followed by a list (if available) of orthologous genes in model organisms at several cascading levels of orthology. The top-level keys are:
gene: {},
organism: {},
organism_xref: "",
nb_genes_matched_the_query: integer,
orthologs_in_model_organisms: [],
genes_and_clusters_statistics: {}Description:
A pinpointed query pattern is expected. OrthoDB gene id 9606_0:0017fc, Uniprot id P12345 supplied viaquery
parameter, or 1 supplied viagid
parameter are good examples. Obviously, specifying NCBI gid 1 viaquery
ends up with tons of matches. Precision while searching by an omnipresent term, e.g. gene symbol ABCD, might always be gained by additional, more focused criteria, like exact species name, e.g. "Homo sapiens". Only the most relevant gene (with supplemented info) is returned by default, however total number of matches might be controlled via keynb_genes_matched_the_query
in the returned JSON object, hence user might iterate over few matches viatake
andskip
parameters.
Example (opens new window)
Example (opens new window)
Example (opens new window)
Example (opens new window)
curl https://data.orthodb.org/v12/genesearch?query=9606_0:0017fc -L -o gene_data.json
# /blast
Arguments:
seq
- protein sequence pattern, at least 20 aa, without fasta-headerReturns:
A nested JSON object containing gene-centric view on the best matching gene, same as delivered by/genesearch
. It contains gene basic data supplemented with genes xrefs, etc.. and followed by a list (if available) of its orthologs in model organisms at several cascaded levels of orthology.Description:
This finds the best matching gene according the given sequence pattern using RapSearch2 algorithm. No provision is made to disambiguate search results, if several genes (from several organisms) match with equal probability, e.g. when a too short and omnipresent sequence pattern is given.
curl https://data.orthodb.org/v12/blast?seq=MGDSHEDTSATVPEAVAEEVSLFSTTDIVLF -L -o blast.dat
# /group
Arguments:
id
- OrthoDB OG idReturns:
A nested JSON object with some info about the OG. The top level keys are:
id: "",
name: "",
tax_id: integer,
ECnumber: [],
public_id: "",
level_name: "",
KEGGpathway: [],
interpro_domains: [],
phyletic_profile: {},
evolutionary_rate: decimal,
gene_architecture: {},
biological_process: [],
cellular_component: [],
molecular_function: [],
functional_category: [],
unclassified GO terms: [] \Description:
This returns detailed annotation on the given OG, without listing its genes.
Please use/orthologs
to enlist genes in an OG. \
curl https://data.orthodb.org/v12/group?id=4977at9604 -L -o group.dat
# /og_description
Arguments:
id
- OrthoDB OG id
clade
- an OrthoDB clade/level NCBI tax idReturns:
Tab-separated records with OG annotation, preceded by a header line. The columns are: \ cluster_id
cog
description
molfunction_go
bioprocess_go
ec
kegg
interproDescription:
This retrieves detailed annotation information on the given OG, or all OGs in the given clade.
Ifid
is given, the respond contains single line describing the OG.
Ifclade
is given, the respond contains multiple lines describing all OGs at the clade/level.
Example (opens new window)
Example (opens new window)
curl https://data.orthodb.org/v12/og_description?clade=943 -L -o group.dat
# /orthologs
Arguments:
id
- either OrthoDB OG id or OrthoDB gene id
species
- an OrthoDB leaf taxon, or a CSV list of them, or an upper clade
species2
- another OrthoDB leaf taxon
clade
- an OrthoDB clade/level NCBI tax idReturns:
A nested JSON object.
Ifid
is given, the respond contains an array of entries, each containing a nested array of orthologous genes (including eventual multi-copies), along with the organism description. Each entry top level keys are:
genes: [],
organism: {},
genes_and_clusters_statistics: {}
Ifspecies
andspecies2
are given, the respond is an array of nested arrays of orthologous genes.Description:
Ifid
is an OG id, this returns all (orthologous) genes in a given OG, optionally filtered byspecies
Ifid
is a gene id, this returns all (orthologous) genes in all OGs at all levels of orthology, optionally filtered byspecies
If noid
, butspecies
andspecies2
are given, this returns all (orthologous) genes between the two organisms atclade
level of orthology, default to the LCA between two given taxons. Unlike toid
- driven requests, in this modespecies
must be a single OrthoDB leaf taxon, i.e. organism id, e.g. 9606_0
Example (opens new window)
Example (opens new window)
Example (opens new window)
curl https://data.orthodb.org/v12/orthologs?id=4977at9604 -L -o orthologs.dat
# /ogdetails
Arguments:
id
- OrthoDB gene idReturns:
JSON object with detailed information on the given gene id, top-level keys are:
aas: integer,
upkws: [],
xrefs: [],
gene_id: {},
uniprot: [],
interpro: [],
entrezgene: [],
entrezprotein: [],
genomic_coordinates: {}Description:
Retrieve detailed info on the given gene id. Please note,/ogdetails
will be called/geneinfo
in next releases.
Example (opens new window) Example (opens new window)
curl https://data.orthodb.org/v12/ogdetails?id=9606_0:0017fc -L -o ogdetails.dat
# /siblings
Arguments:
id
- OrthoDB cluster id
take
- max nr of returned siblingsReturns:
a list of OrthoDB OG idsDescription:
Retrieve all siblings to the given OG.
curl https://data.orthodb.org/v12/siblings?id=4977at9604 -L -o siblings.dat
# /fasta
Arguments:
id
- either OrthoDB OG id or OrthoDB gene id
species
- taxonomic filter, an OrthoDB leaf taxon, or a CSV list of them, or an upper clade
seqtype
- either protein (default) or cdsReturns:
AA or CDS sequences in fasta format, with header line consisting of the OrthoDB gene id, followed by a JSON-like string with pertinent info about the geneDescription:
Ifid
is a gene id, this returns fasta-formatted sequence of this gene, either protein or cds, as given byseqtype
.
Ifid
is an OG id, this returns fasta-formatted sequences of all genes in the OG, either protein or cds, optionally filtered byspecies
If noid
, butspecies
parameter is given, this returns fasta-formatted sequences, either protein or cds of all genes in this organism. Unlikeid
- driven requests, in this modespecies
must be a single OrthoDB leaf taxon, i.e. organism id, e.g. 9606_0
Example (opens new window)
Example (opens new window)
Example (opens new window)
curl https://data.orthodb.org/v12/fasta?species=100_0 -L -o data.fs
# /tab
Arguments:
id
- OrthoDB OG id
species
- taxonomic filter, an OrthoDB leaf taxon, or a CSV list of them, or an upper cladeReturns:
Tab-separated records with gene annotation, preceded by a header line. The columns are:
pub_og_id \ og_name
level_taxid
organism_taxid
organism_name
pub_gene_id
descriptionDescription:
This returns a table with gene annotation for all genes in the OG, optionally filtered byspecies
\
Example (opens new window)
Example (opens new window)
Example (opens new window)
curl https://data.orthodb.org/v12/tab?id=4977at9604 -L -o data.tsv
# RDF
This SPARQL 1.1 endpoint serves OrthoDB data. The OrthoDB release 11.0 consists of 6'568M RDF triples describing evolutionary and functional properties of 101'009'660 genes from 28'055 organisms clustered in 11'677'084 orthologous groups on 1003 taxonomic levels. The data might be explored using Umaka data browser (opens new window), as well as via WIDOCO-generated documentation (opens new window). Here (opens new window) is an intro for the RDF data model.
# Downloads
OrthoDB data are also available as Flat files for download from here (opens new window). This is recommended way (instead of data offloads via API), if user intends to process locally really large parts of the OrthoDB data.
WARNING
Use API (Application Programming Interface) to download data if the data set is not too large.
# FAQ
# How can I ..?
..will come soon..
# Contact
Email: support[at]orthodb.org
Join the OrthoDB-News (opens new window) mailing list (low traffic).
# Funding
- UNIGE
- SIB
- SNSF
# Previous versions
# Cite us
OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity D Kuznetsov, F Tegenfeldt, M Manni, M Seppey, M Berkeley, EV Kriventseva, EM Zdobnov, NAR, Nov 2022, doi:10.1093/nar/gkac996 (opens new window). PMID:36350662 (opens new window)
..more & stats (opens new window)
Go to OrthoDB >>