About GODomainMiner

Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO, for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach “CODAC” (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe “GODomainMiner” for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the three GO namespaces (MF, BP, CC) and the Pfam, SCOP, and CATH domain classifications. Overall, GODomainMiner gives up to a 17-fold average enrichment of GO-domain associations compared to the existing GO annotations in these domain classifications. These associations could potentially be used to annotate many of the protein chains in the PDB and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation.

People Involved