Introduction to BioCyc and Pathway Tools
BioCyc is a database collection and website that couples rich and high-quality data with extensive bioinformatics tools.The Pathway Tools software -- the software that underlies BioCyc -- can be installed at your site to create BioCyc-like databases for genomes of interest. The software will compute metabolic reconstructions for sequenced genomes, and can be used to create quantitative metabolic models.
BioCyc Benefits: Better Science, Faster
BioCyc accelerates your science:- Curators author mini-reviews for genes
[example]
and pathways
[example]
from the literature, which save scientists time in searching, reading, synthesizing, and reconciling errors and disagreements in the literature
- BioCyc integrates many types of data to form a one-stop shop, placing genes in
their pathway and regulatory context and including protein features, GO terms, and gene essentiality
- BioCyc visualization tools speed information uptake and produce publication-quality figures for pathways, genome comparisons, and more
- Accurate and high-quality information curated from 146,000 publications enables:
- Better hypothesis generation
- More complete experimental interpretation
- Higher quality publications and grant proposals
- BioCyc's tools enable unique analyses:
- Pathway-based analysis of metabolomics data
- Visualize transcriptomics data on zoomable metabolic charts
- Metabolic route search
BioCyc Database Collection
The BioCyc collection of Pathway/Genome Databases (PGDBs) provides a reference on the genomes, metabolic pathways, and (in some cases) regulatory networks of thousands of sequenced organisms. Each database combines information from several sources:
- Computational inferences: Our Pathway Tools
software predicts the metabolic pathways of an
organism, predicts which genes code for missing enzymes in metabolic
pathways, predicts protein complexes, and predicts operons. We compute orthologs across BioCyc databases.
- Imported data: BioCyc integrates information
from other bioinformatics databases, such as protein feature and Gene
Ontology information from UniProt, gene-essentiality datasets from OGEE,
and regulatory information from RegTransBase.
- Manual curation:
The curated databases (called
Tier 1 and Tier 2 PGDBs)
have received literature-based curation to enter new gene functions,
pathways, protein complexes, regulation, and more. Curated PGDB
entries include mini-review summaries, thousands of literature
citations, and evidence codes.
- The EcoCyc DB is the result of more than 20 person-years of effort to enter
information from 43,000+ E. coli articles about gene function, metabolism, transport, and regulatory processes.
- The MetaCyc DB describes metabolic pathways, enzymes,and metabolites from all domains of life, curated from 76,000+ publications.
- The EcoCyc DB is the result of more than 20 person-years of effort to enter
information from 43,000+ E. coli articles about gene function, metabolism, transport, and regulatory processes.
The EcoCyc database is freely available to all users because its curation is supported by NIH funding. Also free is the database for the cyanobacterium Arthrospira platensis NIES-39 as an example of a Tier 3 database. The other BioCyc databases are available via subscription, which supports their curation. To obtain free access to the other BioCyc databases for teaching purposes, please click here.
BioCyc data files may be downloaded to your site, and BioCyc data can be queried via web services.
BioCyc.org Bioinformatics Tools
BioCyc.org provides a suite of bioinformatics tools (see Tools menu) for accessing and analyzing the BioCyc databases. The tools provide search and visualization, omics data analysis, and comparative genomics and comparative pathway analysis:
- Search:
Multiple search tools enable users to find genes, pathways, and metabolites of
interest, which are presented in corresponding information pages. Most searches apply to the currently selected organism database,
which can be changed with the "Change Current Database" button at the top of most pages. There are two ways to search across multiple databases:
(1) Use Tools → Search → Cross Organism Search or (2) In commands such as Tools → Search → Search Genes, Proteins, and RNAs,
select "Search across multiple organisms/databases" under the list of buttons.
- Visualization:
A variety of visualization tools are provided, such as metabolic-pathway diagrams, and zoomable diagrams depicting the complete metabolic chart of each organism
[example].
- Genome Browser: The BioCyc genome browser
[example]
enables visual genome exploration and analysis of positional genome datasets via tracks.
- Omics Data Analysis: Tools include statistical over-representation analysis;
and visualization of gene expression, proteomics, or
metabolomics data on metabolic-chart diagrams
[example] and on the Omics Dashboard
[example].
- SmartTables: Provide biologist-friendly analysis capabilities for groups of genes or metabolites that are stored in your BioCyc account.
- Metabolic Route Search: Search for reaction paths connecting specified
metabolites in the metabolic network, with the option of adding new reactions
from the MetaCyc DB.
- Comparative Analysis: Tools include comparison of pathways, metabolites, transporters,
and regulatory networks -- see menu command Analysis → Comparative Analysis and the new Comparative Genome Dashboard at Analysis → Comparative Genome Dashboard.
- Sequence Analysis: Extract sequences, perform BLAST searches, sequence pattern searches, and perform multiple alignments.
Pathway Tools Software
Pathway Tools is an enterprise genome and pathway data management tool and is among the most extensive bioinformatics software packages. It is the software used to create BioCyc databases and it powers the BioCyc.org website and several additional websites. Its capabilities are described in detail here. Please click here to see Pathway Tools testimonials.Pathway Tools can run as both a desktop application and as a web server.
Installing Pathway Tools at your site brings these advantages:
- Install a private local set of BioCyc PGDBs on your intranet
- Create new PGDBs from your own genome data, generating metabolic reconstructions, operon inferences, and more.
- Apply its extensive search, visualization, and analysis tools to your own genome data.
- Edit PGDBs interactively to add new gene functions and pathways
- Build quantitative metabolic flux models using Flux-Balance Analysis with the MetaFlux tool
How to Learn More About BioCyc
The following additional information exists about the BioCyc site:
- Guided Tour: Presents the different information types present in BioCyc
- Webinars: Online videos describing how to use BioCyc
- Website User's Guide: Instructions for using the BioCyc Website
- BioCyc User Guide: Information about the data content of BioCyc DBs
- PGDB Concepts Guide: The ideas behind BioCyc
- Publications: Articles about BioCyc databases and the Pathway Tools software
Definitions of Terminology on the BioCyc Website
Here we define a few key terms. See the glossary for more definitions.Pathway/Genome Database (PGDB). A database that describes
- The genome of an organism -- its chromosome(s), genes, and genome sequence
- The product of each gene
- The metabolic network of the organism -- its pathways, reactions, enzymes, and metabolites
- The transporter complement of the organism
- The regulatory network of the organism, including its operons, transcription factors, and the interactions between transcription factors and their small-molecule ligands and DNA binding sites
Tier 1 PGDB. PGDBs in Tier 1, such as EcoCyc, MetaCyc, and HumanCyc, have received at least one year of literature-based curation by scientists. More information about curation practices is available in the Curator Guide.
Tier 2 PGDB. PGDBs in Tier 2 were generated by the PathoLogic program, which predicted their metabolic pathways; their operons (for bacteria only); protein complexes; and some missing enzymes in their predicted pathways (pathway hole fillers). The resulting PGDBs underwent manual review by a person to remove false-positive pathway predictions that they could detect, and to perform refinements such as defining protein complexes. The resulting PGDBs also underwent a period of literature-based curation, such as to enter metabolic pathways that had been experimentally elucidated in the organism but that were not inferred by PathoLogic. [list of Tier 2 PGDBs]
Tier 3 PGDB. PGDBs in Tier 3 were generated by PathoLogic, which predicted metabolic pathways, operons (for bacteria only), pathway hole fillers, and transport reactions. The resulting PGDBs did not undergo manual review of the pathway predictions, nor subsequent literature curation. Therefore, the pathway predictions should be treated with due caution. [list of Tier 3 PGDBs]
Pathway Tools Software. Pathway Tools is used to construct, update, visualize, query, and analyze PGDBs, such as the BioCyc collection. It is freely available to academics interested in creating PGDBs for organisms of interest to them. Components of Pathway Tools are:
- The Pathway/Genome Navigator supports querying, visualization, and analysis of PGDBs
- The Pathway/Genome Editors support interactive updating and refinement of PGDBs
- PathoLogic performs computational inferences such as pathway prediction
- MetaFlux enables creation of quantitative metabolic models from PGDBs
BioCyc: The collection of PGDBs at URL https://BioCyc.org/ is called the BioCyc Database Collection. EcoCyc and MetaCyc are component databases within the BioCyc collection.