ICON Web & News
Search Using OECD Database
Return to Previous Page
Addition or Correction
Analyzing High Dimensional Toxicogenomic Data Using Consensus Clustering
Link to Journal Abstract
Rapid development of high-throughput toxicogenomics technologies has created new approaches to screen environmental samples for mechanistic toxicity assessment. However, challenges remain in the analysis, especially clustering of the resulting high-dimensional data. Because of the lack of commonly accepted validation methods, it is difficult to compare clustering results between studies or to identify the key experimental or data features that impact the clustering results. We applied consensus clustering (CC), an approach that clusters the input data repeatedly through iterative resampling, and identifies frequently occurring high-confidence clusters. We used CC to analyze a set of high dimensional transcriptomics data with temporal resolution, which were generated using our E. coli whole-cell array system for a diverse variety of toxicants at different dose concentrations. The CC analysis allowed us to evaluate the clustering results' robustness and sensitivity against a number of conditions that represent the common variations in high-throughput experiments, including noisy data, subsets of treatments, subsets of reporter genes, and subsets of time points. We demonstrated the value of utilizing rich time-series data and underscored the importance of careful selection of sampling times for a given experimental system. The results also indicated that temporal data compression using our proposed Transcriptional Effect Level Index (TELI) concept followed by CC largely conserved the cluster resolution. We also found that for our cellular stress response ensemble-based high-throughput transcriptomics assay platform, the size and composition of the reporter gene set are critical factors that affect the resulting coherency of clusters. Taken together, these results demonstrated that more robust consensus clustering such as CC may be valuable in analyzing high-dimensional toxicogenomic data sets.
In this study, the authors applied consensus clustering (CC), an approach that clusters the input data repeatedly through iterative resampling, and identifies frequently occurring high-confidence clusters. They used CC to analyze a set of high dimensional transcriptomics data with temporal resolution, which were generated using an E. coli whole-cell array system for a diverse variety of toxicants at different dose concentrations.
Peer Reviewed Journal Article
Exposure Or Hazard Target
Method Of Study
Computational and System Modeling
Risk Exposure Group
Environ. Sci. Technol., 2012, 46(15): 8413-8421
Environmental Science & Technology
Gao C, Weisman D, Gou N, Ilyin V, Gu AZ
Last updated on September 26, 2012
This work is supported in part by the Nanoscale Science and Engineering Initiative of the National Science Foundation
under NSF Award Number EEC-0118007.
Why Join Us?
Mission and Strategy
Good Nano Guide
Nano EHS Research Needs
Current Practices Survey