Building Quantitative Prediction Models for Tissue Residue of Two Explosives Compounds in Earthworms from Microarray Gene Expression Data

Gong, P; Loh, P; Barker, ND; Tucker, G; Wang, Nan; Zhang, C; Escalon, BL; Berger, B; Perkins, EJ

HERO ID

1506871

Reference Type

Journal Article

Year

2012

Language

English

PMID

21776976

HERO ID 1506871
In Press No
Year 2012
Title Building Quantitative Prediction Models for Tissue Residue of Two Explosives Compounds in Earthworms from Microarray Gene Expression Data
Authors Gong, P; Loh, P; Barker, ND; Tucker, G; Wang, Nan; Zhang, C; Escalon, BL; Berger, B; Perkins, EJ
Journal Environmental Science & Technology
Volume 46
Issue 1
Page Numbers 19-26
Abstract Soil contamination near munitions plants and testing grounds is a serious environmental concern that can result in the formation of tissue chemical residue in exposed animals. Quantitative prediction of tissue residue still represents a challenging task despite long-term interest and pursuit, as tissue residue formation is the result of many dynamic processes including uptake, transformation, and assimilation. The availability of high-dimensional microarray gene expression data presents a new opportunity for computational predictive modeling of tissue residue from changes in expression profile. Here we analyzed a 240-sample data set with measurements of transcriptomic-wide gene expression and tissue residue of two chemicals, 2,4,6-trinitrotoluene (TNT) and 1,3,5-trinitro-1,3,5-triazacyclohexane (RDX), in the earthworm Eisenia fetida. We applied two different computational approaches, LASSO (Least Absolute Shrinkage and Selection Operator) and RF (Random Forest), to identify predictor genes and built predictive models. Each approach was tested alone and in combination with a prior variable selection procedure that involved the Wilcoxon rank-sum test and HOPACH (Hierarchical Ordered Partitioning And Collapsing Hybrid). Model evaluation results suggest that LASSO was the best performer of minimum complexity on the TNT data set, whereas the combined Wilcoxon-HOPACH-RF approach achieved the highest prediction accuracy on the RDX data set. Our models separately identified two small sets of ca. 30 predictor genes for RDX and TNT. We have demonstrated that both LASSO and RF are powerful tools for quantitative prediction of tissue residue. They also leave more unknown than explained, however, allowing room for improvement with other computational methods and extension to mixture contamination scenarios.
Doi 10.1021/es201187u
Pmid 21776976
Wosid WOS:000298762900004
Is Certified Translation No
Dupe Override No
Comments Source: Web of Science WOS:000298762900004Scopus URL: https://www.scopus.com/inward/record.uri?eid=2-s2.0-84855324436&doi=10.1021%2fes201187u&partnerID=40&md5=9efc6cc21aa7277e5837d077c85f8563
Is Public Yes
Language Text English