Jump to main content
US EPA
United States Environmental Protection Agency
Search
Search
Main menu
Environmental Topics
Laws & Regulations
About EPA
Health & Environmental Research Online (HERO)
Contact Us
Print
Feedback
Export to File
Search:
This record has one attached file:
Add More Files
Attach File(s):
Display Name for File*:
Save
Citation
Tags
HERO ID
3859070
Reference Type
Journal Article
Title
Doubly nonparametric sparse nonnegative matrix factorization based on dependent Indian buffet processes
Author(s)
Xuan, J; Lu, J; Zhang, G; Xu, RYD; Luo, X
Year
2017
Volume
29
Issue
5
Page Numbers
1835-1849
Language
English
PMID
28422690
DOI
10.1109/TNNLS.2017.2676817
Web of Science Id
WOS:000430729100035
URL
https://www.proquest.com/scholarly-journals/doubly-nonparametric-sparse-nonnegative-matrix/docview/2029143346/se-2?accountid=171501
Exit
Abstract
Sparse nonnegative matrix factorization (SNMF) aims to factorize a data matrix into two optimized nonnegative sparse factor matrices, which could benefit many tasks, such as document-word co-clustering. However, the traditional SNMF typically assumes the number of latent factors (i.e., dimensionality of the factor matrices) to be fixed. This assumption makes it inflexible in practice. In this paper, we propose a doubly sparse nonparametric NMF framework to mitigate this issue by using dependent Indian buffet processes (dIBP). We apply a correlation function for the generation of two stick weights associated with each column pair of factor matrices while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two factor matrices will be columnwise correlated. Under this framework, two classes of correlation function are proposed: 1) using bivariate Beta distribution and 2) using Copula function. Compared with the single IBP-based NMF, this paper jointly makes two factor matrices nonparametric and sparse, which could be applied to broader scenarios, such as co-clustering. This paper is seen to be much more flexible than Gaussian process-based and hierarchial Beta process-based dIBPs in terms of allowing the two corresponding binary matrix columns to have greater variations in their nonzero entries. Our experiments on synthetic data show the merits of this paper compared with the state-of-the-art models in respect of factorization efficiency, sparsity, and flexibility. Experiments on real-world data sets demonstrate the efficiency of this paper in document-word co-clustering tasks.
Keywords
Co-clustering; nonnegative matrix factorization; probability graphical model; text mining
Tags
IRIS
•
Diisobutyl Phthalate (DIBP) Final
Database Searches
July 2017 Update
PubMed
New for this search
No Primary Data on Toxic Effects
Not chemical specific
Home
Learn about HERO
Using HERO
Search HERO
Projects in HERO
Risk Assessment
Transparency & Integrity