Scalable Probabilistic Tensor Factorization for Binary and Count Data

Scalable Probabilistic Tensor Factorization for Binary and Count Data

7 Pages · 2015 · 281 KB · English

count-valued tensors, we present a probabilistic tensor factor- .. algorithm for exponential family tensor factorization, using. Laplace and Gaussian .. Journal of the Royal Statistical Society: Series B (Statistical. Methodology) 

Scalable Probabilistic Tensor Factorization for Binary and Count Data free download

Scalable Probabilistic Tensor Factorization for Binary and Count Data Piyush Rai  , Changwei Hu  , Matthew Harding y , Lawrence Carin   Department of Electrical & Computer Engineering, Duke University y Sanford School of Public Policy & Department of Economics, Duke University Durham, NC 27708, USA f piyushrai,ch237,matthewharding,lcarin [email protected] Abstract Tensor factorization methods provide a useful way to extract latent factors from complex multirela tional data, and also for predicting missing data Developing tensor factorization methods for mas sive tensors, especially when the data are binary or countvalued (which is true of most realworld ten sors), however, remains a challenge We develop a scalable probabilistic tensor factorization frame work that enables us to perform efcient factoriza tion of massive binary and count tensor data The framework is based on ( i) the P ´ olyaGamma aug mentation strategy which makes the model fully lo cally conjugate and allows closedform parameter updates when data are binary or countvalued; and ( ii ) an efcient onlineExpectation Maximization algorithm, which allows processing data in small minibatches, and facilitates handling massive ten sor data Moreover, various types of constraints on the factor matrices (eg, sparsity, nonnegativity) can be incorporated under the proposed framework, providing good interpretability, which can be useful for qualitative analyses of the results We apply the proposed framework on analyzing several binary and countvalued realworld data sets 1 Introduction Tensor factorization methods [ Kolda and Bader, 2009 ] of fer a useful way to learn latent factors from complex mul tiway data These methods decompose the original ten sor data into a set of factor matrices (one for each mode or “way” of the tensor), which can be used as a latent feature representation for the objects in each of the tensor mode, and can be used for other tasks, such as tensor com pletion Among tensor factorization methods, probabilistic approaches [ Chu and Ghahramani, 2009; Xu et al, 2013; Rai et al , 2014 ] are especially appealing because of a proper generative model of the data, which allows modeling different data types and handling missing data in a natural way Realworld tensor data are often binary or count valued [ Nickel et al, 2011; Chi and Kolda, 2012 ] For ex ample, a multirelational social network [ Nickel et al, 2011 ] can be described as a threeway binary tensor with two modes denoting people and the third mode denoting the types of re lationships Likewise, from a database of research publica tions, one may construct a threeway ( AU T H O R SW O R D S  V E N U E S ) countvalued tensor, where the three dimensions could be authors, words, and publication venues and each en try of the tensor denotes the number of time an author used a specic word at a specic venue Tensor factorization on this multiway data can be used for topic modelingon such a pub lications corpus (the latent factors would correspond to top ics) Another application could be in recommender systems; having learned the latent factors of authors and venues, one can use these factors for authorauthor recommendation (for potential coauthors) or authorvenue recommendation (rec ommending the most appropriate venues for a given author) Although several tensor factorization methods have been proposed in the recent years [ Kolda and Bader, 2009; Chu and Ghahramani, 2009 ] and there has been a signicant re cent interest on developing scalable tensor factorization meth ods [ Kang et al, 2012; Inah et al, 2015; Papalexakis et al , 2012; Beutel et al, 2014 ] , most of these methods treat data as realvalued, and are therefore inappropriate for han dling binary and count data; also see Related Work (Sec tion 7) Motivated by the prevalence of such binary and countvalued tensors, we present a probabilistic tensor factor ization framework which can handle binary and countvalued tensors, while being scalable for massive tensor data Our starting point will be a conjugate, fully Bayesian model for both binary and count data, for which we develop an efcient Gibbs sampler The framework is based on the P ´ olyaGamma data

------------- Read More -------------

Download scalable-probabilistic-tensor-factorization-for-binary-and-count-data.pdf

Scalable Probabilistic Tensor Factorization for Binary and Count Data related documents

Acoustic black holes: horizons, ergospheres, and Hawking radiation

34 Pages · 2008 · 324 KB · English

though the underlying fluid dynamics is Newtonian, non-relativistic, and takes place . superfluids. This point has been emphasised by Comer [25], who has also pointed out that We finally obtain, up to an overall sign, the wave. 6 

AMSOIL Material Safety Data Sheet

4 Pages · 2009 · 207 KB · English

Oil collection services are available for used oil recycling or disposal. Place contaminated materials in containers and dispose of in a manner consistent with


8 Pages · 2009 · 45 KB · English

applied to Baby Doe situations,”6 and new regulations were passed by the legislature. The courts subsequently invalidated these actions, however, because

The Society for Biblical Studies

3 Pages · 2013 · 27 KB · English

St. Luke’s United Methodist Church Highlands Ranch, CO The Rev. Dr. Janet Forbes The Holy Land, The Holy People 11-Day Contemporary Studies Program including Israel

Fatigue and Overloading Behavior of Steel–Concrete Composite Flexural Members Strengthened ...

11 Pages · 2007 · 1.27 MB · English

the fundamental behavior of steel–concrete composite scaled bridge beams strengthened with new high modulus carbon fiber-reinforced polymer (HM CFRP) materials. The behavior of the beams under overloading conditions and fatigue loading conditions was studied as well as the possible presence 

Soil contamination and remediation Bioremediation - FSv-KHMKI

27 Pages · 2007 · 623 KB · English

Bioremediation Utilization of microorganisms to destroy or immobilize the contamination Dominantly these organisms are used: • Bacteria (aerobic and anaerobic)

wioa state plan for the state of minnesota

532 Pages · 2016 · 2.57 MB · English

NOTE: Tables and figures can be seen in the online version of the state combined plan at .. three–fourths of ex–offenders nationwide remain jobless up to a year after release. Applying this SalesForce is a customer relationship management application that is used by several of the partners 

Discover® Account Agreement and Electronic Fund Transfers

18 Pages · 2011 · 420 KB · English

Discover Bank offers great rates and easy Your use of your Discover Card to make Obtain your Money Market Account balance or transfer funds between your

Tax Competition, Tax Arbitrage and the International Tax Regime

10 Pages · 2016 · 180 KB · English

regime, such as tax competition and tax arbitrage, and argues that (2004); Ault, Hugh J., “The Importance of International Cooperation in. Forging