Scalable Probabilistic Tensor Factorization for Binary and Count Data Piyush Rai  , Changwei Hu  , Matthew Harding y , Lawrence Carin   Department of Electrical & Computer Engineering, Duke University y Sanford School of Public Policy & Department of Economics, Duke University Durham, NC 27708, USA f piyushrai,ch237,matthewharding,lcarin [email protected] Abstract Tensor factorization methods provide a useful way to extract latent factors from complex multirela tional data, and also for predicting missing data Developing tensor factorization methods for mas sive tensors, especially when the data are binary or countvalued (which is true of most realworld ten sors), however, remains a challenge We develop a scalable probabilistic tensor factorization frame work that enables us to perform efcient factoriza tion of massive binary and count tensor data The framework is based on ( i) the P ´ olyaGamma aug mentation strategy which makes the model fully lo cally conjugate and allows closedform parameter updates when data are binary or countvalued; and ( ii ) an efcient onlineExpectation Maximization algorithm, which allows processing data in small minibatches, and facilitates handling massive ten sor data Moreover, various types of constraints on the factor matrices (eg, sparsity, nonnegativity) can be incorporated under the proposed framework, providing good interpretability, which can be useful for qualitative analyses of the results We apply the proposed framework on analyzing several binary and countvalued realworld data sets 1 Introduction Tensor factorization methods [ Kolda and Bader, 2009 ] of fer a useful way to learn latent factors from complex mul tiway data These methods decompose the original ten sor data into a set of factor matrices (one for each mode or “way” of the tensor), which can be used as a latent feature representation for the objects in each of the tensor mode, and can be used for other tasks, such as tensor com pletion Among tensor factorization methods, probabilistic approaches [ Chu and Ghahramani, 2009; Xu et al, 2013; Rai et al , 2014 ] are especially appealing because of a proper generative model of the data, which allows modeling different data types and handling missing data in a natural way Realworld tensor data are often binary or count valued [ Nickel et al, 2011; Chi and Kolda, 2012 ] For ex ample, a multirelational social network [ Nickel et al, 2011 ] can be described as a threeway binary tensor with two modes denoting people and the third mode denoting the types of re lationships Likewise, from a database of research publica tions, one may construct a threeway ( AU T H O R SW O R D S  V E N U E S ) countvalued tensor, where the three dimensions could be authors, words, and publication venues and each en try of the tensor denotes the number of time an author used a specic word at a specic venue Tensor factorization on this multiway data can be used for topic modelingon such a pub lications corpus (the latent factors would correspond to top ics) Another application could be in recommender systems; having learned the latent factors of authors and venues, one can use these factors for authorauthor recommendation (for potential coauthors) or authorvenue recommendation (rec ommending the most appropriate venues for a given author) Although several tensor factorization methods have been proposed in the recent years [ Kolda and Bader, 2009; Chu and Ghahramani, 2009 ] and there has been a signicant re cent interest on developing scalable tensor factorization meth ods [ Kang et al, 2012; Inah et al, 2015; Papalexakis et al , 2012; Beutel et al, 2014 ] , most of these methods treat data as realvalued, and are therefore inappropriate for han dling binary and count data; also see Related Work (Sec tion 7) Motivated by the prevalence of such binary and countvalued tensors, we present a probabilistic tensor factor ization framework which can handle binary and countvalued tensors, while being scalable for massive tensor data Our starting point will be a conjugate, fully Bayesian model for both binary and count data, for which we develop an efcient Gibbs sampler The framework is based on the P ´ olyaGamma data

