Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce

Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce

11 Pages · 2017 · 6.59 MB · English

The cataloging of product listings through taxonomy categorization is a fundamental problem for any e-commerce marketplace, with applications ranging from personal- ized search recommendations to query un- derstanding. However, manual and rule based approaches to categorization are not.

Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce free download

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers , pages 969–979, Valencia, Spain, April 37, 2017 c 2017 Association for Computational Linguistics 969WebScale LanguageIndependent Cataloging of Noisy Product Listings for ECommerce Pradipto Das, Yandi Xia, Aaron Levine, Giuseppe Di Fabbrizio, andAnkur Datta Rakuten Institute of Technology, Boston, MA, 02110 USA f pradiptodas, tsyandixia, aaronlevine [email protected] f giuseppedifabbrizio, ankurdatta [email protected] Abstract The cataloging of product listings through taxonomy categorization is a fundamental problem for any ecommerce marketplace, with applications ranging from personal ized search recommendations to query un derstanding However, manual and rule based approaches to categorization are not scalable In this paper, we compare sev eral classiers for categorizing listings in both English and Japanese product cata logs We show empirically that a combina tion of words from product titles, naviga tional breadcrumbs , andlist prices , when available, improves results signicantly We outline a novel method using corre spondence topic models and a lightweight manual process to reduce noise from mis labeled data in the training set We con trast linear models, gradient boosted trees (GBTs) and convolutional neural networks (CNNs), and show that GBTs and CNNs yield the highest gains in error reduc tion Finally, we show GBTs applied in a languageagnostic way on a large scale Japanese ecommerce dataset have improved taxonomy categorization perfor mance over current stateoftheart based on deep belief network models 1 Introduction Webscale ecommerce catalogs are typically ex posed to potential buyers using a taxonomy cat egorization approach where each product is cate gorized by a label from the taxonomy tree Most ecommerce search engines use taxonomy labels to optimize query results and match relevant list ings to users' preferences (Ganti et al, 2010) To illustrate the general concept, consider Fig1 A merchant pushes new men's clothing listings to an online catalog infrastructure, which then orga nizes the listings into a taxonomy tree When a user searches for a denim brand, “DSquared2”, the search engine rst has to understand that the user is searching for items in the “Jeans”category Then, if the specic items cannot be found in the inventory, other relevant items in the “Jeans”cat egory are returned in the search results to encour age the user to browse further However, achiev ing good product categorization for ecommerce marketplaces is challenging Commercial product taxonomies are organized in tree structures three to ten levels deep, with thousands of leaf nodes (Sun et al, 2014;Shen et al, 2012b;Pyo et al, 2016;McAuley et al, 2015) Unavoidable human errors creep in while upload ing data using such large taxonomies, contributing to mislabeled listing noise in the data set Even EBay, where merchants have a unied taxonomy, reported a 15%error rate in categorization (Shen et al, 2012b) Furthermore, most ecommerce companies receive millions of new listings per month from hundreds of merchants composed of wildly different formats, descriptions, prices and metadata for the same products For instance, the two listings, “University of Alabama allcotton non iron dress shirt” and“U of Alabama 100% cotton noiron regular t shirt” by two merchants refer to the same product Ecommerce systems tradeoff between classi fying a listing directly into one of thousands of leaf node categories (Sun et al, 2014; ?) and splitting the taxonomy at predened depths (Shen et al, 2011; ?) with smaller subtree models In the latter case, there is another tradeoff between the number of hierarchical subtrees and the prop agation of error in the prediction cascade Simi lar to (Shen et al, 2012b;Cevahir and Murakami, 2016), we classify product listings in two or three steps, depending on the taxonomy size First, we predict the toplevel category and then clas 970Figure 1: Ecommerce platform using taxonomy categorization to understand query intent, match mer chant listings to potential buyers as well as to prevent buyers from navigating away on search misses sify the listings using another one or two levels of subtree models selected by the previous predic tions For our largescale taxonomy categoriza tion experiments on product listings, we use two inhouse datasets, 1 a publicly available Amazon product dataset (McAuley et al, 2015), and a pub licly available Japanese product dataset 2 Our

------------- Read More -------------

Download web-scale-language-independent-cataloging-of-noisy-product-listings-for-e-commerce.pdf

Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce related documents

Historical Development of the Offshore Industry

17 Pages · 2017 · 1.28 MB · English

other by vertical wings (hydrofoils) on either side of the streamer array. The existing 2D vessels created a very wide, open back deck for hydrofoils, steamers, and air gun operations. 3 DRILLING Studies, Delft. FURTHER READING. Intermountain Oil and Gas BMP Project (2014) The Development.

Chirls to Outline Agenda for 2005 as He Becomes Bar's 78th Chancellor

24 Pages · 2004 · 1.41 MB · English

year your Bar Association leaders have conducted Copy Editor. Kate Maxwell from 5 to 7 p.m. in the Grand Ballroom of the Park Hyatt Philadelphia at the. Bellevue it into a multi-level social services ag- ency and by Mayor John F. Street as DHS com- .. Defense Counsel; Brehon Law Society,.

Starting A Business handout - University of Minnesota Duluth

10 Pages · 2010 · 122 KB · English

What is the history of the business/idea? d. What industry is the business in? This is a list of suggested costs that can be involved in business startup.

The Convergence of Group Psychotherapy and the Twelve Steps of AA

21 Pages · 2003 · 237 KB · English

Put differently, “alcoholism is the horse, not the cart of mental illness”. (1983 .. comradeship and shared problems coupled with the added asset of the emotional objectivity .. Similarities and Differences for each groups Modalities.

request for proposal tender number: t23/11/17

17 Pages · 2017 · 740 KB · English

1. INVITATION TO TENDER. Tender Number. T23/11/17. Title of this RFP. Panel of Support Service Providers for Project Development. Issue Date . Agronomists for crop and other advise This phase is the final stage in the evaluation process and only successful bidders that have met the minimum.

Enhanced Mechanical Seal Performance Through Proper Selection and Application of Enlarged ...

10 Pages · 2013 · 4.82 MB · English

Richard H. Robinson. Director Richard Robinson is Director of Research .. chipping on the I.D. and the O.D., and the O-ring secondary seals.

Properties of Exponents - Create Custom Pre-Algebra, Algebra 1

4 Pages · 2012 · 35 KB · English

26) 2x4 y−4z−3 3x2 y−3z4 27) 4x0 y−2z3 4x 28) 2h3 j−3k4 3jk 29) 4m4n3 p3 Kuta Software - Infinite Algebra 1 Name_____ Properties of Exponents


9 Pages · 2011 · 1.27 MB · English

POWERSCHOOL FOR PARENTS HANDBOOK Special thanks to La Salle College Preparatory School for the template for this information packet. Additional information

Vector Algebra - EFM - iETSI - School of Mechanical Engineering

19 Pages · 2003 · 311 KB · English

The triangle rule can be made more general to apply to any geometrical shape - or polygon. This then becomes the polygon law. Vectors Algebra 26


2 Pages · 2003 · 33 KB · English

Under our law, a person is guilty of Possession of Burglar's. Tools when that to exercise dominion or control over tangible property.2. INTENT means