Word-Level Language Identification and Predicting Codeswitching Points in Swahili-English ...

Word-Level Language Identification and Predicting Codeswitching Points in Swahili-English ...

9 Pages · 2016 · 136 KB · English

Proceedings of the Second Workshop on Computational Approaches to Code Switching, pages 21–29,. Austin, TX, November 1, 2016. cO2016 conveys meaning (Myers-Scotton, 1993b). A model should be able to see this Solorio and Liu look at English-Spanish codeswitching in a relatively small 

Word-Level Language Identification and Predicting Codeswitching Points in Swahili-English ... free download


Proceedings of the Second Workshop on Computational Approaches to Code Switching , pages 21–29, Austin, TX, November 1, 2016 c 2016 Association for Computational Linguistics 21WordLevel Language Identication and Predicting Codeswitching Points in SwahiliEnglish Language Data Mario Piergallini, Rouzbeh Shirvani, Gauri S Gautam, andMohamed Chouikha Howard University Department of Electrical Engineering and Computer Science 2366 Sixth St NW, Washington, DC 20059 [email protected],[email protected] [email protected], [email protected] Abstract Codeswitching is a very common behavior among Swahili speakers, but of the little com putational work done on Swahili, none has focused on codeswitching This paper ad dresses two tasks relating to SwahiliEnglish codeswitching: wordlevel language identi cation and prediction of codeswitch points Our twostep model achieves high accuracy at labeling the language of words using a simple feature set combined with label probabilities on the adjacent words This system is used to label a large SwahiliEnglish internet corpus, which is in turn used to train a model for pre dicting codeswitch points 1 Introduction Language technology has progressed rapidly in many applications (speech recognition and synthe sis, parsing, translation, sentiment analysis, etc), but efforts have been focused mainly on large, high resource languages and on monolingual data Many tools have not been developed for lowresource lan guages nor can they be applied to mixedlanguage data containing codeswitching In many cases, deal ing with lowresource languages requires the ability to deal with codeswitching For example, it is quite common to codeswitch between the lingua franca and English in many former English colonies in Africa, such as Kenya, Zimbabwe and South Africa (MyersScotton, 1993b) Thus, expanding the reach of language technologies to users of these languages may require the ability to handle mixedlanguage data, depending on which domains it is intended for Codeswitching produces additional challenges for NLP due to the simple fact that monolingual tools cannot be applied to mixedlanguage data Beyond that, codeswitching also has its own peculiarities and can convey meaning in and of itself, and these aspects are worthy of study as well Codeswitch ing can be used to increase or decrease social dis tance, indicate something about a speaker's social identity or their stance towards the subject of dis cussion, or to draw attention to particular phrases (MyersScotton, 1993b) Sometimes, of course, it may simply indicate that the speaker does not know the word in the other language, or is not able to re call it quickly in this instance Computational ap proaches to discourse analysis will require tools spe cic to codeswitching in order to be able to make use of these social meanings Multiple theories propose grammatical con straints on codeswitching (MyersScotton, 1993a), and computational approaches may contribute to providing stronger evidence for or against these the ories (Solorio and Liu, 2008) These grammatical constraints also can inform the social interpretation of codeswitching If a codeswitch occurs in a po sition that is less expected, it may be more likely to have been used for effect Similarly, when a codeswitch occurs in a less likely context based on features of the discourse, this also affects the inter pretation The longer a discussion is carried out in a single language, the more likely it would seem that a switch indicates a change in the discourse For example, Carol MyersScotton (1993b) analyzes a conversation where a switch to Swahili and then to English after small talk in the local language adds 22force to the speaker's rejection of a request This type of switch could also be precipitated by a change in conversation topic, task (eg preclass small talk transitioning into the beginning of lessons), location, etc By contrast, in conversations where participants switch frequently between languages, each individ ual switch carries less social meaning In those situ ations, it is the overall pattern of codeswitching that conveys meaning (MyersScotton, 1993b) A model should be able to see this pattern and adjust the like lihood of switches accordingly Being able to pre dict how likely a switch is to occur in a particular position may thus provide information to aid in the social analysis of codeswitching behavior In this paper, we will be introducing two cor pora of SwahiliEnglish data One

------------- Read More -------------

Download word-level-language-identification-and-predicting-codeswitching-points-in-swahili-english.pdf

Word-Level Language Identification and Predicting Codeswitching Points in Swahili-English ... related documents

DEPARTMENT of HEALTH and HUMAN - Centers for Disease Control and

507 Pages · 2008 · 6.61 MB · English

influenza, natural disasters, and terrorism, while remaining focused on the threats to health and local, tribal and territorial health network.

A Typology of Victim Characterization in Television Crime Dramas

33 Pages · 2010 · 278 KB · English

her analysis of one season of Law & Order, NYPD Blue, and The Practice. She found that only

Immigration and Economy in the Globalization Process

236 Pages · 2002 · 1.63 MB · English

will need employees with the right skills and motivation. Switching to an active im- Finland by analyzing the development of the volume of foreign-born and foreign na- tionals and direct foreign . In the globalization trend of corporations, competition has shifted from natural re- source and expen

Interpreting sloppy stick figures by graph rectification and

14 Pages · 2001 · 822 KB · English

1 Interpreting sloppy stick figures by graph rectification and constraint-based matching. James V. Mahoney and Markus P. J. Fromherz Xerox Palo Alto Research Center

International Student Guide for Employment in the US

19 Pages · 2012 · 741 KB · English

Problem- If you do not speak English as a native language, you are at a distinct disadvantage communicating with recruiters. Solution- Consciously make an effort to talk with Americans: • Make presentations, take English courses, and work tirelessly at improving your English skills. • Ask a fel

Assistance and Accountability in Externally Managed Schools

37 Pages · 2008 · 263 KB · English

Edison Schools, Inc., is the largest and most visible among a growing number of. Education Management profit EMOs were managing 521 public schools serving nearly 240,000 students across the United . educational services; and management consulting under the “Edison Alliance” flag, through 

An integrated approach to product design and process selection

48 Pages · 2011 · 2.15 MB ·

Narayan Raman .. M? < Bs% .. a geometric series given by TEMP(y) = r * TEMP(

Using Geographic Locations in BIM Models

22 Pages · 2016 · 2.58 MB ·

Configure Building Location in Architectural Revit Model coordinates file (in XML format) from Autodesk® AutoCAD® Civil3D®. The XML file is then As we said earlier, you can set up two different Revit family types for spot 

Normal Curve Equivalents and Percentiles

1 Pages · 2009 · 385 KB · English

Title: A.1-(28)_FINAL_Percentiles_NCEs_2009-08-23 Author: Dee McMann Created Date: 10/28/2009 3:50:39 PM

Afghanistan Floods and Landslide

12 Pages · 2014 · 585 KB · English

are Health and care (mobile health teams, psychosocial support and CBHFA), Water and Sanitation (distribution of .. health teams (MHTs, including deployment of teams across regions) working closely with CBHFA volunteers, psycho When operating, they erect a tent that serves as an OPD ward.