Developing an Online Indonesian Corpora Repository

Developing an Online Indonesian Corpora Repository

7 Pages · 2012 · 2.25 MB · English

Abstract. This paper describes efforts to develop an online repository of Indonesian corpora. –and its associated functions and services– that has been designed to support a wide variety of use cases and applications. Two design considerations are ensuring sustainability and accessibility of th

Developing an Online Indonesian Corpora Repository free download

Developing an Online Indonesian Corpora Repository * Ruli Manurung, Bayu Distiawan, and Desmond Darma Putra Faculty of Computer Science, Universitas Indonesia, Depok 16424 [email protected], [email protected] , [email protected] Abstract This paper describes efforts to develop an online repository of Indonesian corpora –and its associated functions and services – that has been designed to support a wide variety of use cases and applications Two design considerations are ensuring sustainability and accessibility of the corpora, and enabling open enrichment through annotation The presented model supports OLACcompliant metadata, is built atop an OAIScompliant core repository, and exposes data and functionality via RESTful web services A prototype implementation is presented, which allows users to upload, browse, and search the collection, whose extensible content model currently supports POS tagging The future plan is for languageindependent aspects of the system to be packaged up and released as an opensource package to aid the development of corpora repositories for other languages Keywords: Indonesian, corpora, annotation, metadata, digital repositories * This research is part of a collaborative research project funded by ARC Discovery Grant DP0877595 Copyright 2010 by Ruli Manurung, Bayu Distiawan, and Desmond Darma Putra 1 Introduction Bahasa Indonesia, or just simply Indonesian, is spoken by well over 100 million people, and yet there is a proportionally small amount of available Indonesian language resources that would greatly support linguistics and language technology research Recent work on Indonesian NLP resources and tools has started to bear results (Adriani and Manurung, 2008), but to further advance research, there is a need for a comprehensive, balanced, and widecoverage collection of corpora (Arka et al, 2007 ) Designing and building an online corpora repository is much more complex than simply up loading a set of text files onto a folder accessible over the Internet For it to be of support to the research community, careful consideration must be paid to the design of standards, protocols, metadata, and architecture In this paper we present two main desiderata that inform the design of our corpora repository design: the need to ensure sustainability and accessibility of the corpora (Section 2), and the enabling of open enrichment –through annotation –

------------- Read More -------------

Download developing-an-online-indonesian-corpora-repository.pdf

Developing an Online Indonesian Corpora Repository related documents

Alternative Splicing Regulation of Cancer-Related Pathways in Caenorhabditis elegans: An In Vivo ...

11 Pages · 2015 · 1.36 MB · English

2 Department of Molecular, Cell and Developmental Biology, The Center for .. elt-6 mpk-1/sur-1. √ eor-1 par-1. √ eor-2. √ ptp-2 gap-1 rom-1 gap-2. √ sem-4 .. [61] M. W. Pastok, M. C. Prescott, C. Dart et al., “Structural diversity.

An interview with Nick Blazquez, President, Africa, Diageo

6 Pages · 2012 · 942 KB · English

23 Diageo, the international drinks company, first shipped Guinness to Sierra Leone in 1827. It built its first brewery outside the British Isles in Nigeria in 1963.

Online student evaluation improves Course Experience Questionnaire results in a physiotherapy ...

17 Pages · 2008 · 228 KB · English

Online student evaluation improves Course Experience Questionnaire results in a physiotherapy program CEW comprises a course survey instrument modelled on

Online Response Time Optimization of Apache Web Server

10 Pages · 2012 · 184 KB · English

Student Online Course Survey Is Online Learning For Me?

2 Pages · 2009 · 114 KB · English

Student Online Course Survey Is Online Learning For Me? Online courses, also known as virtual school courses, may offer the student the flexibility of choice and

ISSUE BRIEF Massive Open Online Courses: Legal and Policy Issues

15 Pages · 2012 · 255 KB · English

October 22, 2012 ISSUE BRIEF Massive Open Online Courses: Legal and Policy Issues for Research Libraries Brandon Butler Executive Summary Massive Open Online Courses

An Economic Analysis of Sex Discrimination Laws

27 Pages · 2015 · 1.64 MB · English

flicts of interest between different groups of women (for example, nonworking nally, Executive Order No 11246, as amended in 1967,10 forbids include sexual harassment.13 Although Title VII has been held not to require .. chance with an unknown quantity by accepting a lower wage. How-.

An overview of auditory display to assist comprehension

16 Pages · 2006 · 143 KB · English

Chemistry students and researchers have difficulty in learning and understanding Typically, small organic molecules, such as amino acids, have a size in In the past few years, the analysis and study of molecular properties have been . the absence of sound represented a broken machine.

The Seychelles Online Business Directory - YellowPages.sc

1 Pages · 2010 · 219 KB · English

iMedia | PO Box 1000, VCS Building, Le Chantier, Victoria, Mahé, Seychelles | T: +248 676166 | F: +248 610212 | E: [email protected] | W: www.imedia.sc

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

153 Pages · 2017 · 9.76 MB · English

representations become more alike. (this thesis). Theta oscillations mediate the interplay between the hippocampus and medial prefrontal cortex to facilitate the integration of disparate memories. (this thesis). Whenever observed brain activity is bilateral, it must mean something. (Sander E. Bosch