This thesis discusses present condition, direction, problems, and guidelines of establishment related to a corpus of spoken language in North Korea. Comparing with South Korean corpus, North Korean corpus of spoken language stays around 1 percent, so it should focus on quantitative securing for now. It is difficult to secure qualitative balance because of several practical challenges. It is necessary to design corpus guidance itself generally by differing with a corpus of written language. Guidelines should be designed by considering brevity, North Korean as a dialect, connectivity with successive work, ease of users, and excellence of established process. Especially, a double transcription should be considered in a transcription process of a raw corpus, and both unity and discrimination with the guidelines of written language should be considered in a process for morphological annotation.
카카오톡
페이스북
블로그