1、 .d L o E_ “dA Method for Producing On-line Text Databases in Traditional Mongolian and its Application to Text Retrieval Dula MAN Graduate School of Library, Information and Media Studies University of Tsukuba. 1-2 Kasuga,Tsukuba 305-8550, Japan mandulaslis.tsukuba.ac.jp Atsushi FUJII Graduate Scho
2、ol of Library, Information and Media Studies University of Tsukuba. 1-2 Kasuga,Tsukuba 305-8550, Japan fujiislis.tsukuba.ac.jp Tetsuya ISHIKAWA Graduate School of Library, Information and Media Studies University of Tsukuba. 1-2 Kasuga,Tsukuba 305-8550, Japan ishikawaslis.tsukuba.ac.jp Abstract Exch
3、anging on-line information in the traditional Mongolian script is difficult, due to the lack of the standard electronization method. Although in Mongolian spelling and meaning can be determined by pronunciation, existing character codes are mainly based on spelling and cannot represent meaning. To r
4、esolve this problem, we propose an electronization method and an input/output interface for the traditional Mongolian script. Additionally, to enhance an on-line text database in traditional Mongolian automatically, we propose a method for transliterating texts in modern Mongolian into traditional M
5、ongolian, because both languages use the same pronunciation system but use different letters. We apply our method to realize a full-text retrieval system for newspaper articles in traditional Mongolian. 1o3 .d 03I 3r _ “dKeywordstraditional Mongolian script, electronization of text, character codes,
6、 transliteration, full-text retrieval systems “ #LXb6OJDPEF I K XI 8“b C 73“i133r 3byN 6OJDPEFr L b3030Z T V 0MJWFS$PSGG 40Z T /e$PSGGZ T = v40Z T /e =vZ T b 3 3 Ma93 Mb t3? B“b7$PSGGZ T ?53Ma#a0 yN ?uss3 bNMQ =vZ T uss3 Bt 3| 0bT ?) M 3|B b0Z T c C 3I L i5byN 30Z Tb gYb g uW vb M1 byN S0b g=BC o B
7、o/ b B Cb 5 S b oalosWolosO g5 olsbC Cb “ -5 g8“/Bs3 b7 S ?0b“; V i 3 V f /| 7 P UnicodeM3rb3 H P 333 |33 0bi O 3 3 6B30b ngv3 N0b 33 hvmbz ( Z , y 0o6o o/= o/ MaTEriYa| b6 M M% sb MMh91 MbyN M uYvMM MMb 6tM a e 1 - s 7b f M a e v3 A E b I nZL L mI bNI +BWB E Tb 3 M 3 A U b 38 TrueType FontbI V = V
8、ib 图 I 0 C M3rZ T1 ) 138“0 b ob “ - 0 7 CbyN ) 1 4b C b 38“ : c ?V3byN VM3riM Z C Jb38“C 38“ S: V3b 3 3r V ?b y 3EEBB3 3rbC 3E g? Hv g VbC d iB 1?b B V:b T o.d 3EB 3 ?5 H“ V U f b opMC baatar V:7 5V: bagator a ga bC vsM M M M sb M M Ms 7 b ? pC 3r.d H 3rv/3rb3r ?5+ yM ) bt+ yM P3E?5 -A1+ y3rb taiboN
9、 onb sM M1s 73rb V -31 3rb :/s3 3rbi 3E?5 b 3rTs YV 3rC 03r 0S3r -3r = Bb TMS: 3Ei O3r -C = 3r = B5 b98 A 3rT 3Ei O = 9 rTb i/+51 %bz % M ? 3rby% MiNV?5 ?53E b z ?z3rb S: 3 C S: 3sBCb z 3rM /b P 3r Mil b _ “d4 0Z T L C 3rZ TL 0 oy LC _ “db“d mV Ub _ _ T_ ) 3WWW3r“dC m _ “dmm 4 b P11_ 5 _ M V $3 Z T
10、HA U3 b“ V3 W1_h be _ ) P _ M v_ “$_ YV _ T A U b “ Y -!zVC 3r 7 ob http:/www.ollo.mn: 7C H3r7 ob 3 c MMW bWsbyNM_1 #1 b b sMb $MB ?V UD+b+Y V UMMW1“M ? ?VrD+b1 Mop B ci7 ?Vrc = b aT Mb f ? P_ bMv cM$ “ :/M 3 inverted filebMyN1M) stemmingb M) 1)?5#3b C H M) T 57 =b_ ) _ 9 B“V _ 5MMvb P V ? 5 HC V7/b
11、 E A1 _5 = byN“d P3 io = H A U 33r bN3 i A UZL Pvbm_ L _ ) Ci_ : boolean retrieval model a O bW vector space model a q probabilistic model b t BK a _ “dC byNQT40Z T L k : b : _ TE ) b MC q b T_ M_ TMC q9 A UbL=_ m _ 5 MarkAJi|ilo/ p _ 0b_ T c_ H “*%S5b PV_ T H “4 V H “W% = b L NN_ “d ?A1 Y5!z kbC X0
12、i VL N kiib T 40Z T 7C 0H13r y o L NbQ L10Z T Nb9 _ “dC qrecall N“d ?bC q A c_ 5 M H “$_ P_S Nb_ T Q L N _ 5 $ P EMbt$M_ Nb T VbQ LTA U $_ H “_ 5b V b a_ 5C q (bV 43rZErbVT z 4L oZE rb ) vilbz ( H “$ P_ s?C P_y C 3rVC7/b FSEFNUFO3rFSEFNUFOJTbT4C 3r VzC qbV kT_ $_ U 6qpH “ H “ %rtaamtaz uzNZ Tualasy SXasizTaamcakAz TEalEwiC Taamirciz QAora| kArcySoamW 4 C # M3rZE4 L oZEy ob N o LC _ “db_ TV 43rZEr H9 : _ rb ID b: ! E = + g : o8*/%084/ ZE) p = v8 QQ& +VO = G k = v * +! “= = v+ g! E1 = M)+ gIUUQGVLVDIJIVNBOJTUPIPLVBDKQ_DIJHFODHOJEYIUN v , = + g