The solution to sort entries in the Vietnamese-Ede bilingual vocabulary database

DOI : 10.17577/IJERTV12IS060030

Download Full-Text PDF Cite this Publication

Text Only Version

The solution to sort entries in the Vietnamese-Ede bilingual vocabulary database

Le Hoang Thi My

University of Technology and Education The University of Danang

Danang, Vietnam

Abstract The process of querying data in the vocabulary database, the work of arranging the data in ascending or descending order of each language is a criterion that should be considered in the study of building a vocabulary database. The implementation of sorting an English data table in alphabetical order with the Order by clause in the SQL statement is simple. Because the ASCII encoding and database management systems are used with the English alphabet. However, with the ethnic minority languages of Vietnam in general and the Ede language in particular, the implementation of alphabetical ordering has not been supported and has not received the attention of scientists. Therefore, when writing a database programmer for Ede language, it is difficult to present a data table arranged in alphabetical order. To solve this problem for Ede language in the lexical database, the article proposes a solution to sort Ede entries in the Vietnamese-Ede bilingual lexical database, in order to contribute to the search and control. investigate and manage data, build Ede data tables in alphabetical order of Ede language..

Keywords: Ede language processing, Unicode encoding, entry sorting, vocabulary database, data query

  1. INTRODUCTION

    All information processing activities on computers are related to text editor. Sorting is the process of rearranging the elements of a certain set of objects in a certain order such as ascending or descending for a sequence of numbers, alphabetically for words. Sorting work is often applied in Informatics applications with purposes such as: sorting data in computers for convenient searching, arranging processing results to print out on reports . To solve this problem for Vietnamese and Ede languages in the Vietnamese-Ede vocabulary database [3], [4], The paper proposes a solution to arrange items in the Vietnamese-Ede vocabulary database, the content of the solution is as follows:

    • First, encode Vietnamese and Ede letters into a continuum to allow string comparison in programming applications.

    • Moves the encoded entries into the array. Sort the array in alphabetical order.

    • Pass the index of the array after being sorted into the sorted index property in the datastore corresponding to the item decoded from the array.

    Thus, later when performing sorting of items in the data warehouse instead of sorting by item attributes, we perform sorting by the sort index attribute.

  2. METHOD OF ENCODING VIETNAMESE AND EDE LETTERS

    1. Encoding Vietnamese letters

      Each Vietnamese letter will be mapped into a continuous region in the Unicode en-coding. Areas selected for mapping range from 1F00:1F5E. The basis for choosing this region is because this is a continuous area containing characters and characters in this area do not appear in Vietnamese documents. Table I is a table that maps Viet-namese letters in alphabetical order to the extended Greek character area in the Unicode encoding.

      For example, the entry from the school is encrypted

      .

      TABLE I. MAPPING VIETNAMESE LETTERS INTO THE EXTENDED

      Vietnamese characters

      a

      à

      ã

      á

      â

      b

      c

      d

      e

      è

      é

      ê

      f

      g

      h

      i

      ì

      í

      j

      k

      l

      m

      n

      o

      ò

      õ

      ó

      ô

      p

      q

      r

      s

      t

      u

      ù

      ú

      v

      w

      x

      y

      ý

      z

      Extended greek character area

      1F00

      1F01

      1F02

      1F03

      1F04

      1F05

      1F06

      1F07

      1F08

      1F09

      1F0A

      1F0B

      1F0C

      1F0D

      1F0E

      1F0F

      1F10

      1F11

      1F12

      1F13

      1F14

      1F15

      1F16

      1F17

      1F18

      1F19

      1F1A

      1F1B

      1F1C

      1F1D

      1F1E

      1F1F

      1F20

      1F21

      F22

      1F23

      1F24

      1F25

      1F26

      1F27

      1F28

      1F29

      1F2A

      1F2B

      1F2C

      1F2D

      1F2E

      1F2F

      1F30

      1F31

      1F32

      1F33

      1F34

      1F35

      1F36

      1F37

      1F38

      1F39

      1F3A

      1F3B

      1F3C

      1F3D

      1F3E

      1F3F

      1F40

      1F41

      1F42

      1F43

      1F44

      1F45

      1F46

      1F47

      1F48

      1F49

      F4A

      1F4B

      1F4C

      1F4D

      1F4E

      1F4F

      1F50

      1F51

      1F52

      1F53

      1F54

      1F55

      1F56

      1F57

      1F58

      1F59

      1F5A

      1F5B

      1F5C

      GREEK CHARACTER AREA

      w

      y

    2. Ede language encoding

    Extended greek character area

    1F00

    1F01

    1F02

    1F03

    1F04

    1F05

    1F06

    1F07

    1F08

    1F09

    1F0A

    1F0B

    1F0C

    1F0D

    1F0E

    1F0F

    1F10

    1F11

    1F12

    1F13

    1F14

    1F15

    1F16

    1F17

    1F18

    1F19

    1F1A

    1F1B

    1F1C

    1F1D

    1F1E

    1F1F

    1F20

    1F21

    F22

    1F23

    1F24

    1F25

    The Ede alphabet is also classified into the Latin family, with 76 Ede characters including uppercase and lowercase characters as shown in Table II [Error! Reference source not found.], [[6]]. Of which 68 characters are the basic components of almost all Unicode f

    included in the Unicode encoding. [Error! Reference source not found.].

    TABLE II. EDE ALPHABET

    Uppercase

    Lowercase

    Where:

    Each letter of the Ede language is also mapped into a continuous region in the Unicode encoding. The area selected for mapping has a range from1F00:1F25. The basis for choosing this region is because it is a continuous region containing characters and characters in this region do not appear in Ede documents.

    Unlike Vietnamese letters, Ede letters must be converted to a combination code with two characters before being encoded, so that they can be considered as one character when sorted. The rules for converting letters ( to a character are shown in Table III.

    TABLE III. REGULATIONS TO CONVERT THE LETTER EDE IN THE FORM OF A COMBINATION CODE INTO 1 CHARACTER.

    Consonant

    Uppercase

    B

    D

    G

    H

    J

    K

    L

    M

    N

    Ñ

    P

    R

    S

    T

    W

    Y

    Lowercase

    b

    d

    g

    h

    j

    k

    l

    m

    n

    ñ

    p

    r

    s

    t

    w

    y

    Vowel

    Uppercase

    A

    Â

    E

    Ê

    I

    O

    Ô

    U

    Lowercase

    a

    â

    e

    ê

    i

    o

    ô

    u

  3. EXPERIMENTAL SORTING OF WORD ITEMS IN THE LEXICAL DATABASE

    In order to arrange the items in alphabetical order, we experiment with four basic sorting methods: bubble sort; insertion sort; sort select; quick sort [5], to select the sorting method used to sort the items in the lexicon. Based on the execution time after the experiments, we choose the sorting method to include the solution of sorting items in the Vietnamese-Ede vocabulary database. Experimental results on 4 samples, with 10 times per sample, according to 4 sorting methods, are shown in Table V. The details of the experiments are shown in Table VI.

    Through the results of the tests on Vietnamese and Ede samples in Table 5, this is the basis for the paper to choose the quick sort method as the sorting method for the array containing the items after being encoded.

    pattern

    Number of experiments

    Execution time ( second)

    Bubble sort

    Sort select directly

    Insert sort

    Quick sort

    9.297 Ede entries

    10

    0:0:02.820

    0:0:01.479

    0:0:00.657

    0:0:00.106

    17.968 Ede

    entries

    10

    0:0:09.477

    0:0:04.315

    0:0:04.240

    0:0:0.188

    11.358

    Vietnamese entries

    10

    0:0:02.290

    0:0:02.286

    0:0:00.268

    0:0:00.265

    34.375

    Vietnamese entries

    10

    0:1:14.227

    0:0:02.286

    0:0:13.450

    0:0:00.760

    TABLE V. TEST RESULTS BY 4 SORTING METHODS

    Ede letter with 2 characters

    Alternative character

    !

    @

    #

    $

    TABLE VI. DETAILS OF ATTEMPTS WITH 4 SORTING METHODS

    6

    Pattern

    Number of tries

    Execution time ( second)

    Bubble sort

    Sort select directly

    Insert sort

    Quick sort

    9.297

    Ede entries

    1

    0:0:02.952

    0:0:01.492

    0:0:00.603

    0:0:00.100

    2

    0:0:02.961

    0:0:01.510

    0:0:00.664

    0:0:00.099

    3

    0:0:02.783

    0:0:01.500

    0:0:00.595

    0:0:00.103

    4

    0:0:02.901

    0:0:01.479

    0:0:00.624

    0:0:00.111

    5

    0:0:02.696

    0:0:01.495

    0:0:00.631

    0:0:00.110

    0:0:02.705

    0:0:01.450

    0:0:00.587

    0:0:00.104

    7

    0:0:02.670

    0:0:01.540

    0:0:00.715

    0:0:00.099

    8

    0:0:03.008

    0:0:01.483

    0:0:00.703

    0:0:00.111

    9

    0:0:02.725

    0:0:01.423

    0:0:00.723

    0:0:00.110

    The mapping of Ede letters and corresponding conversion characters to the extended Greek character area is shown in Table IV.

    TABLE IV. MAPPING THE LETTER EDE INTO THE EXTENDED GREEK CHARACTER AREA

    EDE LANGUAGE CHARACTERS

    a

    â

    b

    d

    e

    ê

    !

    g

    h

    i

    j

    k

    l

    m

    n

    ñ

    o

    ô

    @

    #

    p

    r

    s

    t

    u

    $

    10

    0:0:02.804

    0:0:01.414

    0:0:00.730

    0:0:00.117

    Average

    0:0:02.820

    0:0:01.479

    0:0:00.657

    0:0:00.106

    17.968

    Ede entries

    1

    0:0:09.925

    0:0:04.484

    0:0:04.829

    0:0:0.162

    2

    0:0:08.757

    0:0:04.420

    0:0:04.807

    0:0:0.163

    3

    0:0:08.539

    0:0:04.699

    0:0:03.490

    0:0:0.207

    4

    0:0:09.811

    0:0:05.045

    0:0:03.802

    0:0:0.196

    5

    0:0:09.371

    0:0:03.874

    0:0:03.725

    0:0:0.165

    6

    0:0:10.452

    0:0:03.900

    0:0:04.463

    0:0:0.162

    7

    0:0:09.145

    0:0:04.124

    0:0:04.845

    0:0:0.199

    8

    0:0:09.067

    0:0:03.889

    0:0:04.876

    0:0:0.197

    9

    0:0:10.217

    0:0:04.405

    0:0:03.741

    0:0:0.230

    10

    0:0:09.487

    0:0:04.318

    0:0:03.829

    0:0:0.205

    Average

    0:0:02.290

    0:0:02.286

    0:0:00.268

    0:0:00.265

    11.358

    Vietna_ mese entries

    1

    0:0:02.046

    0:0:01.920

    0:0:00.239

    0:0:0.340

    2

    0:0:02.028

    0:0:02.091

    0:0:00.247

    0:0:0.250

    3

    0:0:02.511

    0:0:02.300

    0:0:00.309

    0:0:0.225

    4

    0:0:02.542

    0:0:02.165

    0:0:00.330

    0:0:0.240

    5

    0:0:01.918

    0:0:01.991

    0:0:00.235

    0:0:0.234

    6

    0:0:02.090

    0:0:02.741

    0:0:00.257

    0:0:0.286

    7

    0:0:02.418

    0:0:02.314

    0:0:00.235

    0:0:0.220

    8

    0:0:02.433

    0:0:02.223

    0:0:00.343

    0:0:0.350

    9

    0:0:02.345

    0:0:02.870

    0:0:00.252

    0:0:0.241

    10

    0:0:02.576

    0:0:02.240

    0:0:00.232

    0:0:0.267

    34.375

    Vietna_ mese entries

    1

    0:1:14.166

    0:0:18.910

    0:0:12.168

    0:0:0.795

    2

    0:1:13.985

    0:0:19.425

    0:0:13.462

    0:0:0.686

    3

    0:1:14.374

    0:0:17.841

    0:0:14.679

    0:0:0.826

    4

    0:1:13.956

    0:0:19.410

    0:0:14.835

    0:0:0.748

    5

    0:1:14.126

    0:0:21.091

    0:0:12.963

    0:0:0.795

    6

    0:1:14.212

    0:0:17.862

    0:0:12.651

    0:0:0.875

    7

    0:1:14.028

    0:0:20.420

    0:0:13.806

    0:0:0.842

    8

    0:1:13.825

    0:0:18.798

    0:0:13.868

    0:0:0.592

    9

    0:1:15.006

    0:0:18.688

    0:0:12.731

    0:0:0.717

    10

    0:1:14.589

    0:0:19.983

    0:0:13.338

    0:0:0.733

    Average

    0:1:14.227

    0:0:02.286

    0:0:13.450

    0:0:00.760

    Fig 1. Result of executing sort command with Vietnamese entries

  4. EXPERIMENTAL RESULTS

    Currently, if with the Order by clause of the query statement, the results are sorted in alphabetical order of Vietnamese string attributes. With the accented alphabetic

    l i ul f lp b i l arrangement in Vietnamese. The result when using the Order by clause in the SQL statement is shown in Figures 1 and 2. Figure 1 shows the results when executing the query Select Viet From VIET Order by Viet. Figure 2 shows the results when executing the query Select Viet From VIET Order by CS_SX. The CS_SX attribute is an added attribute according to the above solution for sorting items in the Viet-Ede datastore.

    Fig 2. Result of executing the command to sort Vietnamese entries with the sort index when coded

    With Ede letters, the same situation is encountered in Vietnamese. In addition, Ede language also has a case of handling letters that are combined in the form of a combination code. The results when using the Order by clause in the SQL statement are shown in Figures 3 and 4. Figure 3 shows the result when executing the query Select Ede From EDE Order by Ede. Figure 4 shows the result when executing the query Select Ede From EDE Order by CS_SX. The CS_SX attribute is an added attribute according to the above solution for sorting entries in the Viet-Ede datastore.

    Fig 3. The result of executing the sort command with Ede entries

    alphabetical order in the data query statement with data sorting.

    In the next orientation, the paper will apply this solution to integrate into applica-tions that edit tables such as Winword, Excel to arrange columns or rows in Ede data tables.

    Fig 4. The result of executing the command to sort items from the Ede entries with the sorting index when encrypted

  5. CONCLUSION

The solution to sort items in the Vietnamese-Ede biingual vocabulary database has been sorted on the attribute containing Vietnamese entries and Ede entries. The results are sorted according to the Vietnamese and Ede alphabetical order when using the Order by clause in the SQL query statement in the Viet-Ede datastore.

This solution contributes to solving the problem of arranging Vietnamese entries and Ede entries in the Vietnamese-Ede bilingual vocabulary database in

REFERENCES

[1] Doan Van Phuc: Ede phonetics, Social science Hà Ni, 1996.

[2] Hoang Thi My Le, Vilavong Souksan, Phan Huy Khanh: Using Unicode in Encoding the Vietnamese Ethnic Minority Languages, Applying for the EDe Language, Proceeding of the International Conference on Knowledge and System Engineering, KSE 2013, HaNoi, pp. 137-148, 2013.

[3] Hoang Thi My Le Phan Huy Khanh: The solution to build a Vietnamese-Ede bilingual vocabulary database based on the Vietnamese-Ede interaction model, No 5 (2), pp. 36

40, 2017.

[4] Le Hoang Thi My, Khanh Phan Huy: Deploying environment for processing Ede ethnic minority language in Vietnam, IEEE International Conference on System Science and Engineering (ICSSE), 2017..

[5] Robert Sedgewich: Algorithm, NXBKH & KT, 2003, https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-840.pdf

[6] Department of Education and Training DakLak: Ede Grammar, Education publisher, 2011.