 Open Access
 Total Downloads : 59
 Authors : Yashaswini Hegde , Padma S. K
 Paper ID : IJERTV8IS060728
 Volume & Issue : Volume 08, Issue 06 (June 2019)
 Published (First Online): 08072019
 ISSN (Online) : 22780181
 Publisher Name : IJERT
 License: This work is licensed under a Creative Commons Attribution 4.0 International License
VKTPY Prefix Hashed Trie for Indian Languages: A Case Study with Kannada Text Retrieval

Yashaswini Hegde, Padma S. K
Dept. of Information Science
Shree Jayachama Rajendra College of Eng.
Mysuru, India
Abstract The tries used in existing retrieval engines are unicode based and are more applicable to languages like English. When same method is applied to phonetic languages like Indian and South Asian languages they arent that efficient in terms of compact representation. Therefore, we have developed a novel VKTPY Prefix Hashed Trie (VKTPY PHT) based approach which is efficient in memory usage (compact representation) and a general methodology for the retrieval of text in Indian and south Asian languages.
In our approach we have made use of the fact that these languages are phonetic in nature and follows Brahmi/ Devanagari based scripts, which can be represented numerically by the Katapayaadi Sankhya Sutra (KTPY Rule). The enhancement proposed by us to this KTPY Rule is called Vistruta Katapayaadi Sankhya Sutra (VKTPY Rule). Using Our VKTPY Rule, we have developed an efficient encryption/decryption technique and have applied it to a south Indian language – Kannada as a case study. However, our technique is general and extendable to around 120 Indian and south Asian languages which follows Brahmi/Devanagari scripts.
Our proposed VKTPY PHT method first converts unicode Kannada to VKTPY encoded Kannada and stores them as prefixes, indexed by its first character set. By this, we get a 20% improvement in memory compared to existing unicode tries with same time complexity.
Our proposed VKTPY PHT is not restricted to text but can also be used in speech (spoken) enabled retrieval engines for all Indian languages. We have some preliminary results to show this capability. In short, the key benefits of our approach are memory efficiency and applicability to many Indian and south Asian languages.
Keywords Encryption; Compressed Trie; Kannada Prefix Hashing; NLP; Katapayaadi Sankhya Sutra ;

INTRODUCTION
The scripts of many Indian languages like Sanskrith,Hindi, Kannada, Telagu, Marathi, Gujarathi etc. are based on Brahmi/Devanagari scrips and are similar in their structure. These languages, that are official and spoken languages of highly populated Indian states, generate huge web content. For example Kannada, a spoken language of around 60 million people from Karnataka, a southern state of India, generate millions of web pages/documents. Hence it is important to study and explore the efficient techniques for text retrieval
Fig.1. Simple Trie
engines (search engines) with common/similar language representations and are phonetic in nature. In this paper we are proposing a new common representation for many Indian languages and showing its efficiency in terms of space and time, when collaborated with a Trie, a basic data structure and algorithm used in text retrieval engines, taking Kannada language as a use case.
The proposed VKTPY Rule (Vistruta Katapayadi Sankhya Sutra") is a encryption/decryption rule by which an Indian language based on Brahmi/Devanagari script can have its numerical representation. And VKTPY PHT (VKTPY Prefix hashed trie) is a space efficient prefix hashed trie which uses VKTPY Rule to encode the inputed unicode text.
VKTPY Rule is based on an ancient Indian technique called The Katapayaadi Sankhya Sutra (KTPY rule). KTPY Rule is a system of numerical notation, was used by ancient
Indian mathematicians and grammarians as a tool to convert alphabets to numerals and vice versa.
Tries are tree based fast data structures used to store and search variable length strings but are space intensive. They are also called as digital tree or Radix tree or Prefix tree. The keys of the Trie generally are characters and all the descendants of the nodes will share the common prefixes. The frequency of the prefixes are stored in the leaves of the Trie. Tries are used to preprocess the patterns to speed up the pattern matching queries with the search time proportional to the size of the pattern. The standard simple Trie for Unicode Kannada words is given in the Fig. 1. Here each node except the root is labeled with a Kannada character. A Path from the root to the leaves gives a word. The time complexity of a Trie for insertion, deletion and search is O(dm) where d is the size of a string and m is the size of a alphabet. The space complexity is O(n) where n is the total size of strings.
The compact tries like Radix and Patricia trees are space optimized variants of Trie. The compact Prefix Tries or fully compressed Trie or the Radix Trie are obtained by collapsing the single leaf nodes. In other words the only child will be merged with its parent.
The proposed VKTPY rule which is an extension KTPY rule [7] is capable of covering around 120 [1] Indian and south east Asian languages like Balinese, Javanese etc . These languages are based on Brahmi/Devanagari and have common script structures [2].
The Katapayaadi Sankhya Sutra ( ) – is a powerful encryption technique, a system of numerical notation, was used by ancient Indian mathematicians and grammarians as a tool to convert alphabets to numerals and vice versa. The oldest available evidence of the use of the KTPY rule is from Grahacaaranibandhana by Haridatta in 683 CE. [4]
The KaTaPaYaadi technique, groups the consonants of Kannada into four. Ka, Ta, Pa and Ya, are the group names represented by the beginning letters of these groups. The rule says kaadi nava,(from ka nine letters – Velar and Palatal Stops) taadi nava (from Ta nine letters – Retroex and dental stops), paadi pancha (5 letters from Pa Labial stops),yaadhyaShTa (8 letters from ya – Fricatives & Glides) assigning the values from 1 through 9 and 0 for the last letter of the group. Groups and numbers assigned for the consonants are given in Table 1. As shown in Table 1 KaTaPaYaadi can be seen as a mnemonic technique which helps to remember the numbers (i.e number to characters – decryption) and also like ASCII coding deriving numeric values to non numeric characters. [5] [3]
From Table 1 we can represent ka (k) as 1, sa(s) as 7, ma

as 5, na (n) as 0 and so on. So by KTPY rule the word gaNita () will become 356.
However KTPY rule has two major limitations.

No provision for unique representations of the words.

A word damita () also becomes 356 like a word gaNita ()
TABLE 1. KATAPAYAADI SANKHYA SUTRA (KTPYRULE)
h
Grp Name
1
2
3
4
5
6
7
8
9
0
Kagrp (Velar & Palatl)
k
kh
g
gh
ng
c
ch
j
jh
ny
Tagrp (Retroex & dental)
T
Th
D
Dh
N
t
th
d
dh
n
Pagrp (Labial)
p
ph
b
bh
m
Yagrp (Fricatives & Glides)
y
r
l
v
sh
Sh
s


No numerical representation possible for vowels, composite and conjunctive consonants of the words.

Words like agaNita (), uddaama ( ) cannot be represented as numbers, as characters like a and u are not assigned with any representative numbers.

Words having compound consonants like kShaNika (}) and a word with out compound consonants like kaNika ( ) both get same representation as 151.
These two drawbacks of the KTPY rule are addressed in the proposed Vistruta Katapayaadi Sankhya Sutra ( (VKTPY rule) ) . This V KTPY rule


Gives numerical representations to characters like KTPY rule.

Gives unique representation to all characters of Kannada alphabets including swara (vowels), vargeeya vyanjana (classified consonants), avargeeya vyanjana (miscellaneous consonants), yogavaahaka, ottakshara (compound consonants) and kaaguNita (Conjunctive consonants).

Gives same numerical representation to alphabetical characters of many Indian and south east Asian languages that use Barhmi/Devanagari.

Provides easy decryption rules.

The following Table 2 shows how this VKTPY rule make use of the natural groupings of Kannada alphabets such as swara, vargeeya vyanjana, avargeeya vyanjana,ottakshara and kaaguNita while representing them as numbers. Here each of these groups are numbered with bin numbers from 1 to 8 and each letter belonging to these bins are numbered from 1 to 9 and 0 expect pagrp which is numbered from 1 to 5. This technique, thus able to cover vowels, conjunctive and compound consonants. Our proposed technique has an unique way of representing each Kannada character by a two digit number where the first digit is a bin number and second digit is an index number of the character in that bin. The Table 2 details the VKTPY rule. By this rule, and using Table 2 the word gaNita ( ) can be represented as 13257226, Kannada( ) as 1120882023, and bhaashe ( ) as 34714781. The Table 3 gives comparative numerical representation by KTPY Rule and VKTPY Rule.
TABLE 2. KATAPAYAADI SANKHYA SUTRA EXTENDED (VKTPY
RULE)
Grp Name
1
2
3
4
5
6
7
8
9
0
Kagrp 1
k
kh
g
gh
ng
c
ch
j
jh
ny
Tagrp 2
T
Th
D
Dh
N
t
th
d
dh
n
Pagrp 3
p
Ph
b
bh
m
Yagrp 4
y
r
l
v
sh
Sh
s
h
L
Swara0 5
a
Aa
i
ii
u
uu
R
Ru
Swara1 6
e
ee
ai
o
O
ou
am
ah
Gunita1 7
Gunita2 8
A VKTPY Trie is a congruence of VKTPY rule and a Trie. It stores VKTPY codes as labels/edges of a Trie. Fig. 1. shows a simple Trie constructed for Kannada unicode words and Fig. 2. is its equivalent VKTPY Trie. VKTPY codes that are stored in the Trie structure saves around 30% of the space since each letter of Kannada unicode is a multi byte character (varies from 1 to 6 bytes) and the total code size of VKTPY encoded text will be far lesser compared to a Trie with Kannada unicode words.
VKTPY Prefixed Hashed Trie (VKTPY PHT) is a V KTPY Trie where the VKTPY encoded prefixes are stored as labels of the trie indexed by its first character in the node array at each internal nodes of the trie.
In VKTPY PHT, the nodes are being compressed by combining consonant conjunctives with their respective consonants because it is quite normal to treat such a combination as a single letter in Kannada lanaguage. The Fig. 3a. shows the combination of consonant conjuncts and its respective consonant in a Trie. The Fig. 3b. shows its equivalent in Kannada unicode. By comparing these two figures,it can be seen that our method further reduces the number of pointers in a Trie and thus gives more gain in using the memory space. Fig. 3c. is a fully compressed unicode Kannada tries.
TABLE 3. NUMERICAL REPRESENTAION BY KTPY RULE V/S V KTPY RULE
Kannada Words
KTPYRule representation
VKTPY Rule representation
(gaNita)
356
13257226
(damita)
356
28357226
(agaNita)
No representation
5113257226
(uddaama)
No representation
552887287135
(kShaNika)
151
118746257211
(kaNika)
151
11257211
Fig. 2. Equivalent VKTPY Trie
Fig. 3a. Compressed VKTPY Trie
Fig. 3b. Unicoded Equivalent Trie
Fig.3c. Fully compressed Knnada unicode Trie
Fig. 3a. Compressed VKTPY Trie
Fig. 3b. Unicoded Equivalent Trie
Fig.3c. Fully compressed Knnada unicode Trie
Fig.3. Compressed Tries


RELATED WORK
The Katapayadi sankhya sutra (KTPY Rule) though used in ancient time (1st CE), it is being considered by the computer scientists recently. This technique is being considered for its powerful encoding capabilities. Anand Raman from Massey University, New zealand, compared this ancient rule, with modern hashing method [6] where till today it is believed that the concept of hashing was considered by
H.P. Luhn of IBM in 1953. Subhash kak ,Regents professor of electrical and computer engineering at Oklahoma State University, observes that KTPY rules could be used like binary numbers [7] where he attributes the mapping of KTPY rules to binary values to Pingala, the brother of Panini a famous sanskrit grammarian. Further Subhash Kak along with
T.R.N Rao , in his book Computation in Ancient India published in 2016 [8], mentions the role of Katapayadi rule in Indian contributions to the science of computing while explaining Paninian structure of grammar and Indian logic.
Trie [14] that has been uccessfully used in information retrieval is an array of pointers – one for each character in an alphabet. Each leaf node is the end of a chain of trie nodes representing a string [27]. Tries are pretty fast with reasonable worstcase performance,and good for applications involving text management like pattern matching with kd digital trie [22], dictionary and text processing with double array listing [23], MSD Radix sorting and searching [24], compression [25] and text data mining with hashtrees [20]. Tries are spaceintensive, and thus makes it difficult to use with enormous volumes of strings [11]. Bell et al. in 1990 [25] shows that the space can be saved by reducing the number of trie nodes by omitting chains of tries that descend into a single leaf and thus changing their structure. They call it a compact trie [25]. In the Patricia trie Sedgewick, in 1998 [21] demonstrated that the compression is further possible by omitting all chains without branches, not just those that lead to single leaves. The ternary search tree [24] can save space by using a 3way trienodes for less than, equal to, and greater than comparisons, thus reducing the node size for sparse data. Trie compression [12] , trie compaction and heuristics [26] have also been applied.
Heinz et al. in 2002 successfully reduced the number oftrie nodes at little cost, by collapsing triechains into buckets that share a common prex. They called it the Bursttrie with buckets represented as linked lists with movetofront on access as suggested by [27]. They are then selectively burst into smaller buckets that are parented by a new trie.
Generally it is perceived that Prex Hash Trees are data structures for Distributed data bases. In 2004 Ramabhadran et al. [29] worked on Prex Hash Tree as an indexing structure over distributed hash tables, enabling queries over distributed hash table which uses look up interfaces to construct very ecient trie based data structures.
In the context of Kannada language, tries have been used to gure out the root words by eliminating its prexes and which are further used for the indexing purpose. Sumant Kulkarni and Srinath Srinivasa in 2013 worked on these possibilities and called this trie TrieIR [28].

VKTPY PREFIX HASHED TRIE : VKTPY PHT
VKTPY PHT is a compressed Trie having fewer branches. It is compressed by, collapsing all single nodes,by combining consonant conjunctives and also by allowing to store entire prexes which are indexed by rst characters of the prexes.
Here each node array is a structure consisting of its prex and a pointer to its child. Initially each word is treated as its prex and stored in the NAry node as shown in Fig.4. Each node of the VKTPY PHT is like a hash table with each pointer, pointing to NAry tries of the subsequent levels containing VKTPY prex codes as hash keys. Since V KTPY words with similar prexes are encrypted in a similar way they can be considered as hash keys for VKTPY PHT. If a word with same prex is hashed into the same bucket (node array), then the node is split with its prex as a label of the root and remaining length of the key and node key (subKey1 , subkey2) as labels of its children as shown in Fig.4. For example (KannadabavuTa) is a hashed key stored as a label but indexed with its rst digit in node 1. The subsequent insertion of key (kannadaasmite) hashed to the same node, causing a split in the index and also split in the node node1, into a internal node node2. The leaves leaf1 and leaf3 with its labels (baavuTa) as subKey1 and (asmite) as subKey2 are created respectively in the node node1, having (kannada) as its prex in the node node1, which is a parent node of the node node2. The key (kannadaakshara) is also hashed to the node node1, with the same index and the prex (kannada), navigates down the trie to reach node2. Now the words (asmite) and (akshara) are compared and the prex (a) becomes the label of their parent node node2. The remaining letters (kShara) as subKey1 and (smite) as subKey2 will become the labels of the leaves leaf2 and leaf3 of the new node node3. After each split, the label of the parent node will get updated if its prexKey(nodeKey) value changes due to new prex computed. Each insertions like this begins with a search in PHT. This search gets right PHT node (addingNode) where the new key can be inserted. The following Algorithm 1 searchPHT(), explains the steps in detail. And the Algorithm 2 initInsertionPHT() and Algorithm 3 insertionPHT() explains insertion of VKTPY Keys.
Fig. 4. VKTPYPHT Node Structure
Algorithm1: searchPHT()
search in VKTPY PHT for suitable node to add new key
Input : ROOT, K , V KTPY key

Initialize: Set HT ROOT , N currNode, subKey1 and subKey2 are part of nodeKey and K excluding their
common prex, lpfx the length of the prex

Get index from K

Get chains from HT.htable

Get childNode from chains

Set childNode.pfxKey as nodeKey

if nodeKey and K same then

return frequency count of the K

else

Get prefix, subKey1 ,subKey2, lpfx by calling getPrex(nodeKey, K)

end

while children exists do

Get index from subKey2

if index in children then

Get childNode from children

Set childNode.children to children

Set childNode.pfxKey to nodeKey

Navigate sub tries of PHT by getting

prefix, subKey1 ,subKey2 , lpfx on calling getPrex(nodeKey, sunKey2)

else

Key Not found

Break

end

return childNode,prefix,subKey1,subKey2,lpfx

end
Algorithm 2:initInsertionPHT() Variable initialization part of insertionPHT()

Initialize: Set HT ROOT , N currNode ,
nodeKey currentNode.PfxKey,
K is VKTPY key

insertPHT() invokes searchPHT() and K is searched
in the trie, if K is not found in the trie a new node with K is inserted into the trie in the appropriate
place, and returns an updated VKTPY populated trie. If k is found then frequency count of the K is incremented.searchPHT() returns common
prex between K and nodeKey, subKey1(nodeKey prex), subKey2 (V
KTPY Keyprex) and length of prex lpfx

N, keyIndex searchTrie(P,K);The return enumerates three cases

case1. N is root (Hash Table) when trie empty case2. N is node with having Hash chain

case3. N is the node where the keyIndex not found in its chains ;

Algorithm3: insertionPHT() adding VKTPY keys into VKTPY PHT_

if Hash chain Empty: case1
then Empty Hash Table N is created; Get index as rst character of key K

Create a new hash node hn containing N as its parent, index as the keyIndex of the hash chain and K the
key 3 return HT

if Hash chain exists :case2 then

Check indexs present in the chain search further with searchHT() returns addingNode, prefix,
subKey1, subKey2, lpfx

Get the children from the addingNode and prefixKey = nodeKey from the addingNode

if the nodeKey and KatapaKey are same then

Increment the frequency count of the prefixKey

else if prefix is same as nodeKey with empty sybKey2 then

Increment frequency count of the nodeKey

else if nodeKey is prefix and subKey2 is not empty then

create new hash node with addingNode
as parent and prefix as its key indexed by index of subKey2.

Do the same even if prefix is part of the katapaKey but index the new node with index of subkey1

else if prefix is empty but subKey2 is not empty then

create a new hash node with subKey2 and addingNode as its parent indexed by rst chracter of subKey2

else if none of the above condition satises and if addingNode has no children then

two hash nodes have been created one with subKey1 and other with subKey2 having addingNode as their parent indexed by their respective rst characters.

else if none of the above condit ion satises but addingNode had children. then

This indicates that the insertion is happening in between the nodes. Then check if insertion happening between root Hash Table HT and
addingNode or insertion is happening between two hash nodes.

else if insertion is is between root Hash table and a node then

insert the newly created hash node hn1 between Hash table and addingNode as its new child with prex as its nodeKey and create another hash node hn2 with subKey2 as its nodeKey and hn1 as its parent.

else
insertion is between two hash nodes

repeat the above step.parent of adding node will be a hash node instead of Hash table.
End 24 else

index not in chains :case3

create a hash node with Hash table as its parent and katapaKey as its nodeKey indexed by it rst
character.

End
To analyze these algorithms, we consider a very generic equation for Mary tries containing N numbers of words (keys) as given by Knuth [27]. Let us say AN be the average number of internal nodes. Then for N>=2 the equation to nd the average number of nodes is
TABLE 4: DETAILS OF THE DATASET
Data Set
Total words in Doc w
Number of unique words
u
Number of repeated words
r
Number of stop words
s
Total words4 ws = u+r
Prajavani articles
6300
2426
1465
2409
3891
Data Set
Total words in Doc w
Number of unique words
u
Number of repeated words
r
Number of stop words
s
Total words4 ws = u+r
Prajavani articles
6300
2426
1465
2409
3891
= 1 + ( !
) (
+. . . . )
1+…+=
1!…..!
1
The tokens of parsed documents are encoded with VKTPY rule and then inserted in to VKTPY PHT data structure. The
This equation can be rewritten by considering that k1 of the keys are in the first subtrie and kM are in Mth subtrie. We can rewrite the above eqation (1) with A0=A1=0 and if we sumup k2 to kM we get
!
results are provided under different sections.

Memory taken by VKTPY encryption and Kannada unicode
Our experiments show that Kannada encrypted using VKTPY rule takes less space compared to Kannada unicode.
= 1 + 1 (
1!. . . . . !
1+…+=
) 1
In the case of the unicode representation, the character size varies from 1 byte to 6 and in VKTPY encription method each character takes only symbolic 8 bits. The graph in Fig.5.
= 1 +
1
()
(1
)
1 2
shows that around 30% memory is saved just by using V KTPY encrypted code, for the text corpora developed by us [10].
If CN is the average total number of digits inspection needed to compare all N keys in the trie then C0 = C1 = 0 and equation becomes
= + 1 ( 1) 2
Simplifying all these complicated equations we can arrive at the following description as Knuth suggests in [27] such that the number of nodes needed to store N random keys in M ary trie with branching terminated for s keys is approximately
numNodes = N/(s \ln M) (1)
This equation (1) is valid for large N small s and small M and for a trie with M link field the equation further simplifis to equation "(2)
numNodes = N/(s \ln M) if s= M (2)


EXPERIMENTAL SCENARIOS AND RESULTS Both VKTPY encryption and VKTPY PHT have been
implemented in Python. Our work uses the data set created by political articles by Shekhar Guptas column RaaShtrakaaraNa publishing in renowned daily called PrajavaaNi () a Kannada news paper. There are around 44 articles containing around 6000 words. Selected stop words have been removed by own tool developed earlier [32]. Table 4 gives the details about the data set used.
Fig. 5. Kannada Unicode word size vs VKTPY encoded Kannada word size

Memory and time taken by VKTPY PHT and fully compressed Kannada unicode Trie
The fully Compressed Kannada unicode Trie (Fig.3c.) is compressed by converting long chains of singlechild nodes to one single node. And VKTPY PHT is further compressed by merging of nodes of consonants and conjunctive consonants. When this structure is experimented with the text corpora [10], we get almost same time for inserting and searching on an average with much less space requirement. The memory used vs time ticks for inserting VKTPY code into VKTPY PHT and insertion of Kannada unicode words into fully compressed unicode trie are shown in the Fig.6. Figure shows VKPTY PHT is more compressed as the number of words increases. Which means it is much more suitable for Big Data.
Fig.6: Insertion: Memory vs Time taken by VKTPY PHT and fully compressed Kannada unicode Trie
Table 5 gives the comparison of eld links (pointers) created by fully compressed unicode trie and VKTPY PHT. From this Table 4 we can conclude that number of nodes created in case of VKTPY PHT is less and around 20% memory is saved.
Fig.7. shows the comparison of average search time taken by VKTPY PHT and a fully compressed Kannada unicode Trie.
TABLE 5: DETAILS OF THE FIELD LINKS IN TRIES
Tries
Fully Compressed unicpde Trie
Proposed VKTPYPHT
Number of field links
197232
157304
Fig. 7. Search time taken by fully compressed Unicode Kannada Trie and VKTPY PHT
The experiments with prexed hashes shows that the encrypted code is suitable for hashing since it falls into good distribution among the nodes of the proposed data structure VKTPY PHT. Fig.8. shows the random sample of 100 words
with its sorted frequencies. The graph is tted for half Gaussian distribution. From this gure we can deduce that the spread is good and the maximum number of collisions do not exceed 20 hits for around 500 words.
Fig.8. Curve tted for 100 Kannada Words with sorted frequencies


CONCLUSIONS

Our study and experiments show that our proposed V KTPY encryption technique is a very efficient method. It not only saves the space greater than 30% but also enables several advantages due to its capability of representing the Kannada letters numerically and symbolically. One such advantage proved in our study is effectively compressed VKTPY PHT over fully compressed Kannada unicode trie. This space and time efficient VKTPY PHT, saves the memory usage by 20% with almost the same time complexity with added possibility of phonetic hashing. With our proposed VKTPY rules, it is easy to extend it for other Indian languages to enable cross language retrieval. In short our study indicates that the proposed VKTPY rule and VKTPY PHT are very efficient and effective in searching and retrieving information from Kannada documents.
REFERENCES

https://scriptsource.org/scr/Dev

https://en.wikipedia.org/wiki/Brahmic_scripts

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.18.9659&r ep=rep1&type=pdf

https://en.wikiedia.org/wiki/Katapayadi_system

https://mysteriesexplored.wordpress.com/2011/08/24/amazing encryptiontechnologyinancientindiathekatapayadishankya/

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.18.9659&r ep=rep1&type=pdf

Subhash Kak, 2000, Indian binary numbers and the Katapayadi notation, Annals of the Bhandarkar Oriental Research Institute, vol.81, 2000, pp.269272

T.R.N. Rao and Subhash kak, August 8, 2016, Computation in Ancient India, Paperback August 8, 2016

http://www.prajavani.net/news/category/22885.htm

https://drive.google.com/drive/folders/MFrrFZLNhr7F0Ge9XOLgW eTgIxL2IyX

Comer, D. 1979, Heuristics for trie index minimization, ACM trans. on Database Systems 4(3), 383395.

M. AlSuwaiyel and Ellis Horowitz, 1984, Algorithms for Trie Compaction, ACM Trans. Database Syst., v.9, pp.243263

Flajolet, P. & Puech, C. 1986, Partial match retrieval of multimedia data, Jour. of the ACM 33(2), 371407.
[14] 
Fredkin, E. 1960, Trie memory, Communications of the ACM 3(9), 490499. 
Homepage archive Volume 33 Issue 2, Pages 371407, ACM New York, NY, USA 

[15] 
Frigo, M., Leiserson, C. E., Prokop, H. & Ramachandran, S., 1999, 
[23] 
Aoe et al. 1992 , An Ecient Implementation of Trie Structures, 
Cacheoblivious algorithms,in IEEE Symp. on the Foundations of Computer Science, IEEE Computer Society Press, p. 285. 
SOFIWAREPRACTICE AND EXPERIENCE,VOL.22(9), 695 721 (SEPTEMBER 1992) 

[16] 
Fu, J. W. C., Patel, J. H. & Janssens, B. L.,1992, Stride directed 
[24] 
Jon L. Bentley , Robert Sedgewick , 1997, Fast Fast algorithms for 
prefetching in scalar processors, in Proc. Annual ACM/IEEE MICRO Int. Symp. on Microarchitecture, IEEE Computer Society 
sorting and searching strings, Proceeding SODA 97 Proceedings of the eighth annual ACMSIAM symposium on Discrete algorithms , 

Press, pp. 102110. 
Pages 360369, Society for Industrial and Applied Mathematics 

[17] 
Hallberg, J., Palm, T., & Brorsson, M. 2003, Cacheconscious allocation of pointerbased data structures revisited with hw/sw 
[25] 
Philadelphia, PA, USA Â©1997 Bell, T. C., Cleary, J. G. & Witten, I. H. 1990, Text Compression, 
prefetching, in 2nd Annual Workshop on Duplicating, 
PrenticeHall. ISBN:0139119914 

[18] 
Deconstructing, and Debunking. Harman, D., 1995, Overview of the second text retrieval conference 
[26] 
Comer, D. 1979, Heuristics for trie index minimization, ACM trans. on Database Systems 4(3), 383395. 
(TREC2),in Proc. Second Text Retrieval Conference, Pergamon 
[27] 
Knuth, D. E., 1998, The Art of Computer Programming: Sorting and 

[19] 
Press, Inc., pp. 271289. Heinz, S., Zobel, J. & Williams, H. E., 2002,Burst tries: A fast, 
[28] 
Searching, Vol. 3, second edn, AddisonWesley. Sumant Kulakari,Srinath Srinivasa ,TrieIR: Indexing and Retrieval 
ecient data structure for string keys, ACM trans. on Information 
Engine for Kannada Unicode Text,Digital Libraries: Social Media 

[20] 
Systems 20(2), 192223. Rakesh Agrawal and Ramakrishnan Srikant, 1994 , Fast Algorithms 
and Community Networks: 15th International Conference on Asia Pacic Digital Libraries, ICADL 2013, Bangalore, India, December 

for Mining Association Rules in Large Databases, Proceeding 
911, 2013. Proceedings (pp.2124) 

VLDB 94 Proceedings of the 20th International Conference on Very Large Data Bases Pages 487499, Morgan Kaufmann Publishers Inc. 
[29] 
Sriram Ramabhadran, Sylvia Ratnasam, Prex Hash Tree An Indexing Data Structure over Distributed Hash Tables, 2004 

SF,USA 
[30] 
http://ntzdevelop.blogspot.in/2011/03/phoneticalgorithms.html 

[21] 
Robert Sedgewick, 1998 , Algorithms in C++, Parts 14: 
[31] 
https://thottingal.in/blog/2009/07/26/indicsoundex/ 
Fundamentals, Data Structure, Sorting, Searching, Third Edition, 
[32] 
Y. Hegde, S. Kadambe and P. Naduthota, Sux stripping algorithm 

[22] 
Addison Wesley 1998, chapter 15. Philippe Flajolet, Claude Puech, April 1986 Partial match retrieval 
for Kannada information retrieval, 2013 International Conference on Advances in Computing, Communications and Informatics 
of multidimensional data, Journal of the ACM (JACM) JACM
(ICACCI), Mysore, 2013, pp. 527533. doi: 10.1109/ICACCI.2013.6637227