State Machine based Framework for Genomic Analysis

Lakshmi Bharathi S; Dr. Vidya Niranjan; Sudhamshu Mohan S

doi:10.17577/IJERTV9IS010132

Volume 09, Issue 01 (January 2020)

State Machine based Framework for Genomic Analysis

DOI : 10.17577/IJERTV9IS010132

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 98
Authors : Lakshmi Bharathi S , Dr. Vidya Niranjan , Sudhamshu Mohan S
Paper ID : IJERTV9IS010132
Volume & Issue : Volume 09, Issue 01 (January 2020)
Published (First Online): 20-07-2021
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

State Machine based Framework for Genomic Analysis

1. Lakshmi Bharathi S, 2. Dr. Vidya Niranjan, 3. Sudhamshu Mohan S

Mysore Road, R V Vidyanikethan, Bengaluru 560059, Karnataka, India

INTRODUCTION

The extensibility of Finite State Machines (FSM) to different disciplines viz., networking, compiler design, marketing, etc., is highly appreciated for its optimal and precise solutions to respective problems[12]. The attempt of attributing states of a FSM to biological data, specifically to genes is rarely endeavored because it is very difficult to visualize and represent states and events corresponding to genomic data. This constraint paved the way in looking for a transform using which representable genomic states and events could be realized. The subsequent sections of this paper introduce to such a linear transform thereby the mapping of genes leads to mathematically representable states[17].

METHODOLOGY

Gene Representation

Let = {G1, G2, GN} be the genes present in a genome, where G1, G2,GN are genes. As there we be a really large set of genes present in genome. This representation can lead to a very large set.

Mathematically contains finite set of genes, however large it might be. We can use mathematically convenient representations which are computationally appreciable

= {G1, G2, GN}

General flow of the framework

Where, is impulsive only at kth index where gene is present. In fact, will have a value unity where gene-k exists and is equal to 0 elsewhere. This throws focus on mathematical representation of nucleotides A T G C.

Bergen and Antoniou proposed a method based on complex representation, parametric window function and STDFT to maximize SNR (Signal to Noise Ratio) to identify coding regions of the genes

A= 0.10+0.12j; T= -0.30-0.20j;

G=0.45-0.19j; C=0

Graphical representation of nucleotides
Gene Function

is complex in nature.

which is a Such that leads to a numerical value V which

should be a convergent function.

Where Where, n is the position of nucleotide within a gene

Where, is the arithmetic mean of gene, l is the number of nucleotides within a gene is the mean of the nucleotide

the variance of the nucleotide

So, we will have each gene getting mapped to a number, uniqueness of this value increases with increase in number of parameters, that is if we go for and , the value we get will be more unique.
Representation using Gene Function

can be represented as a right-handed sequence starting from origin till valN.

Similarly, we can also represent the query sequence.

Now we have mathematically representable sequences and numbers which are unique.
Gene Convolution

Genome convolution is the convolution of these sequences

(G)

G1 G2

GN

Reference axis

Here, G1, G2, GN are complex in nature.

Q(G)

G1 G2

Gn

Reference axis

Similarly, Reference gene RG corresponds to some value.

With all these assumptions, we need to find the 1 and 2 Where,

and

1(G) and 2(G) are the resultants obtained by convolving genomic sequence with the query sequence and genomic sequence with the reference sequence.[13]
If 1(G) and 2(G) are of bigger length, we need to use transformation. (1(G), 2(G))

We need to first reduce the feature space of 1(G) and 2(G) using K means algorithm, if the feature space is very large to the best samples.
Scoring Mechanism

Now, scoring mechanism needs to be employed This will be a rectangular matrix of order p*q. pq = (1(G) – 2(G))/

where, is the determinant.

We need to make order (r*r) where R is the LCM(p,q) = We need to make all the remaining entries 0 by appending it everywhere

Now, the determinant needs to be calculated
States (Key Players)

Now try to make the matrix upper/lower triangular or try to diagonalize it (Echelon forms).

The remaining entries now are the key players of our analysis. If the matrix is diagonalizable then analysis becomes easy. Judgement (J) from the key players will be based on the learning methods.

Inputs for learning are
[Principal diagonal elements or Triangular matrix] + [Determinant obtained from matrix] + [Rank of the matrix]
Using some machine learning techniques like multi-layered perception or SVM we can train some of the inputs. Involving reference genes for a given genome with some of the query sequences (around 30-40 samples)

Then by using this binary classifier we can make a decision for a particular analysis.
SVM based Binary Classifier

Training inputs

Output Range

Ref Gene +Genome +Query sequence

Decide on output range of values

II. CONSTRAINTS FOR THE APPLICATION OF THIS METHOD
1. Gene Representation
  
  To mathematically represent a gene, following parameters need to be decided.
  1. All the nucleotides must be unique i.e., the number corresponding to each nucleotide must be different from others.
  2. These numbers must be linearly independent and should not belong to same linear space.
  3. Orthogonality is the most preferred feature to introduce the uniqueness in the analysis.
  4. Each nucleotide will be a point on complex plane. (S-plane in Laplace plane)
    
    Imaginary plane
    
    Gene
    
    Real Plane
2. Gene Function
  1. It should be convergent
  2. It should be definitive, continuous and differentiable
  3. Periodic and non-periodic properties need to be studied further
3. Scoring Mechanism
  1. As this involves finding key players (numbers) that is finding triangular elements or diagonal elements, the values are highly uncorrelated.
  2. Dividing the values by determinant will normalize the values.
    1. APPLICATIONS
      
      This idea could be transformed as a framework for Genomic Analysis, primarily intended to characterize nucleotide, gene, genome and their expressions in a mathematically coherent way.[14] The inherent modularity of this framework ensures to enhance and improve performance of each stage independently. As the crux of the framework is principally derived harnessing the concept of linear systems viz. Convolution, Linear Transformation- Cartesian Association, K- means Clustering, Discrete Differentiation, Characteristic Equations followed by a training method based on Support Vector Machines, the approach employed is robust, simple, conclusive and reliable.[16]
      The tangibility of this approach is derived from the philosophy of Linearization of Sample Space.[13] This novel approach shapes & fits the non-characterized biological data into a framework where one can apply any of the Linear Discriminative Techniques which are deterministic in nature, thus leading to conclusions which are reliable and specific in nature. [18]
    2. FUTURE SCOPE
      
      As this framework is modular and generic in nature, this could be enhanced and extended to techniques based on Fuzzy Logic, for better decisiveness and quantifiability of conclusions. For huge training data sets a layered approach involving Artificial
      
      intelligence-based approach could be studied. For better characterization of Gene Functions, we can even go for multi parameter based statistical techniques for Modelling, which are based on Curvature based analysis.
    3. REFEENCES

FrancisDutil, JosephPaulCohen, MartinWeiss, GeorgyDerevyanko, YoshuaBengio Towards Gene Expression Convolutions using Gene Interaction Graphs arXiv:1806.06975v1 [q-bio.GN] 18 Jun 2018
MichaÃ«l Defferrard, Xavier Bresson, Pierre Vandergheynst Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering EPFL,

Lausanne, Switzerland
Yifei Chen1,4,, Yi Li1,, Rajiv Narayan2, Aravind Subramanian2 and Xiaohui Xie, Gene expression inference with deep learning Bioinformatics Advance Access published February 11, 2016
Tanya Barrett*, Dennis B. Troup, Stephen E. Wilhite, Pierre Ledoux, et.al., NCBI GEO: archive for high-throughput functional genomic data Nucleic Acids Research, 2009, Vol. 37, Database issue D885D890 doi:10.1093/nar/gkn764
David Warde-Farley, Sylva L. Donaldson, Ovi Comes, Khalid Zuberi, et.al., The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function Nucleic Acids Research, 2010, Vol. 38, Web Server issue doi:10.1093/nar/gkq537
Minoru Kanehisa, Miho Furumichi, Mao Tanabe, Yoko Sato et.al., KEGG: new perspectives on genomes, pathways, diseases and drugs Nucleic Acids Research, 2017, Vol. 45, Database issue D353D361 doi: 10.1093/nar/gkw1092
James M. Heather, Benjamin Chain, The Sequence of Sequencers: The History of Sequencing DNA, Genomics (2015), doi: 10.1016/j.ygeno.2015.11.003
Stanley H. Chan, CONSTRUCTING A SPARSE CONVOLUTION MATRIX FOR SHIFT VARYING IMAGE RESTORATION PROBLEMS Proceedings of 2010 IEEE 17th International Conference on Image Processing
Yanwei Pang, Senior Member, Manli Sun, Xiaoheng Jiang, and Xuelong Li, Convolution in Convolution for Network in Network IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
S. H. Chan, "Constructing a sparse convolution matrix for shift varying image restoration problems," 2010 IEEE International Conference on Image Processing, Hong Kong, 2010, pp. 3601-3604.
Y. Pang, M. Sun, X. Jiang and X. Li, "Convolution in Convolution for Network in Network," in IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 5, pp. 1587-1597, May 2018.
R. Xi, M. Hou, M. Fu, H. Qu and D. Liu, "Deep Dilated Convolution on Multimodality Time Series for Human Activity Recognition," 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 2018, pp. 1-8
L. Gao, P. Chen and S. Yu, "Demonstration of Convolution Kernel Operation on Resistive Cross-Point Array," in IEEE Electron Device Letters, vol. 37, no. 7, pp. 870-873, July 2016.
J. Shangguan, Y. Li, Y. Wang and H. Li, "Fast algorithm of modified cubic convolution interpolation," 2011 4th International Congress on Image and Signal Processing, Shanghai, 2011, pp. 1072-1075.
X. Gao and H. Xiong, "A hybrid wavelet convolution network with sparse-coding for image super-resolution," 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, 2016, pp. 1439-1443.
S. Shrivastava and P. Rawat, "High speed and delay efficient convolution by using Kogge Stone device," 2017 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, 2017, pp. 1-5.
B. Bipin and J. J. Nair, "Image convolution optimization using sparse matrix vector multiplication technique," 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Jaipur, 2016, pp. 1453-1457.
S. Jain and S. Saini, "High speed convolution and deconvolution algorithm (Based on Ancient Indian Vedic Mathematics)," 2014 11th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), Nakhon Ratchasima, 2014, pp. 1-5.
C. Radhakrishnan and W. K. Jenkins, "Modified Discrete Fourier Transforms for fast convolution and adaptive filtering," Proceedings of 2010 IEEE International Symposium on Circuits and Systems, Paris, 2010, pp. 1611-1614.
R. Krutsch and S. Naidu, "Monte Carlo method based precision analysis of deep convolution nets," 2016 Conference on Design and Architectures for Signal and Image Processing (DASIP), Rennes,

[21] 2016, pp. 162-167.

S. Kambhampati, "Power efficient modulo convolution," 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, 2016, pp. 1-6.
P. Katkar, T. N. Sridhar, G. M. Sharath, S. Sivanantham and K. Sivasankaran, "VLSI implementation of fast convolution," 2015 Online International Conference on Green Engineering and Technologies (IC-GET), Coimbatore, 2015, pp. 1-5.
A. Khumaidi, E. M. Yuniarno and M. H. Purnomo, "Welding defect classification based on convolution neural network (CNN) and Gaussian kernel," 2017 International Seminar on Intelligent Technology and Its Applications (ISITIA), Surabaya, 2017, pp. 261-265.

Training inputs	Output Range
Ref Gene +Genome +Query sequence	Decide on output range of values

State Machine based Framework for Genomic Analysis

Leave a Reply