Tax Aggressiveness Prediction Method with Neural Network and Logistic Regression

This study aims to examine the predictive power of tax aggressiveness using neural network and logistic regression methods. This research sample is a company whose shares are listed in the Indonesian Sharia Stock Index (ISSI) in the period 2011-2015. A total of 71 public companies in Indonesia were obtained. Data obtained from Indonesia Stock Exchange. The technique of determining the sample was used purposive sampling. The independent variables used are maqashid sharia index, disclosure index of corporate social responsibility, company size, profitability, leverage, inventory intensity, and capital intensity. The analysis technique used is multiple regression, logistic regression, and neural networks. In the initial test, multiple regression method was used. At this initial stage, other independent variables will be known that can predict the level of tax aggressiveness. In the second stage of the test comparing the prediction model of tax aggressiveness that gives a higher level of accuracy between logistic regression analysis and neural network. Based on the results of the analysis and discussion, it can be concluded that the Neural Network method provides a better level of prediction than logistic regression for training data and


INTRODUCTION
This research attempts to predict the tax aggressiveness by entering the maqashid sharia index variable and the level of corporate social responsibility disclosure. The prediction model that will be developed is called the Sharia-based Islamic Tax Aggressive Prediction Model and Social Disclosure. In this model, several control variables are also included, such as company size, profitability, leverage, capital intensity, and inventory intensity. Furthermore, this study will find and compare which prediction models are more accurate to predict the level of tax aggressiveness. This study is intended to compare between the classical models represented by logistic regression with the new model represented by the neural network.
Logistic regression analysis can be used to predict tax aggressiveness. In logistic regression, a logistic model is used to explain the relationship between predictors and responses and to group objects into one of two response categories. In its development, logistic regression can also be used for responses in more than two groups, known as polycotomous logistic regression. Logistic regression in some literature is referred to as a classic model.
One classification method developed from the machine learning group is the Neural Network (NN). This model does not require a measurement scale and certain distribution of predictors or inputs in NN terminology. In general, there are two major groups in NN associated with the presence or absence of responses, namely supervised and unsupervised NN. In the case of this classification analysis, the NN used is included in the supervised NN group, because the learning process (function optimization) is supervised by a response (output classification). In some classification literature, this NN is often referred to as part of the modern classification model.
The novelty offered in this study is on new methods to predict tax aggressiveness using neural networks and logistic regression. There have been no previous studies comparing neural network methods and logistic regression in order to predict tax aggressiveness. Based on the background of the above problems, the formulation of the problem from this study is which method will provide higher predictive power between logistic regression methods and Neural Network (NN)?

OBJECTIVES OF THE STUDY
The purpose of this study is to compare logistic regression and Neural Network. Both methods are applied using statistical packages that provide facilities for data analysis using SPSS version 20 software. Each data is divided into two groups, namely data for modeling (training) and evaluation (testing) where training and testing comparisons are 2: 2 . Next will be compared the classification accuracy of each classification method.

Logistic Regression
Logistic regression is a special form where the dependent variable becomes two parts or groups (binary). Although the formula can be more than two groups. Logistic regression is a regression that is used to find a regression equation if the dependent variable is a scaleshaped variable. Binary logistic regression is used to find a regression equation where the dependent variable is categorical type two choices such as: yes or no, or more than two choices such as: disagree, agree, strongly agree.
Many categorical response variables have only two categorical values. Observations for each subject of the company can be classified as bankrupt (default) or not bankrupt (non default), with the probability value that will occur is calculated with 1 and 0. The response variables that become observations follow the Bernoulli distribution with binary random variables that have P (Y = 1).
populations using a small number of examples (one or several) at a time. NN is basically no different from the standard statistical model. NN is used as a statistical tool in various fields, including psychology, statistics, engineering, econometrics, and even physics. NN is also used as a cognitive process model by neural and cognitive scientists.
Basically, NN is built from simple units, sometimes called neurons or cells by analogy with real things. These units are connected by a series of weighted connections. Learning is usually done by modification of the weighted connection. Each unit of code corresponds to the features or characteristics of a pattern that we want to analyze or what we want to use as a prediction. These networks usually organize their units into several layers. This first layer is called the input layer, the last is the output layer. Middle layers (if any) are called hidden layers. The information to be analyzed is fed to the first layer neuron and then propagated to the second layer neuron for further processing. The results of this processing are then distributed to the next layer and so on until the last layer. Each unit receives some information from another unit (or from the outside world through several devices) and processes this information, which will be converted into the unit output.

Literature Review
The study of the influence of the Islamic maqashid index on tax aggressiveness is still relatively rarely . The results of their study found that companies that have a higher level of profitability, have a tendency to lower tax aggressiveness. This is in accordance with the concept of income tax regarding progressive tax rates where companies that have higher taxable income will be charged a higher tax rate as well.
Many researchers have examined the effect of leverage on the level of tax aggressiveness as done by Gupta 2014) found the negative influence of leverage on the level of tax aggressiveness. This is because the company is bound by an agreement with the creditor so it is less inclined to do tax aggressiveness.

Population and Sample
The population of this study is a company listed on the Indonesia Stock Exchange (IDX). Samples were selected according to certain criteria (with purposive sampling method) from 2011 to 2015. The data used in this study were taken from the Indonesian Capital Market Directory (ICMD), as well as those listed on idx.co.id. In addition, indicators regarding corporate social responsibility disclosures are obtained from the website

Variable Descriptions and Indicators
The research variable is divided into dependent variables and independent variables. Dependent variable is the level of tax aggressiveness (Y). The independent variables include: maqashid syariah index (X1), index of corporate social responsibility disclosure (X2), company size (X3), profitability (X4), leverage (X5), capital intensity (X6), and inventory intensity (X7) . 1. The level of tax aggressiveness is the level of how much the company reduces the amount of income tax every year. Tax aggressiveness in this study is proxied by an effective tax rate (ETR). ETR is measured by the tax burden divided by income before tax. Companies that carry out tax aggressiveness are given code 1 and companies that do not carry out tax aggressiveness are given 0. , and responsibility for products (9 indicators). The scores for each item of disclosure are summed and divided by the total items of disclosure expected for each indicator to obtain the disclosure score per indicator. 4. The characteristics of the company in this study include company size, profitability, leverage, capital intensity, and inventory intensity. Company size is measured by total sales. Profitability is measured by return on assets (ROA). ROA is a comparison of pretax profitability to total assets. Leverage is measured by total liabilities divided by total assets. Capital intensity is measured by fixed assets divided by total assets. Inventory intensity is measured by inventory divided by total assets.

Data Analysis Technique
In this study, each data is divided into two groups, namely data for modeling (training) and evaluation (testing) where the comparison of training and testing is 2: 2. Further classification will be carried out with logistic regression and Neural Networks. Both methods are applied using SPSS version 20 statistical software which provides facilities for data analysis with both methods. The research period is 2011 -2015. Data for modeling (training) and evaluation (testing) are divided into 2 (two), namely: 1.

Logistic Regression Comparison of Data Training and Testing 2: 2 (same year)
The Hosmer and Lemeshow test is used as a goodness of fit test to determine whether the model can be used to interpret the relationship between the level of tax aggressiveness and the seven independent variables. The research hypothesis of the Hosmer and Lemeshow test is H0 : Fit model (the model is able to explain empirical data) H1 : The model is not Fit H0 criteria if the p-value of the Hosmer and Lemeshow test is Chi square distribution of more than 0.05. The results of the model can be seen that the p-value is 0.537> 0.05 as in Table 1, it can be concluded that the null hypothesis cannot be rejected which means that the model is fit.  Table  2, while the rest is explained by factors outside the model. Statistical results are presented in Table 3. Based on Table 3, it can be concluded that the variables of IMS, ICSR, SIZE, and ROA significantly influence the level of tax aggressiveness while the other three variables (LEV, CAPINT, and INVINT) are not significant. This can be seen in the Wald value which is greater than the table value or by looking at the significance value of both of which are smaller than 0.05.

20.7% (Cox and Snell) and 28% (Nagelkerke) as in
Logistic regression model for data effective tax rate (ETR) with comparison of training and testing 2: 2 is as follows: or it can also be made with the following equation: The accuracy of the classification results in the regression model is shown in Table 4.  Table 4 above shows that the results of data classification for modeling (period 2011 -2012) with a comparison of 2: 2 training and testing with logistic regression is 70%. From the observation results, 57 companies which were non-defaults were precisely predicted to be 37 companies so that the level of prediction accuracy was 64.9%. On the contrary, from the observation results, 83 companies that defaulted correctly were predicted as many as 61 companies so that the level of accuracy of their predictions was 73.5%.
Furthermore, data classification is carried out for data testing (2013-2014 period) with logistic regression. The results showed that the accuracy of the classification was 75% as shown in table 5.

Comparison of Data Training and Testing 2: 2 (different years)
The analysis was continued for data with comparison of training and testing was 2: 2 for different year periods between variables X and variable Y. Variable Y used data for the following year period while variable X used data from the previous year. From the Hosmer Test it is known that the model used in this study is fit because the significance value is 0.946 above 0.05. The results of the Hosmer test are described in Table 6 below. From the summary model, can be seen in Cox and Snell R Square and Nagelkerke R Square by 22.3% and 29.8% variables of tax aggressiveness can be explained by independent variables while the rest is explained by factors outside the model. The results of the summary model are shown in Table 7. The significance test results in table 8 show that there are several variables, namely IMS, ICSR, SIZE, ROA, and CAPINT which significantly influence the tax aggressiveness. The significance value of these variables is below 0.05 or 0.1. LEV and INVINT variables have no significant effect on tax aggressiveness. From Table 8 the logistic regression model can be made as follows: The accuracy of the classification results in the regression model is shown in Table 9.  Table 9 shows that the results of data classification by comparing training and testing data 2: 2 in different years with logistic regression is 67.9%. From the results of observations as many as 64 companies that were nondefaults were precisely predicted as many as 42 companies so that the level of accuracy of the prediction was 65.6%. On the contrary, from the observation of 76 companies that defaulted, it was precisely predicted that there were 53 companies, so the accuracy of the prediction was 69.7%.
Furthermore, the classification of FD data is carried out for testing data 2: 2 (where the X variable uses the 2012-2013 period while the Y variable uses the period 2013 -2014) with logistic regression. The results showed that the accuracy of the classification was 72.1% as shown in Table 10.   Table 12 shows that overall the results of the data classification of Tax Aggressiveness for modeling (period 2011-2012) with a comparison of training and testing 2: 2 with Neural Network that is equal to 88.6%. From the results of observations as many as 57 companies that were non-defaults were precisely predicted as many as 52 companies so that the level of accuracy of the prediction was 91.2%. On the contrary, from the observation results, there were 83 companies that defaulted correctly, predicted as many as 72 companies so that the accuracy of the prediction was 88.6%. As for the testing data for the 2013-2014 period, the prediction accuracy is 95% as shown in Table 13.

Comparison of Data Training and Testing 2: 2 (Different Years)
Output results of the level of accuracy of the results of classification with Neural Network using training data for the period 2011-2012 for the X and 2012-2013 periods for Y variables are shown in Table 14. Based on Table 14, it can be seen that the level of accuracy of the prediction of total observations is 77.9%. From the observations of as many as 64 companies that were non-defaults were precisely predicted to be as many as 38 companies so that the level of accuracy of their predictions was 59.4%. On the contrary, from the results of observation, there were 76 companies that defaulted correctly, predicted as many as 71 companies so that the accuracy of the prediction was 93.9%.
Furthermore, for testing data using the period 2012-2013 for variables X and 2013 -2014 for variable Y shows the level of accuracy of the prediction of 80.7% as shown in Table 15. On average, it was found that the accuracy of data classification Tax Aggressiveness training and testing with Neural Netwrok was better than Logistic Regression.

CONCLUSIONS
The tax aggressiveness is part of tax planning applied by the companies in order to minimize or reduce the amount of taxes they are supposed to pay. This tax aggressiveness can be done by either lowering the amount of income or increase the amount of load that taxable income (taxable income) is reduced. Then, ultimately, it can reduce the amount of income tax that must be paid by the companies. Tax aggressiveness is a form of tax that is illegal tax evasion or tax avoidance which do not violate the law by exploiting loopholes in tax regulation. The researchers for more than 20 years have struggled to do empirical studies on the determinants that determine tax aggressiveness and provide different findings. The study does not justify that the entire practice of tax aggressive for it is unlawful as described by Frank et al. (2009). The study refers to previous studies which explain that the smaller the tax burden paid by the company, the more the company does tax aggressiveness in the practice of taxation.
This study found that the average level of accuracy of data classification Tax Aggressiveness for data modeling (training) was 68.95% for logistic regression and 83.25% for neural networks. These results indicate that methodically, neural networks are the best classification method for data training.The results also showed that the average level of accuracy of data classification Tax Aggressiveness for evaluation data was 73.5% for logistic regression and 87.85% for neural networks. This result shows that by method, neural network is the best classification method for testing data. Both of these findings provide evidence that by method, neural networks provide a better level of predictive accuracy than Logistic Regression.
Future research can expand this research by adding other variables such as Islamic governance in relation to the level of tax aggressiveness. In addition, further research can focus more on Islamic bank entities as objects of research so that the variable level of disclosure of social responsibility and the maqashid sharia index can be adjusted to the conditions in Islamic banking.