Experimental study of Descriptive and Inferential Analytics Approaches for Real Estate Buyers using ‘R’ Tools

Download Full-Text PDF Cite this Publication

Text Only Version

Experimental study of Descriptive and Inferential Analytics Approaches for Real Estate Buyers using ‘R’ Tools

Sameer Jain

Assistant Professor

National Institute of Construction Management and Research, Pune, India

Abstract – In the building industry, analytics has altered the fundamental pattern of data processing and forecasting. Delights, the lobbying firm of a newly established real estate buyer, is aiming to penetrate the Melbourne property market. In order to generate insights on various aspects of this booming market, senior management is keen to capitalize on large volumes of historical real estate data. Insights can vary through location, seasonality, price patterns, area features, land features, property features and numerous other aspects. A massive dataset of real estate transactions in Melbourne, near 20,000 records from 2017, has been obtained by the company. Huge quantities of structured and unstructured data are generated in the industry and we will help a company make a game-changing decision with this knowledge. In this article, the descriptive and inferential analytics approach is used to extract insights into the real estate industry.

Keywords: Analytics; R; Real Estate; Inferential; Multiple Linear Regressions;


As a scientist, you have to collect experiment data and analyze those as a part of the scientic method. In all those experiments, the data are very complicated and the understanding of that information is done by the help of graphs and statistical models. Graphics and modeling are done with the help of computer software and computing is just one of the skills necessary for a scientist. Training in the R needs lot of skills is software used for analytics and it has the potential for statistical analysis of data. R has many important features like built-in functions libraries which have basic to advanced statistical functions. The R libraries installed from the CRAN mirror sites where R is available.

R has many data mining algorithms, as well as data visualization and data manipulation mechanisms. It covers all statistical applications from easy to complex and it would allow you to complete all your statistical training using R. Also, it covers topics in a consistent way so that the programming you learn for say linear models will also be done the same way for non-linear models, hence it will reduce your efforts. This consistency is convenient and also gives an understanding of statistical modeling.

Modern statistics has simplied many problems through the use of graphics and computer intensive functions and tools.


In this to review the relevant literature, using analytics and big data for decision making on the basis of complex and large data sets which encompasses the tools for collecting and analyzing the structured and semi structured and unstructured data. R is a programming language and software environment for statistical computing and graphics supported by R foundation for statistical computing. The R is widely used among statisticians and data miners for developing statistical software and data analysis [Wikipedia].

R is an implementation of S programming language created by two University of Auckland statisticians Ross Ihaka and Robert Gentlemen.

There are two analogies for the name R: (i) based on the first names of the two R authors and (ii) As a play on the names of S (S programming language).

According to KDnuggests [11] online newsletter conducted polls in 2012, 2013 and 2014 asking the question What statistics/programming languages you used for an analytics/ data mining/ data science work. The result show that SQL, SAS, PYTHON, and R- hold a commanding lead and 91% of all respondents used one of them.

Big data analytics is a term through which real time data is analyze and managing both structured, semi-structured and unstructured data for decision making and optimize processes [7,8].


Using descriptive research, statistical methodologies to analyze the factors that affect the real estate market in Melbourne are used by experiment-based 'R' tool to forecast the value of homes in Melbourne. This paper

helps to provide insights into the real estate market, Domain sights, real estate firm needed move in Melbourne.

Step 1: Obtain the data

Set the working directory using setwd () command in the R tool to your respective folder.

Load the dataset.

Melbourne _realestate = read.csv ("Melbournerealestate.csv")

Step 2: Dataset Summary

Step 3: Evaluate data distribution

Step 4: Reduce Data Dimensions

Analyze the correlation of attributes. Explore the correlation between attributes in the dataset.

Plots to analyze the relationship:

As shown below, from the plot, the number of rooms is positively correlated with the price and the distance from the town of the property is negatively correlated.

Figure 1: Distance and Price graph

Multiple linear regression:

To construct the relationship between the independent and dependent variables, multiple linear regression (MLR) models are used. The dependent variable is price, which can affect the remaining variables. Multiple linear regression helps to forecast continuous data, where the property's price can be predicted using the fitted multiple regression model.

Data Transformation:

We first read the data from the Melbourne real estate file data sets and generated a subset of numerical values to check for correlation. After that, the subset values are converted into the variables of the factor and then the multiple linear regression model is constructed. As the distribution of this variable equips the other data, the price variable is converted to logarithm.

Interpretation of output

Significance of variables for predictor:

As p>.05, the Typet variable is insignificant; the p value greater than .05 is insignificant; thus, all the other variables except the Typet variable are important.

The adjusted R squared value reveals that this regression model explains 61 percent variability, rest due to randomness. The overall model is therefore critical since p<.05, with F-statistics 760.88.

Hypothesis testing:

Scenario I:

To verify that the property is more distance-biased than the number of rooms

H0: To measure whether the price is equivalent to the distance of the property from the city.

H1: Alternate is the price is not equivalent to the distance of the property from the city.


The p value is less than .05, so there is no chance to accept the null hypothesis, it suggests

that the price is not identical and it differs, implying that a property that is 3BHK is overpriced

than a property near the city, located about 15-20 kms from the city.

Scenario 2:

To check if the cost of the property is affected by the space of the car,

the year constructed, the size of the land and other variables.

H0: To measure whether the price of the property is more influential than other

variables on car space, year built, land size.

H1: With these variables, the property price is not influential.

The p value is lower than .05 from above, so the price of the property is more

affected by other variables.

Scenario 3:

To measure whether the year of completion of the property Affects the price of the roperty.

H0: The property's price and year of building are independent.

H1: The year of construction determines the price f the property.

Data collection:

In this, we apply the chi-squared test of Pearson to verify the goodness of the

fit price value. Property prices are divided into five groups, ranging from cheap

to extremely expensive. Based on the year of completion, the year constructed is

clustered from the old building to the new.



Therefore, from the above output p<0.05, it indicates that the price of the property is affected by the year constructed.

Scenario 4:

To determine if the number of spaces for cars depends on the distance from the city.

H0: A space for a car is independent of distance from the city.

H1: Car spaces are affected by distance.


So, the number of car spaces depends on the distance from the city, from the above output p<0.05. The underlying hypothesis is that suburbs can have more car space than the city.

Scenario 5:

To determine if the number of car spaces is specified by the building size.

H0: Building size and space for cars are independent from each other.

H1: The amount of car space is influenced by the building's size.

P<0.05, which means null hypothesis rejection, because the number of spaces in the car depends on the size of the property.


In this study, R was used because it is an open-source software that can also be used for all other paid packages, such as SPSS, MS EXCEL, Minitab, Tata, etc. Inferential, statistical and multiple linear regression techniques are extracted from the above research for the real estate sector. It shows that the price of Melbourne City property is affected by rooms, land size, year built, and distance from the city. The other variables, such as car space, type of house, have weak correlations and the space of the car depends on the size of the buildings. In this, there is a lot of space for predicting the dimensions and measuring by using data visualizations and classification techniques.


  1. Abbott, D. (2014). Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst, Wiley.

  2. Boehmke, Bradley C. and Jackson, Ross A. (2016) Unpacking the true cost of free statistical software. OR/MS Today, vol. 43, 26 27.

  3. Bradley Boehmke(2016): Data Wrangling with R, Springer.

  4. Deepali Arora, Piyush Malikanalytics (2015). Key To Go From Hoarding Big Data To Deriving Value, IEEE.

  5. Dhanya Jothimani, Ravi Shankar and Surendra S. Yadav (2014). A Big Data Analytical Framework for

    Portfolio Optimization.

  6. H. J Watson (2013), All about Analytics , International Journal of Business Intelligence Research, Vol.4. No.2, pp. 13 28.

  7. H. Chen, R. H. L. Chiang, and V. C. Storey (2012), Business Intelligence and Analytics: From Big Data to Big Impact, MIS Q., vol. 36, no. 4, pp. 11651188.

  8. Ihaka, Ross, and Robert Gentleman. (1996) R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5, 299314.

  9. J.L.Bale, D.Bele, R.Pirnat, V.A.Loncaric(2015), Business Intelligence In e-Learning, Florence, Italy , pp 369- 375

  10. Krause, A. (2016). Reproducible research in real estate: A review and an example. Journal of Real Estate Practice and Education, 19(1), 6985.

  11. Krause, A. & Lipscomb, C. A. (2016). The data preparation process in real estate: Guidance and review. Journal of Real Estate Practice and Education, 19(1), 1542.

  12. Morandat, Floréal, Brandon Hill, Leo Osvald, and Jan Vitek. Evaluating the design of the R language.(2012) In European Conference on Object-Oriented Programming, pp. 104131. Springer Berlin Heidelberg.

  13. Nelson, M. L. (2009). Data-driven science: A new paradigm? Educause Review, 44(4), 612.

  14. P. RussomBig Data Analytics (2011), TDWI Best practices Report. Seattle: The Data warehousing,http://tdwi.org/research/2011/09/bestpractices-report- 4-big-data-analytices.aspx.

  15. Pollack, R. D., Klimberg, R. K., and Boklage, S.H. (2015) The true cost of free statistical software. OR/MS Today, vol. 42, 34 35.

  16. The R Project (2018) [Internet], R Project Avalible from http://www.r-project.org [Accessed April 2019]

  17. S. Miller, S. Lucas, L. Irakliotis, M. Ruppa, T.Carlson and B.Perlowitz. (2012), Demystifying Big Data: A practical Guide to Transforming the Business of Government, Washington: Tech America Foundation.

  18. Taylor, J, Decision management systems: A practical guide to using business rules and predictive analytics. New York, NY: IBM Press.

  19. Yongho Ko And Seungwoo Han (2015). Big Data Analysis Based Practical Applications in Construction, Int'l Conf. on Advances in Big Data Analytics, pp.121-122.

  20. Internet Source / Web Source

  21. [a] www.r-project.org, accessed on 30 November 2020.

  22. [b] www.python.org/doc/essays/blurb/, accessed on 16 November 2020.

  23. [c] www.scala-lang.org/, accessed on 15 October 2020.

  24. [d] https://spark.apache.org/, accessed on 22 October 2020.

  25. [e] https://cran.r-project.org/web/packages/sparklyr/index.html, accessed on 25 December

Leave a Reply

Your email address will not be published. Required fields are marked *