Strategic Insights into Vehicles Fuel Consumption Patterns: Innovative Approaches for Predictive Modeling and Efficiency Forecasting

Kajal Sheth; Dhvanil Patel; Gautam Swami

doi:10.17577/IJERTV13IS060063

Volume 13, Issue 06 (June 2024)

Strategic Insights into Vehicles Fuel Consumption Patterns: Innovative Approaches for Predictive Modeling and Efficiency Forecasting

DOI : 10.17577/IJERTV13IS060063

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 189
Authors : Kajal Sheth, Dhvanil Patel, Gautam Swami
Paper ID : IJERTV13IS060063
Volume & Issue : Volume 13, Issue 06 (June 2024)
Published (First Online): 22-06-2024
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Strategic Insights into Vehicles Fuel Consumption Patterns: Innovative Approaches for Predictive Modeling and Efficiency Forecasting

Kajal Sheth, New York Tech, Dhvanil Patel, Texas A&M University,

Gautam Swami, Tulane University

Abstract This study explores significant trends in fuel efficiency and carbon emissions within the passenger vehicle sector, utilizing extensive data from the U.S. Department of Energy and Environmental Protection Agencys fueleconomy.gov. Analyzing vehicle data from 1984 to the present, we identified key patterns in miles-per-gallon performance, tailpipe CO2 emissions, and economic impacts of vehicle efficiency. Our research highlights advancements in vehicle technology and shifts in consumer choices, underscoring their implications for environmental policy and sustainable transportation strategies. Employing advanced statistical analyses and machine learning in R and visualizations in Tableau, this study provides insights that support informed policy-making and strategic decisions in the transportation sector.

KeywordsFuel efficiency; Carbon emissions; Transportation sector; Environmental policy; Predictive analytics

INTRODUCTION

The transportation sector is a significant contributor to global carbon emissions, accounting for nearly a quarter of direct CO2 emissions from fuel combustion [1]. The urgency to reduce these emissions has intensified under the pressures of climate change and environmental degradation. Advances in vehicle technology have shown promise in improving fuel efficiency and reducing emissions, yet the pace of improvement and adoption varies widely across vehicle types and regions. This study leverages a comprehensive dataset from the U.S. Department of Energys official fuel economy information portal, fueleconomy.gov [2], to examine the evolution of fuel efficiency and carbon emissions within the passenger vehicle sector.

Despite abundant data on vehicle performance, there remains a gap in understanding the long-term trends and their implications for policy and consumer behavior. Most existing studies focus on short-term impacts or specific vehicle types, lacking a holistic view of the transportation sectors progress toward sustainability goals. This research aims to fill this gap by systematically analyzing fuel consumption patterns and emission trends over several decades, highlighting the technological and regulatory shifts that have influenced these trends. Specifically, the study seeks to identify key trends in fuel efficiency improvements, analyze the variance in emissions among different vehicle classes, and assess the economic implications of evolving fuel economy standards.

Using data curated and maintained by the U.S. Department of Energy's (DOE) Office of Energy Efficiency and Renewable Energy and supplemented by the U.S. Environmental Protection Agency (EPA), this analysis spans a broad spectrum of vehicles from 1984 to the present. The dataset includes

detailed metrics such as miles-per-gallon in various driving contexts, tailpipe CO2 emissions, and potential cost savings across diverse vehicle classes, including sedans, SUVs, and trucks [2]. This rigorous dataset, derived from standardized laboratory tests, provides a reliable foundation for comparing vehicle performance equitably. By exploring these comprehensive data, the study aims to provide strategic insights that can inform both policy-making and consumer decisions, ultimately supporting the DOE and EPA's mandate under the Energy Policy Act of 1992 to deliver precise and actionable fuel economy data to the public [1].
METHODS
1. Exploratory Data Analysis
  
  In our exploratory data analysis (EDA), we thoroughly examined the dataset, which consists of 43,156 entries across 83 distinct attributes. This extensive dataset provided a robust framework for our analysis. Initially, we focused on generating summary statistics for key variables that are critical to our research questions. These statistics offered initial insights into the data's central tendencies and variability, setting the stage for a deeper, more focused analysis. The subsequent section presents these summary statistics, detailing the measures of central tendency, dispersion, and the distribution of these selected variables.
  
  Fig. 1. Summary Statistics for EDA
  
  To better understand the distribution of our categorical variables, namely Make, FuelType1, FuelType2, Supercharger, Turbocharger, and Number of Cylinders, we conducted a frequency analysis. Frequency tables were generated to depict the occurrence of each category within these variables, providing a clear visual representation of the data distribution.
  
  table i. Frequency Table for FuelType1
  
  FuelType1
  
  Freq
  
  Regular Gasoline
  
  28,722
  
  Premium Gasoline
  
  12,791
  
  Diesel
  
  1,196
  
  Electricity
  
  257
  
  Midgrade Gasoline
  
  130
  
  Natural Gas
  
  60
  
  table ii. Frequency Table for FuelType2
  
  FuelType2
  
  Freq
  
  E85
  
  1,477
  
  Electricity
  
  206
  
  Natural Gas
  
  20
  
  Propane
  
  8
  
  table iii. Frequency Table for Number of Cylinders
  
  Number of Cylinders
  
  Freq
  
  4
  
  16,809
  
  6
  
  14,858
  
  8
  
  9,216
  
  5
  
  774
  
  12
  
  668
  
  3
  
  317
  
  10
  
  179
  
  2
  
  61
  
  16
  
  14
  
  The frequency tables provide a straightforward means to examine the distribution of categorical variables within the dataset. For instance, the data reveal that the most common engine types in vehicles are 4 and 6 cylinders, as illustrated in Table III, with 14 vehicles equipped with 16 cylinders, highlighting the rarity of such configurations.
  
  Following the analysis of categorical data, we proceed to examine continuous variables through visual means. Box plots have been utilized to depict the distribution of the Miles Per Gallon (MPG) attributes, tailpipe emissions, and fuel consumption rates. These visualizations are essential for understanding the spread and central tendencies of these variables, as well as for identifying outliers that may influence subsequent analyses.
  
  Fig. 2. Spread of MPG Attributes
  
  Fig. 3. Spread of Tailpipe Emissions
  
  Fig. 4. Average Fuel Economy by Fuel Type
  
  From the spread of the MPG attributes, we notice that the mean values of City MPG, Highway MPG, and Combined MPG values are very close to each other and are around 20 miles per gallon. The spread of tailpipe emissions shows the mean emissions in the dataset are around 400 grams per mile, and the
  
  average number of barrels used for fuel consumption is around 15.
  
  The other variables we use in our dataset are the Model and the Make of the vehicles. The chart below shows the top manufacturers by the number of models. The top three model manufacturers in the dataset are Meredes-Benz, BMW, and Chevrolet. Cadillac, Suzuki, and Subaru have the least number of models in the dataset.
  
  Fig. 5. Top Manufacturers by Count of Model
  
  Two other key parameters for the analysis are the average MPG and the primary fuel type. Therefore, we plot the yearly trend of average MPG by fuel type. Intuitively, this trend is indicative of the improvement in efficiency in the transportation sector and vehicle technology in general. The results are shown below.
  
  Fig. 6. Average Fuel Economy by Fuel Type
  
  As can be seen in the image above, gasoline and electric vehicles show the maximum improvement in fuel economy, while diesel and CNG vehicle show fluctuations. Although the scale on the y- axis is relative for each graph, in general, electric vehicles have the highest fuel economy, followed by CNG vehicles, diesel vehicles and lastly, the gasoline vehicles showing the lowest fuel economy. This conforms with the current trends in the transportation sector.
2. Data Manipulation and Cleaning
The raw dataset contained 83 variables, out of which only 21 were relevant for the purpose of this analysis. Therefore, these were selected and stored in a new data frame. The volume of the vehicle was a parameter of interest for analysis, so six

attributes for luggage and passenger volumes were summed together to get the total volume of all the vehicles. Therefore, the final dataset used for all the analysis contains 43,023 records and 16 variables. It was decided to keep all the records, not just the distinct ones since some manufacturers release models with the same name for multiple years.

Next, the categorical variables were converted to factors for the analysis. These include Make, Model, FuelType1, Fueltype2, Supercharger, and Turbocharger.

Next, the empty cells in various attributes were replaced with 0 or black values to ensure consistency in data. The summary statistics after cleaning the data are shown below:

Fig. 7. Summary Statistics after Data Manipulation

Next, the following 2 categorical variables were grouped together for narrowing down the scope of the analysis. This was done using regular expressions. The change in FuelType1 is described below:

table iv. Cleaning of FuelType1 Variable

Unclean

Clean

Regular Gasoline

Gasoline

Premium Gasoline

Diesel

Diesel

Natural Gas

Natural Gas

Electricity

Electricity

NA

Midgrade Gasoline

NA

The vehicle Class variable was also grouped from 34 unique values down to 5. These new levels are Other, Cars, Vans, Pickup Trucks, and SUVs.

STATISTICAL ANALYSIS

Efficiency of superchargers vs Turbochargers

This part aims to investigate the impact superchargers and turbochargers have on IC engine vehicles. Both these components are used to increase the combustion efficiency of the engine and increase the power output. While the supercharger draws power directly from the crankshaft, the turbocharger uses the exhaust gases created by combustion to power the compressor. This is an inferential type question which compares 2 independent attributes.

For the purpose of the analysis, the vehicles with either a turbocharger or a supercharger are filtered. The summary statistics are shown below:

table v. Summary Statistics for Superchargers and

TurboCharger	SuperCharger	N	Mean	Median	SD
True	False	7822	22.23907	22	4.696939
True	True	74	23.93243	25	3.897070
False	True	831	18.21540	18	3.193450

Turbochargers

Fig. 8. Comparison of Superchargers and Turbochargers

It can be ovserved that there is some intersection between the 2 attributes. The intersecting records are eliminated to ensure that both the attributes are independent. The MPG values can be assumed to be normally distributed and hence we can use t-test to run hypothesis test. For this test, the hypotheses are:

H0: Turbocharger = Supercharger (i.e. there is no difference in average fuel economies)
Ha: Turbocharger Supercharger (i.e. there is a difference in average fuel economies)

The results of the t-test are shown below:

Fig. 9. T-Test on Superchargers and Turbochargers

Due to the extremely small p-value, the null hypothesis is rejected. Therefore, there is significant statistical evidence that fuel economy is different for vehicles with superchargers and turbochargers. Out of the two, vehicles with turbocharger have significantly better fuel economy.

Fuel economy comparison

This section aims to investigate the difference in fuel economies for different classes of vehicles such as cars, vans, pickup trucks, SUVs, and others. This analysis is performed on IC engine vehicles only to ensure comparison between similar entities. This question aims to check what effect the utility of the vehicle has on the MPG. This is an inferential type question

which compares the means of a common attribute between different groups.

Vehicles with Fueltype1 as Gasoline, Diesel, CNG and Fueltype2 as E85, Propane, CNG are filtered. This ensures no hybrid EV or battery EV is included for the analysis. The summary statistics are shown below:

table vi. Summary Statistics for FuelType1

Class	N	Mean	Median	SD
Cars	22809	22.42799	21	5.216095
Other	4632	18.25712	18	4.951738
Pickup Trucks	6136	16.69801	16	3.099634
SUV	6625	19.55894	19	4.14002
Vans	2306	15.55421	15	2.739264

Fig. 10. Comparison of MPG by Vehicle Class

From the figure above, we can see that the average MPG are not significantly different, however, it is important to note that the data contains outliers, which may skew the observation. The data is independent and can be assumed to be normally distributed, therefore, we will use ANOVA method to statistically model the data and test the hypothesis. For this test, the hypotheses are:

H0: Vans = SUV = Trucks = Cars = Others (i.e. all means are the same)
Ha: Vans SUV Trucks Cars Others (i.e. at least some of the means are different)

To perform the ANOVA test, a linear regression is performed, and an ANOVA table is created using the anova function. The result is shown below:

Fig. 11. ANOVA table

Due to the extremely small p-value returned, the null hypothesis is rejected. Therefore, there is significant statistical evidence that fuel economy is different for different classes of IC engine vehicles. This difference throws light on the relation

between MPG and the load carrying capacity of the vehicle as well. Cars, which are mainly passenger vehicles show the highest MPG, whereas, trucks and vans, whic are mainly cargo vehicles show the lowest MPG values.

Engine Displacement and Vehicle Volume Correlation

This part aims to find out if there is a correlation between the engine displacement and the volume of the vehicle. Before performing a correlation test on these variables, we choose all the vehicles which are not electric vehicles and consider a vehicle volume of more than 20 cubic feet but less than 230 cubic feet to eliminate outliers. The null values in the engine displacement attribute are also ignored.

The summary statistics for the attribute vehicle volume show a total of 22788 rows with a mean of 120.23 cubic feet, standard deviation of 36.36 and a median vehicle volume of 111 cubic feet. These summary statistics are visualized using a box plot as shown below

Fig. 12. Box Plot for Vehicle Volume

Assuming the two attributes are not normally distributed and hence we use the Kendall Rank Correlation Test. The results of the test are shown below

Fig. 13. Kendall's Rank Correlation Test

Tau is the Kendall correlation coefficient. The p value for the test is 0.1385 and the tau value is – 0.0068. Since the value of tau is very close to zero, the two variables have no correlation. A scatter plot between the two confirms the same as shown below

Fig. 14. Correlation between Engine Displacement and Vehicle Volume

The above scatter plot is made using the ggpubr library function ggscatter which displays the correlation coefficient and the significance level on the plot. A regression line is added to the chart shown in red and the points are grouped by the volume of the vehicle.
Forecast of tailpipe CO2 emissions

As of 2018, the transportation sector in the US accounted for

~28% of the total greenhouse gas (GHG) emissions. While electric vehicles (EVs) are expected to be widely adopted in the near future, it is necessary to reduce the tailpipe emissions from internal combustion (IC) vehicles in the meantime. Thus, the aim of this question was to analyze the trend in the tailpipe CO2 emissions from 1984 until 2020 and forecast the emissions until 2030.

To perform this analysis, autoregressive integrated moving average (ARIMA) forecasting method was used. ARIMA models are widely used in time series forecasting. In this method, the lag in time series as well as the lagged forecast errors are used to predict future values based on the past values of a time series. Since EVs do not have CO2 tailpipe emissions, they were filtered out the data frame using the function filter. Makes with 5 highest count of vehicles in the raw data set were considered. These included Chevrolet, Ford, Dodge, GMC and Toyota. An average of tailpipe emissions was taken for each year (1984-2020) using the aggregate function. This was followed by converting the average emissions by year into a time series format with a frequency of 1 year using the ts function from the forecast package. The auto.arima function was used on the time series. The auto.arima command was used as it returns the best ARIMA model, rather than creating a custom ARIMA model. Finally, the forecast function was used to get the predicted tailpipe emissions for 10 years and this information was plotted.

Fig. 15. Tailpipe CO2 Emissions Forecast for Top 5 Manufacturers

As seen from the plots above, Toyota and Ford have a decreasing trend in tailpipe CO2 emissions, while Chevrolet, Dodge, and GMC have a forecast that stays constant until 2030. Out of all the vehicles considered, Toyota has the least GHG emissions, with the forecast expected to reach ~230 grams/mile by 2030. This is followed by Ford, who is expected to reach ~350 grams/mile. Chevrolet, Dodge, and GMC are forecasted to have CO2 tailpipe emissions in the range of 450-

500 grams/mile by 2030. It should be noted the data set only considers new models released every year. Thus, it can be concluded that the new cars released by Toyota are cleaner on average compared to other makes considered in this analysis.
City MPG and Highway MPG values for Gasoline vs Electric Vehicles

For this, we wanted to find out if there is a difference between the city and highway MPG values for gasoline and electric vehicles in the dataset. To analyze this, we choose only the Toyota vehicles to represent the gasoline vehicles and Tesla represents the electric vehicles in the dataset. The city and highway MPG values for gasoline and electric vehicles are represented by a ratio of the two to be able to account for both the factors. The statistical analysis is performed on a random sample of 100 gasoline vehicles and the first 100 electric vehicles. The summary statistics are shown below

table vii. Summary Statistics for Gasoline and EV MPG values

Group

Count

Mean

SD

EV_ratio

100

1.0014044

0.07194879

Gas_ratio

100

0.7903093

0.07833846

From the summary statistics shown in the table above, we notice the mean MPG ratio for electric vehicles is higher than the mean MPG ratio for gasoline vehicles. This is shown in the density chart below

Fig. 16. Density plot for Gasoline and EVs MPG Ratios

It can be noticed from the plot above that the mean MPG ratio for electric vehicles is higher than that for gasoline vehicles.

To perform the analysis for this question, we use the F-test and the T-test to check if the two groups have the same variances and if there is a significant difference between gasoline and electric vehicle MPG ratios.

The F-test is done to check if the two populations have the same variances. The hypotheses for this test are shown below
- H0: Â² (Ratio of City MPG/Highway MPG for Gasoline) = Â² (Ratio of City MPG/Highway MPG for Electric Vehicles) [The variances of the two groups are the same]
- Ha: Â² (Ratio of City MPG/Highway MPG for Gasoline) Â² (Ratio of City MPG/Highway MPG for Electric Vehicles) [The variances of the two groups are different]
  The T-test is done to check if the two populations have the same means. The hypotheses for this test are shown below
- H0: (Ratio of City MPG/Highway MPG for Gasoline) = (Ratio of City MPG/Highway MPG for Electric Vehicles) [The means of the two groups are the same]
- Ha: (Ratio of City MPG/Highway MPG for Gasoline) (Ratio of City MPG/Highway MPG for Electric Vehicles) [The means of the two groups are different]
  The results of the F-test and T-test performed are shown below
  
  Fig. 17. F-Test Results for Gasoline vs EV MPG Ratios
  
  Fig. 18. T-test Result for Gasoline vs EV MPG Ratios
  
  The p-value of the F-test is 0.3987, which is greater than the significance level alpha = 0.05. In conclusion, there is no difference between the variances of gasoline and electric vehicle ratios. Therefore, we can use the T-test, which assumes equality of variances. The T-test result gives a p-value less than 0.05, from which we can conclude that the ratios of the two groups are significantly different. Thus, we can also infer that the variability in the City MPG and Highway MPG values for Electric Vehicle is low compared to that of gasoline vehicles.
The trend of new vehicle models each year by fuel type

While gasoline has been the most popular fuel when it comes to passenger vehicles, the need to reduce GHG emissions in the transportation sector has led auto makers to diversify their portfolios to include cars that use alternative fuels. The aim of this question was to study the trend of vehicles by fuel types from 1984-2020.

To perform this analysis, the fuel type of each vehile was identified. The fuel types included in this analysis were gasoline, hybrid electric vehicle (HEV), battery electric vehicle (BEV), diesel, natural gas and hybrid. Since the original dataset contained 2 variables for fuel types depending on whether the vehicle runs on 1 or 2 fuels, the mutate function was used to create a new column in the dataset which determined the fuel type of the vehicle. In this column, if the fuelType1 variable

was gasoline and fuelType2 was empty, the result in the new

column would return gasoline.

Similar code was written for diesel and natural gas. If the fuelType1 variable was gasoline and fuelType2 was electricity, then the result in the new column would return HEV. If the fuelType1 variable was electricity, then the result in the new column would return BEV. If the fuelType1 variable was gasoline and fuelType2 was E85 or propane or natural gas, then the result in the new column would return Hybrid. This was done using the if_else function which checks the logical condition of the inputs.

The data for year 2021 was filtered out as it did not have enough data points and thus could have misled the result. The count of vehicles was found using the tally function and each fuel type was filtered and plotted over time to see the trend. A linear regression line was added for gasoline, HEV and BEVs for further analysis.

Fig. 19. Count of Vehicle Type with Model Year and the Linear Model

As seen from the plot, vehicles were fueled by either gasoline or diesel until the early 1990s. After 2000, the portfolio of automakers widened in terms of the fuels used. Alternative fuels for vehicles were introduced around this time. From the plot, a steep increase in the number of HEVs and BEVs can also be seen. The regression line shows that there is a steep increase in the new models of HEVs and BEVs while new gasoline vehicles are on the decline. This shows the big picture in the auto industry which suggests a shift towards electrification and the use of cleaner fuels in general.
Groups based on Average MPG and Fuel Savings

This analysis was performed using k-means clustering which a machine learning method. K-means is a centroid-based clustering method which groups data based on distances to a point. Tableau Desktop was used to perform this analysis due to its ease of use and strong data visualization features. The variable youSaveSpend was placed in the Columns shelf and comb08 was placed in the Rows shelf. Averages of both were taken. Vehicle makes was placed in Label and Detail tabs under the Marks section. This showed the vehicle makes based on their average MPG and average fuel savings. Finally, a cluster was added to this model from the Analytics tab. Reference line showing averages of both variables was added to see which vehicle makes are above or below the average.

Note: The youSaveSpend variable stands for 5-year savings/spendings compared to an average car. Negative savings indicate that money spent as opposed to money saved.

table viii. Summary Diagnostics for Cluster Analysis

Number of Clusters:	6
Number of Points:	138
Between-group Sum of Squares:	5.1752
Within-group Sum of Squares:	0.45609
Total Sum of Squares:	5.6313

table ix. Cluster Analysis Results

Clusters	Centers
Clusters	Number of Items	Average MPG	Average Savings ($)
Cluster 1	2	9.3929	-16589.0
Cluster 2	43	16.928	-5015.5
Cluster 3	64	21.915	-1948.0
Cluster 4	23	13.188	-9340.2
Cluster 5	5	63.058	1076.9
Cluster 6	1	102.64	2743.0
Not Clustered	0

Fig. 20. K-Means Clustering for Avg. MPG and Savings by Vehicle Make

6 clusters were formed to group distinct characteristics of the data which also helped identify some of the outliers. Cluster 1 consists of only 2 vehicle makes and this represents makes which have the least average MPG and the lowest savings. On the other hand, cluster 6 consists of only 1 make, Tesla and represents the vehicle make with the highest MPG as well as the most savings over 5 years. Cluster 5 includes makes which mainly have EVs in their portfolio. Cluster 3 has some of the popular makes such as GM, Fiat, Nissan etc. which have decent mileage and are not very expensive to maintain either.

Cluster 2 includes makes such as BMW, Porsche, Audi, etc. which have some high-performance cars in their portfolio. Cluster 4 includes makes such as Lamborghini, Pagani, Bentley, etc. which have low mileage cars due to their high- horsepower engines and are also expensive to maintain on average.

CONCLUSIONS

This study has meticulously analyzed the EPA fuel economy dataset to discern key trends and changes within the transportation sector over an extended period. The data, notably clean with minimal inconsistencies, facilitated a robust analysis of fuel economy trends, primarily focusing on miles- per-gallon (MPG) distributions and tailpipe CO2 emissions.

Our findings indicate that MPG values generally follow a normal distribution, skewed right due to outliers represented by electric vehicles, which exhibit exceptionally high MPG. These outliers were selectively filtered from the analysis to maintain focus on internal combustion engine vehicles. Statistical and machine learning models were employed to test hypotheses and yielded statistically significant results, underpinning the robustness of our analytical methodologies.

Key insights from the study include:
- Vehicle Performance: Vehicles equipped with turbochargers generally demonstrated better fuel economy compared to those with superchargers.
- Class Comparison: Fuel economy varied significantly across different vehicle classes, inversely related to their load-carrying capacities. Passenger cars showed higher MPG compared to heavier vehicles like vans and trucks.
- Engine and Volume Correlation: There was a negligible correlation between engine displacement and vehicle volume, with a slightly negative trend.
- Fuel Type Trends: The analysis of fuel types showed that electric vehicles consistently outperformed others in terms of fuel economy, aligned with global shifts towards more sustainable vehicle technologies.
Additionally, the forecast for tailpipe CO2 emissions indicated a declining trend for manufacturers like Toyota and Ford, highlighting industry-leading practices in emissions reduction.

However, some manufacturers like Chevrolet, Dodge, and GMC showed little to no reduction, pointing to areas where further improvements are necessary.

The transportation sector's shift towards Battery Electric Vehicles (BEVs) and Hybrid Electric Vehicles (HEVs) represents a promising trend towards reducing greenhouse gas emissions. This shift is further evidenced by the growing proportion of these vehicles each year, contrasting with a decline in conventional gasoline vehicles.

Lastly, our cluster analysis revealed that electric vehicles, particularly those from manufacturers like Tesla, not only offer superior fuel economy but also present considerable savings over five years ompared to average internal combustion engine cars.
REFERENCES

US EPA, Office of Air and Radiation. (2015, December 29). "Sources of Greenhouse Gas Emissions." Overviews and Factsheets, US EPA. Available: https://www.epa.gov/ghgemissions/sources-greenhouse-gas- emissions
U.S. Department of Energy. (n.d.). "Fuel Economy Web Services." Retrieved December 11, 2020, from https://www.fueleconomy.gov/feg/ws/index.shtml#vehicle
Selva, P. (2019, February 18). "ARIMA ModelComplete Guide to Time Series Forecasting in Python." ML+. Available: https://www.machinelearningplus.com/time-series/arima-model-time- series-forecasting-python/
Kassambara, A. (n.d.). "Correlation Analyses in R." Easy Guides WikiSTHDA. Retrieved December 11, 2020, from http://www.sthda.com/english/wiki/correlation-analyses-in-r
Kassambara, A. (n.d.). "Correlation Test Between Two Variables in R." Easy GuidesWikiSTHDA. Retrieved December 11, 2020, from http://www.sthda.com/english/wiki/correlation-test-between-two- variables-in-r
Kassambara, A. (n.d.). "F-Test: Compare Two Variances in R." Easy GuidesWikiSTHDA. Retrieved December 11, 2020, from http://www.sthda.com/english/wiki/f-test-compare-two-variances-in-r
Kassambara, A. (n.d.). "ggpubr: Publication Ready Plots." Easy GuidesArticlesSTHDA. Retrieved December 11, 2020, from http://www.sthda.com/english/articles/24-ggpubr-publication-ready- plots/
Kassambara, A. (n.d.). "Unpaired Two-Samples T-test in R." Easy GuidesWikiSTHDA. Retrieved December 11, 2020, from http://www.sthda.com/english/wiki/unpaired-two-samples-t-test-in-r

FuelType1	Freq
Regular Gasoline	28,722
Premium Gasoline	12,791
Diesel	1,196
Electricity	257
Midgrade Gasoline	130
Natural Gas	60

FuelType2	Freq
E85	1,477
Electricity	206
Natural Gas	20
Propane	8

Number of Cylinders	Freq
4	16,809
6	14,858
8	9,216
5	774
12	668
3	317
10	179
2	61
16	14

Unclean	Clean
Regular Gasoline	Gasoline
Premium Gasoline	Diesel
Diesel	Natural Gas
Natural Gas	Electricity
Electricity	NA
Midgrade Gasoline	NA