Towards A Smart City Concept - Machine Learning Smarts for the Estimation of Future Temperature Rise in Tabuk City

In the Middle Eastern peninsula especially in Saudi Arabia, there is a varsity temperature variation among the individual regions. As far as the city of Tabuk is concerned, no study has been conducted, regarding climate change (the temperature rise) in the Tabuk region and its implications on society and for the flagship “Future Smart Cities” concept. In this paper, machine learning algorithms are used to predict the future temperature values in the Tabuk region. The machine learning algorithms were trained on the data collected from the real-time weather radar stations of the region. Different features from the dataset are used for machine learning models to predict the future temperature. These unique features, for example, humidity and pressure, impact the accurate predictability of the temperature. Temperature prediction is modeled as a regression problem due to the nature of the data, therefore, different machine learning regression models were developed, i.e. Artificial Neural Network ANN-based techniques (Multi-layer Perceptron (MLP)), Decision Trees (DT), K-Nearest Neighbours (KNN), and Support Vector Regression (SVR). Encouragingly, the preliminary model evaluations utilizing Mean Absolute Error ( MAE) yielded a high accuracy of 90% on the testing dataset. The findings are envisaged to inform decision-makers in the Climate and Weather Ministry of Tabuk City, potentially contributing to the advancement of the city's "Future Smart Cities" concept.


Introduction
As the world copes with the multifaceted challenges of climate change, understanding its localized impacts on diverse regions becomes increasingly crucial.Across the diverse landscapes of the Middle East, particularly in Saudi Arabia, individual regions experience significant temperature variations.This understanding is critical for informed decisionmaking, especially in the city of Tabuk.To achieve the "Future Smart Cities" vision, Saudi Arabia urgently needs more research on the specific impacts of rising temperatures on diverse regions like Tabuk.This gap is particularly concerning considering the projected temperature increase of 2.4°C to 3.4°C in Tabuk by 2050, according to some studies (Krishna, 2014;Sharif, 2015;Mahmoud et al., 2023).This warming threatens to exacerbate existing challenges like heat stress, water scarcity, and infrastructure damage (Albalawi et al., 2018;Albalawi, 2020;Anushka et al., 2020).Saudi Arabia's Vision 2030 envisions a future filled with sustainable and intelligent cities (Alyami, 2019).Achieving this vision necessitates accurate temperature prediction, especially for arid regions like Tabuk.Recognizing this pg. 2 urgency, Saudi Arabia is spearheading innovative climate resilience strategies, with smart city initiatives playing a crucial role.
Given the critical need for informed decision-making in Tabuk's smart city development, accurate monitoring and forecasting are fundamental.Climate change, with its undeniable negative consequences, necessitates a deep understanding of future temperature variations.This information empowers decision-makers to mitigate climate change effects and enhance infrastructure resilience (Rezvani et al., 2023).While traditional methods for temperature forecasting have yielded valuable results, achieving high accuracy consistently remains a challenge.This challenge is often tied to the quality and limitations of the underlying data, with many methods relying heavily on adhering to specific data quality standards and measures (Feigl et al.;2021).Recognizing these limitations, Machine Learning (ML) techniques have emerged as a promising opportunity for advancing temperature prediction capabilities (Sarkar et al., 2020;Rahayu et al., 2020).This is particularly relevant given the inherent difficulties associated with achieving high accuracy in this domain (Anushka et al., 2020;Cifuentes et al, 2020).
Among established ML techniques frequently employed in various environmental fields, Linear Regression (LR) stands out for its ability to address problems involving one or more variables (Zhu et al., 2023).For other regression problems, k-Nearest Neighbors (kNN), a non-parametric ML approach, offers a distinct solution (Malakouti, 2023).Support Vector Regression (SVR) presents another method to identify relationships between input and output data (Quan et al., 2022).The multi-layer perceptron (MLP) or Artificial neural networks (ANNs) have also become valuable tools in meteorology, excelling at tasks like classification and forecasting (Azari et al.;2022;Nezhad et al., 2019).Their success lies in their ability to accurately: identify complex patterns in historical weather data, essential for prediction; capture non-linear relationships between diverse weather variables; and find optimal solutions within intricate weather models, ultimately tackling a broad range of ML problems (Gulrez, 2021;Poh, et al., 2021).The wide availability of historical data, further strengthens ANNs, making them an attractive choice for researchers globally (Anushka, MD, and Upaka, 2020;Sarkar et al., 2020;Rahayu et al., 2020;Nezhad et al., 2019).
Individual machine learning (ML) algorithms offer valuable tools for temperature prediction, but researchers strive for even higher accuracy.This pursuit has led to the exploration of hybrid models, which combine diverse ML techniques or integrate them with physical models (Hou et al., 2022).By synergistically leveraging the strengths of various approaches, these hybrid models hold the potential to overcome limitations inherent in single algorithms (Moosavia et al., 2020).One prominent ensemble learning technique employed in such hybrid models is stacking.Stacking involves combining predictions from multiple different ML models to create a more accurate final prediction (Wolff et al., 2020).This rests on the principle that individual models have unique strengths and weaknesses, and their collective power can be harnessed to achieve superior performance (e.g., improved accuracy, and generalizability).Importantly, stacking can be particularly effective for tackling complex problems like temperature prediction, offering researchers a powerful tool in their quest for more accurate forecasts (Di Nunno et al., 2023).
This paper proposes a novel data-driven model for predicting future temperature values in the Tabuk region.Utilizing historical climate data and comparing different machine learning architectures, this model seeks to overcome existing limitations and contribute to improved climate change understanding and preparedness in the region.To achieve this, a diverse set of machine learning algorithms were employed including Multi-layer Perceptron (MLP) or Artificial Neural Network (ANN), Decision Trees (DT), K-Nearest Neighbours (KNN), and Support Vector Regression (SVR) known for its strong performance in high-dimensional data.This analysis will culminate in the selection of the algorithm that delivers the best overall performance, paving the way for the development of a more powerful and accurate weather forecasting model tailored specifically for Tabuk City.

Methodology Study Area Description:
Situated in northwest Saudi Arabia between 28°23′N and 28°39′N latitude and 36°35′E and 36°57′E longitude, the Tabuk region exhibits diverse topography with elevations ranging up to 2498 m above mean sea level (m.s.l.) (Figure 1).This study leveraged daily temperature observations and other weather parameters including humidity, and vapor pressure spanning 1985 to 2015 from the Tabuk meteorological station (ID 40375) provided by the General pg. 3 Presidency of Meteorology and Environment Protection.With an average temperature of 27°C and sparse rainfall of 40 mm per year, Tabuk experiences pronounced aridity, with most precipitation concentrated in the winter months (November-January).Snowfall is a rare occurrence, happening only every few years (Albalawi et al., 2018).The weather station used for data collection is located in Tabuk City.The machine learning model trained on this data will provide temperature predictions primarily for Tabuk City and its immediate surroundings.

Machine Learning Model Development:
In this paper, we have developed an ensemble machine learning model that incorporates the effect of DT, KNN, SVR, and MLP algorithms and produces an ideal aggregated model for the prediction task.This study paves the way for robust and accurate temperature predictions, informing critical decision-making toward building a resilient and climate-smart future for Tabuk.The selection of the most suitable model for temperature prediction in Tabuk will depend on the features used in the model-building process (i.e.Vapour pressure, wind speed, and humidity as shown in (Figure 2) below.Further, the performance of these algorithms is assessed based on negative mean absolute error which is sufficient to show the effectiveness of the model in the temperature prediction context.

Feature preparation for model development:
The success of any machine learning project depends heavily on carefully preparing the data.This crucial step, known as data pre-processing, ensures the data is clean, organized, and formatted correctly for analysis.This leads to better results and more accurate insights.In this case, our dataset includes key features like temperature, humidity, and wind speed, which are all directly related to predicting temperature (Figure 2).These features are fed into the geographical climate model, which considers the combined effect of humidity and wind speed on temperature.

Training and Testing the Models:
This paper presents a novel analysis of historical weather data collected from the Tabuk weather station, in Saudi Arabia spanning the past three decades.The dataset is meticulously divided into two distinct partitions: a training set comprising 80% of the data and a testing set encompassing the remaining 20%.This strategic division ensures that the models are trained on a representative subset of the historical data while reserving a separate portion for unbiased evaluation of their predictive performance.

Evaluation Metrics for Model Predication Performance:
The effectiveness of the proposed model was evaluated using established strategies such as the Learning Curve and comparisons with Mean Absolute Error (MAE).Common evaluation metrics include Mean Absolute Error (MAE) and a learning curve for assessing the accuracy of temperature predictions.MAE quantifies the average absolute deviation between predicted and actual temperatures.Its simplicity fosters interpretability, yet it downplays the impact of large errors compared to small ones.The Mean Absolute Error (MAE) doesn't have a single equation, but rather a formula that calculates any two datasets (y_true and y_predicted): is the total number of data points in your dataset.
_true_i is the actual value of the i-th data point.
_predicted_i is the predicted value of the i-th data point by your model. is the summation symbol, meaning you sum the absolute difference between each actual and predicted value for all data points.This formula essentially calculates the average of the absolute differences between your model's predictions and the actual values, providing a measure of how accurate your model is overall.The lower the MAE, the better your model's performance.

Results and Models Performance
Figure 3 depicts a promising learning curve for the MLP or ANN model.Both training and validation accuracy steadily increase with more data, showcasing its learning capacity.Training accuracy reaches 0.90, indicating strong performance.While validation accuracy peaks at 0.85, suggesting good generalization, the small gap (0.05) minimizes overfitting risk.The rapid initial rise shows a fast learning rate.Accuracy plateaus near 8000-10000 data points, suggesting further increases may not be significant.Notably, the 0.85 validation accuracy demonstrates promising generalization ability for real-world applications.
In contrast, a striking feature of the DT model curve is the training accuracy reaching a perfect score of 1.000, suggesting complete memorization of the training data (Figure 3).However, this comes at the potential cost of overfitting, as evidenced by the gap between the flat blue training curve and the fluctuating green validation curve, which peaks at around 0.850 but dips as low as 0.825.While the validation accuracy remains respectable, it suggests the model might not generalize well to unseen data.The initial rise in validation accuracy followed by a plateau and slight decline indicates that the DT model might benefit from regularization techniques to mitigate overfitting and improve its generalizability.This could involve limiting the tree depth, implementing early stopping, or employing techniques like data augmentation to diversify the training data.Overall, the DT model exhibits a strong learning ability but raises concerns about overfitting.A notable feature in the KNN model is the training accuracy starting at a relatively high value of 0.92 and gradually increasing to reach a peak of 0.93.This suggests strong initial learning and good performance on the training data.
The validation curve also exhibits encouraging behavior, consistently rising from a starting point above 0.82 to reach its highest point of more than 0.86.This gradual upward trend indicates that the KNN model effectively balances learning from the training data with generalizability to unseen data, minimizing the risk of overfitting.Overall, the KNN model's learning curve paints a picture of effective learning, good generalization, and minimal overfitting risk.While all models showed decent performance, the MLP or ANN consistently outperformed the others across various evaluation metrics, solidifying its position as the most effective model (Figure 3).
The SVR model stands out with its initially negative training accuracy, suggesting weak learning on the training data (Figure 3).While both training and validation curves improve, their peak values remain around 0.06, indicating limited performance.This highlights the need for further model exploration or optimization.
To assess model generalization to unseen data, we calculated the mean and standard deviation of negative Mean Absolute Error (MAE) scores across all four models (Figure 4).The figure highlights significant discrepancies in temperature forecasts from different models for the Tabuk station.Remarkably, the MLP (or ANN) model stands out with the lowest average negative MAE, demonstrating its superior ability to capture underlying patterns in the data series.The stacked model also exhibits strong reliability in temperature prediction, bolstering the case for ensemble approaches in this domain.

Conclusion
This study investigated the application of various machine learning models, including MLP or ANNs, Decision Trees (DTs), K-nearest neighbors (KNNs), and Support Vector Regression (SVR), for temperature prediction within the context of smart cities.While previous works have explored SVM's potential for weather forecasting, our comparative analysis revealed that MLP or ANNs emerged as the most efficacious model for this specific task.
The inherent strengths of ANNs for smart city temperature prediction lie in their exceptional ability to navigate the intricacies of the task.Unlike simpler models like decision trees and k-nearest neighbors, ANNs can readily capture the inherent non-linearity of temperature fluctuations driven by complex interactions between diverse factors like air pressure, humidity, wind patterns, and seasonal variations.Smart city environments generate a variety of data from various sources, and ANNs excel at seamlessly integrating these diverse features into the prediction process, leading to more comprehensive and nuanced models compared to SVR, which often require cautiously pre-defined features.
The dynamic nature of smart cities necessitates adaptable models that can learn and refine themselves continuously, a challenge in which ANNs shine with their superior generalizability to unseen data compared to static models like decision trees and k-nearest neighbors.While DT and other models still hold merit in specific applications, our findings suggest that ANNs represent a valuable tool for temperature prediction in smart cities due to their exceptional ability to handle non-linearity, integrate diverse features, and adapt to dynamic environments.This advantage opens doors for numerous applications in smart city development, including weather forecasting, resource management, and even climate-adaptive infrastructure design.

Figure 1 .
Figure 1.Geographical location of Tabuk city and the surrounding region.

Figure 2 .
Figure 2. The training dataset to develop a machine learning (ML) model (to predict the temperature of the region).These features are consistent with the weather characteristics of the Tabuk region.Hence, these weather characteristics are treated as input feature vectors for the ML model.These weather characteristics are namely (a) temperature (b) vapor, (c)wind speed, and (d) relative humidity.

Figure 3 .
Figure 3. Presents the learning curve and accuracy of four models: ANN/ MLP (a), Decision Tree (b), KNN (c), and SVR (d)

Figure 4 .
Figure 4. Machine learning model comparison between different algorithms.The performance is based on minimum negative absolute error (MAE) for KNN, DT, SVR (or SVM), MLP (or ANN), and stacking.Here, a stacking model is developed which is based on an ensemble learning technique.