1. Introduction
Dew point temperature (DPT) is defined as the temperature in which air becomes liquid water due to the high concentration of water molecules. Precise and accurate estimation of DPT has a significant role in solving agricultural problems, such as calculating the amount of available moisture in the air and estimating the near surface humidity [
1]. DPT and relative humidity are commonly used to measure the air humidity level [
2]. The DPT can also be used to estimate the temperature of crops considering glaciation [
3]. Many studies have paid attention to the accurate estimation of DPT using regression methods. However, data-driven methods such as Gene Expression Programming (GEP) and Neuro-Fuzzy Inference System (ANFIS) have been developed to identify optimal functions and modeling for complex phenomena. In this regard, several studies have been carried out on the application of the mentioned methods in meteorological studies [
4,
5,
6,
7,
8,
9,
10,
11,
12,
13]. Shiri [
2] compared the capabilities of the artificial neural network (ANN) and GEP to estimate the DPT using meteorological parameters at Seoul and Inchon stations, located in South Korea. They used two management scenarios: In the first scenario, the meteorological information of each station was used to estimate the DPT of the same station; in the second scenario, they used the meteorological information of adjacent stations. Their results showed that in both scenarios GEP was more accurate than ANN. Also, the application of the second scenario showed that GEP had more accurate results in estimating the DPT values of Seoul stations using Incheon station parameters. They also reported that the DPT values at Seoul Station could be estimated using the average temperature and relative humidity of the Incheon station with proper accuracy. Deka et al. [
14] examined the ability of a support vector machine (SVM), ANN, and Extreme Learning Machine (ELM) to estimate DPT at two stations in Iran. They showed that the results of the ELM model were more similar to observed DPT at the two mentioned stations. In other research, Zounemat-Kermani [
15] implemented two methods of multiple linear regression (MLR) and Levenberg–Marquardt algorithm (LMA) in the artificial neural network (LMA–ANN) in order to estimate DPT values at Ontario Station, Canada. The results of the LMA–ANN model had an appropriate match with observational data. Additionally, Jia et al. [
16] investigated dew formation. For this purpose, they used meteorological data of average temperature, sunny hours, wind speed, saturated vapor pressure, relative humidity, and DPT values of three stations of Dagot, Pohang, and Ulsan, South Korea. They reported that the effects of sunny hours, wind speed, and saturated vapor pressure were lower than other parameters. Therefore, it was possible to estimate the DPT using average temperature and relative humidity. Attar et al. [
17] used GEP, multivariate adaptive regression splines (MARS), and SVM models to estimate the DPT in arid regions of Iran. Using the meteorological data of 13 synoptic stations during the 55 years (1996 to 2014), and by defining 50 different scenarios. They concluded that the MARS model offers more accurate results than other studied models. In a similar study, Mehdizadeh et al. [
18] estimated the DPT values in Tabriz and Urmia cities, in the northwest of Iran, using the GEP method. They defined three scenarios: A parameters-based scenario, a temperature-based scenario, and a periodicity-based scenario considered the meteorological parameters of minimum, maximum, and mean air temperature, actual vapor pressure, and atmospheric pressure. Their results showed that the actual vapor pressure is the most effective meteorological parameter in estimating the DPT in the study area.
Therefore, over the last decade, researchers have tried to estimate DPT values with suitable accuracy. For this reason, the main purpose of the current study was to implement three data-driven methods of GEP, M5, and SVR in order to improve the estimation accuracy and develop some mathematical formulations for obtaining precise estimations of DPT values using explicit formulations. To the best of our knowledge, the application of M5 has not been reported in the literature. In other words, the goals of the study were (i) evaluating the performance of the models above in the estimation of DPT, and (ii) investigating the role of climatic parameters estimation DPT values. The rest of the paper is structured as follows:
Section 2 describes implemented methods, evaluation parameters, and characteristics of the study area. Additionally,
Section 3 discussed the obtained results and, finally, the conclusion is presented in
Section 4.
4. Results and Discussion
In order to reach the research objectives, daily average temperature, relative humidity, actual vapor pressure, wind speed, and sunny hours at the Tabriz synoptic station were collected from the Meteorological Organization of East Azerbaijan province, Iran during the period 1998 to 2016. The statistical characteristics of the implemented data are presented in
Table 2. There is no basic way of separating training and testing data. For example, the study of Kurup and Dudani [
33] used a total of 63% of their data for model development, whereas Samadianfard et al. [
4] and Samadianfard et al. [
7] used 67% of total data, and Deo et al. [
34] used 70% of total data to develop their models. Thus, to develop the studied GEP, M5, and SVR models for estimation DPT, we divided the data into training (67%) and testing (33%). Therefore, the accuracy of the models in estimating DPT evaluated through Taylor diagrams. Additionally, the effects of considered meteorological parameters were inspected by defining 15 different input combinations (
Table 3).
After performing the computations for different input combinations, the accuracy of the considered models was determined in the testing phase based on the statistical criteria (Equations (9) and (10)) and Taylor diagrams. The obtained results are presented in
Table 4.
As can be seen in
Table 4, GEP-10 with RMSE of 0.96 degrees and R
2 equal to 0.902 with the parameters of T, RH, and S shows better performance compared to GEP models. However, SVR-6 with RMSE of 0.44 degree and R
2 of 0.996 presents more accurate estimation compared to the SVR models. Furthermore, the best estimation of the DPT, based on M5 models, was related to M5-15 with RMSE of 0.37 degree and R
2 of 0.996 and using all considered meteorological parameters as the input. In other words, a comprehensive comparison between the mentioned models exhibited that M5-15 had the best performance in estimation DPT values by using input combinations of T, S, RH, W, V
p. After selecting the most accurate models for estimation DPT values, the time series plots and scatterplots are finalized and illustrated in the
Figure 4 and
Figure 5.
It can be comprehended from
Table 4 and
Figure 4 that the estimation accuracy of the M5-15 was higher than the GEP-10 and SVR-6. The above-mentioned conclusion, regarding the high accuracy of the M5-15 model in estimation the DPT for Tabriz station, can be deduced from
Figure 5. In this figure, it can be seen that the distribution of the points around the bisector line in the M5-15 model was less than the corresponding points of GEP-10 and SVR-6.
Furthermore, Taylor charts were used to examine the standard deviation and correlation values among estimated and measured DPT values for the GEP, M5, and SVR models with different input parameters. Taylor diagrams for models mentioned above are shown in
Figure 6. The length of the space from the reference point (a green color point) to each point is defined as centered RMSE [
31]. Therefore, the most accurate model has a minimum distance between the green point and its corresponding point. According to
Figure 6, M5-15 (a blue color point) offered the most accurate estimations of DPT values at Tabriz station.
One of the advantages of GEP and the M5 models, in comparison with other data-driven methods, is their ability to provide explicit relationships to calculate the output parameter. Therefore, for the current study, Equation (12) was obtained for estimation DPT values using GEP-10 as the most accurate GEP model.
Additionally, the list of linear equations (presented in
Table 5) was the outcome of M5-15 estimation of the DPT values using meteorological parameters of T, S, RH, W, V
p.
As previously mentioned, Deka et al. [
14] used SVM, ANN, and ELM and implemented meteorological parameters of minimum, maximum, and average temperatures, relative humidity, atmospheric pressure, water vapor pressure, sunny hours, and solar radiation in order to estimate the DPT in two cities of Kerman province, Iran. The minimum RMSE reported in the mentioned study was 0.49, related to the ELM method by using minimum temperature and water vapor pressure data as input parameters. In the present study, with the application of the M5 and applying meteorological parameters of average temperature, relative humidity, actual vapor pressure, wind speed, and sunny hours, the RMSE decreased to 0.37, indicating the high accuracy of the M5 model tree for estimation DPT values. The output of the M5 was a simple linear relationship that can be used to calculate the DPT values easily, while the ELM does not have such a capability. Additionally, in another study by Baghban et al. [
35] the maximum estimation accuracy of the DPT was reported using an SVM model with an RMSE value of 0.4.
Furthermore, the precision of the M5-15 in the current study was more than the accuracy of the proposed SVM method by Baghban et al. [
35]. To investigate the influence of input parameters on the DPT estimation, the RMSE and R
2 were utilized for different groupings of input variables. For this purpose, all utilized models, including GEP, M5, and SVR were selected for sensitivity analysis (
Table 6). Each model confirmed the extent to which the eliminated variable would affect the model accuracy. As shown in
Table 6, the precision of all models decreased if each of T, RH, V
p, W, and S input parameters were removed from the modeling. Furthermore, it can be comprehended that T had the greatest effect in increasing the prediction accuracy. In other words, eliminating T caused a sharp increase in RMSE values in all studied models.
5. Conclusions
In the current study, three data-driven methods including GEP, M5, and SVR were used to estimate DPT values at Tabriz synoptic station, Iran. For this purpose, the meteorological parameters were collected from the Meteorological Organization of East Azerbaijan province in the period 1998 to 2016. Also, 15 different input combinations were defined to study the effect of meteorological parameters on the estimation of DPT values. The results of this study revealed that the SVR-6, using two input parameters of T and RH, and GEP-10 using three parameters of T, RH, and S, had appropriate performance in the estimation of DPT values. Furthermore, the overall analysis of the studied methods showed that the M5-15 using five parameters of T, S, RH, W, and Vp had the best performance in the estimation of DPT values at Tabriz station in comparison with all considered models with different input combinations. To conclude, M5-15 is proposed as the most accurate method for the estimation of DPT values at the Tabriz synoptic station, Iran.