Short-term water demand forecasting using hybrid supervised and unsupervised machine learning model

Regression Tree (RT) forecasting models are widely used in short-term demand forecasting. Likewise, Self-Organizing Maps (SOM) models are known for their ability to cluster and organize unlabeled big data. Herein, a combination of these two Machine Learning (ML) techniques is proposed and compared to a standalone RT and a Seasonal Autoregressive Integrated Moving Average (SARIMA) models, in forecasting the short-term water demand of a municipality. The inclusion of the Unsupervised Machine Learning clustering model has resulted in a significant improvement in the performance of the Supervised Machine Learning forecasting model. The results show that using the output of the SOM clustering model as an input for the RT forecasting model can, on average, double the accuracy of water demand forecasting. The Mean Absolute Percentage Error (MAPE) and the Normalized Root Mean Squared Error (NRMSE) were calculated for the proposed models forecasting 1 h, 8 h, 24 h, and 7 days ahead. The results show that the hybrid models outperformed the standalone RT model, and the broadly used SARIMA model. On average, hybrid models achieved double accuracy in all 4 forecast periodicities. The increase in forecasting accuracy afforded by this hybridized modeling approach is encouraging. In our application, it shows promises for more efficient energy and water management at the water utilities.


Introduction
Marching forward toward true sustainability, reliable and resilient water systems are essential in a world facing the challenge of water scarcity. Smart decision making in water systems is one of the two keys a Smart Water consists of (Joong 2018). The application of water demand forecasting is crucial for optimal operation and control of Smart Water Grids (SWG) (Public Utilities Board Singapore 2016). Supplying water at lower cost, with less energy, and lighter load on the network infrastructure is a primary goal of water utilities. This goal is achieved through multiple practices and applications in the water supply system; one of which is an accurate forecast of the systems' future demand. Short-term water demand forecasting has been employed by a plethora of utilities, researchers, and developers to tackle the imbalance between supply and demand. Short-term water demand forecasting can be used to manage water pressure, control leakage, schedule pumping operations, system maintenance, and infrastructure development (Zhou et al. 2002). However, developing a water demand forecasting model is challenging. That is because the accuracy of the model output controls the efficiency of the system response (Jamieson et al. 2007).
Forecasting models have been a topic of significant interest to researchers and developers alike. A wide spectrum of forecasting models has been featured in the literature. Kozlowski et al. 2018;Donkor et al. 2014; House-Peters and Chang 2011 presented an extensive review of methods and models used in water demand forecasting. Zhang 2001 largely classified these models into two groups, linear and nonlinear. Linear models are used extensively owing to their simplicity and the practicality of the required data acquisition. The ease of implementation and ability to update make these models very attractive. Autoregressive Integrated Moving Average (ARIMA) and univariate time series analysis models have been proposed by many researchers (Zhou et al. 2000;Maidment and Miaou 1986;Maidment et al. 1985;Perry 1981;Hughes 1980) to forecast water demand. ARIMA models can be employed to forecast water demand; however, the accuracy of the forecast can be unsatisfactory (Kozlowski et al. 2018). To overcome the unsatisfactory results of the linear models, researchers developed nonlinear models. Nonlinear models deemed to better capture the nonlinear patterns in water demand. Nonlinear models are costly and difficult to develop, implement, and update. However, their capacity to analyze big data with multiple parameters and concurrently find the nonlinearity relations between variables, have make them powerful prediction tools. Artificial Neural Networks (ANN), nonlinear regression models, fuzzy logic, and other nonlinear models are among the most popular for forecasting water demand (Bennett et al. 2013;Boguadis et al. 2005;Adamowski and Karapataki 2010;Adamowski et al. 2012;Tiwari and Adamowski 2015;Mitrea et al. 2009;Ghiassi et al. 2008;Ghiassi et al. 2005;Hippert et al. 2001;Jain et al. 2001;Cutore et al. 2008;Nasseri et al. 2011).
Two sub-groups can be further distinguished: Standalone and hybrid. Standalone are models that forecast the demand using one technique, where hybrid models are a combination of more than one technique. Standalone linear and nonlinear models are the dominant in the application of demand forecasting; however, researchers have also combined two or more models. Shamseldin and O'Connor 2001 combined a datadriven ANN model and a deterministic model to forecast demand. Abebe and Price 2003 added an ANN prediction model to improve rainfall-runoff model forecasts. The inclusion of the ANN model in both hybrid models was to calculate innovations by being calibrated to residual error time-series. Hiroyuki et al. 2001 combined multi-layer perceptron (MLP) of ANN model and a fuzzy regression tree model to forecast demand in power grid. While the MLP model was employed to forecast one step ahead, the fuzzy regression tree model assisted by determining a split value. This value helped organizing the input data in classes according to specific data rules. These two fused models were proven effective in forecasting power systems load. Mori and Takahashi 2011 proposed another example of hybrid model. A Regression Tree (RT) model was fused with a Relevance Vector Machine (RVM) model. The RT model used some distinct characteristic similarities to classify data into clusters. The classification technique abated the RVM prediction task in smaller clusters with common characteristics. The hybrid model was employed to successfully forecast the electric load in Japan. Tiwari and Adamowski 2013 assembled multiple ANN models developed using bootstrap sampling and wavelet analysis. Their hybrid model was shown to outperform standalone ANN, ARIMA, and ARIMA with exogenous variables (i.e. ARIMAX) models when deployed to forecast water demand.
Despite that hybrid models were mostly developed to forecast electric load; researchers have highlighted strong similarities between water and electricity demand patterns and forecasting approaches (Perry 1981). These similarities are represented by: demand driving factors, demand trend and seasonality, economic and socio-economic parameters, etc. Table 1 highlights short-term load forecasting studies in both water and power grids. Models listed in this table are distinguished by their mathematical representation (i.e. linear/nonlinear) and performance (i.e. standalone/hybrid).
In this paper, short-term water demand forecasting application is achieved through a hybrid ANN model. The hybrid model consists of a clustering unsupervised and a predictive supervised machine learning models. The performance and complexity of the resulted fused model are investigated. The objective is to inspect whether adding an auxiliary clustering model to the forecasting model improves its performance or not. This is presented in the following flow: section 2 introduces the location and data of the studied water utility and its unique characteristics. Section 3 details the implemented forecasting methodology and forecasting models. The performance, computational load, and application of the hybrid model is presented and discussed in section 4. And finally, concluding remarks for this work is highlighted in section 5.

Study area and data
The data analyzed in this paper were collected from a water utility that services primarily rural areas located in Southwestern Ontario, Canada. The water utility supplies water to more than 65,000 residents, agricultural, and other commercial and industrial customers. On average, commercial greenhouses are the utility's dominant water consumers with an annual share of 80%. Since the legalization of cannabis in Canada in 2018, this annual percent share has been increasing according to the water utility's reports. The utility's reports show frequent requests by the commercial greenhouses on increasing their water demand drastically. This dynamic variance accompanied by other demand-driving factors have pressured the supply side, water supply systems, in the region.

Input data
The performance of a short-term water demand forecasting model is dependant on the input type and the temporal resolution of the fed data. Although a number of researchers (Boguadis et al. 2005;Ghiassi et al. 2008;Jain and Ormsbee 2002;Rice et al. 2017) showed that exogenous inputs can improve model predictability, a preliminary study (Bata et al. 2020) in the same area revealed that temporal inputs of the historical water demand are the main driver. The study has also highlighted that 4 recent months of 1-h resolution data is sufficient in the application of short-term water demand forecasting at the studied water utility. Therefore, the input data collected for this study is a 1-h resolution data that spans the last 4 months of 2017. Figure 1 shows the historical water outflow in (m 3 /hr) for the studied water utility during the months of August (top  (Zhou et al. 2000;Maidment and Miaou 1986;Maidment et al. 1985;Perry 1981;Hughes 1980) Nonlinear Standalone Short-term water demand forecast (Bennett et al. 2013;Boguadis et al. 2005;Adamowski and Karapataki 2010;Adamowski et al. 2012;Tiwari and Adamowski 2015;Mitrea et al. 2009;Ghiassi et al. 2008;Ghiassi et al. 2005;Hippert et al. 2001;Jain et al. 2001;Cutore et al. 2008;Nasseri et al. 2011 Hybrid Short-term water demand forecast (Tiwari and Adamowski 2013) left), September (top right), October (bottom left), and November (bottom right) of 2017. The water demand can be seen from this figure, where the water outflow is at minimum around midnight and increases to reach its peak around noon. Although water demand, reflecting water consumption, is usually at peak during early morning and early evening, this is not the case here. That is because 80% of the total utility's demand is consumed by commercial greenhouses. These commercial greenhouses have their own water storage facilities that help them avoid consuming expensive-water during peak hours (note that water price fluctuates based on time-of-use in this region). Also observed is the gradual decrease in peak demand between the months of August and November. This is due to the seasonality of grown crops at the commercial greenhouses.
A unique indicator (i.e. predictor) has been assigned to each data point to address the abovementioned demand patterns and observations. The first indicator is hour of the day. This indicator assigned a value between 1 (represents 12:00 am -01:00 am) and 24 (represents 11:00 pm -12:00 am) to each data point. The second indicator is day of the month (e.g. values ranged between 1 and 31 for the month of August). The third indicator is month of the year where each data point was assigned a value between 8 (for August) and 11 (for November).

Data correlation
The correlation between the target, the current water outflow (Q t ), and other input data (i.e. predictors) was calculated. Pearson Correlation Coefficient (PCC) was calculated using eq. 1. PCC evaluates the linear relationship between two variables, and it ranges between − 1 to 1, where 0 indicates no correlation. Table 2 shows the investigated predictors ranked according to their correlation strength with the water outflow. Table 2 reveals that the strongest predictor with the highest PCC value is the K value for the same day previous hour water outflow. Although this is the most correlated predictor, it was not used in most of the models in this paper. That is because this predictor would be practical to use only in the 1 h ahead forecast. If the forecast is to be performed for other periodicities (8 h, 24 h, and 1 week ahead in this study), the values Fig. 1 The utility hourly water outflow (August -November), 2017. A 4-month (August-November) hourly water outflow dataset is shown in this figure, measured in m 3 /hr. The hourly water outflow is the target dataset used for clustering, prediction, and performance assessment in the model for this predictor would be unknown. The same concept applies for other predictors chosen to train models that predicted the water demand 8 h, 24 h, and 1 week ahead.
Predictors ranked 2nd, 3rd, and 4th had a relatively strong correlation of approximately 0.8. These three predictors were used in training, testing, and validating the proposed models to forecast water demand 1 h, 8 h, 24 h, and 1 week ahead. Predictors ranked 5th and 6th had a relatively moderate correlation, however, they did not add any significance in models performance. This was also the case for the weakest studied correlated predictors ranked 7th and 8th.
Where, n: is the number of data points i: is the data point, ranges from i = 1 to i = n Xi: is the i th value of variable X Yi: is the i th value of variable Y

Forecasting methodology and models
This section presents a description of the forecasting methodology, the forecasting approach, the forecasting models, and the model performance assessment approach. Figure 2 illustrates the forecasting flowchart. The process begins with gathering available historical water demand data for the studied system. The data at this stage is usually raw. Here, raw data is referred to as pre-processed data. Raw data contains erroneous data (e.g. zero water outflow), noisy data (e.g. untrue water outflow), and/or missing data. Raw data is cleaned, smoothed, and imputed, where it becomes processed data. Then, 75% of the processed data, training set, is fed to two separate models, SARIMA model and Hybrid model. A description of the SARIMA model and the Hybrid model is presented in sections 3.1 and 3.2, respectively. After the models are trained and calibrated, 15% of the processed data is used for testing while the reminder of 10% is deployed to validate the models. This data division configuration was used based on the guidelines of (Hunter et al. 2012). At this point, models can be fed with the time ahead input data to predict the target, water outflow.  (Shumway and Stoffer 2000). The (p, d, q) non-seasonal order of the model is the number of Autoregressive (AR) parameters, differences, and Moving Average (MA) parameters. The (P, D, Q) s order of the seasonal order of the model is the AR parameters, differences, MA parameters, and periodicity. SARIMA model is formulated as (Shumway and Stoffer 2000): Where, Φ P (B S ): is the seasonal AR parameter of order P φ(B): is the ordinary non-seasonal AR parameter X t : is the measured time series denoted by time t δ: is the intercept Θ Q (B S ): is the seasonal MA parameter of order Q Fig. 2 Forecasting methodology flowchart. A flowchart of the steps developing, calibrating, and validating the models is shown. The historical water demand raw data is pre-processed, where missing and erroneous data are cleaned and imputed. Then, processed data is fed into the SARIMA model, whereas it is divided into two groups, target data and input data for the hybrid model. For the hybrid model, the target data is fed into the SOM model and the input data is sent directly to the RT model. In the SOM model, the target data is clustered and the output cluster number accompanied with the target data is added to the input data in the RT model. At this point, the RT model performs the prediction t time steps ahead. After predicting the target, the performance of the model is assessed (i.e. compared to the held back water demand data). If the performance is satisfactory, the model is implemented to forecast t time steps ahead. However, usually the desired performance cannot be attained from the first trial. In this case, more neurons can be added to the SOM model and/or more leaves and folds can be added to the RT model θ(B): is the ordinary non-seasonal MA parameter W t : is the usual Gaussian noise process Figure 3 shows the algorithm used to determine these parameters and develop SARIMA models in this paper. SARIMA seasonal and non-seasonal parameters are estimated iteratively through plotting the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF). SARIMA is a simple traditional model that can be trained and fitted on a small dataset. Arandia et al. 2016 proposed six SARIMA models that were trained by three small datasets (24 h and 7 days windows, and 15 min, 1 h, and 24 h resolutions). A 7-day window was showed to be sufficient to train the SARIMA model. In this paper, two SARIMA models with two different training windows were fitted to forecast water demand. These models are displayed in Table 3, where they are distinguished by their seasonal period.

Hybrid model
The hybrid model in this study comprises of two models, SOM model and RT model. The practice for the proposed hybrid model is to simply feed the output of the SOM clustering model, accompanied by other desired correlated inputs, to the RT forecasting model. SOM, also known as Kohonen Neural Networks (Kohonen 1982), is an unsupervised learning technique that reduces data dimensionality. SOM uses competitive learning to cluster input data into groups while preserving the topology and the distribution of the input data. Simply stated, an n-dimensional grid of neurons compete to win data points according to how close these points are in the input pattern. The patterns that are close in the input space will be mapped to units that are close in the output space (i.e. grid) (Bação et al. 2005). Figure 4 illustrates the mechanism of a 2-dimensional (2D) grid of 3 neurons competing to map input data into an output of 9 clusters. As shown in this figure, input data points (I 1 , I 2 , …, I n ) are fed to the network where a same initial weight is assigned. Each data point multiplied by its assigned weight is called a node. Then, the Euclidean distance is computed between each node and all competing neuron (3 neurons in this example). The data point in this specific node is won by only the neuron with the shortest computed distance. The neurons that did not win this data point are topologically mapped using a neighbourhood function. This function determines how close each neuron should be to other competing neurons. At the end of this mapping technique and before processing the next input data point, weights are adjusted according to the previous neighbourhood topology. By the end of the training, the input data is grouped in 9 clusters that have the same topology as the input space.
In this study, 2D-SOM are developed and implemented to cluster the water outflow input data using the algorithm shown in Fig. 5a. In the hybrid model, four 2D-SOM clustering models were developed. These four models vary with the number (N) of neurons used in the 2D grid layer. For example, HYB-2 N is a hybrid model with 2 neurons in each dimension of the 2D grid layer. For our purpose, N ranged between 2 and 5. If N were to be less than 2 (i.e. N = 1), the resulted number of clusters is N 2 which equals to 1. This 1 cluster would basically have the same input dataset topology. Therefore, the minimum number of neurons was set to 2. The upper limit in this study is 5 neurons was the limit in this study. That is because using 6 or more neurons did not yield in any significant efficiency increase in the performance of the SOM model and the overall hybrid model.
RT is a supervised learning technique that is used for prediction. RT is the numeric outcome model of the general classification and regression tree (CART) introduced by (Breiman et al. 1984). The model is constructed with an assembly of rules based on variables extracted from the dataset (i.e. predictors). These rules are represented by values that are selected to form the best possible splits to differentiate instances (i.e.  observations). Once a rule, also called decision, is selected, a split is applied at a specific node. This process continues to be applied to each node in the tree through a recursive procedure. RT models are obtained by repeatedly dividing the data space and fitting a simple prediction model within each split. As a result, the data division can be represented graphically as a decision tree (Loh 2011). This splitting process continues until a predefined limit is reached. This limit could be where no further information gain can be achieved. Alternatively, splitting can be left to continue where the tree is pruned at the end of the process. Pruning is a technique that establish stopping rules to prevent the growth of tree sections that do not seem to improve the accuracy of the predicting model. a b Fig. 5 a SOM model algorithm. This figure shows the algorithm used to develop SARIMA forecasting models. Processed target data is fed to the model where the model is trained, tested, and validated. After reaching a satisfying performance, the time ahead input data is read to predict the response (i.e. the cluster number). b RT model algorithm. This figure shows the algorithm used to develop RT forecasting models. Processed target data is fed to the model where the model is trained, tested, and validated. After reaching a satisfying performance, the time ahead input data is read to predict the response (i.e. the water demand) RT model development begins with feeding the input data to the tree root, then the data is filtered and sent to a branch and then to another branch until it reaches the leaf. The leaf is where the final decision is made, is called the Response. In this study, five RT models were developed to forecast 1 h, 8 h, 24 h, and 7 days ahead. The first RT model is a standalone model. This model is not fed any of the SOM output (i.e. no K inputs) as a model input. The rest of the models (HYB-N2, HYB-N3, HYB-N4, and HYB-N5) are hybridized models; all predictors are fed to the model every time the model predicts the future outflow demand. Although the standalone RT model is fed with fewer input parameters, however, all models in this group are structurally identical in terms of the input data time span, the number of tree leaves, and the crossvalidation folds. The RT model algorithm shown in Fig. 5b was used to develop the RT predicting model.

Model performance
The performance of the proposed models was measured based on the deviation of the predicted outflow from the actual outflow. Both over-estimated and under-estimated predictions were considered as inaccurate model performance, therefore, included in error measurements. The performance was measured with: (1) Mean Absolute Percentage Error (MAPE) and (2) Root Mean Squared Error (RMSE). The Normalized Root Mean Squared Error (NRMSE) was calculated and shown along with MAPE in Table 4. The inclusion of NRMSE was to account for the variation of the means of datasets used in forecasting the water outflow. Equations 3, 4, and 5 represent the MAPE, RMSE, and NRMSE, respectively. Here n denotes the number of data points, Y i is the data set mean, and Ŷ i and Y i represent the forecasted and the actual water outflow, respectively.

Models overall performance
The results for the three proposed models forecasting 1 h, 8 h, 24 h, and 7 days ahead are shown in Table 4. It can be observed that the forecast in all time horizons have a relatively better overall performance when RT, and HYB models are deployed compared to the simple linear SARIMA models. On average, the MAPE for RT model was 15% to 25% less compared to the SARIMA models, forecasting 1 h, 8 h, 24 h, and 7 days ahead. Likewise, HYB models had a MAPE of 35% to 70% less than SARIMA models forecasting the same four time horizons ahead. As can be seen, the nonlinear models have outperformed the linear SARIMA models. That is due to the following two reasons. First, the utility's outflow has a nonlinear relationship over service time, where the change in water demand per unit time is variable. RT and HYB models are naturally nonlinear models that have a higher capability in capturing nonlinear patterns in a time series. These models perform extensive computations to extract relationships between parameters in the current, previous, and subsequent time steps. On the other hand, SARIMA models are linear models that linearly approximate a time series with a trend and seasonality regardless of the time series' nature. This process of estimating trend and seasonality parameters grows into a harder task when the time series has a random nonlinear change in patterns. Although, it is apparent that RT and HYB models have the advantage over SARIMA in this study, the time series degree of nonlinearity might change that. With a lower degree of nonlinearity, SARIMA model can better capture the time series' trend and seasonality orders, which is translated to a better performance in the forecasting application. Secondly, RT and HYB models were fed with the correlated inputs mentioned in Table 2 in addition to the water outflow. Meanwhile, SARIMA models were trained and fit with the water outflow time series only. Even though the extra correlated parameters fed to the RT and HYB models are a rearrangement time series of the water outflow or the temporal data, these reproduced time series aided models in extracting interesting nonlinear patterns. Comparing RT and HYB models reveals that hybridizing the RT model has significantly improved its predictivity. Fusing RT model with SOM model has increased the accuracy of forecasting water outflow 1 h, 8 h, 24 h, and 7 days ahead. Table 4 shows that for forecasting water outflow 1 h ahead, MAPE for the RT model dropped by 25%, 41%, 56%, and 58% for HYB-N2, HYB-N3, HYB-N4, and HYB-N5, respectively. Similarly, with a small variance, MAPE for the 8 h, 24 h, and 7 days ahead forecast dropped by 25% to 55%, 16% to 55%, and 17% to 59%, respectively. The SOM clustering model grouped the water outflow dataset into smaller datasets with a smaller space. This has helped the hybridized RT model in extracting patterns from spaces with instances close in intensity.
Inspecting the HYB models divulges that the number of neurons in the SOM models affects its predictivity proportionally. That means HYB models with higher number of neurons performed with less forecasting error. For example, HYB-N5 with 5 neurons in the infused SOM model had a less MAPE than HYB-N4, HYB-N3, and HYB-N2 in all forecasting horizons. Here, increasing the number of neurons reaches a limit where it no longer affects the model performance. In this study, this limit was five neurons, where HYB-6 N model had approximately the same performance as HYB-N5 with a slight decrease of less than 0.5% in MAPE. This asymptote reached by the model is due to having most of datasets patterns explained by other used neurons. In other words, the model at that limit have had reduced data dimensionality to a level where the increase in information gain ratio is negligible.
The last observation that can be drawn from the overall performance results in Table 4 is regarding the time horizon. Here, a forecast time horizon refers to the number of hours ahead a water outflow is to be predicted. Water outflow is a variable function that depends on a plethora of factors. These factors are, but not limited to, the number and growth of the population, consumer type, consumption seasonality, water price, socioeconomic factors, etc. All these factors vary with time and therefore have a level of uncertainty. This can be noticed in the better performance of all proposed models on shorter time horizons. For instance, MAPE increased for all models forecasting 8 h ahead compared to 1 h ahead.

Computational load
Although model overall forecasting performance is a good measure for model accuracy, it does not incorporate model deployment complexity. Therefore, two more measures were calculated and considered for model selection. The first measure is the Akaike Information Criteria (AIC), which penalizes models that use more parameters. AIC consists of two terms, likelihood and number of parameters. The second measure is the time spent during data training steps. As a control of these two measures, all models were built using the same tool. The computational tool used in this research was HP Pavilion TS 14 Notebook PC with a 1.6 GHz Intel Core i5 processor and 8 GB memory. Table 5 displays the results of AIC and training time for the seven proposed models. AIC and training time results reveal that the SARIMA model had the lowest and most preferred performance. SARIMA-24 has had a relatively low AIC due to the shortest data span, 7 days, used to feed the model. SARIMA-168 with a slightly higher AIC and training time ranked second. RT model had a 20 -fold higher AIC, and triple training time compared to SARIMA model. This drastic increase in AIC was due to the extra inputs fed to the model which increased the number of parameters. HYB models had a 10% to 110% higher AIC, and 40% to 135% longer training time compared to the RT model. Again, the addition of the SOM model to the RT model increased the number of parameters which lead to more complex model.

Model application
Models predictivity was investigated using datasets that were not used in the training, testing, and validating of the proposed models. Figure 6 shows the performance of SARIMA, RT, and HYB models forecasting 1 h ahead for a held back dataset of a weekday in November 2017. Similar to the overall performance, nonlinear RT and HYB models have shown better forecasting performance compared to the linear SARIMA model. Also, when RT model performance is compared to HYB models performance, it can be seen that HYB-N5 and HYB-N4 have shown better fits over the tested time span. Here, it can be concluded again that: (1) the RT model performed better when fused with the SOM clustering model, and (2) the HYB model performance improved as the number of clusters in SOM was increased. Figures 7 presents the results of forecasting a weekend day for SARIMA, RT, HYB models 8 h ahead. Again, HYB models performed better than SARIMA and RT models,   Fig. 6, Hybrid models forecasting 8 h ahead have performed better compared to SARIMA and standalone RT models. Also, Hybrid models with more neurons outperformed those with less neurons on the tested dataset and HYB models with higher number of clusters showed the best performance among proposed models.

Concluding remarks
Three models were presented to forecast a water utility outflow 1 h, 8 h, 24 h, and 1 week ahead. The main objective of this study was to investigate the influence of fusing a Supervised and Unsupervised Machine Learning techniques in the application of short-term water demand forecasting. This hybrid model (i.e. HYB) comprised of RT forecasting model and SOM clustering model. The performance of the HYB models was assessed and compared to a standalone RT model, and a SARIMA model. In virtue of its adequate performance, simple structure, and wide use in the application of short-term forecasting, SARIMA model was appended as a baseline model.
The results of this study have highlighted the following concluding remarks: Fusing the RT Supervised Machine Learning model with the SOM Unsupervised Machine Learning model improved models predictivity. HYB models have shown better prediction performance, less forecasting error, when compared to the standalone RT model. The Mean Absolute Percentage Error (MAPE) for HYB models was shown to be 15% to 60% less than the MAPE for the standalone RT model. Increasing the number of clusters in HYB models has led to a significant decrease in forecasting error. For example, the MAPE dropped, on average, by 25% when the number of clusters increased from 4 (for N = 2) to 9 (for N = 3), and by 25% when the number of clusters increased from 9 (for N = 3) to 16 (for N = 4). Including more clusters will eventually not affect the performance after a steady state is reached. This can be seen when HYB models with16 clusters are compared to those of 25 clusters. Nonlinear models (i.e. RT, and HYB) have shown better forecasting performance than the simple linear SARIMA model. However, they are more complicated to build and interpret, and perform the forecast on a slower pace.
To conclude, the HYB model would be the best selection of the investigated models if forecasting accuracy is prioritized over model simplicity. A water utility should consider the implementation of HYB model in order to obtain an accurate forecast. This significant increase in forecasting accuracy could help the water utility meet its demand requirements efficiently, increase its service reliability and customer satisfaction. In particular, water treatment and distribution processes along with maintenance and system development could be improved through optimized pumping schedules and storage. Maintenance and system development could also be improved by having a better understanding of system loads. Decreasing the demand side uncertainties through accurate prediction could also help the water utility avoid over−/under-loading the water supply system. Although this paper only investigated the model performance on shortterm water demand load, the proposed model can also be applied to energy, commercial, and industrial loads.

Nomenclature
The following symbols are used in this chapter: RT = Regression tree SOM = Self-organizing map ANN = Artificial neural network SARIMA = Seasonal autoregressive integrated moving average MAPE = Mean absolute percentage error NRMSE = Normalized root mean squared error SWG = Smart water grids ARIMA = Autoregressive integrated moving average MLP = Multi-layer perceptron RVM = Relevance vector machine ARIMAX = Autoregressive integrated moving average with exogenous variables m3/hr. = Cubic meter per hour am = Ante Meridiem (i.e. Before midday) Q t = The current water outflow PCC = Pearson correlation coefficient K = The cluster number ACF = Autocorrelation function PACF = Partial autocorrelation function hr. = Hour f = Frequency S-24 = SARIMA model with a seasonal period equal to 24 h S-168 = SARIMA model with a seasonal period equal to 168 h n = A number (i.e. 1, 2, 3 … etc.) 2D = Two dimensional I n = Input number n N = Number of neurons HYB-2 N = A hybrid model with 2 neurons in each dimension of the grid HYB-3 N = A hybrid model with 3 neurons in each dimension of the grid HYB-4 N = A hybrid model with 4 neurons in each dimension of the grid HYB-5 N = A hybrid model with 5 neurons in each dimension of the grid RMSE = Root mean squared error Y i = Data set mean b Y i = Forecasted water outflow Y i = Actual water outflow AIC = Akaike information criterion mentioned authors represented a true collaborative effort in this publication. The author(s) read and approved the final manuscript.

Funding
This research was funded mutually by the Union Water Supply Systems and the Turbulence and Energy Lab, Ontario, Canada.

Availability of data and materials
The data used in this study was obtained from a water utility that does not want this proprietary data shared outside of the approved manuscript.