The electricity consumption of buildings is a significant factor in energy usage and cost management. Predicting the electricity consumption can help to optimize energy usage and minimize costs. It can also help to identify potential problems with energy usage and enable building managers to take corrective actions.
I. Data Collection:
Data collection is an essential step in predicting building electricity consumption. The aim is to identify and collect relevant data from various sources such as building energy management systems, utility bills, weather data, and occupancy data.
We gonna use some bulding characteristics as input data to be able to predict their consumption. Here’s a sample:
The data should be pre-processed to ensure that it is accurate and relevant. This involves cleaning and filtering the data, handling missing values, and ensuring that the data is in a format that can be used for analysis. The quality of the data is critical for accurate predictions.
II. Feature Engineering:
Feature engineering involves identifying relevant features that can impact electricity consumption such as temperature, humidity, time of day, day of the week, occupancy, and equipment usage. The aim is to extract, transform, and select features using statistical techniques such as principal component analysis (PCA) or feature selection algorithms. This step is important to ensure that the model can capture the relevant patterns and trends in the data. Normalization is also important to ensure that the features are on the same scale.
III. Model Selection:
Model selection involves identifying the appropriate algorithm for predicting electricity consumption, such as linear regression, decision trees, or neural networks. The aim is to train and test various models to determine the most accurate and efficient model for the prediction task. The accuracy of the model can be evaluated using various performance metrics such as mean squared error (MSE) or coefficient of determination (R-squared). Fine-tuning the model parameters can optimize its performance.
Sometimes there are datasets on which we cannot apply ML models because the data does not reflect anything specific. To know if it is useful or not to use it, we calculate the baseline of the dataset and its metric R2. If R2 < 0 or R2 << 1, it would be a good idea to train different ML models and see which would apply best.
Based on the comparison of linear regression, K-Nearest Neighbors (KNN), and XGBoost models for predicting building energy consumption, the XGBoost model was found to be the best-performing model.
XGBoost is a powerful machine learning algorithm that is particularly effective in handling complex and high-dimensional datasets. It is a type of gradient boosting algorithm that uses a combination of decision trees to make predictions. This algorithm can capture non-linear relationships between variables and handle missing values effectively.
In contrast, linear regression is a simple model that assumes a linear relationship between the input variables and the output variable. It may not perform well when there are non-linear relationships between variables, and it may also be sensitive to outliers in the data.
KNN is a non-parametric algorithm that makes predictions based on the similarity of the input data to the training data. However, it may not perform well when the dataset is high-dimensional and suffers from the “curse of dimensionality.”
Conclusion:
In summary, the XGBoost model outperformed linear regression and KNN in predicting building energy consumption because it is a powerful algorithm that can handle complex and high-dimensional datasets, capture non-linear relationships between variables, and effectively handle missing values.