The machine learning regressor

Starting with v0.9.0, a new framework is proposed within EMHASS. It provides a machine learning module to predict values from a csv file using different regression models.

This API provides two main methods:

  • fit: To train a model with the passed data. This method is exposed with the regressor-model-fit end point.

  • predict: To obtain a prediction from a pre-trained model. This method is exposed with the regressor-model-predict end point.

A basic model fit

To train a model use the regressor-model-fit end point.

Some paramters can be optionally defined at runtime:

  • csv_file: The name of the csv file containing your data.

  • features: A list of features, you can provide new values for this.

  • target: The target, the value that has to be predicted.

  • model_type: Define the name of the model regressor that this will be used for. For example: heating_hours_degreeday. This should be an unique name if you are using multiple custom regressor models.

  • regression_model: The regression model that will be used. For now only this options are possible: LinearRegression, RidgeRegression, LassoRegression, RandomForestRegression, GradientBoostingRegression and AdaBoostRegression.

  • timestamp: If defined, the column key that has to be used for timestamp.

  • date_features: A list of ‘date_features’ to take into account when fitting the model. Possibilities are year, month, day_of_week (monday=0, sunday=6), day_of_year, day(day_of_month) and hour

Examples:

runtimeparams = {
    "csv_file": "heating_prediction.csv",
    "features": ["degreeday", "solar"],
    "target": "heating_hours",
    "regression_model": "RandomForestRegression",
    "model_type": "heating_hours_degreeday",
    "timestamp": "timestamp",
    "date_features": ["month", "day_of_week"]
    }

A correct curl call to launch a model fit can look like this:

curl -i -H "Content-Type:application/json" -X POST -d  '{"csv_file": "heating_prediction.csv", "features": ["degreeday", "solar"], "target": "hour", "regression_model": "RandomForestRegression", "model_type": "heating_hours_degreeday", "timestamp": "timestamp", "date_features": ["month", "day_of_week"], "new_values": [12.79, 4.766, 1, 2] }' http://localhost:5000/action/regressor-model-fit

A Home Assistant rest_command can look like this:

fit_heating_hours:
  url: http://127.0.0.1:5000/action/regressor-model-fit
  method: POST
  content_type: "application/json"
  payload: >-
    {
    "csv_file": "heating_prediction.csv",
    "features": ["degreeday", "solar"],
    "target": "hours",
    "regression_model": "RandomForestRegression",
    "model_type": "heating_hours_degreeday",
    "timestamp": "timestamp",
    "date_features": ["month", "day_of_week"]
    }

After fitting the model the following information is logged by EMHASS:

2024-04-17 12:41:50,019 - web_server - INFO - Passed runtime parameters: {'csv_file': 'heating_prediction.csv', 'features': ['degreeday', 'solar'], 'target': 'heating_hours', 'regression_model': 'RandomForestRegression', 'model_type': 'heating_hours_degreeday', 'timestamp': 'timestamp', 'date_features': ['month', 'day_of_week']}
2024-04-17 12:41:50,020 - web_server - INFO -  >> Setting input data dict
2024-04-17 12:41:50,021 - web_server - INFO - Setting up needed data
2024-04-17 12:41:50,048 - web_server - INFO -  >> Performing a machine learning regressor fit...
2024-04-17 12:41:50,049 - web_server - INFO - Performing a MLRegressor fit for heating_hours_degreeday
2024-04-17 12:41:50,064 - web_server - INFO - Training a RandomForestRegression model
2024-04-17 12:41:57,852 - web_server - INFO - Elapsed time for model fit: 7.78800106048584
2024-04-17 12:41:57,862 - web_server - INFO - Prediction R2 score of fitted model on test data: -0.5667567505914477

The predict method

To obtain a prediction using a previously trained model use the regressor-model-predict end point.

The list of parameters needed to set the data publish task is:

  • mlr_predict_entity_id: The unique entity_id to be used.

  • mlr_predict_unit_of_measurement: The unit_of_measurement to be used.

  • mlr_predict_friendly_name: The friendly_name to be used.

  • new_values: The new values for the features (in the same order as the features list). Also when using date_features, add these to the new values.

  • model_type: The model type that has to be predicted

Examples:

runtimeparams = {
    "mlr_predict_entity_id": "sensor.mlr_predict",
    "mlr_predict_unit_of_measurement": None,
    "mlr_predict_friendly_name": "mlr predictor",
    "new_values": [8.2, 7.23, 2, 6],
    "model_type": "heating_hours_degreeday"
}

Pass the correct model_type like this:

curl -i -H "Content-Type:application/json" -X POST -d '{"new_values": [8.2, 7.23, 2, 6], "model_type": "heating_hours_degreeday" }' http://localhost:5000/action/regressor-model-predict

or

curl -i -H "Content-Type:application/json" -X POST -d  '{"mlr_predict_entity_id": "sensor.mlr_predict", "mlr_predict_unit_of_measurement": "h", "mlr_predict_friendly_name": "mlr predictor", "new_values": [8.2, 7.23, 2, 6], "model_type": "heating_hours_degreeday" }' http://localhost:5000/action/regressor-model-predict

A Home Assistant rest_command can look like this:

predict_heating_hours:
  url: http://localhost:5001/action/regressor-model-predict
  method: POST
  content_type: "application/json"
  payload: >-
   {
    "mlr_predict_entity_id": "sensor.predicted_hours",
    "mlr_predict_unit_of_measurement": "h",
    "mlr_predict_friendly_name": "Predicted hours",
    "new_values": [8.2, 7.23, 2, 6],
    "model_type": "heating_hours_degreeday"
    }

After predicting the model the following information is logged by EMHASS:

2024-04-17 14:25:40,695 - web_server - INFO - Passed runtime parameters: {'mlr_predict_entity_id': 'sensor.predicted_hours', 'mlr_predict_unit_of_measurement': 'h', 'mlr_predict_friendly_name': 'Predicted hours', 'new_values': [8.2, 7.23, 2, 6], 'model_type': 'heating_hours_degreeday'}
2024-04-17 14:25:40,696 - web_server - INFO -  >> Setting input data dict
2024-04-17 14:25:40,696 - web_server - INFO - Setting up needed data
2024-04-17 14:25:40,700 - web_server - INFO -  >> Performing a machine learning regressor predict...
2024-04-17 14:25:40,715 - web_server - INFO - Performing a prediction for heating_hours_degreeday
2024-04-17 14:25:40,750 - web_server - INFO - Successfully posted to sensor.predicted_hours = 3.716600000000001

The predict method will publish the result to a Home Assistant sensor.

Storing CSV files

Standalone container - how to mount a .csv files in data_path folder

If running EMHASS as Standalone container, you will need to volume mount a folder to be the data_path, or mount a single .csv file inside data_path

Example of mounting a folder as data_path (.csv files stored inside)

docker run -it --restart always -p 5000:5000 -e LOCAL_COSTFUN="profit" -v $(pwd)/data:/app/data -v $(pwd)/config_emhass.yaml:/app/config_emhass.yaml -v $(pwd)/secrets_emhass.yaml:/app/secrets_emhass.yaml --name DockerEMHASS <REPOSITORY:TAG>

Example of mounting a single csv file

docker run -it --restart always -p 5000:5000 -e LOCAL_COSTFUN="profit" -v $(pwd)/data/heating_prediction.csv:/app/data/heating_prediction.csv -v $(pwd)/config_emhass.yaml:/app/config_emhass.yaml -v $(pwd)/secrets_emhass.yaml:/app/secrets_emhass.yaml --name DockerEMHASS <REPOSITORY:TAG>

Add-on - How to store data in a csv file from Home Assistant

Change data_path

If running EMHASS-Add-On, you will likley need to change the data_path to a folder your Home Assistant can access. To do this, set the data_path to /share/ in the addon Configuration page.

Store sensor data to csv

Notify to a file

notify:
  - platform: file
    name: heating_hours_prediction
    timestamp: false
    filename: /share/heating_prediction.csv

Then you need an automation to notify to this file

alias: "Heating csv"
id: 157b1d57-73d9-4f39-82c6-13ce0cf42
trigger:
  - platform: time
    at: "23:59:32"
action:
  - service: notify.heating_hours_prediction
    data:
      message: >
        {% set degreeday = states('sensor.degree_day_daily') |float %}
        {% set heating_hours = states('sensor.heating_hours_today') |float | round(2) %}
        {% set solar = states('sensor.solar_daily') |float | round(3) %}
        {% set time = now() %}

          {{time}},{{degreeday}},{{solar}},{{heating_hours}}