Machine Learning – Predict Running Distance Based On Weather Condition Using Linear Regression

Machine Learning for Healthy Lifestyle Lovers

Nowadays, great and healthy lifestyle are being promoted by governments and NGOs to raise public’s awareness. People surrounding me, including my friends and colleagues, spend their after work time to jog, go to gym, or swimming.

Today, I am going to share, how we can predict running distance based on the weather condition using the ready-data set from Kaggle. We will be using Azure Machine Learning Studio, a free yet comprehensive tool that comes with drag-and-drop features if you don’t like to deal with lots of command or scripting. This has been discussed in our previous posts – “Getting Started with Machine Learning on Azure” and “Azure Machine Learning Studio & Data Analysis“.

Our main goal today will be to predict running distance based on weather condition and the algorithm we will be exploring is linear regression.

What’s Linear Regression?

Linear regression is a basic and commonly used type of predictive analysis. The overall idea of regression is to examine two things: (1) does a set of predictor variables do a good job in predicting an outcome (dependent) variable? (2) Which variables in particular are significant predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable? These regression estimates are used to explain the relationship between one dependent variable and one or more independent variables.

(Credit: Static Solutions)

Creating Experiment using Azure ML Studio

  1. Get Ready Your Dataset
    Let’s download the dataset from Kaggle Run Activities. I would recommend to trim some of the columns so we make it much straight forward later. In this case, I left some data about distance and some weather-related columns.
  2. Upload Dataset to Azure ML Studio
    With the trimmed dataset, we now upload to Azure Machine Learning Studio with the left bottom “+New” button.
  3. Create a Blank Experiment.
    To begin our exploration, we will have to create a blank experiment.
    Step 3
  4. Clean and Choose Data Columns
    Data often comes with missing data cell or abnormal data. Therefore, we can drag and apply the “Clean Missing Data” component from the left panel to help us clean the data.
    After that, we choose the columns that we would want to train model and later used for evaluate the trained model.
  5. Splitting Data for Training Model and Testing
    By splitting the big chunk of data, we have sufficient data for both training model and testing the train model.
  6. Choose Linear Regression as Algorithm
    There are plenty of algorithms for machine learning that you can find from the Microsoft Algorithm Cheat Sheet for Machine Learning which you can download here. For today, we will be using Linear Regression algorithms to predict the running distance.
    Algorithm Cheat Sheet
  7. Partition and Sample
    Partition and sample is an important tool in machine learning because it reduces the size of a dataset while maintaining the same ratio of values.
  8. Train Model
    Now, we shall select the column for prediction, which in this case, “Distance (Raw)”.
  9. Score Model
    Scoring is widely used in machine learning to mean the process of generating new values, given a model and some new input.
  10. Evaluate Model
    This is to determine whether the predictions are accurate, the amount of error, and whether overfitting occurs to ensure the efficiency of the model.
  11. Finalize and Save
    Finally, we complete creating the experiment and we can save the experiment.

  12. Run and Deploy
    Now, we can deploy it as a predictive web service which I think it is cool that Microsoft has provided everything to be hosted on cloud without the need to maintain infrastructure by ourselves.
  13. Test Deployed Web Service
    Based on the random input of weather condition, we can predict a person can run 3.9KM.

Goal Achieved

By following the above, we have learnt about Linear Regression and how we can use Azure Machine Learning to predict the distance a person can run based on the weather condition.


Please follow me for more upcoming AI topics @
Follow me @ Twitter@hmheng
Subscribe My Channel @ YouTube:
More slides @ SlideShare

You may also like...

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: