Machine Learning – Predict Running Distance Based On Weather Condition Using Linear Regression
Machine Learning for Healthy Lifestyle Lovers
Nowadays, great and healthy lifestyle are being promoted by governments and NGOs to raise public’s awareness. People surrounding me, including my friends and colleagues, spend their after work time to jog, go to gym, or swimming.
Today, I am going to share, how we can predict running distance based on the weather condition using the ready-data set from Kaggle. We will be using Azure Machine Learning Studio, a free yet comprehensive tool that comes with drag-and-drop features if you don’t like to deal with lots of command or scripting. This has been discussed in our previous posts – “Getting Started with Machine Learning on Azure” and “Azure Machine Learning Studio & Data Analysis“.
Our main goal today will be to predict running distance based on weather condition and the algorithm we will be exploring is linear regression.
What’s Linear Regression?
Linear regression is a basic and commonly used type of predictive analysis. The overall idea of regression is to examine two things: (1) does a set of predictor variables do a good job in predicting an outcome (dependent) variable? (2) Which variables in particular are significant predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable? These regression estimates are used to explain the relationship between one dependent variable and one or more independent variables.
(Credit: Static Solutions)
Creating Experiment using Azure ML Studio
- Get Ready Your Dataset
Let’s download the dataset from Kaggle Run Activities. I would recommend to trim some of the columns so we make it much straight forward later. In this case, I left some data about distance and some weather-related columns.
- Upload Dataset to Azure ML Studio
With the trimmed dataset, we now upload to Azure Machine Learning Studio with the left bottom “+New” button.
- Create a Blank Experiment.
To begin our exploration, we will have to create a blank experiment.
- Clean and Choose Data Columns
Data often comes with missing data cell or abnormal data. Therefore, we can drag and apply the “Clean Missing Data” component from the left panel to help us clean the data.
After that, we choose the columns that we would want to train model and later used for evaluate the trained model.
- Splitting Data for Training Model and Testing
By splitting the big chunk of data, we have sufficient data for both training model and testing the train model.
- Choose Linear Regression as Algorithm
There are plenty of algorithms for machine learning that you can find from the Microsoft Algorithm Cheat Sheet for Machine Learning which you can download here. For today, we will be using Linear Regression algorithms to predict the running distance.
- Partition and Sample
Partition and sample is an important tool in machine learning because it reduces the size of a dataset while maintaining the same ratio of values.
- Train Model
Now, we shall select the column for prediction, which in this case, “Distance (Raw)”.
- Score Model
Scoring is widely used in machine learning to mean the process of generating new values, given a model and some new input.
- Evaluate Model
This is to determine whether the predictions are accurate, the amount of error, and whether overfitting occurs to ensure the efficiency of the model.
- Finalize and Save
Finally, we complete creating the experiment and we can save the experiment.
- Run and Deploy
Now, we can deploy it as a predictive web service which I think it is cool that Microsoft has provided everything to be hosted on cloud without the need to maintain infrastructure by ourselves.
- Test Deployed Web Service
Based on the random input of weather condition, we can predict a person can run 3.9KM.
By following the above, we have learnt about Linear Regression and how we can use Azure Machine Learning to predict the distance a person can run based on the weather condition.
Please follow me for more upcoming AI topics @
Follow me @ Twitter: @hmheng
Subscribe My Channel @ YouTube: http://bit.ly/hmheng_yt
More slides @ SlideShare: https://www.slideshare.net/HiangMengHengMarvin