The taxi problem is an intro to a well known machine learning problem, the paper will explain about feature engineering, analysis and using various regression algorithms for the purpose of solving the problem, you can use this as a base for many regression and classification problems.

A Second study (regression, random forest, xgboost (extreme gradient boosting tree)).

Standard error estimate -- measures the distance from the estimated value to the real value

R^2 error estimate- measures the distance of the estimated to the mean against the real to the mean, 1 no error, 0 lots.

**** with regression prediction it's best to create dummy variables (i.e., binary variables - exist or doesn't exist) from numeric variables, such as grid_number to grid_1, grid_2 etc..

Last updated