Linear regression
SA Engine supports multiple linear regression through the linear_regression system model. It can both estimate given weights and train a model given a dataset. The training is done using Gradient descent.
The linear_regression system model is loaded with the system_models:load() function.
system_models:load("linear_regression");
To train the regression model we use the linear_regression() function.
//plot: Text
set :s = (select vector of v
from Vector v
where v in csv:file_stream(system_models:folder("linear_regression")+
"test/linear_reg2.csv"));
set :r = linear_regression(:s, // training data
0.0000001, // learning rate
100, // max iterations
[1], // indices to use in prediction
2); // index of value to predict
set :w = :r[1];
:r;
In the preceding example we train the regression model on a set of 2D vectors loaded from the file linear_reg2.csv. We use 0.0000001 as learning rate for the gradient descent algorithm, we set the maximum number of iterations to 100, and we say that we want to use the first value in the vector to predict the second value.
The vector returned contains the weight vector and the Sum of Squares Error (SSE) for the regression model.
In the case where we have 2D data points [x,y] and a linear regression model y = kx + m to predict the value of y based on the x value (like in the code example above), the weight vector returned by linear_regression() has the format [m,k].
To apply the regression model on some value we use the lr_estimate() function. It takes the weight vector and the input vector and returns the predicted value according to the regression model.
For example, in our trained regression model we got the weight vector [0.00000100622172569312,0.00180844720146722]. This means that for an x-value of 1500 we should get a y-value of approximately 1500 * 0.00180844720146722 + 0.00000100622172569312 = 2.713. Let's see if that is correct.
lr_estimate(:w,[1500]);
The prediction was as expected.
This means that we can now use the trained regression model on the training data and we should get predictions along a straight line. The following code plots the training data in one color and the prediction for each point in the training set in another color.
//plot: Multi plot
set :min = aggv(in(:s),#"BAG.MIN->OBJECT");
set :max = aggv(in(:s),#"BAG.MAX->OBJECT");
{
"sa_plot": "Scatter plot",
"size_axis": "none",
"color_axis": 3,
"axis_opts": {"xDomain":[:min[1],:max[1]],
"yDomain":[:min[2],:max[2]]}
};
-- Draw the model regression line
{"sa_ovr": "", "space": 'data', "mode": 'replace',
"lines": [ {"points": [:min[1], lr_estimate(:w, [:min[1]]),:max[1],lr_estimate(:w, [:max[1]])]}]};
-- Plot the dataset
select vector of y
from Vector of Number v, Vector of Number y
where v in :s
and y = concat(v,[1]);
API
All linear regression functions are listed in the OSQL API.
