# DBSCAN

SA Engine supports the DBSCAN clustring algorithm through the `dbscan`

system model.

System models are loaded with the `system_models:load()`

function.

`system_models:load("dbscan");`

The first step is to generate a new instance of DBSCAN for your analysis. This is done with the function `dbscan:generate(Charstring name)`

:

`dbscan:generate('my_test');`

The command will generate a number of DBSCAN functions with the prefix `my_test:`

. Indexed stored functions are generated to allow fast training and inference on the model, along with functions for populating the model with training data, training the model, and inferring feature vectors.

To illustrate DBSCAN we will generate random training data points in circular shapes using the following function:

`create function gen_datapoint(Number rad, Number noise, Number i)`

-> Vector of Number

as [rad * sin(i*2*pi()/100)+frand(noise)-noise/2,

rad * cos(i*2*pi()/100)+frand(noise)-noise/2];

Try generating a vector of 100 random data points with:

`//plot: Scatter plot`

select Vector of gen_datapoint(1, 0.2, range(100));

The training set of data points is stored in the function `my_test:datapoints`

. It is populated with the function `my_test:dbscan_add_data`

. Lets populate it with two circular shapes, one with radius `1.0`

and one with radius `0.5`

:

`my_test:dbscan_add_data(select Stream of gen_datapoint(1.0, 0.2, range(1000)));`

my_test:dbscan_add_data(select Stream of gen_datapoint(0.5, 0.2, range(500)));

Now that we have populated our dataset with random points we can start by visualizing them as a scatter plot:

`//plot: Scatter plot`

select vector of { "cos(x)": v[1], "sin(x)": v[2] }

from Vector v, Number n

where v = my_test:datapoints(n);

Now we can train the DBSCAN model by calling the function `my_test:dbscan(Number eps, Number min_nbr)`

, where `eps`

is the maximum distance between points for being classified as neighbors, and `min_nbr`

is the minimum number of neighbor points to be classified as a cluster. For more details on these parameter setting please read DBSCAN

*Example:*

`my_test:dbscan(0.1, 3);`

The DBSCAN model is now trained, so let's have a look at the result. To visualize the clusters we use scatter plot where each point is colored by its cluster. The points for each cluster are stored in the function `my_test:clustered_points(Number cluster_id)`

. Two additional values in the input vectors to scatter plot specify the sizes and colors of the points (i.e., each vector can have the format `[x,y,size,color]`

). The color is specified as an integer in the inteval *[-1,255]*.

*Example:*

`//plot: Multi plot`

{

"sa_plot": "Scatter plot",

"size_axis": "none",

"color_axis": "cluster"

};

select vector of { "cos(x)": v[1],

"sin(x)": v[2],

"cluster": cid

}

from Number cid, Number pid, Vector v

where pid in my_test:clustered_points(cid)

and v = my_test:datapoints(pid);

We can see two clusters here, the inner and the outer circle. Let's use this model to classify a stream of 2D data points. This is done with the function `my_test:dbscan_classify(Vector feature_vector, Number eps, Number minpts) -> Number`

, which returns -1 if the point is an outlier and the cluster id if it belongs to a cluster.

We generate a random data set by calling:

`select vector of gen_datapoint(bag(0.5, 1.0), 0.3, iota(1, 200))`

This generates 400 datapoints, having the radii `0.5`

and `1.0`

. The noise is `0.3`

.

*Simulated classification:*

`//plot: Multi plot`

{

"sa_plot": "Scatter plot",

"size_axis": "none",

"color_axis": "cluster"

};

select vector of { "cos(x)": v[1],

"sin(x)": v[2],

"cluster": label

}

from Vector v, Number label

where v in gen_datapoint(bag(0.5, 1.0), 0.3, iota(1, 200))

and label = my_test:dbscan_classify(v, 0.1, 3);

## Save the trained DBSCAN modelâ€‹

If you use SA Studio in the cloud or self-hosted you can save the DBSCAN instance to a model with one of the following functions:

`dbscan:save_model(charstring instance, Charstring model)`

dbscan:save_model(charstring instance, Charstring model, Charstring file)

The first will save the DBSCAN instance as `master-weights.json`

into the user model `model`

. The second one will save the DBSCAN instance to a file specifed by `file`

.

Note that `model`

must exist in your user model directory. Otherwise the save will give the error `No model named <model>`

.

More information on how to work with models can be found in Managing models guide and Working with queries and models in the SA Studio Manual.

## APIâ€‹

All DBSCAN functions are listed in the OSQL API.