# Density-based spatial clustering of applications with noise (DBSCAN)

This model implements DBSCAN

The first step is to generate a new instance of DBSCAN for your
analysis. This is done with the function ```
dbscan:generate(Charstring
name)
```

:

load_system_model("dbscan");

dbscan:generate('my_test');

To run this code block you must be logged in and your studio instance must be started.

The command will generate a number of DBSCAN functions with the prefix
`my_test:`

. Indexed stored functions are generated to allow fast
training and inference on the model, along with functions for
populating the model with training data, training the model, and
inferring feature vectors.

To illustrate DBSCAN we will generate random training data points in circular shapes using the following function:

`create function gen_datapoint(Number rad, Number noise, Number i)`

-> Vector of Number

as [rad * sin(i*2*pi()/100)+frand(noise)-noise/2,

rad * cos(i*2*pi()/100)+frand(noise)-noise/2];

To run this code block you must be logged in and your studio instance must be started.

Try generating a vector of 100 random data points with:

`select Vector of gen_datapoint(1, 0.2, range(100));`

To run this code block you must be logged in and your studio instance must be started.

Here we notice that the automatic scaling make the shape oval rather
than circular. This can be fixed by changing the visualization to
**Multi plot** and prefixing it with a **visual formatting**.
When the data is plotted you simply click the lock in the upper left
corner, then you can grab the small rectangle in the lower right corner
and drag to resize the plot and get a circular shape of the cluster.

*Example:*

`{"sa_plot":"Scatter plot"};`

select Vector of gen_datapoint(1, 0.2, range(100));

To run this code block you must be logged in and your studio instance must be started.

See Visualization for further details.

The training set of data points is stored in the function
`my_test:datapoints`

. It is populated with the function
`my_test:dbscan_add_data`

. Lets populate it with two random circular
shapes:

`my_test:dbscan_add_data(select Stream of gen_datapoint(1,0.2,range(1000)));`

my_test:dbscan_add_data(select Stream of gen_datapoint(0.5,0.2,range(500)));

To run this code block you must be logged in and your studio instance must be started.

Now that we have populated our dataset with random points we can start by visualizing them as a scatter plot:

`select vector of { "cos(x)": v[1], "sin(x)": v[2] }`

from Vector v, Number n

where v = my_test:datapoints(n);

To run this code block you must be logged in and your studio instance must be started.

Now that we have populated our training dataset with random 2D points
we move on to training the DBSCAN model by calling the function
`my_test:dbscan(Number eps, Number minPts)`

. `eps`

is the maximum
distance between points for being classified as neighbors, `minPts`

is
the minimum number of neighbor points to be classified as a
cluster. For more details on these parameter setting please read
DBSCAN

*Example:*

`my_test:dbscan(0.1,3);`

To run this code block you must be logged in and your studio instance must be started.

The DBSCAN model is now trained, so let's have a look at the
result. To visualize the clusters we use scatter plot where each point
is colored by its cluster. The points for each cluster id are stored
in the function `my_test:clustered_points(Number cluster_id)`

. Two
additional values in the input vectors to scatter plot specify the
sizes and colors of the points, i.e. each vector can have the format
`[x,y,size,color]`

. The color is specified as an integer in the
inteval *[-1,255]*.

*Example:*

`{`

"sa_plot": "Scatter plot",

"size_axis": "none",

"color_axis": "cluster"

};

select vector of { "cos(x)": v[1],

"sin(x)": v[2],

"cluster": cid

}

from Number cid, Number pid, Vector v

where pid in my_test:clustered_points(cid)

and v = my_test:datapoints(pid);

To run this code block you must be logged in and your studio instance must be started.

We can see two clusters here, the inner and the outer circle. Let's
use this model to classify a stream of 2D data points. This is done by
using the function ```
my_test:dbscan_classify(Vector feature_vector,
Number eps, Number minpts) -> Number
```

, which returns -1 if the point
is an outlier and the cluster id if it belongs to a cluster.

We generate a random data set by calling:

`select vector of gen_datapoint(bag(0.5,1),0.3,iota(1,200))`

To run this code block you must be logged in and your studio instance must be started.

This generates 400 datapoints, having the radii 0.5 and 1. The noise is 0.3.

*Siimulated classification:*

`{`

"sa_plot": "Scatter plot",

"size_axis": "none",

"color_axis": "cluster"

};

select vector of { "cos(x)": v[1],

"sin(x)": v[2],

"cluster": label

}

from Vector v, Number label

where v in gen_datapoint(bag(0.5,1),0.3,iota(1,200))

and label = my_test:dbscan_classify(v, 0.1, 3);

To run this code block you must be logged in and your studio instance must be started.

# Save the trained DBSCAN model

You can save the dbscan instance `my_test`

to a model by calling the function
`dbscan:save_model(charstring instance, Charstring model)`

. This will save the
dbscan instance as `master-weights.json`

into the user mode `model`

:

Note that `model`

must exist in your user model directory.
Otherwise the save will give the error `No model named <model>`

.

`dbscan:save_model("my_test", "test");`

To run this code block you must be logged in and your studio instance must be started.

Note:If you look into the generated files for the model`test`

a file named`master-weights.json`

which contains all the DBSCAN data for tour DBSCAN instance.

If you do not want to save the DBSCAN instance in the file `master-weights.json`

then you can use the function
`dbscan:save_model(charstring instance, Charstring model, Charstring file)`

which will save it in
the file `file`

under model `model`

instead.