Identifying sounds
In this tutorial it is shown how to identify sounds from an audio stream. It utilizes the built-in database to store pre-recorded feature vectors of a number of sounds. Then a CQ is run that continuously identifies and reports when the audio stream contains some of these sounds.
This result of the tutorial was demonstrated in the video presented 2021-08-26 at the IEEE Smartcomp 2021 conference.
To begin with, let's create a table of feature vectors in the local database of our stream server (or edge client if you are using an Android device connected to the stream server):
create table sound_anomaly(id Charstring,
fv Vector of Integer,
unique(id));
The approach we are taking for identifying unusual sounds is to make a
model to extract feature vectors from the sound stream and store the
feature of the sound anomaly named id
in sound_anomaly
. We
identify a sound anomaly when a received computed feature vector in
the live audio stream is close to some pre-recorded feature vector in
the table. We choose to use
euclid(v,w)
to measure the
distance between feature vectors v
and w
.
In the previous part of the tutorial
we computed vectors of sound intensities for different frequencies in
the audio stream by using rfft()
. In this tutorial we define a
feature vector as the indices of the k
strongest signals in each
frequency spectrum by the function:
create function top_indices(Vector of Number v, Integer k)
-> Vector of Integer
/* Indices of `k` largest elements in `v` */
as select Vector of i
from Integer i, Number x
where x = v[i]
order by x desc
limit k;
The body of the function top_indices(s,k)
is an example of an OSQL
vector query returning a vector. Vector queries are defined by the
keyword Vector
just after the select
. Notice how order by
and
limit
is used to form av vector of the indices i
of the strongest
frequencies in v
. Test it by calling:
top_indices([1,5,2,7], 3);
The next step is to define a function to produce a stream of feature vectors from the live raw audio stream:
create function sound_features() -> Stream of Vector of Integer
/* Produce a sound feature vector by selecting the indicies of the
five strongest signals in the frequency spectrucm of the audio
stream */
as select Stream of top_indices(e,5)
from Vector of Number e
where e in rfft(audio(512,16000))
and max(e) > 0.1;
Notice the noise cancellation by max(e) > 0.1
!
Test sound_features()
visualized as bar plot while whistling or
clapping your hands! What happens when there is noise cancellation?
Did it work? If not, fix it! Also visualize as text and remember how
the feature vectors of whistling and humming looked.
Now we define a function to find the identity id
of the feature
vector in table sound_anomaly
being closest within radius
from a
feature vector fv
:
create function closest_anomaly(Vector of Integer fv, Real radius)
-> Charstring
/* Get the anomaly closest to the audio feature vector `fv`
within the distance `radius` */
as select s.id
from Number dist, sound_anomaly s
where dist = euclid(fv,s.fv)
and dist <= radius
order by dist, s.id
limit 1
To test closest_anomaly(fv, radius)
we first need to populate
sound_anomaly(id)
by setting the sound anomaly whistle
by the
database update:
insert into sound_anomaly values("Whistle", first(sound_features()));
The function first(s)
returns the first element in stream s
.
Wait a little bit after you have started the above update before you whistle. If the update returns before the whistle happens, the feature vector of some disturbance has been recorded by mistake.
Repeat the update for humming.
insert into sound_anomaly values("Hum",first(sound_features()));
Make sure that the table
sound_features()
has the correct feature vectors for both whistling
and humming.
select * from sound_anomaly;
Now we can test our model by running this continuous query:
closest_anomaly(sound_features(),50)
See Streams in the reference documentation for more on functions over streams. For example, how do you remove the repetitions of detected features in the query above?
The next part of the tutorial explains how to develop models running on edge devices.