Identifying sounds

In this tutorial it is shown how to identify sounds from an audio stream. It utilizes the built-in database to store pre-recorded feature vectors of a number of sounds. Then a CQ is run that continuously identifies and reports when the audio stream contains some of these sounds.

This result of the tutorial was demonstrated in the video presented 2021-08-26 at the IEEE Smartcomp 2021 conference.

To begin with, let's create a table of feature vectors in the local database of our stream server (or edge client if you are using an Android device connected to the stream server):

create table sound_anomaly(id Charstring,
                           fv Vector of Integer,
                           unique(id));

The approach we are taking for identifying unusual sounds is to make a model to extract feature vectors from the sound stream and store the feature of the sound anomaly named id in sound_anomaly. We identify a sound anomaly when a received computed feature vector in the live audio stream is close to some pre-recorded feature vector in the table. We choose to use euclid(v,w) to measure the distance between feature vectors v and w.

In the previous part of the tutorial we computed vectors of sound intensities for different frequencies in the audio stream by using rfft(). In this tutorial we define a feature vector as the indices of the k strongest signals in each frequency spectrum by the function:

create function top_indices(Vector of Number v, Integer k)
                          -> Vector of Integer
  /* Indices of `k` largest elements in `v` */
  as select Vector of i
       from Integer i, Number x
      where x = v[i]
      order by x desc
      limit k;

The body of the function top_indices(s,k) is an example of an OSQL vector query returning a vector. Vector queries are defined by the keyword Vector just after the select. Notice how order by and limit is used to form av vector of the indices i of the strongest frequencies in v. Test it by calling:

top_indices([1,5,2,7], 3);

The next step is to define a function to produce a stream of feature vectors from the live raw audio stream:

create function sound_features() -> Stream of Vector of Integer
  /* Produce a sound feature vector by selecting the indicies of the
     five strongest signals in the frequency spectrucm of the audio
     stream */
  as select Stream of top_indices(e,5)
       from Vector of Number e
      where e in rfft(audio(512,16000))
        and max(e) > 0.1;

Notice the noise cancellation by max(e) > 0.1!

Exercise

Test sound_features() visualized as bar plot while whistling or clapping your hands! What happens when there is noise cancellation? Did it work? If not, fix it! Also visualize as text and remember how the feature vectors of whistling and humming looked.

Now we define a function to find the identity id of the feature vector in table sound_anomaly being closest within radius from a feature vector fv:

create function closest_anomaly(Vector of Integer sound_fv, Real radius)
                              -> Charstring
  /* Get the anomaly closest to the audio feature vector `fv`
     within the distance `radius` */
  as select s.id
       from Number dist, sound_anomaly s
      where dist = euclid(sound_fv,s.fv)
        and dist <= radius
      order by dist, s.id
      limit 1

To test closest_anomaly(fv, radius) we first need to populate sound_anomaly(id) by setting the sound anomaly whistle by the database update:

insert into sound_anomaly values("Whistle", first(sound_features()));

The function first(s) returns the first element in stream s.

info

Wait a little bit after you have started the above update before you whistle. If the update returns before the whistle happens, the feature vector of some disturbance has been recorded by mistake.

To redo the recording, use update:

update sound_anomaly set fv=first(sound_features()) where id="Whistle";

Repeat the update for humming.

insert into sound_anomaly values("Hum",first(sound_features()));

Make sure that the table sound_features() has the correct feature vectors for both whistling and humming.

select * from sound_anomaly;

Now we can test our model by running this continuous query:

closest_anomaly(sound_features(),50)

See Streams in the reference documentation for more on functions over streams. For example, how do you remove the repetitions of detected features in the query above?