Working with recorded data
Introduction
In the model development phase it is often inconvenient to work directly on data streams from devices that are in active production settings. You will need to be able to replay scenarios and to have data that represents both general cases and any edge cases you want your model to handle. The possibility to work on both synthetic data and recorded data is often a requirement to facilitate a smooth and rigorous model development process and to guarantee reliability in model performance.
Working with recorded data on CAN busses offers extra possibilities and is described in the CAN bus module documentation.
Basic recording and replaying of data
The simplest way to record data is to use csv:write_file(Charstring
file,Stream of Vector s)->Boolean
. You provide it with a stream of
vectors and it will save the vectors as CSV to a file (one vector per
row).
To write the values 1-10 to a CSV file we can simply make vectors of
the numeric elements in the stream generated by the synthetic stream
generator
diota(pace,l,u)
.
Example:
// Write query output to CSV
csv:write_file(sa_home() + "example_1.csv",
(select Stream of [n]
from Integer n
where n in diota(0.1,1,10)))
If you want to send the elements to the output window while saving it
you can provide the number 1 as argument feedback
to
csv:write_file(Charstring file,Number feedback,Stream of Vector
s)->Stream of Vector
.
This query will overwrite the file example_1.csv
saved in the
previous query:
// Verbal write to CSV
csv:write_file(sa_home() + "example_1.csv",
1,
(select Stream of [n]
from Integer n
where n in diota(0.1,1,10)))
Replay the data by running the csv:file_stream(Charstring
file)->Stream of Vector
function:
// Replay data from CSV
csv:file_stream(sa_home() + "example_1.csv")
Since csv:write_file()
takes a stream of vector as argument we can
save vectors of any dimension. This query saves a three-dimensional
vector produced by three different streams using pivot()
(using -1
as default values).
// Write 3D vectors to CSV
csv:write_file(sa_home() + "example_2.csv",
(select Stream of v
from Vector v, Stream of Integer s1,
Stream of Real s2, Stream of Real s3
where s1 = diota(0.1,1,10)
and s2 = heartbeat(0.1)
and s3 = simstream(0.1)
and v in pivot([s1,s2,s3], [-1,-1,-1])
limit 10))
Verify the results by replaying the saved file.
//plot: Line Plot
csv:file_stream(sa_home() + "example_2.csv")
Data with timestamps
Maybe we want to have some time information in the saved data. This
can easily be done by saving current time stamp
string
together with each result data element. The current wall time as an
UTC time stamp string is returned by the expression utc_time()
. So
to save a vector with the UTC time stamp string as first value we can
compute the current UTC time stamp string each time we return a new
element from the result.
csv:write_file(sa_home() + "example_3.csv",
(select Stream of [utc_time(), v1, v2]
from Vector v, Integer v1, Integer v2
where v in pivot([diota(0.1,1,10),diota(0.1,1,10)], [-1,-1])
and v = [v1,v2]))
Verify the results by replaying the saved file.
csv:file_stream(sa_home() + "example_3.csv")
You can use parse_iso_timestamp()
to convert time stamp strings into
time points when reading the recorded data. To illustrate this we
create a function that reads the recorded data in example_3.csv
and
outputs a stream of time stamped vectors.
create function replay_recorded_ts_stream()
-> Stream of Timeval of Vector
as select Stream of t
from Timeval t, Vector v,
Number v1, Number v2, Charstring tim
where v in csv:file_stream(sa_home() + "example_3.csv")
and [tim,v1,v2] = v
and t = ts(parse_iso_timestamp(tim),[v1,v2])
Run the function by executing the following query:
replay_recorded_ts_stream()
As you can see the function outputs a stream of the timestamped
vectors recorded in example_3.csv
.
GPS example
Just to show another application example we'll look at how to replay
GPS data from a recorded file. We have provided a file gps.csv
with
GPS data recorded during a drive through Uppsala, Sweden.
http:download_file(
"https://assets.streamanalyze.com/docs/guides/data/gps.csv",
{}, sa_home() + "gps.csv")
First we create a function that replays the data at a specified pace.
create function replay_gps_stream(Real pace)
-> Stream of Vector
as select Stream of v
from Vector v
where v in csv:file_stream(sa_home() + "gps.csv", "read", pace);
We then wrap the GPS values in in GeoJSON records to be able to render the drive on a map.
create function geojson_stream(Number pace, Charstring name)
-> Stream of Record
as select Stream of geojson:point(p,
{"persistent": true,
"id": name,
"style": {"label": name}})
from Vector p
where p in replay_gps_stream(pace);
And finally we start the stream with GeoJSON visualization activated to see how the car drives around in Uppsala.
//plot: Geo JSON
geojson_stream(0.5, "car-01")
The GPS positioning is a bit jittery at first due to low initial accuracy but improves as the car starts driving towards the city center.
Conclusion
This guide has shown how to record data streams, and how to replay and work with recorded streams. As next step we would recommend reading the Advanced recording examples guide where we try these concepts on a real edge device.