Introduction
OSQL extends SQL seamlessly with functions that compute results for given arguments. For example, there is a large library of mathematical functions built into the system and new functions can easily be defined. Each SA Engine system has an internal main-memory database where the functions' definitions are stored.
The simplest kind of query is to call some built-in function.
Example:
sin(3.14/2)*cos(3.14)
Run the function in your browser by clicking the play button play_circle
Documentation of functions and other SA Engine artifacts are stored in
the database as well. You can query the database about the
documentation of a function fn
by calling doc('fn')
. It returns
the documentation of a function as a string.
Example:
doc('sin')
You can edit the examples in the tutorial. Try changing sin
to tan
.
You can enter several OSQL statements by separating them with a ;
.
Example:
doc('atan2');
doc('cos')
If you don't know exactly the name of a function you can also pass a
string pattern to doc()
to return all functions matching the
string pattern in their names. In a string pattern a %
matches any
sequence of characters while a _
matches a single character.
Example:
doc('_tan%')
Functions may return sets of values. In the example the function doc
returned two different documentation strings.
The function sourcecode('fn')
where fn
can be a function name or
a string pattern returns the definitions of the matching functions.
Example:
sourcecode('tan')
The first line of a source code string contains the signature of
the function. It is a string denoting the function's arguments and
results. You can get just the signature of function or string pattern
fn
by calling the system function signature(fn)
.
Example:
signature('atan%')
Functions may have different definitions depending on the types of their arguments. Such functions are said to be overloaded on their types. Overloaded function have several signatures.
Example:
signature('range')
Some functions have corresponding infix operators. For example,
the function plus
has the infix operator +
so the queries 1+2
and plus(1,2)
both return the value 3.
You can call signature(op)
to get the signature of infix operators too.
Example:
signature('^')
The following function call returns the version of SA Engine you are running:
system_version()
Visit Documentation functions for a list of functions used for documenting the system.
Basic math functions are listed in Math.
Session variables
During sessions with SA Engine it is often practical to temporarily
save values of expressions in session variables prefixed with :
by using the set
statement, e.g.:
set :pi = 3.14
Usage:
sin(:pi/2)*cos(:pi)
The session variable :pi
is first assigned the value of 3.14 and
then used twice in a computation.
Session variables are not stored in the local database and disappear when you exit SA Engine. Use database tables to save data in the local database, as explained next.
Database tables
Numbers and other objects used in analysis models can be saved in database tables. Database tables are maintained and queried by standard SQL statements. OSQL is a seamless extension of SQL with functions, mathematical objects, and data streams.
Tables are created by the create table
statement.
Example:
create table Constants(name Charstring,
value Real,
unique (name))
The above statement creates a table called Constants
having two
columns, where values in column name
are strings and column
value
stores floating point numbers. The keyword unique
specifies
that the values of column name
are different in all table rows.
The create
table
syntax of
SQL is more elaborate. For example, strings need not have maximum
lengths in OSQL.
Tables are populated using SQL's insert
into
statement.
Example:
insert into Constants values('pi', 3.14),
('e', 2.71),
('freeze', 0.0),
('boil', 100.0)
Queries
Queries filter out and transform objects fulfilling some specific properties from collections of objects.
A populated table can be queried by SQL's select
statement
.
Examples:
select * from Constants
select value from Constants where name = 'pi'
select sin(value/2)*cos(value) from Constants where name = 'pi'
Tables can be updated using SQL's update
statement
.
Examples:
update Constants set value=asin(1)*2 where name = 'pi'
update Constants set value=exp(1) where name = 'e'
select * from Constants
The following query returns the names and values of the constants in
the table Constants
whose closest integer is 3:
select name, value
from Constants
where round(value)=3
SQL's select
statement
is a special
case of OSQL queries. A restriction with SQL's select statement is that
variables in queries must be bound only to rows in tables. In OSQL
queries, variables can be bound to any kind of objects, e.g. strings,
numbers, vectors, matrices, tensors, bags, streams, records or user
defined objects, . Combined with functions, this makes analytical models
in different domains very easy to express with OSQL.
The SQL subset of OSQL is introduced in the SQL Tutorial. You can skip that tutorial if you are already familiar with SQL.
Unlike SQL, you can make OSQL queries over any kind of objects by
specifying their types in the from
clause of the select statement.
Example: This query returns the set of odd elements among the integers between 10 and 15:
select i
from Integer i
where i in range(10)
and mod(i,2) = 1
Here, the function range(10)
returns the set of integers between 1
and 10, the operator i in range(10)
finds all elements i
in the
set, and the function mod(i,2)
computes the remainder of dividing
i
with 2.
Set queries
By default queries are set queries that return sets of objects where
the order is not guaranteed. For example, the function range(n)
returns the set of numbers from 1 up to n
.
The following query computes the set of divisors for the integer 143:
select d
from Integer d, Integer n
where d in range(2,ceiling(n/2))
and 0 = mod(n,d)
and n = 143
A function applied on a set in a query is called for each element of the set.
Example:
sin(range(10))
Try visualizing the result as a line plot by inserting the row
//plot: Line plot
before the query!
In general sets in OSQL may contain duplicates. Therefore, the term bag is used instead of set to indicate that duplicates are allowed.
Example: This query returns a bag rather than a set:
floor(range(10)/2)
Apply sin(x)
on the result of floor
and visualize by line plot!
Explicit bags can be created using the notation.
Example:
set :mybag = bag(1,2,2,3)
This query returns a bag of the even elements in the session variable
:mybag
:
select i
from Integer i
where i in :mybag
and mod(i,2)=0
Case expressions
Case expressions compute values having different definitions depending on conditions.
Example:
select case when x <= -40 then "too cold"
when x < 0 then "freezing"
when x < 18 then "cold"
when x < 23 then "plesant"
when x < 40 then "hot"
else "too hot"
end
from Real x
where x in 10*range(-4,4)
This illustrates how case expressions can be used for specifying decision rules based on values of variables.
Tests in case
expressions can be functions returning any kind of
value, not only logical values. A test succeeds unless its value is
null
or false
.
Example:
case when sqrt(-1) then 1
when 2 < sqrt(2) then 2
else 0 end
The case expression compute these values without
executing any actions (side effects). This is different from
conditional statements in conventional programming languages such as
if {condition} do-something; else do-something-else;
where different
side effects (actions) are executed based on a condition.
AND, OR and NOT
In the where
clause of a select statement, conditions connected with
and
operators restrict the query result. Analogously, or
clauses
extends the query result.
Example:
The following query returns the set of values of the vector
[1,2,3,4,5]
being less than 3 or larger than 4:
select i
from Integer i
where i in [1,2,3,4,5]
and (i<3 or i>4)
Notice that or
has higher priority than and
in queries, so
parentheses may be needed if both and
and or
are used in the same
query as in the example.
To negate disjunctive logical expressions we can use the expression
not l
that returns true
when l
is true:
Example:
select i
from Integer i
where i in [1,2,3,4,5]
and not i<=sqrt(10)
Queries over sets and bags are documented in Queries.
Defining functions
Analysis models are defined in terms of OSQL functions. You can
define your own OSQL functions by the create function
statement.
Example: The following function converts Celsius degrees to Fahrenheit:
create function ctof(Real c) -> Real
/* Convert a Celsius degree `c` to Fahrenheit */
as c * 9 / 5 + 32
OSQL functions are stored in the local database inside SA Engine.
Let's test ctof
:
ctof(10)
You can use your OSQL functions in queries, e.g.
select ctof(value) from Constants where name = 'boil'
You can also create derived functions, which are functions defined by a query.
Example:
create function constant(Charstring c) -> Real
as select value from Constants where name = c
ctof(constant('boil'))
The seamless OSQL extension of SQL allows variables to be bound
both to rows in tables and other kinds of objects. In the example
the variables value
and name
refer to columns in the table
Constants
, while the variable c
is the function's argument.
This derived function computes the divisors of integer n
:
create function divisors(Integer n) -> Bag of Integer
as select d
from Integer d
where d in range(2,ceiling(n/2))
and 0 = mod(n,d)
divisors(143)
Case expressions can be used in function definitions.
Example: The signum of a number can be defined as:
create function signum(Real x) -> Integer
as case when x > 0 then 1
when x = 0 then 0
else -1 end
signum(-3.1)
The built-in function sign(x)
does the same.
Another example of a function defined in terms of case expressions is the
built in function atan2(y,x)
:
sourcecode("atan2")
atan2(1,2)
Temporal functions
The built-in function local_time()
returns the current local wall
time as an ISO time stamp string.
Example:
local_time()
The current UTC time stamp string is returned by calling utc_time()
.
While local_time()
and utc_time()
return the current wall time as
time stamp strings, the function now()
returns the wall time as a
time point, which is an object of type Timeval
that internally
represents the number of microseconds since new year 1970.
Example:
now()
The function ts(Object o)->Timeval
returns a time stamped
object, which associates the current wall time point with o
also
using the type Timeval
. The object associated with a time stamped
object tv
is accessed by the function value(Timeval tv)->Object
,
while the function timestamp(Timeval tv)->Timeval
returns the time
point itself.
Examples:
set :ts = ts(1);
value(:ts)
timestamp(:ts)
local_time(:ts)
For more about functions over time visit Time in the reference documentation.
Vectors
The data type Vector
represents finite sequences of
objects with the usual [...]
notation.
Examples of vectors:
A vector of four integers:
[1,2,3,4]
A vector of two vectors:
[[1.1,2.0],[3,4]]
A vector of a string and a number:
['Tore',55]
You can access vector element by the notation v[i]
.
Example:
set :v = [2,4,6]
:v[2]
The e in v
operator can be used to get all elements e
in vector
v
.
Example: The following query returns the bag of even numbers in the vector
[1,2,3,2,4]
:
select e
from Integer e
where e in [1,2,3,2,4]
and mod(e,2) = 0
Select vector queries
Unlike a regular set query that returns a bag of values, a select vector query returns a vector of values.
Example: The following select vector query removes 2 from the vector
[1,2,3,2,4]
:
select Vector of a[i]
from Vector a, Integer i
where a = [1,2,3,2,4]
and a[i] != 2
order by i
The order by
clause normally should be present when constructing
vectors with a select vector query in order to exactly specify the
order of the elements in the vector. If no order by
clause is
present the order of the elements in the vector is arbitrarily chosen
by the system being the order that is the most efficient to produce.
Modify the above select vector query to also reverse the order of the
result thus returning [4,3,1]
,
The following select vector query returns a vector of prime numbers in
[197,143,2]
:
select Vector of n
from Integer n
where n in [197,143,2]
and not exists divisors(n)
Vector operators and functions
As for numbers there is a built-in library of vector operators and
functions. The mathematical operators +,-,*,/,^
can be used for
whole vectors or mixes of vectors and scalar values. A dot (.
) in
front of an operator indicates element-wise application of it on each
element of the vectors.
Examples:
The number of elements in a vector;
dim([1, 3, 2])
Vector addition:
[1,4,3,2]+[1,5,6,7]
Change visualization of the vectors to Bar plot
by adding //plot:
Bar plot
before the query.
Element wise vector multiplication:
[1, 2, 3, 4] .* [5, 6, 7, 8]
Combining vectors and scalars.
[1,4,3,2]-[0,2,3,4]+1
The sum of vector elements:
sum([1, 2, 3, 4])
The average of vector elements:
avg([1, 2, 3, 4])
The standard deviation of elements:
stdev([1, 2, 3, 4])
The median of the elements:
median([1, 4, 2, 3])
Go to Vectors in the reference documentation for documentation of vectors.
Arrays
An array is a multi-dimensional cube of numbers. It is
represented by type Array
. Arrays can be created with the function
array(Vector data)->Array
, where the (nested) vector data
contains
its numerical elements.
The number of dimensions of an array is called its rank and
returned by the function rank(Array a)->Integer
. Arrays with rank
one (1D arrays) are called numerical vectors, arrays with rank two
are called matrices, and arrays with higher dimension than two are
called tensors.
Examples:
Set variable :v
to numerical vector:
set :v = array([1,2,3])
Rank of :v
:
rank(:v)
The representation of numerical vectors is much more efficient than
using objects of type Vector
.
Examples:
Set variable :v
to matrix:
set :a = array([[1,2,3],[4,5,6]])
Rank of :a
:
rank(:a)
Value of :
:a[2,3]
Array format
The elements of an array have a format that specifies their
binary representations in memory. For example, elements of type
Integer
are represented as 64-bits integers having the format I64
,
and type Real
has the format F64
. The first argument of the
function array(Charstring fmt,Vector data)->Array
specifies the
elements' format.
Example: This call creates a 1-dimensional array of 16-bit integers:
array('I16',[1,2,3,4])
When called with one argument, array(Vector data)->Array
the
element format will be the most general type in data
, where
real numbers are considered more general than integers,
Examples:
array([1,2,3,4])
array([1.1,2,3,4])
Array shape
An array also has a shape being a vector of the number of elements
in each of its dimensions. The shape is computed by the function
shape(Array a)->Vector of Integer
.
Examples:
shape(array([1,2,3]))
shape(array([[1,2,3],[4,5,6]]))
Array operators and functions
As for vectors there is a built in library of array functions and operators.
Examples:
Numerical vector addition:
array([1, 2, 3, 4]) + array([5, 6, 7, 8])
The total sum of array elements:
sum(array([1, 2, 3, 4]))
sum(array([[1, 2], [3, 4]]))
Array addition:
array([[1,2],[3,4]]) + array([[2,3],[6,7]])
Element-wise multiplication:
array([[1,2],[3,4]]) .* array([[2,3],[6,7]])
The function dot(Array u, Array v)->Number
computes the scalar
product of the elements in arrays u
and v
of any shape.
Examples:
dot(array([1,2,3]),array([4,5,6]))
dot(array([[1,2],[3,4]]),array([[4,5],[6,7]]))
The function matmul(Array a, Array b)->Array
computes the array
multiplication of matrices a
and b
.
Example:
matmul(array([[1,2,3]]),array([[1,2],
[3,4],
[5,6]]))
Select array queries
Analogous to set and select vector queries, a select array query returns a single array as result.
Example: The following query returns a 4x5 array containing
integers elements e
with format U8
where :
select Array[i..4,j..5] of U8 e where e = i + j
Visit Arrays and tensors for more detailed descriptions of arrays.
Aggregate functions
Aggregate functions compute a single value from the elements in bags, vectors or arrays.
The aggregate function sum()
adds the elements in a bag, vector or array:
Examples:
sum(range(200)/2)
sum(array([1, 2, 3, 4]))
sum(array([[1, 2], [3, 4]]))
sum([1, 2, 3, 4])
The function avg()
computes the mean value of the elements in a bag,
vector or array.
Examples:
avg(array([1, 2, 3, 4]))
avg(array([[1, 2], [3, 4]]))
avg([1, 2, 3, 4])
The function stdev()
computes the standard deviation of the elements
in a bag, vector or array.
Examples:
stdev([1, 2, 3, 4])
The function max()
computes the largest element in a bag, vector or array.
Examples:
max(array([2, 4, 1, 1, 4, 3]))
max([2, 4, 1, 1, 4, 3])
The function min()
computes the smallest element in a bag, vector or array.
Examples:
min(array([2, 4, 1, 1, 4, 3]))
min([2, 4, 1, 1, 4, 3])
The function argmax()
computes the first largest element in a vector
or array.
Examples:
argmax(array([2, 4, 1, 1, 4, 3]))
argmax([2, 4, 1, 1, 4, 3])
The function argmin()
computes the first smallest element in a
vector or array.
Examples:
argmin(array([2, 4, 1, 1, 4, 3]))
argmin([2, 4, 1, 1, 4, 3])
Aggregate functions over bags can be used in SQL queries, e.g.
select sum(value) from Constants
See Aggregate functions in the reference documentation for more on aggregate functions.
The reduce function
The overloaded reduce function provides a closed formula mechanism for defining incremental aggregations over bags, vectors, arrays, and streams. The main principle is to apply a binary reductor function over successive elements and in a collection to accumulate values into an aggregated result: . For arrays, vectors and bags a single aggregated result is returned.
For example, the following query sums the elements of the vector
[1,2,3,4]
:
reduce([1,2,3,4],'+')
The function named +
is applied in between the elements of the
vector, so the expression is equivalent to:
1+2+3+4
This query sums the elements of a 2x2 matrix:
reduce(array([[1,2],[3,4]]),'+')
The following query sums the elements of a bag:
reduce(bag(1,2,3,4),'+')
The following query sums the odd natural numbers up to 20:
reduce((select i
from Integer i
where i in range(20)
and mod(i,2)=0),
'+')
You can define your own aggregate functions using reduce
. For
example, myprod(Bag of Integer b)
multiplies the elements in the bag
b
:
create function myprod(Bag of Integer b) -> Integer
as reduce(b, '*');
myprod(range(4))
Use myprod()
to multiply all even natural numbers up to 20.
Boolean and null values
The constant null
is used to represent unknown or missing
values. For example, the last element of the vector [1,2,null]
is
unknown.
A Boolean null
value is regarded as false. Thus, the query 2<1
returns null
, while 2>1
return true
. The only object of type
Boolean is thus true
. The constant false
is defined as null
casted to type Boolean
.
null
values in bags and vectors are regarded as missing values that
are ignored by aggregate functions.
Example:
sum([1,2,null])
Passing null
as argument in a function call will return null
without calling the function.
Example:
sin(sqrt(-1))
The expression exists x
returns true
if the expression x
is not
empty.
Example:
exists range(1,10000000000)
exists sqrt(-1)
Analogously, the expression not exists x
returns true
if the expression
x
is empty.
Example:
not exists sqrt(-1)
An OSQL query variable bound to the value null
is regarded as
unbound and therefore null
cannot be regraded as a regular
value. Instead, to test that an expression x
is null
, use the
not exists x
expression that returns true
if x
is null
.
Often it is practical to use a default value instead of null. For this
the expression can be used. It returns
the first not being null
.
Example:
coalesce(sqrt(-1),ln(-1),1)+1
To replace all null
s in a vector use
whennull(Vector v, Object dflt)->Vector
.
Example:
whennull([1,2,null,3],0)+2
The following query returns the set of prime numbers in the vector
[197,143,2]
using the function divisors(Integer n)->Bag of Integer
defined in section Defining
functions:
select n
from Integer n
where n in [197,143,2]
and not exists divisors(n)
The following query returns the set of prime numbers in the vector
[197,143,2]
:
select n
from Integer n
where n in [197,143,2]
and not exists divisors(n)
Distance functions
Distance computations are common in analytics and machine learning, for example to find clusters of data points being close to each other.
The most common distance function euclid(x,y)
computes the geometric
Euclidean distance
between two numerical vectors.
Examples:
euclid([0,0], [2,3])
euclid(array([0,0]), array([2,3]))
While the Euclidean distance computes the shortest distance between to
points in space the function manhattan(x,y)
computes the Manhattan
distance as the
distance in a rectangular grid of streets.
Examples:
manhattan([0,0], [2,3])
manhattan(array([0,0]), array([2,3]))
The general Minkowski
distance function
minkowski(v,w,r)
is also built-in. The following examples shows
different kinds of distances expressed in terms of minkowski(v,w,r)
.
Examples:
Manhattan distance ():
minkowski([1,2],[3,4], 1)
Euclidean distance ():
minkowski([1,2],[3,4], 2)
Chebyshev distance ():
minkowski([1,2],[3,4], +inf)
The notation +inf
represents positive infinity as a floating point
number, while -inf
is negative infinity.
The Fast Fourier Transform
The function rfft(a)
computes the real number Fast Fourier
Transform
(FFT) over numerical vector of real numbers.
Example:
rfft(array([1.0, 2, 3, 4]))
You can also apply rfft()
on vectors of numbers, even though this is
much slower as it will convert the vector to an array.
Example:
rfft([1.0, 2, 3, 4])
The inverse real FFT is computed with irfft(a)
.
Example:
irfft(rfft(array([1.0, 2, 3, 4])))
The later tutorial section Querying the microphone shows how user-defined functions over vectors and FFT can be used for filtering audio streams in real-time.
Undo changes
The system logs your changes. This means that SA Engine user can undo
all definitions in a session by the command rollback
.
Example:
First define a simple function:
create function myadd(Integer i)->Integer as i+1;
myadd(1)
If we issue the command rollback
the function myadd
will no longer
be defined:
rollback;
myadd(1)
This is very practical when building models, since buggy model
definitions can easily be undone by executing rollback
and then
re-enter corrected model code.
You can make a save point by executing the command commit
. The
next rollback will stop at the commit
.
Example:
create function myadd(Integer i)->Integer as i+1;
commit
rollback;
myadd(1);
When running SA Engine in the browser, all changes will be undone if you refresh the web browser. Also committed data will disappear. In Saving data permanently it is explained how to save the database in a file.
Refresh the browser and try to call myadd(1)
again.