Introduction

OSQL extends SQL seamlessly with functions that compute results for given arguments. For example, there is a large library of mathematical functions built into the system and new functions can easily be defined. Each SA Engine system has an internal local database where the functions' definitions are stored.

The simplest kind of query is to call some built-in function.

Example:

sin(3.14/2)*cos(3.14)

Exercise

Run the function in your browser by clicking the play button play_circle

Documentation of functions and other SA Engine artifacts are stored in the database as well. You can query the database about the documentation of a function fn by calling doc('fn'). It returns the documentation of a function as a string.

Example:

doc('sin')

Exercise

You can edit the examples in the tutorial. Try changing sin to tan.

You can enter several OSQL statements by separating them with a ;.

Example:

doc('atan2');
doc('cos')

If you don't know exactly the name of a function you can also pass a string pattern to doc() to return all functions matching the string pattern in their names. In a string pattern a % matches any sequence of characters while a _ matches a single character.

Example:

doc('_tan%')

Note

Functions may return sets of values. In the example the function doc returned two different documentation strings.

The function sourcecode('fn') where fn can be a function name or a string pattern returns the definitions of the matching functions.

Example:

sourcecode('tan')

The first line of a source code string contains the signature of the function. It is a string denoting the function's arguments and results. You can get just the signature of function or string pattern fn by calling the system function signature(fn).

Example:

signature('atan%')

Functions may have different definitions depending on the types of their arguments. Such functions are said to be overloaded on their types. Overloaded function have several signatures.

Example:

signature('range')

Some functions have corresponding infix operators. For example, the function plus has the infix operator + so the queries 1+2 and plus(1,2) both return the value 3.

You can call signature(op) to get the signature of infix operators too.

Example:

signature('^')

The following function call returns the version of SA Engine you are running:

system_version()

Visit Documentation functions for a list of functions used for documenting the system.

Basic math functions are listed in Math.

Session variables

During interactive sessions with SA Engine it is often practical to temporarily assign values of expressions to session variables prefixed with : by using the set statement, e.g.:

set :pi = 3.14

Usage:

sin(:pi/2)*cos(:pi)

The session variable :pi is first assigned the value of 3.14 and then used twice in a computation.

Note

Session variables are not stored in the local database and disappear when you refresh the browser or leave SA Studio sessions. Use database tables to save data permanently in the local database, as explained next.

Database tables

Numbers and other objects used in analysis models can be saved in database tables. Database tables are maintained and queried by standard SQL statements. OSQL is a seamless extension of SQL with functions, mathematical objects, and data streams.

Tables are created by the create table statement.

Example:

create table Constants(name   Charstring,
                       value  Real,
                       unique (name))

The above statement creates a table called Constants having two columns, where values in column name are strings and column value stores floating point numbers. The keyword unique specifies that the values of column name are different in all table rows.

Note

The create table syntax of SQL is more elaborate. For example, strings need not have maximum lengths in OSQL.

Tables are populated using SQL's insert into statement.

Example:

insert into Constants values('pi',     3.14),
                            ('e',      2.71),
                            ('freeze', 0.0),
                            ('boil',   100.0)

Queries

Queries filter out and transform objects fulfilling some specific properties from collections of objects.

A populated table can be queried by SQL's select statement.

Examples:

select * from Constants

select value from Constants where name = 'pi'

select sin(value/2)*cos(value) from Constants where name = 'pi'

Tables can be updated using SQL's update statement.

Examples:

update Constants set value=asin(1)*2 where name = 'pi'

update Constants set value=exp(1) where name = 'e'

select * from Constants

The following query returns the names and values of the constants in the table Constants whose closest integer is 3:

select name, value
  from Constants
 where round(value)=3

Note

SQL's select statement is a special case of OSQL queries. A restriction with SQL's select statement is that variables in queries must be bound only to rows in tables. In OSQL queries, variables can be bound to any kind of objects, e.g. strings, numbers, vectors, matrices, tensors, bags, streams, records or user defined objects, . Combined with functions, this makes analytical models in various domains very easy to express with OSQL.

Visit SQL Tables and Queries for an introduction to SQL for SA Engine. You can skip that if you are already familiar with SQL.

Note

The local database disappear when you refresh the browser or leave SA Studio. Section Saving data permanently explains how to save data permanently.

Set queries

In OSQL you can make set queries that select or return not only rows in tables, but sets of any kind of objects, by specifying their types in the from clause of the select statement.

Example: This query returns the set of odd elements among the integers between 10 and 15:

select i
  from Integer i
 where i in range(10)
   and mod(i,2) = 1

Here, the function range(10) returns the set of integers between 1 and 10, the operator i in range(10) finds all elements i in the set, and the function mod(i,2) computes the remainder of dividing i with 2.

Example: The following query computes the set of divisors for the integer 143:

select d
  from Integer d, Integer n
 where d in range(2,ceil(n/2))
   and 0 = mod(n,d)
   and n = 143

In general sets in queries may contain duplicates. Therefore, the term bag is used instead of set to indicate that duplicates are allowed.

Example: This query returns a bag rather than a set:

select floor(i/2)
  from Integer i
 where i in range(10)

A function applied on a set in a query is called for each element of the set.

Example:

select sin(range(10))

Queries having no from or where clauses are called expressions and can be specified without the keyword select.

Example: the previous query can also be expressed as:

sin(range(10))

tip

Try visualizing the result as a line plot by inserting the row //plot: Line plot before the query!

Exercise

Apply sin(x) on the result of floor and visualize by line plot!

The order of the objects returned by set queries is not guaranteed, unless you specify it using SQL's order by.

Example:

select x
  from Real x
 where x in sin(range(10))
 order by x

Queries may return sets of tuples where each tuple group two or more comma separated values.

Example:

select i, i+1
  from Integer i
 where i in range(5)

Values queries

Values queries construct explicit bags using the notation values v1, v2,....

Example:

values 1,3,2,4

A bag with a single value is logically the same as the single value itself.

Example:

values 123

Tuples can be specified in values queries by using the notation $(e_1,e_2,...e_n)$ .

Example: The following query constructs a bag of tuples representing a constant table having two columns:

values (1,'a'),(2,'b'),(3,'c')

A tuple with a single value is logically the same as the single value itself.

Example:

values ('a')

values ('a'),('b'),('c')

Bags in expressions can be expressed by enclosing a values query in parentheses.

Example:

set :mybag = (values 1,2,2,3)

:mybag

Case expressions

Case expressions compute values having different definitions depending on conditions.

Example:

select case when x <= -40 then "too cold"
            when x < 0    then "freezing"
            when x < 18   then "cold"
            when x < 23   then "plesant"
            when x < 40   then "hot"
            else "too hot"
        end
  from Real x
 where x in 10*range(-4,4)

This illustrates how case expressions can be used for specifying decision rules based on values of a variable x.

Tests in case expressions can be functions returning any kind of value, not only logical values. A test succeeds unless its value is null or false.

Example:

case when sqrt(-1)    then 1
     when 2 < sqrt(2) then 2
     else 0 end

Note

The case expression computes the values without executing any actions (side effects). This is different from conditional statements such as if {condition} do-something; else do-something-else; in conventional programming languages, where different side effects (actions) are executed based on a condition.

AND, OR and NOT

In the where clause of a select statement conditions connected with the and operator restrict the query result. Analogously, or clauses extends the query result.

Example:

The following query returns the set of values in the bag (values 1,2,3,4,5) being less than 3 or larger than 4:

select i
  from Integer i
 where i in (values 1,2,3,4,5)
   and (i<3 or i>4)

Notice that or has higher priority than and in queries, so parentheses may be needed if both and and or are used in the same query as in the example.

To negate disjunctive logical expressions we can use the expression not l that returns true when l is false:

Example:

select i
  from Integer i
 where i in (values 1,2,3,4,5)
   and not i<=sqrt(10)

Queries over sets and bags are documented in Queries.

Defining functions

Analysis models are defined in terms of OSQL functions. You can define your own OSQL functions by the create function statement.

Example: The following function converts Celsius degrees to Fahrenheit:

create function ctof(Real c) -> Real
  /* Convert a Celsius degree `c` to Fahrenheit */
  as c * 9 / 5 + 32

OSQL functions are stored in the local database inside SA Engine.

Let's test ctof:

ctof(10)

You can use your OSQL functions in queries, e.g.

select ctof(value) from Constants where name = 'boil'

You can also create derived functions, which are functions defined by a query.

Example:

create function constant(Charstring c) -> Real
  as select value from Constants where name = c

ctof(constant('boil'))

Note

The seamless OSQL extension of SQL allows variables to be bound both to rows in tables and other kinds of objects. In the example the variables value and name refer to column values in rows from the table Constants, while the variable c is the function's argument.

This derived function computes the divisors of integer n:

create function divisors(Integer n) -> Bag of Integer
  as select d
       from Integer d
      where d in range(2,ceil(n/2))
        and 0 = mod(n,d)

divisors(143)

Case expressions can be used in function definitions.

Example: The signum of a number can be defined as:

create function signum(Real x) -> Integer
  as case when x > 0 then 1
          when x = 0 then 0
          else -1 end

signum(-3.1)

The built-in function sign(x) does the same.

Another example of a function defined in terms of case expressions is the built in function atan2(y,x):

sourcecode("atan2")

atan2(1,2)

Temporal functions

The built-in function local_time() returns the current local wall time as an ISO time stamp string.

Example:

local_time()

The current UTC time stamp string is returned by calling utc_time().

While local_time() and utc_time() return the current wall time as time stamp strings, the function now() returns the wall time as a time point, which is an object of type Timeval that internally represents the number of microseconds since new year 1970.

Example:

now()

For more about functions over time visit Time in the reference documentation.

Sequences

The data type Vector represents finite sequences of objects with the usual [...] notation.

Examples of vectors:

A vector of four integers:

[1,2,3,4]

A vector of two vectors:

[[1.1,2.0],[3,4]]

A vector of a string and a number:

['Tore',55]

You can access vector element $v_i$ by the notation v[i].

Example:

set :v = [2,4,6]

:v[2]

The e in v operator can be used to get all elements e in vector v.

Example: The following query returns the bag of even numbers in the vector [1,2,3,2,4]:

select e
  from Integer e
 where e in [1,2,3,2,4]
   and mod(e,2) = 0

Select vector queries

A select vector query works exactly like a regular set query with the difference that it returns a vector of values rather than a bag of values.

Example:

select Vector of sin(i)
  from Integer i
 where i in range(3)
 order by i

The difference between a vector and bag is that the order of the elements in a vector is important while it is undefined in bags. Therefore the order by clause must be specified in select vector queries when a specific order of the elements is required. If no order by clause is present the order of the elements in the vector is arbitrarily chosen by the system being the order that is the most efficient to produce.

Example: This query returns a vector of sine values ordered by the values rather than by the arguments of sines as in the previous select vector query:

select Vector of sin(i)
  from Integer i
 where i in range(3)
 order by sin(i)

Example: The following select vector query removes 2 from the vector [1,2,3,2,4]:

select Vector of a[i]
  from Vector a, Integer i
 where a = [1,2,4,2,3]
   and a[i] != 2
 order by i

Exercise

Modify the above select vector query to return the remaining values of a in reverse value order, i.e. the vector [4,3,1].

The following select vector query calls the function divisors(Integer n)->Bag of Integer defined in section Defining functions to construct a vector of ordered prime numbers among 197, 143, and 2:

select Vector of n
  from Integer n
 where n in (values 197,143,2)
   and not exists divisors(n)
 order by n

Visit select vector queries in the reference documentation for more information about queries returning vectors.

Vector operators and functions

As for numbers there is a built-in library of vector operators and functions. The mathematical operators +,-,*,/,^ can be used for whole vectors or mixes of vectors and scalar values. A dot (.) in front of an operator indicates element-wise application of it on each element of the vectors.

Examples:

The number of elements in a vector;

dim([1, 3, 2])

Vector addition:

[1,4,3,2]+[1,5,6,7]

Exercise

Change visualization of the vectors to Bar plot by adding //plot: Bar plot before the query.

Element wise vector multiplication:

[1, 2, 3, 4] .* [5, 6, 7, 8]

Combining vectors and scalars.

[1,4,3,2]-[0,2,3,4]+1

The sum of vector elements:

sum([1, 2, 3, 4])

The average of vector elements:

avg([1, 2, 3, 4])

The standard deviation of elements:

stdev([1, 2, 3, 4])

The median of the elements:

median([1, 4, 2, 3])

Go to Vectors in the reference documentation for documentation of vectors.

Arrays

An array is a multi-dimensional cube of numbers. It is represented by type Array. Arrays can be created with the function array(Vector data)->Array, where the (nested) vector data contains its numerical elements.

Example:

array([[1,2,3],[4,5,6]])

The number of dimensions of an array is called its rank and returned by the function rank(Array a)->Integer. Arrays with rank one (1D arrays) are called numerical vectors, arrays with rank two are called matrices, and arrays with higher dimension than two are called tensors.

Example: set variable :v to a numerical vector:

set :v = array([1,2,3])

Rank of :v:

rank(:v)

Note

Vectors can also represent arrays with rank one. However, using numerical vectors is much more efficient than using objects of type Vector.

Examples:

Set variable :a to a 2x3 matrix:

set :a = array([[1,2,3],[4,5,6]])

Rank of :a:

rank(:a)

Value of $a_{2,3}$ :

:a[2,3]

You can select slices of arrays by specifying the wildcard character * in array index positions.

Example: This expression selects the third column of :a:

:a[*,3]

Array format

The elements of an array have a format that specifies their binary representations in memory. For example, elements of type Integer are represented as 64-bits integers having the format I64, and type Real has the format F64. The first argument of the function array(Charstring fmt,Vector data)->Array specifies the elements' format.

Example: This call creates a numerical vector of 16-bit integers:

array('I16',[1,2,3,4])

When called with one argument, array(Vector data)->Array the element format will be the most general type in data, where real numbers are considered more general than integers,

Examples:

array([1,2,3,4])

array([1.1,2,3,4])

Array shape

An array also has a shape being a vector of the number of elements in each of its dimensions. The shape is computed by the function shape(Array a)->Vector of Integer.

Examples:

shape(array([1,2,3]))

shape(array([[1,2,3],[4,5,6]]))

Array operators and functions

As for vectors there is a built in library of array functions and operators.

Examples:

Numerical vector addition:

array([1, 2, 3, 4]) + array([5, 6, 7, 8])

Add column one and three of :a:

:a[*,1] + :a[*,3]

The total sum of array elements:

sum(array([1, 2, 3, 4]))

sum(array([[1, 2], [3, 4]]))

Array addition:

array([[1,2],[3,4]]) + array([[2,3],[6,7]])

Element-wise multiplication:

array([[1,2],[3,4]]) .* array([[2,3],[6,7]])

The operation a * b computes matrix multiplication of matrices (i.e. 2D arrays) a and b.

Example:

array([[1,2,3]]) * array([[1,2],
                          [3,4],
                          [5,6]])

A numerical vector is converted to a row matrix when multiplied with another matrix.

Example:

array([1,2,3]) * array([[1,2],
                        [3,4],
                        [5,6]])

You can multiply constants with arrays of any shape (broadcasting).

Example:

2 * array([[1,2,3],
           [4,5,6],
           [7,8,9]])

Numerical vectors are transposed into row or column matrices when multiplied with matrices for conforming shapes.

Examples:

array([1,2]) * array([[3,4],[5,6]])

array([[3,4],[5,6]]) * array([1,2])

The function dot(Array u, Array v)->Number computes the scalar product of the elements in arrays u and v of any shape.

Examples:

dot(array([1,2,3]),array([4,5,6]))

dot(array([[1,2],[3,4]]),array([[4,5],[6,7]]))

Select array queries

Analogous to set queries and select vector queries, a select array query returns a single array as result.

Example: The following query returns a 4x5 matrix $A$ containing integers elements e with format U8 where $A_{i,j}=i+j$ :

select Array[i..4,j..5] of U8 e where e = i + j  

Visit Arrays and tensors for more detailed descriptions of arrays.

Aggregate functions

Aggregate functions compute a single value from the elements in bags, vectors or arrays.

The aggregate function sum() adds the elements in a bag, vector or array:

Examples:

sum(range(200)/2)

sum(array([1, 2, 3, 4]))

sum(array([[1, 2], [3, 4]]))

sum([1, 2, 3, 4])

The function avg() computes the mean value of the elements in a bag, vector or array.

Examples:

avg(array([1, 2, 3, 4]))

avg(array([[1, 2], [3, 4]]))

avg([1, 2, 3, 4])

The function stdev() computes the standard deviation of the elements in a bag, vector or array.

Examples:

stdev([1, 2, 3, 4])

The function max() computes the largest element in a bag, vector or array.

Examples:

max(array([2, 4, 1, 1, 4, 3]))

max([2, 4, 1, 1, 4, 3])

The function min() computes the smallest element in a bag, vector or array.

Examples:

min(array([2, 4, 1, 1, 4, 3]))

min([2, 4, 1, 1, 4, 3])

The function argmax() computes the first largest element in a vector or array.

Examples:

argmax(array([2, 4, 1, 1, 4, 3]))

argmax([2, 4, 1, 1, 4, 3])

The function argmin() computes the first smallest element in a vector or array.

Examples:

argmin(array([2, 4, 1, 1, 4, 3]))

argmin([2, 4, 1, 1, 4, 3])

Aggregate functions over bags can be used in SQL queries, e.g.

select sum(value) from Constants

See Aggregate functions in the reference documentation for more on aggregate functions.

The reduce function

The reduce function provides a closed formula mechanism for defining incremental aggregations over bags, vectors, arrays, and streams. In simple reduce the main principle is that a binary reducer function $r(a,e_i)$ is called for each element $e_i$ in a collection $C$ to compute a new value of an accumulator $a$ to be returned by $reduce(C,r)$ . For arrays, vectors and bags a single aggregated result is returned. For streams reduce returns an aggregated result for each element in the stream.

For example, the following query sums the elements of the values 1,2,3,4:

reduce((values 1,2,3,4),'+')

You can regard this as if the function named + is applied in between the elements of the vector, so the expression is equivalent to:

1+2+3+4

This query sums the elements of a 2x2 matrix:

reduce(array([[1,2],[3,4]]),'+')

The following query returns the largest element in a vector:

reduce([1,2,3,4],'max')

The following query sums the odd natural numbers up to 20:

reduce((select i
          from Integer i
         where i in range(20)
           and mod(i,2)=0),
       '+')

The reduce function is further explained in Incremental aggregation.

Boolean and null values

The constant null is used to represent unknown or missing values. For example, the last element of the vector [1,2,null] is unknown.

A Boolean null value is regarded as false. Thus, the query 2<1 returns null, while 2>1 return true. The only object of type Boolean is thus true. The constant false is defined as null casted to type Boolean.

null values in bags and vectors are regarded as missing values that are ignored by aggregate functions.

Example:

sum([1,2,null])

Passing null as argument in a function call will return null without calling the function.

Example:

sin(sqrt(-1))

The expression exists x returns true if the expression x is not empty.

Example:

exists range(1,10000000000)

exists sqrt(-1)

Analogously, the expression not exists x returns true if the expression x is empty.

Example:

not exists sqrt(-1)

An OSQL query variable bound to the value null is regarded as unbound and therefore null cannot be regraded as a regular value. Instead, to test that an expression x is null, use the not exists x expression that returns true if x is null.

Often it is practical to use a default value instead of null. For this the $coalesce(x_1,...,x_n)$ expression can be used. It returns the first $x_i$ not being null.

Example:

coalesce(sqrt(-1),ln(-1),1)+1

To replace all nulls in a vector use whennull(Vector v, Object dflt)->Vector.

Example:

whennull([1,2,null,3],0)+2

The following query returns the set of prime numbers among 197,143,2 using the function divisors(Integer n)->Bag of Integer defined in section Defining functions:

select n
  from Integer n
 where n in (values 197,143,2)
   and not exists divisors(n)

Distance functions

Distance computations are common in analytics and machine learning, for example to find clusters of data points being close to each other.

The most common distance function euclid(x,y) computes the geometric Euclidean distance between two numerical vectors.

Examples:

euclid([0,0], [2,3])

euclid(array([0,0]), array([2,3]))

While the Euclidean distance computes the shortest distance between to points in space the function manhattan(x,y) computes the Manhattan distance as the distance in a rectangular grid of streets.

Examples:

manhattan([0,0], [2,3])

manhattan(array([0,0]), array([2,3]))

The general Minkowski distance function minkowski(v,w,r) is also built-in. The following examples shows different kinds of distances expressed in terms of minkowski(v,w,r).

Examples:

Manhattan distance ( $L_1$ ):

minkowski([1,2],[3,4], 1)

Euclidean distance ( $L_2$ ):

minkowski([1,2],[3,4], 2)

Chebyshev distance ( $L_{infty}$ ):

minkowski([1,2],[3,4], +inf)

Note

The notation +inf represents positive infinity as a floating point number, while -inf is negative infinity.

The Fast Fourier Transform

The function rfft(a) computes the real number Fast Fourier Transform (FFT) over numerical vector of real numbers.

Example:

rfft(array([1.0, 2, 3, 4]))

The inverse real FFT is computed with irfft(a).

Example:

irfft(rfft(array([1.0, 2, 3, 4])))

You can also apply rfft() on vectors of numbers, even though this is much slower as it will convert the vector to an array.

Example:

rfft([1.0, 2, 3, 4])

The later tutorial section Querying the microphone shows how user-defined functions over vectors and FFT can be used for filtering audio streams in real-time.

Undo changes

The system logs your changes. This means that SA Engine user can undo all definitions in a session by the command rollback.

Example: First define a simple function:

create function myadd(Integer i)->Integer as i+1

If we issue the command rollback the function myadd will no longer be defined:

rollback;

myadd(1)

This is very practical when building models, since buggy model definitions can easily be undone by executing rollback and then re-enter corrected model code.

You can make a save point by executing the command commit. The next rollback will stop at the commit.

Example:

create function myadd(Integer i)->Integer as i+1

commit

rollback

myadd(1)

When running SA Engine in the browser, all changes will be undone if you refresh the web browser. Also committed data will disappear. In Saving data permanently it is explained how to save the database in a file.

Exercise

Refresh the browser and try to call myadd(1) again.

Introduction

Session variables​

Database tables​

Queries​

Set queries​

Values queries​

Case expressions​

AND, OR and NOT​

Defining functions​

Temporal functions​

Sequences​

Select vector queries​

Vector operators and functions​

Arrays​

Array format​

Array shape​

Array operators and functions​

Select array queries​

Aggregate functions​

The reduce function​

Boolean and null values​

Distance functions​

The Fast Fourier Transform​

Undo changes​

Session variables

Database tables

Queries

Set queries

Values queries

Case expressions

AND, OR and NOT

Defining functions

Temporal functions

Sequences

Select vector queries

Vector operators and functions

Arrays

Array format

Array shape

Array operators and functions

Select array queries

Aggregate functions

The reduce function

Boolean and null values

Distance functions

The Fast Fourier Transform

Undo changes