Queries

Queries retrieve objects having specified properties from the local database, streams or other data sources. A query can be one of the following:

It can be an expression to be evaluated to some result.
It can be a set query to search the database for a set of objects having properties fulfilling a query condition specified as a logical predicate.
It can be an array query that constructs numerical arrays fullfilling a query condition.
It can be a stream query that constructs a possibly infinite stream of elements fullfilling a query condition.
It can be a vector query that constructs an ordered sequence (vector) of objects fulfilling a query condition.

Function calls and expressions

A simple form of queries are expressions that call functions, for example:

sqrt(2.1);
1+2;
1+2 < 3+4;
"a" + 1

The built-in functions plus(), minus(), times(), and div() have infix syntax +,-,*,/ with the usual priorities.

Example:

   (sin(3.14)+sin(6.28)) * 0.5

is equivalent to:

   times(plus(sin(3.14),sin(6.28)),0.5)

The + operator is defined for both numbers and strings. For strings it implements string concatenation, e.g.

Example:

   "Hello" + " world"

In a function call, the types of the actual parameters must be the same as, or subtypes of, the types of the corresponding formal parameters.

Set queries

A select expression provides a very flexible way to specify a query returning a set of objects. A select expression has the format:

select <result>
from   <type extents>
where  <condition>

For example, assume we have the following popiulated database:

create type Department;
create function name(Department d) -> Charstring 
  as stored;
create Department(name) instances
       :toys     ("Toys"),
       :tools    ("Tools"),
       :food     ("Food");

create type Employee;
create function name(Employee e) -> Charstring
  as stored;
create function salary(Employee e) -> Real
  as stored;
create function dept(Employee e) -> Department d
  as stored;
create Employee(name,      salary, dept) instances
               ("Maja",    100,    :toys),
               ("Bill",    200,    :food),
               ("Bull",    300,    :tools),
               ("Pelle",   400,    :toys),
               ("Mons",    500,    :food),
               ("Olle",    500,    :toys),
               ("Birgitta",600,    :tools),
               ("Murre",   700,    :food);

The following query returns the names and salaries of the employees in the Toys department having higher salary than 500:

select name(e), salary(e)
  from Employee e, Department d
 where salary(e) > 400
   and dept(e) = d
   and name(d) = 'Tools'

In general the semantics of an OSQL query is as follows:

Form the Cartesian product of the type extents, which is the Cartesian product of all employees and departments in the example.
Restrict the Cartesian product by the condition in the where clause.
For each possible variable binding to elements in the restricted cartesian product, evaluate the results expressions to form a result tuple.

It would be very inefficient to directly use the above semantics to execute a query. It is therefore necessary for the system to do extensive query optimization to transform the query into an efficient execution plan, which is a program in an internal language called SLOG (Streamed LOGic).

OSQL also permits formulation of queries accessing indefinite extents. Such queries are not executable at all without query optimization. For example, the previous query could also have been formulated as:

select nm, s
  from Employee e, Department d, Charstring nm, Real s
 where salary(e) > 400
   and dept(e) = d
   and name(d) = 'Tools'
   and nm = name(e)
   and s = salary(e)

In this case, the Cartesian product of all employees, departments, strings, and real numbers is infinite, so the query is not executable without query optimization.

It may be impossible to excute some queries over infinite type extents. For example, the following query is not executable:

select r from Real r where r < 100

The guide Type extent and the solution domain explains more about how query results are formed by the combining the extents of types.

The value of a set query may contain duplicates, i.e. it actually returns a bag rather than a set.

For example, the result of the following query will contain duplicates if names of persons in the database are not unique:

select mod(i,3)
  from Integer i
 where i in range(10)

Duplicates are removed from the query result when the keyword distinct is specified. By specifying distinct the result from a set query is a set rather than a bag.

For example, this query removes the duplicates:

select distinct mod(i,3)
  from Integer i
 where i in range(10)

Predicates

The where clause in a set query specifies a selection filter as a logical predicate over variables. A predicate is an expression returning a boolean value. Predicates can be expressed using logical value comparison operators (>, *, + etc.) and functions returning boolean results. The boolean operators and and or can be used to combine boolean values into composite predicates.

Examples:

nm = name(e)

salary(e) > 400 and dept(e) = d and name(d) = "Tools"

The boolean operator and has precedence over or.

For example:

   a<2 and a>3 or b<3 and b>2

is equivalent to

  (a<2 and a>3) or (b<3 and b>2)

The comparison operators (=, !=, >, <=, and >=) are treated as binary predicates. You can compare objects of any type.

Predicates are allowed in the result of a set query.

The query returns true for all employees in the Toys department earning more than 200.

Nested function calls

If a function is applied on the result of a function returning a bag of values, the outer function is applied on each element of that bag, the bag is flattened. This is called Daplex semantics.

For example, consider the query: Example:

create function employees(Department d) -> Bag of Employee
  as select e
       from Employee e
      where dept(e) = d;

select name(employees(d))
  from Department d
 where name(d) = "Toys"

The function employees(d) returns a bag of employees, on which the function name() is applied. The normal semantics in SA Engine is that when a function (e.g. name()) is applied on a bag valued function (e.g. employees(d)) it will be applied on each element of the returned bag. In the example a bag of the names of the employees working in the Toys department are returned.

The in operator

If a function returns a bag the elements of that bag can be accessed using the in operator.

Example:

select name(e)
  from Employee e, Department d
    where e in employees(d)
      and name(d) = "Tools"

Tuple expressions

To retrieve the results of tuple valued functions in queries, use a tuple expression.

For example, the following query calls the function divide(Integer i, Integer j)->(Integer d, Integer r) to compute the remained of dividing 5 with 3:

select rem
  from Integer d, Integer rem
 where (d,rem) = divide(5,3)

Tuple expressions can also be used to assign the result of functions returning tuples, for example:

set (:d,:r)=divide(5,3);
:d + :r;

Into clause

The optional into clause specifies variables to be bound to the result.

Example:

select e into :e
  from Employee e
 where name(e) = 'Olle';
 
name(dept(:e));

The first query above retrieves the object representing the employee named 'Olle' into the session variable :e. The second query returns the name of Olle's department.

Notice that if the result bag contains more than one object the into variable(s) will be bound only to the first object in the bag. In the example, if more that one employee would have been named Olle, only the first one found will be assigned to :e.

If you wish to assign the entire result from a set query to a variable, enclose it in parentheses. The result will be a bag. The elements of the bag can then be extracted with the infix in operator or the in() function.

Example:

set :r = (select e
            from Employee e
           where salary(e) < 150)

Inspect :r with one of these equivalent queries:

select name(e) from Employee e where e in :r;
   
name(in(:r));

Aggregate functions in queries

Aggregate functions such as sum(), avg(), stdev(), min(), max(), count() are handled specially, not using Daplex semantics. They are applied on entire collections of values, rather than being applied for each element using the Daplex semantics of regular nested function calls.

For example:

select count(employees(d))
  from Department d
 where name(d) = "Toys"

In this case count() is applied on the function employees(d) returning a bag of all employees in department d whose name is Toys. The system uses the general rule that an argument is converted into a subquery when the argument of the calling function (e.g. count) is declared Bag of.

Another example:

select sum(salary(employees(d)))
  from Department d
 where name(d) = "Tools"

Here, first Daplex semantics is used to form the bag of employees in the Tools department and then the aggregate function sum adds together the salaries of its employees.

Aggregate functions are often used in grouped selections.

Aggregate functions can be applied on subqueries specified as nested set queries returning bags.

For example, this query returns the average salary of all employees:

avg(select salary(e) from Employee e)

Local variables in queries may be declared as bags, which means that the variable is bound to a subquery that can be used as subquery arguments to aggregate functions.

For example, this query gets the mean and standard deviation of all employees:

select  avg(b), stdev(b)
  from Bag of Integer b
  where b = (select salary(e)
               from Employee e)

Variables may be assigned to bags by assigning values of functions returning bags.

Example:

   set :e = (select employees(d) from Department d where name(d)='Toys');

   count(:e)

Bags are not explicitly stored in memory, but are generated when needed, for example when they are used in aggregate functions.

Example:

set :bigbag = range(1,10000000)

The statement above assigns :bigbag to a bag of 10^7^ numbers. The bag is not explicitly created through. Instead its elements are generated when needed, for example when passed to the aggregate function count():

count(:bigbag)

Ordered selections

The order by clause specifies that the result should be sorted by the specified sort key. The sort order is descending when desc is specified and ascending otherwise. A set query with an order-by-clause is called an ordered selection.

For example, the following query sorts the result descending based on the sort key salary(e):

select name(e), salary(e)
  from Employee e
  order by salary(e) desc

The sort key does not need to be part of the result. For example, the following query list the names of employees in descending order without showing their salaries:

select name(e)
  from Employee e
  order by salary(e) desc

Grouped selections

When analyzing data it is often necessary to group data, for example to get the sum of the salaries of employees per department. Such re-groupings are specified though the optional group by clause. It specifies on which expression in the select clause the data should be grouped and summarized. This is called a grouped selection.

Example:

select name(d), sum(salary(employees(d)))
  from Department d
 group by name(d)

A grouped selection is a set query with a group by clause present. The execution semantics of a grouped selection is different than for regular queries.

Example:

select name(d), sum(salary(e))
  from Department d, Employee e
 where dept(e)=d
 group by name(d)

Here the group by clause specifies that the result is grouped on the names of the departments in the database. After the grouping, for each department d the salaries of the employees e working at that department are summed using the aggregate function sum().

An element of a select clause of a grouped selection must be one of:

An element in the group key specified by the group by clause, which is name(d) in the example. The result is grouped on each group key. In the example the grouping is made over each department name so the group key is specified as name(d).
A call to an aggregate function, which is sum() in the example. The aggregate function is applied for the set of variable bindings specified by the group key. In the example the aggregate function sum() is applied on the set of values of salary(p) for the employees e working in department d, i.e. where dept(e)=d.

Contrast the above query to the regular (non-grouped) query:

select name(d), sum(salary(e))
  from Department d, Employee e
  where dept(e)=d

Without the grouping the aggregate function sum() is applied on the salary of each emplouyee 3, rather than the bag of salaries corresponding to the group key name(e) in the grouped selection.

The group key need not be part of the result. For example the following query returns the sum of the salaries for all departments without revealing the department names:

select sum(salary(e))
  from Department d, Employee e
  where dept(e)=d
  group by name(d)

Top-k queries

The optional limit clause limits the number of returned values from the set query. It is often used together with ordered selections to specify top-k queries returning the first few tuples in a set of objects based on some ranking.

For example, the following query returns the names and salaries of the 10 best payed employees:

select name(e), salary(e)
  from Employee e
  order by salary(e) desc
  limit 10

The limit can be any numerical expression.

For example, the following query retrieves the :k+3 lowest income earners, where :k is a variable bound to a numeric value:

set :k=2;

select name(e), salary(e)
  from Employee e
  order by salary(e) desc
  limit :k+3

Quantifier functions

The operator exists sq implements logical exist over a subquery sq.

Example

select name(d)
  from Department d
 where exists employees(d)

The expression not exists(sq) tests if a subquery sq returns empty result, i.e. negation.

Example:

select name(d)
  from Department d
 where not exists(select e 
                    from Employee e
                   where salary(e)<200
                     and dept(e)=d)

Functions

Set and bag functions

K-nearest functions.

Queries

Function calls and expressions​

Set queries​

Predicates​

Nested function calls​

The in operator​

Tuple expressions​

Into clause​

Aggregate functions in queries​

Ordered selections​

Grouped selections​

Top-k queries​

Quantifier functions​

Functions​

Function calls and expressions

Set queries

Predicates

Nested function calls

The in operator

Tuple expressions

Into clause

Aggregate functions in queries

Ordered selections

Grouped selections

Top-k queries

Quantifier functions

Functions