Queries

Queries retrieve objects in the local database, streams or external data sources. A query can be one of the following:

It can be an expression evaluated to some result.
It can be a standard SQL query to search the main-memory database inside SA Engine based of some SQL query condition.
It can be a set query to search the local database for a set of objects having properties fulfilling a query condition specified as a logical predicate.
It can be an select array query that constructs numerical arrays fullfilling a query condition.
It can be a select vector query that constructs an ordered sequence (vector) of objects fulfilling a query condition.
It can be a select stream query that constructs a possibly infinite stream of elements fullfilling a query condition.

Set queries

A set query provides a very flexible way to specify a query returning a set of objects using the general format:

select <result>
  from <variables>
 where <condition>

table queries is a particular kind of set queries that provide a very flexible syntax for searching data stored in tables.

Example: Let's define a new table Department.

drop table Department;        -- Remove old table if present
create table Department(      -- Create new table
       id Integer,
       name Charstring,
       unique(id));
insert into Department values -- Populate new table
(1, 'Toys'),
(2, 'Tools'),
(3, 'Food');

The following SQL query selects column id from table Department:

select id
  from Department d
 where d.name = 'Tools'

In SQL all data searched by select queries must be stored in tables referenced in the from clause. Therefore variables declared in the from clause always reference rows. For example the variable d in the query above range over rows in the table named Department.

OSQL provides a smooth extension to standard SQL by allowing variables in the from clause to be bound to any kind of object, including rows in tables, numbers, strings, arrays, etc.

Example: The following set query returns the numbers less than 10 having the product 6.

select m, n
  from Integer m, Integer n
 where n in range(10)
   and m * n = 6

The value of a set query may contain duplicates, i.e. it actually returns a bag rather than a set.

Example: The result of the result from following query contains duplicates.

select mod(i,3)
  from Integer i
 where i in range(10)

Duplicates are removed from the query result when the keyword distinct is specified so the result becomes a set rather than a bag.

Example: this query removes duplicates:

select distinct mod(i,3)
  from Integer i
 where i in range(10)

Into clause

The optional into clause specifies variables to be bound to the result.

Example:

select i into :i
  from Integer i
 where i in range(10)
   and mod(i,3)=0

:i

The query above retrieves all natural numbers up to 10 divisible with 3, i.e. values 3,6,9. When the result bag contains more than one object, as in this case, the into variable will be bound only to the first object in the bag.

If you wish to assign the entire result from a set query to a variable, enclose it in parentheses.

Example:

set :b = (select i
            from Integer i
           where i in range(10)
             and mod(i,3)=0)

:b

Predicates

The where clause in a set query specifies a select condition as a logical predicate over variables. A predicate is an expression returning a Boolean value.

Example: This query contains the predicate i=sqrt(4).

select i
  from Integer i
 where i = sqrt(4)

Predicates can be expressed as:

A logical comparison operator: = (equal), <> (not equal), >= (larger than), > (strict larger than) , <= (less than), and < (strict less than).
A call to a Boolean function having the result type Boolean.
A combination of predicates using the Boolean operators and, or, not, and exists to combine predicates into composite predicates.

Example: The following query has a composite predicate.

select i
  from Integer i
 where i + 1 > 8
   and (i in range(10) or i in range(12))
   and mod(i,6) = 0

Predicates my be used in the results of queries too.

Example:

select i<3
  from Integer i
 where i in range(5)

Negation

The predicat not p is true when the predicate p is false. It implements logical negation over a predicat p, $\neg sq$ . The predicate p is a negated predicate.

Example:

select i
  from Integer i
 where i in range(10)
   and not i > 5

The query optimizer removes negated predicates by replacing them with their complements when possible by using De Morgan's laws.

The complements of =, <>, >, >=, <, and <= are <>, =, <=, <, >=, and >, respectively.

Example: The query above can also be written as follows.

select i
  from Integer i
 where i in range(10)
   and i <= 5

The predicate p can be any simple of composite predicate.

Example:

select i
  from Integer i
 where i in range(10)
   and not (i < 9 and i > 3)

Using De Morgan's laws the query can also be expressed as:

select i
  from Integer i
 where i in range(10)
   and (i >= 9 or i <= 3)

Quantifiers

The operator exists sq returns true when the subquery sq returns a non-empty value. It implements existential quantification over a subquery sq, $\exists x | x \in sq$ .

Example: This query returns the integers between 1 and 5 that are also in the bag mod(range(5),3) being values 0,1,2.

select i
  from Integer i
 where i in range(5)
   and exists i in mod(range(5),3)

If we remove the term exists here, the query will return duplicates:

select i
  from Integer i
 where i in range(5)
   and i in mod(range(5),3)

The expression not exists(sq) tests if a subquery sq returns empty result, i.e. a negated subquery (logical $\neg\exists x|x \in sq$ ).

Example:

select i
  from Integer i
 where i in range(5)
   and not exists i in mod(range(5),3)

Negated subqueries can specify intersections of bags, in the example between the bags range(5) and mod(range(5),3).

Aggregate functions in queries

Aggregate functions such as sum(), avg(), stdev(), min(), max(), count() are handled specially, not using Daplex semantics. They are applied on entire collections of values, rather than being applied for each element using the Daplex semantics of regular nested function calls.

Example:

count(select i
        from Integer i
       where i in range(5)
         and mod(i,2)=0)

In this case count() is applied on a bag of numbers. The system uses the rule that an argument not flattened when the argument of the calling function (e.g. count) is declared as a Bag.

Example:

signature('count')

Aggregate functions are often used in grouped selections.

Aggregate functions are called with bags as arguments.

Example: Let's define a bag :b of the integers between one and ten.

set :b = (select i
            from Integer i
           where i in range(10))

This computes the mean of the values in :b:

avg(:b)

Variables in queries can be declared as bags bound to subqueries.

Example: this query gets the mean and standard deviation of the integers between 1 and 10:

select  avg(b), stdev(b)
  from Bag of Integer b
 where b = (select i
              from Integer i
             where i in range(10))

Bags are not explicitly stored in memory, but are generated when needed, for example when they are used in aggregate functions.

Example:

set :bigbag = range(1,1000000)

The statement above assigns :bigbag to a bag of a million integers without explicitly creating it. Instead its elements are generated when needed, for example inside the aggregate function sum():

sum(:bigbag)

Ordered selections

The order by clause specifies that the result should be sorted by the specified sort key. The sort order is descending when desc is specified and by default ascending, asc. A set query with an order-by-clause is called an ordered selection.

Example: The following query sorts the integers i between 1 and 10 paired with sin(i) based of the sort key sin(i):

select i, sin(i)
  from Integer i
 where i in range(10)
  order by sin(i) desc

The sort key does not need to be part of the result.

Example:

select i
  from Integer i
 where i in range(10)
 order by sin(i) desc

Grouped selections

When analyzing data it is often necessary to group data, for example to get the sum of the salaries of employees per department. Such re-groupings are specified though the optional group by clause. It specifies on which expression in the select clause the data should be grouped and summarized. This is called a grouped selection.

Example: Consider the following query without a group by clause:

select i
  from Integer i
 where i in mod(range(10),3)

To pair the different i values in the result with the sum of the corresponding i values, we can make this modified query with a group by clause:

select i, sum(i)
  from Integer i
 where i in mod(range(10),3)
 group by i

A grouped selection is a set query with a group by clause present. The execution semantics of a grouped selection is different than for regular queries. In the example the aggregate function sum() is not applied directly on the value of i but rather on the bag of values specified by the group by clause.

An element of the select clause of a grouped selection can be one of:

An element in the group key specified by the group by clause, which is i in the example. The result is grouped by each value of the group key. In the example the grouping is made over each integer in the bag mod(range(10),3).
A call to an aggregate function, which is sum(i) in the example. The aggregate function is called for the bag of variable bindings specified by the group key, in the example the bag of i such that i in mod(range(10),3).

The group key need not be part of the result.

Example:

select sum(i)
  from Integer i
 where i in mod(range(10),3)
 group by i

Top-k queries

The optional limit clause limits the number of returned values from the set query. It is often used together with ordered selections to specify top-k queries returning the first few tuples in a set of objects based on some ranking.

Example: The following query finds the two largest values of sin(i) for each integer ì between 1 and 10.

select i, sin(i) 
  from Integer i
 where i in range(10)
 order by sin(i) desc
 limit 2

The limit can be any numerical expression.

Example:

set :k=1

select i, sin(i) 
  from Integer i
 where i in range(10)
 order by sin(i) desc
 limit :k+2

Queries

Set queries​

Into clause​

Predicates​

Negation​

Quantifiers​

Aggregate functions in queries​

Ordered selections​

Grouped selections​

Top-k queries​

Set queries

Into clause

Predicates

Negation

Quantifiers

Aggregate functions in queries

Ordered selections

Grouped selections

Top-k queries