Queries
Queries retrieve objects in the local database, streams or external data sources. A query can be one of the following:
It can be an expression evaluated to some result.
It can be a standard SQL query to search the main-memory database inside SA Engine based of some SQL query condition.
It can be a set query to search the local database for a set of objects having properties fulfilling a query condition specified as a logical predicate.
It can be an select array query that constructs numerical arrays fullfilling a query condition.
It can be a select vector query that constructs an ordered sequence (vector) of objects fulfilling a query condition.
It can be a select stream query that constructs a possibly infinite stream of elements fullfilling a query condition.
Set queries
A set query provides a very flexible way to specify a query returning a set of objects using the general format:
select <result>
from <variables>
where <condition>
table queries is a particular kind of set queries that provide a very flexible syntax for searching data stored in tables.
Example: Let's define a new table Department.
drop table Department; -- Remove old table if present
create table Department( -- Create new table
id Integer,
name Charstring,
unique(id));
insert into Department values -- Populate new table
(1, 'Toys'),
(2, 'Tools'),
(3, 'Food');
The following SQL query selects column id from table Department:
select id
from Department d
where d.name = 'Tools'
In SQL all data searched by select queries must be stored in tables
referenced in the from clause. Therefore variables declared in the
from clause always reference rows. For example the variable d in the
query above range over rows in the table named Department.
OSQL provides a smooth extension to standard SQL by allowing variables
in the from clause to be bound to any kind of object, including rows
in tables, numbers, strings, arrays, etc.
Example: The following set query returns the numbers less than 10 having the product 6.
select m, n
from Integer m, Integer n
where n in range(10)
and m * n = 6
The value of a set query may contain duplicates, i.e. it actually returns a bag rather than a set.
Example: The result of the result from following query contains duplicates.
select mod(i,3)
from Integer i
where i in range(10)
Duplicates are removed from the query result when the keyword
distinct is specified so the result becomes a set rather than a
bag.
Example: this query removes duplicates:
select distinct mod(i,3)
from Integer i
where i in range(10)
Into clause
The optional into clause specifies variables to be bound to the result.
Example:
select i into :i
from Integer i
where i in range(10)
and mod(i,3)=0
:i
The query above retrieves all natural numbers up to 10 divisible with
3, i.e. values 3,6,9. When the result bag contains more than one
object, as in this case, the into variable will be bound only to
the first object in the bag.
If you wish to assign the entire result from a set query to a variable, enclose it in parentheses.
Example:
set :b = (select i
from Integer i
where i in range(10)
and mod(i,3)=0)
:b
Predicates
The where clause in a set query specifies a select condition as
a logical predicate over variables. A predicate is an
expression
returning a Boolean value.
Example: This query contains the predicate i=sqrt(4).
select i
from Integer i
where i = sqrt(4)
Predicates can be expressed as:
A logical comparison operator:
=(equal),<>(not equal),>=(larger than),>(strict larger than) ,<=(less than), and<(strict less than).A call to a Boolean function having the result type
Boolean.A combination of predicates using the Boolean operators
and,or,not, andexiststo combine predicates into composite predicates.
Example: The following query has a composite predicate.
select i
from Integer i
where i + 1 > 8
and (i in range(10) or i in range(12))
and mod(i,6) = 0
Predicates my be used in the results of queries too.
Example:
select i<3
from Integer i
where i in range(5)
Negation
The predicat not p is true when the predicate p is false. It
implements logical negation over a predicat p, .
The predicate p is a negated predicate.
Example:
select i
from Integer i
where i in range(10)
and not i > 5
The query optimizer removes negated predicates by replacing them with their complements when possible by using De Morgan's laws.
The complements of =, <>, >, >=, <, and <= are <>, =,
<=, <, >=, and >, respectively.
Example: The query above can also be written as follows.
select i
from Integer i
where i in range(10)
and i <= 5
The predicate p can be any simple of composite
predicate.
Example:
select i
from Integer i
where i in range(10)
and not (i < 9 and i > 3)
Using De Morgan's laws the query can also be expressed as:
select i
from Integer i
where i in range(10)
and (i >= 9 or i <= 3)
Quantifiers
The operator exists sq returns true when the subquery sq returns a
non-empty value. It implements existential
quantification
over a subquery sq, .
Example: This query returns the integers between 1 and 5 that are
also in the bag mod(range(5),3) being values 0,1,2.
select i
from Integer i
where i in range(5)
and exists i in mod(range(5),3)
If we remove the term exists here, the query will return duplicates:
select i
from Integer i
where i in range(5)
and i in mod(range(5),3)
The expression not exists(sq) tests if a subquery sq returns empty
result, i.e. a negated subquery (logical ).
Example:
select i
from Integer i
where i in range(5)
and not exists i in mod(range(5),3)
Negated subqueries can specify intersections of
bags, in the example between the bags
range(5) and mod(range(5),3).
Aggregate functions in queries
Aggregate functions such as sum(), avg(), stdev(), min(),
max(), count() are handled specially, not using Daplex
semantics.
They are applied on entire
collections
of values, rather than being applied for each element using the Daplex
semantics of regular nested function calls.
Example:
count(select i
from Integer i
where i in range(5)
and mod(i,2)=0)
In this case count() is applied on a
bag of numbers. The
system uses the rule that an argument not flattened when the
argument of the calling function (e.g. count) is declared as a
Bag.
Example:
signature('count')
Aggregate functions are often used in grouped selections.
Aggregate functions are called with bags as arguments.
Example: Let's define a bag :b of the integers between one and ten.
set :b = (select i
from Integer i
where i in range(10))
This computes the mean of the values in :b:
avg(:b)
Variables in queries can be declared as bags bound to subqueries.
Example: this query gets the mean and standard deviation of the integers between 1 and 10:
select avg(b), stdev(b)
from Bag of Integer b
where b = (select i
from Integer i
where i in range(10))
Bags are not explicitly stored in memory, but are generated when needed, for example when they are used in aggregate functions.
Example:
set :bigbag = range(1,1000000)
The statement above assigns :bigbag to a bag of a million integers
without explicitly creating it. Instead its elements are
generated when needed, for example inside the aggregate
function sum():
sum(:bigbag)
Ordered selections
The order by clause specifies that the result should be sorted by
the specified sort key. The sort order is descending when desc
is specified and by default ascending, asc. A set query with an
order-by-clause is called an ordered selection.
Example: The following query sorts the integers i between 1 and 10
paired with sin(i) based of the sort key sin(i):
select i, sin(i)
from Integer i
where i in range(10)
order by sin(i) desc
The sort key does not need to be part of the result.
Example:
select i
from Integer i
where i in range(10)
order by sin(i) desc
Grouped selections
When analyzing data it is often necessary to group data, for example
to get the sum of the salaries of employees per department. Such
re-groupings are specified though the optional group by clause. It
specifies on which expression in the select clause the data should be
grouped and summarized. This is called a grouped selection.
Example: Consider the following query without a group by clause:
select i
from Integer i
where i in mod(range(10),3)
To pair the different i values in the result with the sum of
the corresponding i values, we can make this modified
query with a group by clause:
select i, sum(i)
from Integer i
where i in mod(range(10),3)
group by i
A grouped selection is a set query with a group by clause
present. The execution semantics of a grouped selection is different
than for regular queries. In the example the aggregate function
sum() is not applied directly on the value of i but rather on the
bag of values
specified by the group by clause.
An element of the select clause of a grouped selection can be one of:
An element in the group key specified by the
group byclause, which isiin the example. The result is grouped by each value of the group key. In the example the grouping is made over each integer in the bagmod(range(10),3).A call to an aggregate function, which is
sum(i)in the example. The aggregate function is called for the bag of variable bindings specified by the group key, in the example the bag ofisuch thati in mod(range(10),3).
The group key need not be part of the result.
Example:
select sum(i)
from Integer i
where i in mod(range(10),3)
group by i
Top-k queries
The optional limit clause limits the number of returned values from
the set query. It is often used together with ordered
selections to specify top-k queries returning the first few tuples
in a set of objects based on some ranking.
Example: The following query finds the two largest values of
sin(i) for each integer ì between 1 and 10.
select i, sin(i)
from Integer i
where i in range(10)
order by sin(i) desc
limit 2
The limit can be any numerical expression.
Example:
set :k=1
select i, sin(i)
from Integer i
where i in range(10)
order by sin(i) desc
limit :k+2
