Queries
Queries retrieve objects having specified properties from the local database, streams or other data sources. A query can be one of the following:
It can be an expression to be evaluated to some result.
It can be a set query to search the database for a set of objects having properties fulfilling a query condition specified as a logical predicate.
It can be an array query that constructs numerical arrays fullfilling a query condition.
It can be a stream query that constructs a possibly infinite stream of elements fullfilling a query condition.
It can be a vector query that constructs an ordered sequence (vector) of objects fulfilling a query condition.
Function calls and expressions​
A simple form of queries are expressions that call functions, for example:
sqrt(2.1);
1+2;
1+2 < 3+4;
"a" + 1
The built-in functions plus()
, minus()
, times()
, and div()
have infix syntax +,-,*,/
with the usual priorities.
Example:
(sin(3.14)+sin(6.28)) * 0.5
is equivalent to:
times(plus(sin(3.14),sin(6.28)),0.5)
The +
operator is defined for both numbers and strings. For strings
it implements string concatenation, e.g.
Example:
"Hello" + " world"
In a function call, the types of the actual parameters must be the same as, or subtypes of, the types of the corresponding formal parameters.
Set queries​
A select expression provides a very flexible way to specify a query returning a set of objects. A select expression has the format:
select <result>
from <type extents>
where <condition>
For example, assume we have the following popiulated database:
create type Department;
create function name(Department d) -> Charstring
as stored;
create Department(name) instances
:toys ("Toys"),
:tools ("Tools"),
:food ("Food");
create type Employee;
create function name(Employee e) -> Charstring
as stored;
create function salary(Employee e) -> Real
as stored;
create function dept(Employee e) -> Department d
as stored;
create Employee(name, salary, dept) instances
("Maja", 100, :toys),
("Bill", 200, :food),
("Bull", 300, :tools),
("Pelle", 400, :toys),
("Mons", 500, :food),
("Olle", 500, :toys),
("Birgitta",600, :tools),
("Murre", 700, :food);
The following query returns the names and salaries of the employees in the Toys department having higher salary than 500:
select name(e), salary(e)
from Employee e, Department d
where salary(e) > 400
and dept(e) = d
and name(d) = 'Tools'
In general the semantics of an OSQL query is as follows:
Form the Cartesian product of the type extents, which is the Cartesian product of all employees and departments in the example.
Restrict the Cartesian product by the condition in the
where
clause.For each possible variable binding to elements in the restricted cartesian product, evaluate the
results
expressions to form a result tuple.
It would be very inefficient to directly use the above semantics to execute a query. It is therefore necessary for the system to do extensive query optimization to transform the query into an efficient execution plan, which is a program in an internal language called SLOG (Streamed LOGic).
OSQL also permits formulation of queries accessing indefinite extents. Such queries are not executable at all without query optimization. For example, the previous query could also have been formulated as:
select nm, s
from Employee e, Department d, Charstring nm, Real s
where salary(e) > 400
and dept(e) = d
and name(d) = 'Tools'
and nm = name(e)
and s = salary(e)
In this case, the Cartesian product of all employees, departments, strings, and real numbers is infinite, so the query is not executable without query optimization.
It may be impossible to excute some queries over infinite type extents. For example, the following query is not executable:
select r from Real r where r < 100
The guide Type extent and the solution domain explains more about how query results are formed by the combining the extents of types.
The value of a set query may contain duplicates, i.e. it actually returns a bag rather than a set.
For example, the result of the following query will contain duplicates if names of persons in the database are not unique:
select mod(i,3)
from Integer i
where i in range(10)
Duplicates are removed from the query result when the keyword
distinct
is specified. By specifying distinct
the result from a
set query is a set rather than a bag.
For example, this query removes the duplicates:
select distinct mod(i,3)
from Integer i
where i in range(10)
Predicates​
The where
clause in a set query specifies a selection filter as a
logical predicate over variables. A predicate is an
expression
returning a boolean value. Predicates can be expressed using logical value
comparison operators (>, *, +
etc.) and functions returning boolean
results. The boolean operators and
and or
can be used to combine
boolean values into composite predicates.
Examples:
nm = name(e)
salary(e) > 400 and dept(e) = d and name(d) = "Tools"
The boolean operator and
has precedence over or
.
For example:
a<2 and a>3 or b<3 and b>2
is equivalent to
(a<2 and a>3) or (b<3 and b>2)
The comparison operators (=, !=, >, <=, and >=) are treated as binary predicates. You can compare objects of any type.
Predicates are allowed in the result of a set query.
The query returns true
for all employees in the Toys department
earning more than 200.
Nested function calls​
If a function is applied on the result of a function returning a bag of values, the outer function is applied on each element of that bag, the bag is flattened. This is called Daplex semantics.
For example, consider the query: Example:
create function employees(Department d) -> Bag of Employee
as select e
from Employee e
where dept(e) = d;
select name(employees(d))
from Department d
where name(d) = "Toys"
The function employees(d)
returns a bag of employees, on which the
function name()
is applied. The normal semantics in SA Engine is
that when a function (e.g. name()
) is applied on a bag valued
function (e.g. employees(d)
) it will be applied on each element of
the returned bag. In the example a bag of the names of the employees
working in the Toys department are returned.
The in operator​
If a function returns a
bag the
elements of that bag can be accessed using the in
operator.
Example:
select name(e)
from Employee e, Department d
where e in employees(d)
and name(d) = "Tools"
Tuple expressions​
To retrieve the results of tuple valued functions in queries, use a tuple expression.
For example, the following query calls the function divide(Integer i, Integer j)->(Integer d, Integer r)
to compute the remained of dividing 5 with 3:
select rem
from Integer d, Integer rem
where (d,rem) = divide(5,3)
Tuple expressions can also be used to assign the result of functions returning tuples, for example:
set (:d,:r)=divide(5,3);
:d + :r;
Into clause​
The optional into
clause specifies variables to be bound to the result.
Example:
select e into :e
from Employee e
where name(e) = 'Olle';
name(dept(:e));
The first query above retrieves the object representing the employee
named 'Olle' into the session variable :e
. The second query returns
the name of Olle's department.
Notice that if the result bag contains more than one object the
into variable(s) will be bound only to the first object in the
bag. In the example, if more that one employee would have been named
Olle, only the first one found will be assigned to :e
.
If you wish to assign the entire result from a set query to a
variable, enclose it in parentheses. The result will be a bag. The
elements of the bag can then be extracted with the infix in
operator
or the in()
function.
Example:
set :r = (select e
from Employee e
where salary(e) < 150)
Inspect :r
with one of these equivalent queries:
select name(e) from Employee e where e in :r;
name(in(:r));
Aggregate functions in queries​
Aggregate functions such as sum()
, avg()
, stdev()
, min()
,
max()
, count()
are handled specially, not using Daplex
semantics. They are applied on entire
collections
of values, rather than being applied for each element using the Daplex
semantics of regular nested function calls.
For example:
select count(employees(d))
from Department d
where name(d) = "Toys"
In this case count()
is applied on the function employees(d)
returning a bag of all employees in department d
whose name is
Toys
. The system uses the general rule that an argument is
converted into a subquery when the argument of the calling
function (e.g. count
) is declared Bag of
.
Another example:
select sum(salary(employees(d)))
from Department d
where name(d) = "Tools"
Here, first Daplex semantics is used to form the bag of employees in
the Tools department and then the aggregate function sum
adds
together the salaries of its employees.
Aggregate functions are often used in grouped selections.
Aggregate functions can be applied on subqueries specified as nested set queries returning bags.
For example, this query returns the average salary of all employees:
avg(select salary(e) from Employee e)
Local variables in queries may be declared as bags, which means that the variable is bound to a subquery that can be used as subquery arguments to aggregate functions.
For example, this query gets the mean and standard deviation of all employees:
select avg(b), stdev(b)
from Bag of Integer b
where b = (select salary(e)
from Employee e)
Variables may be assigned to bags by assigning values of functions returning bags.
Example:
set :e = (select employees(d) from Department d where name(d)='Toys');
count(:e)
Bags are not explicitly stored in memory, but are generated when needed, for example when they are used in aggregate functions.
Example:
set :bigbag = range(1,10000000)
The statement above assigns :bigbag
to a bag of 10^7^ numbers. The
bag is not explicitly created through. Instead its elements are
generated when needed, for example when passed to the aggregate
function count()
:
count(:bigbag)
Ordered selections​
The order by
clause specifies that the result should be sorted by
the specified sort key. The sort order is descending when desc
is
specified and ascending otherwise. A set query with an order-by-clause
is called an ordered selection.
For example, the following query sorts the result descending based on
the sort key salary(e)
:
select name(e), salary(e)
from Employee e
order by salary(e) desc
The sort key does not need to be part of the result. For example, the following query list the names of employees in descending order without showing their salaries:
select name(e)
from Employee e
order by salary(e) desc
Grouped selections​
When analyzing data it is often necessary to group data, for example
to get the sum of the salaries of employees per department. Such
re-groupings are specified though the optional group by
clause. It
specifies on which
expression in
the select clause the data should be grouped and summarized. This is
called a grouped selection.
Example:
select name(d), sum(salary(employees(d)))
from Department d
group by name(d)
A grouped selection is a set query with a group by
clause
present. The execution semantics of a grouped selection is different
than for regular queries.
Example:
select name(d), sum(salary(e))
from Department d, Employee e
where dept(e)=d
group by name(d)
Here the group by
clause specifies that the result is grouped on the
names of the departments in the database. After the grouping, for each
department d
the salaries of the employees e
working at that
department are summed using the aggregate
function
sum()
.
An element of a select clause of a grouped selection must be one of:
An element in the group key specified by the
group by
clause, which isname(d)
in the example. The result is grouped on each group key. In the example the grouping is made over each department name so the group key is specified asname(d)
.A call to an aggregate function, which is
sum()
in the example. The aggregate function is applied for the set of variable bindings specified by the group key. In the example the aggregate functionsum()
is applied on the set of values ofsalary(p)
for the employeese
working in departmentd
, i.e. wheredept(e)=d
.
Contrast the above query to the regular (non-grouped) query:
select name(d), sum(salary(e))
from Department d, Employee e
where dept(e)=d
Without the grouping the aggregate function sum()
is applied on the
salary of each emplouyee 3
, rather than the
bag of
salaries corresponding to the group key name(e)
in the grouped
selection.
The group key need not be part of the result. For example the following query returns the sum of the salaries for all departments without revealing the department names:
select sum(salary(e))
from Department d, Employee e
where dept(e)=d
group by name(d)
Top-k queries​
The optional limit
clause limits the number of returned values from
the set query. It is often used together with ordered selections to
specify top-k queries returning the first few tuples in a set of
objects based on some ranking.
For example, the following query returns the names and salaries of the 10 best payed employees:
select name(e), salary(e)
from Employee e
order by salary(e) desc
limit 10
The limit can be any numerical expression.
For example, the following query retrieves the :k+3
lowest income
earners, where :k
is a variable bound to a numeric value:
set :k=2;
select name(e), salary(e)
from Employee e
order by salary(e) desc
limit :k+3
Quantifier functions​
The function some()
implements logical exist over a subquery.
Signature:
some(Bag sq) -> Boolean
Example
select name(d)
from Department d
where some(employees(d))
The function notany()
tests if a subquery sq returns empty result,
i.e. negation.
Signature
notany(Bag sq) -> Boolean
Example:
select name(d)
from Department d
where notany(select e
from Employee e
where salary(e)<200
and dept(e)=d)