Skip to main content

Model repositories

Guide specification
Guide type:Studio code
Requirements:None
Recommended reading:Managing models

SA Engine can be configured to use other repositories than the default one. This can be done by adding a repository record to the stored function models:repository() that maps repository names to records.

Repository record

A repository record has the following format:

{
"url": "http(s)://url.to.your.repo:port",
"base_path": "/models/" | function(Charstring repo, Record rec,
Charstring model, Charstring version) -> Charstring,
"http_headers": {
"authorization": "token-for-auth",
"header2": "another header needed for connection",
...
"headerN"; "Header n"
}
}
  • url: Url (protocol://host:port) for the remote repository
  • base_path: Can be either:
    1. A charstring which will be appended to the url before full path is created: <url>/<base_path>/<model>/<version>
    2. A function which takes four arguments and returns the path to the model in the repository. When using a function the function must return the full path after the url argument.
  • http_headers: Record with headers for the HTTP requests, can be used for setting authorization headers and other configurations needed with the remote repository.
Note

By customizing base_path with a function you can use many services as a model repository. Later in this guide you will be shown how to use a GitHub repository as a model repository.

Model repository REST API

The REST API is very simple. Two methods are used, only one is needed.

MethodURLDescription
GET{url}{base_path}/{model}/{version}Download version of model as .s.fcz package
POST{url}{base_path}/{model}/{version}Upload version of model as .s.fcz package (only needed when publishing from within an SA Engine instance)

Example configuration with base path as string

With the following repository config and import statement:

set models:repository("my_http_repo") = {
"url": "https://my.remote.host.com",
"base_path": "/my_models/",
"http_headers": {
"Authorization": "Basic ...."
}
};

models:import("my_http_repo","model1","1.2.3");

The following request would be sent:

MethodURL
GEThttps://my.remote.host.com/my_models/model1/1.2.3

Example configuration with base path as function

With the following repository config and import statement:

create function my_http_rewriter(Charstring repo, Record opts,
Charstring model, Charstring version)
-> Charstring
as "/custom_url/"+version+"/"+model;

set models:repository("my_http_repo") = {
"url": "https://my.remote.host.com",
"base_path": #"my_http_rewriter",
"http_headers": {
"Authorization": "Basic ...."
}
};

models:import("my_http_repo","model1","1.2.3");

The following request would be sent:

MethodURL
GEThttps://my.remote.host.com/custom_url/1.2.3/model1
Note

If you wish to learn how we host the public SA Engine model repository take a look at this GitHub repository

GitHub as a model repository

This guide will walk you through how to use a GitHub repository as a model repository for SA Engine. The first part will use one of Stream Analyze's own public repositories for models while the later part of the guide will show you how you can use your own private repository with access tokens as well.

Using sa.public.repo

Stream Analyze hosts a public repository for SA Engine models. The public model repo is hosted at https://github.com/streamanalyze/sa.repo.public.

There is a built-in function for creating a GitHub repository configuration, all you need to do is select a name for the model repository (e.g., sa.public), the organization, repository name and finally a personal access token if needed to access the repository.

models:add_github_repo("sa.public","streamanalyze", "sa.repo.public","");
Not connected

To run this code block you must be logged in and your studio instance must be started.

Let's take a look at the repository record generated:

models:repository("sa.public");
Not connected

To run this code block you must be logged in and your studio instance must be started.

That's all you need to do to add the public SA repository to your SA Engine instance. Now you can import models from sa.repo.public. Take a look at the following examples where we install and run version 1.0 of model1 and model2. Then we upgrade both models to 2.0 and run them instead.

models:import("sa.public","model1","1.0");
models:import("sa.public","model2","1.0");

model1();
model2();

models:import("sa.public","model1","2.0");
models:import("sa.public","model2","2.0");

model1();
model2();
Not connected

To run this code block you must be logged in and your studio instance must be started.

Note

If you look at the structure of the sa.repo.public you will see that essentially it is the SA_HOME directory with sub folders models and model_releases checked in.

SA/
├─ models/
│ ├─ model1/
│ │ ├─ master.osql
│ │ ...
│ └─ model2/
│ ├─ master.osql
│ ...
└─ model_releases/
├─ model1@1.0.s.fcz
├─ model1@2.0.s.fcz
├─ model2@1.0.s.fcz
└─ model2@2.0.s.fcz

The model1@1.0.s.fcz file was generated by running the following command from the root of the repository:

sa.engine -f SA -o "models:create_release('model1','1.0'); quit;"
Exercise

Try creating a GitHub action that automatically generates a model release.

Using a private GitHub repository

Using a private repository from GitHub is essentially the same thing with three extra steps.

  1. Create a private repository
  2. You need to create a personal access token (PAT) that can read the contents of the private repository
  3. Supply the PAT as the fourth argument to the models:add_github_repo function

If you can you should set an environment variable with the PAT and use getenv:

export SA_REPO_PAT=<secret-token>
// This won't work unless you have set SA_REPO_PAT
// environment variable before staring this studio instance
models:add_github_repo("sa.private","streamanalyze", "sa.repo.private",
getenv("SA_REPO_PAT"));
Not connected

To run this code block you must be logged in and your studio instance must be started.

GitHub repository - final words

That's all you need to know on how to use GitHub as a model repository. A similar approach can be used for any other git based repository with a REST API if you wish.

A setup with a GitHub model repository fits into a nice workflow where you can create, test and edit models locally and then push them to your remote for importing onto SA Engines across your organization. But due to the request limits on GitHub APIs it is not suited for massive deployments.

In an upcoming section we will go through how to use Amazon Web Services S3, CloudFormation, and Lambda@Edge to create a super-scalable deployment repository.

Your private GitHub repository can be viewed as a development or staging environment for the scalable AWS version.