Model repositories
Guide specification | |
---|---|
Guide type: | Studio code |
Requirements: | None |
Recommended reading: | Managing models |
SA Engine can be configured to use other repositories than the default one. This can be done by adding a repository record to the stored function models:repository()
that maps repository names to records.
Repository record
A repository record has the following format:
{
"url": "http(s)://url.to.your.repo:port",
"base_path": "/models/" | function(Charstring repo, Record rec,
Charstring model, Charstring version) -> Charstring,
"http_headers": {
"authorization": "token-for-auth",
"header2": "another header needed for connection",
...
"headerN"; "Header n"
}
}
url
: Url (protocol://host:port) for the remote repositorybase_path
: Can be either:- A charstring which will be appended to the url before full path is created:
<url>/<base_path>/<model>/<version>
- A function which takes four arguments and returns the path to the model in the repository. When using a function the function must return the full path after the
url
argument.
- A charstring which will be appended to the url before full path is created:
http_headers
: Record with headers for the HTTP requests, can be used for setting authorization headers and other configurations needed with the remote repository.
By customizing base_path
with a function you can use many services as a model repository. Later in this guide you will be shown how to use a GitHub repository as a model repository.
Model repository REST API
The REST API is very simple. Two methods are used, only one is needed.
Method | URL | Description |
---|---|---|
GET | {url}{base_path}/{model}/{version} | Download version of model as .s.fcz package |
POST | {url}{base_path}/{model}/{version} | Upload version of model as .s.fcz package (only needed when publishing from within an SA Engine instance) |
Example configuration with base path as string
With the following repository config and import statement:
set models:repository("my_http_repo") = {
"url": "https://my.remote.host.com",
"base_path": "/my_models/",
"http_headers": {
"Authorization": "Basic ...."
}
};
models:import("my_http_repo","model1","1.2.3");
The following request would be sent:
Method | URL |
---|---|
GET | https://my.remote.host.com/my_models/model1/1.2.3 |
Example configuration with base path as function
With the following repository config and import statement:
create function my_http_rewriter(Charstring repo, Record opts,
Charstring model, Charstring version)
-> Charstring
as "/custom_url/"+version+"/"+model;
set models:repository("my_http_repo") = {
"url": "https://my.remote.host.com",
"base_path": #"my_http_rewriter",
"http_headers": {
"Authorization": "Basic ...."
}
};
models:import("my_http_repo","model1","1.2.3");
The following request would be sent:
Method | URL |
---|---|
GET | https://my.remote.host.com/custom_url/1.2.3/model1 |
If you wish to learn how we host the public SA Engine model repository take a look at this GitHub repository
GitHub as a model repository
This guide will walk you through how to use a GitHub repository as a model repository for SA Engine. The first part will use one of Stream Analyze's own public repositories for models while the later part of the guide will show you how you can use your own private repository with access tokens as well.
Using sa.public.repo
Stream Analyze hosts a public repository for SA Engine models. The public model repo is hosted at https://github.com/streamanalyze/sa.repo.public.
There is a built-in function for creating a GitHub repository configuration, all you need to do is select a name for the model repository (e.g., sa.public
), the organization, repository name and finally a personal access token if needed to access the repository.
models:add_github_repo("sa.public","streamanalyze", "sa.repo.public","");
To run this code block you must be logged in and your studio instance must be started.
Let's take a look at the repository record generated:
models:repository("sa.public");
To run this code block you must be logged in and your studio instance must be started.
That's all you need to do to add the public SA repository to your SA Engine instance. Now you can import models from sa.repo.public
. Take a look at the following examples where we install and run version 1.0 of model1 and model2. Then we upgrade both models to 2.0 and run them instead.
models:import("sa.public","model1","1.0");
models:import("sa.public","model2","1.0");
model1();
model2();
models:import("sa.public","model1","2.0");
models:import("sa.public","model2","2.0");
model1();
model2();
To run this code block you must be logged in and your studio instance must be started.
If you look at the structure of the sa.repo.public you will see that essentially it is the SA_HOME
directory with sub folders models
and model_releases
checked in.
SA/
├─ models/
│ ├─ model1/
│ │ ├─ master.osql
│ │ ...
│ └─ model2/
│ ├─ master.osql
│ ...
└─ model_releases/
├─ model1@1.0.s.fcz
├─ model1@2.0.s.fcz
├─ model2@1.0.s.fcz
└─ model2@2.0.s.fcz
The model1@1.0.s.fcz
file was generated by running the following command from the root of the repository:
sa.engine -f SA -o "models:create_release('model1','1.0'); quit;"
Try creating a GitHub action that automatically generates a model release.
Using a private GitHub repository
Using a private repository from GitHub is essentially the same thing with three extra steps.
- Create a private repository
- You need to create a personal access token (PAT) that can read the contents of the private repository
- Supply the PAT as the fourth argument to the
models:add_github_repo
function
If you can you should set an environment variable with the PAT and use getenv
:
export SA_REPO_PAT=<secret-token>
// This won't work unless you have set SA_REPO_PAT
// environment variable before staring this studio instance
models:add_github_repo("sa.private","streamanalyze", "sa.repo.private",
getenv("SA_REPO_PAT"));
To run this code block you must be logged in and your studio instance must be started.
GitHub repository - final words
That's all you need to know on how to use GitHub as a model repository. A similar approach can be used for any other git based repository with a REST API if you wish.
A setup with a GitHub model repository fits into a nice workflow where you can create, test and edit models locally and then push them to your remote for importing onto SA Engines across your organization. But due to the request limits on GitHub APIs it is not suited for massive deployments.
In an upcoming section we will go through how to use Amazon Web Services S3, CloudFormation, and Lambda@Edge to create a super-scalable deployment repository.
Your private GitHub repository can be viewed as a development or staging environment for the scalable AWS version.