ApiBackend

Fetch and store data from an API endpoint.

  1. Overview
  2. base_url
  3. authentication
  4. model_casing
  5. api_casing
  6. api_to_model_map
  7. pagination_parameter_name
  8. pagination_parameter_type
  9. limit_parameter_name

Overview

The ApiBackend gives developers a way to quickly build SDKs to connect a clearskies applications to arbitrary API endpoints. The backend has some built in flexibility to make it easy to connect it to most APIs, as well as behavioral hooks so that you can override small sections of the logic to accommodate APIs that don’t work in the expected way. This allows you to interact with APIs using the standard model methods, just like every other backend, and also means that you can attach such models to endpoints to quickly enable all kinds of pre-defined behaviors.

Usage

Configuring the API backend is pretty easy:

  1. Provide the base_url to the constructor, or extend it and set it in the __init__ for the new backend.
  2. Provide a clearskies.authentication.Authentication object, assuming it isn’t a public API.
  3. Match your model class name to the path of the API (or set model.destination_name() appropriately)
  4. Use the resulting model like you would any other model!

It’s important to understand how the Api Backend will map queries and saves to the API in question. The rules are fairly simple:

  1. The API backend only supports searching with the equals operator (e.g. models.where("column=value")).
  2. To specify routing parameters, use the {parameter_name} or :parameter_name syntax in either the url or in the destination name of your model. In order to query the model, you then must provide a value for any routing parameters, using a matching search condition: (e.g. models.where("routing_parameter_name=value"))
  3. Any search clauses that don’t correspond to routing parameters will be translated into query parameters. So, if your destination_name is https://example.com/:categoy_id/products and you executed a model query: models.where("category_id=10").where("on_sale=1") then this would result in fetching a URL of https://example.com/10/products?on_sale=1
  4. When you specifically search on the id column for the model, the id will be appended to the end of the URL rather than as a query parameter. So, with a destination name of https://example.com/products, querying for models.find("id=10") will result in fetching https://example.com/products/10.
  5. Delete and Update operations will similarly append the id to the URL, and also set the appropriate response method (e.g. DELETE or PATCH by default).
  6. When processing the response, the backend will attempt to automatically discover the results by looking for dictionaries that contain the expected column names (as determined from the model schema and the mapping rules).
  7. The backend will check for a response header called link and parse this to find pagination information so it can iterate through records.

NOTE: The API backend doesn’t support joins or group_by clauses. This limitation, as well as the fact that it only supports seaching with the equals operator, isn’t a limitation in the API backend itself, but simply reflects the behavior of most API endoints. If you want to support an API that has more flexibility (for instance, perhaps it allows for more search operations than just =), then you can extend the appropritae methods, discussed below, to map a model query to an API request.

Here’s an example of how to use the API Backend to integrate with the Github API:

import clearskies


class GithubPublicBackend(clearskies.backends.ApiBackend):
    def __init__(
        self,
        # This varies from endpoint to endpoint, so we want to be able to set it for each model
        pagination_parameter_name: str = "since",
    ):
        # these are fixed for all gitlab API parameters, so there's no need to make them setable
        # from the constructor
        self.base_url = "https://api.github.com"
        self.limit_parameter_name = "per_page"
        self.pagination_parameter_name = pagination_parameter_name
        self.finalize_and_validate_configuration()


class UserRepo(clearskies.Model):
    # Corresponding API Docs: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#list-repositories-for-a-user
    id_column_name = "full_name"
    backend = GithubPublicBackend(pagination_parameter_name="page")

    @classmethod
    def destination_name(cls) -> str:
        return "users/:login/repos"

    id = clearskies.columns.Integer()
    full_name = clearskies.columns.String()
    type = clearskies.columns.Select(["all", "owner", "member"])
    url = clearskies.columns.String()
    html_url = clearskies.columns.String()
    created_at = clearskies.columns.Datetime()
    updated_at = clearskies.columns.Datetime()

    # The API endpoint won't return "login" (e.g. username), so it may not seem like a column, but we need to search by it
    # because it's a URL parameter for this API endpoint.  Clearskies uses strict validation and won't let us search by
    # a column that doesn't exist in the model: therefore, we have to add the login column.
    login = clearskies.columns.String(is_searchable=True, is_readable=False)

    # The API endpoint let's us sort by `created`/`updated`.  Note that the names of the columns (based on the data returned
    # by the API endpoint) are `created_at`/`updated_at`.  As above, clearskies strictly validates data, so we need columns
    # named created/updated so that we can sort by them.  We can set some flags to (hopefully) avoid confusion
    updated = clearskies.columns.Datetime(
        is_searchable=False, is_readable=False, is_writeable=False
    )
    created = clearskies.columns.Datetime(
        is_searchable=False, is_readable=False, is_writeable=False
    )


class User(clearskies.Model):
    # Corresponding API docs: https://docs.github.com/en/rest/users/users?apiVersion=2022-11-28#list-users

    # github has two columns that are both effecitvely id columns: id and login.
    # We use the login column for id_column_name because that is the column that gets
    # used in the API to fetch an individual record
    id_column_name = "login"
    backend = GithubPublicBackend()

    id = clearskies.columns.Integer()
    login = clearskies.columns.String()
    gravatar_id = clearskies.columns.String()
    avatar_url = clearskies.columns.String()
    html_url = clearskies.columns.String()
    repos_url = clearskies.columns.String()

    # We can hook up relationships between models just like we would if we were using an SQL-like
    # database.  The whole point of the backend system is that the model queries work regardless of
    # backend, so clearskies can issue API calls to fetch related records just like it would be able
    # to fetch children from a related database table.
    repos = clearskies.columns.HasMany(
        UserRepo,
        foreign_column_name="login",
        readable_child_columns=["id", "full_name", "html_url"],
    )


def fetch_user(users: User, user_repos: UserRepo):
    # If we execute this models query:
    some_repos = (
        user_repos.where("login=cmancone")
        .sort_by("created", "desc")
        .where("type=owner")
        .pagination(page=2)
        .limit(5)
    )
    # the API backend will fetch this url:
    # https://api.github.com/users/cmancone/repos?type=owner&sort=created&direction=desc&per_page=5&page=2
    # and we can use the results like always
    repo_names = [repo.full_name for repo in some_repos]

    # For the below case, the backend will fetch this url:
    # https://api.github.com/users/cmancone
    # in addition, the readable column names on the callable endpoint includes "repos", which references our has_many
    # column.  This means that when converting the user model to JSON, it will also grab a page of repositories for that user.
    # To do that, it will fetch this URL:
    # https://api.github.com/users/cmancone/repos
    return users.find("login=cmancone")


wsgi = clearskies.contexts.WsgiRef(
    clearskies.endpoints.Callable(
        fetch_user,
        model_class=User,
        readable_column_names=["id", "login", "html_url", "repos"],
    ),
    classes=[User, UserRepo],
)

if __name__ == "__main__":
    wsgi()

The following example demonstrates how models using this backend can be used in other clearskies endpoints, just like any other model. Note that the following example is re-using the above models and backend, I have just omitted them for the sake of brevity:

wsgi = clearskies.contexts.WsgiRef(
    clearskies.endpoints.List(
        model_class=User,
        readable_column_names=["id", "login", "html_url"],
        sortable_column_names=["id"],
        default_sort_column_name=None,
        default_limit=10,
    ),
    classes=[User],
)

if __name__ == "__main__":
    wsgi()

And if you invoke it:

$ curl 'http://localhost:8080' | jq
{
    "status": "success",
    "error": "",
    "data": [
        {
            "id": 1,
            "login": "mojombo",
            "html_url": "https://github.com/mojombo"
        },
        {
            "id": 2,
            "login": "defunkt",
            "html_url": "https://github.com/defunkt"
        },
        {
            "id": 3,
            "login": "pjhyett",
            "html_url": "https://github.com/pjhyett"
        },
        {
            "id": 4,
            "login": "wycats",
            "html_url": "https://github.com/wycats"
        },
        {
            "id": 5,
            "login": "ezmobius",
            "html_url": "https://github.com/ezmobius"
        },
        {
            "id": 6,
            "login": "ivey",
            "html_url": "https://github.com/ivey"
        },
        {
            "id": 7,
            "login": "evanphx",
            "html_url": "https://github.com/evanphx"
        },
        {
            "id": 17,
            "login": "vanpelt",
            "html_url": "https://github.com/vanpelt"
        },
        {
            "id": 18,
            "login": "wayneeseguin",
            "html_url": "https://github.com/wayneeseguin"
        },
        {
            "id": 19,
            "login": "brynary",
            "html_url": "https://github.com/brynary"
        }
    ],
    "pagination": {
        "number_results": null,
        "limit": 10,
        "next_page": {
            "since": "19"
        }
    },
    "input_errors": {}
}

In essence, we now have an endpoint that lists results but, instead of pulling its data from a database, it makes API calls. It also tracks pagination as expected, so you can use the data in pagination.next_page to fetch the next set of results, just as you would if this were backed by a database, e.g.:

$ curl http://localhost:8080?since=19

Mapping from Queries to API calls

The process of mapping a model query into an API request involves a few different methods which can be overwritten to fully control the process. This is necessary in cases where an API behaves differently than expected by the API backend. This table outlines the method involved and how they are used:

MethodDescription
records_urlReturn the absolute URL to fetch, as well as any columns that were used to fill in routing parameters
records_methodReurn the HTTP request method to use for the API call
conditions_to_request_parametersTranslate the query conditions into URL fragments, query parameters, or JSON body parameters
pagination_to_request_parametersTranslate the pagination data into URL fragments, query parameters, or JSON body parameters
sorts_to_request_parametersTranslate the sort directive(s) into URL fragments, query parameters, or JSON body parameters
map_records_responseTake the response from the API and return a list of dictionaries with the resulting records

In short, the details of the query are stored in a clearskies.query.Query object which is passed around to these various methods. They use that information to adjust the URL, add query parameters, or add parameters into the JSON body. The API Backend will then execute an API call with those final details, and use the map_record_response method to pull the returned records out of the response from the API endpoint.

base_url

Required

Given a URL, this will append the base URL, fill in any routing data, and also return any used routing parameters.

For example, consider a base URL of `/my/api/{record_id}/:other_id` and then this is called as so:

```python
(url, used_routing_parameters) = api_backend.finalize_url(
    "entries",
    {
        "record_id": "1-2-3-4",
        "other_id": "a-s-d-f",
        "more_things": "qwerty",
    },
)
```

The returned url would be `/my/api/1-2-3-4/a-s-d-f/entries`, and used_routing_parameters would be ["record_id", "other_id"].
The latter is returned so you can understand what parameters were absorbed into the URL.  Often, when some piece of data
becomes a routing parameter, it needs to be ignored in the rest of the request.  `used_routing_parameters` helps with that.

authentication

Optional

An instance of clearskies.authentication.Authentication that handles authentication to the API.

The following example is a modification of the Github Backends used above that shows how to setup authentication. Github, like many APIs, uses an API key attached to the request via the authorization header. The SecretBearer authentication class in clearskies is designed for this common use case, and pulls the secret key out of either an environment variable or the secret manager (I use the former in this case, because it’s hard to have a self-contained example with a secret manager). Of course, any authentication method can be attached to your API backend - SecretBearer authentication is used here simply because it’s a common approach.

Note that, when used in conjunction with a secret manager, the API Backend and the SecretBearer class will work together to check for a new secret in the event of an authentication failure from the API endpoint (specifically, a 401 error). This allows you to automate credential rotation: create a new API key, put it in the secret manager, and then revoke the old API key. The next time an API call is made, the SecretBearer will provide the old key from it’s cache and the request will fail. The API backend will detect this and try the request again, but this time will tell the SecretBearer class to refresh it’s cache with a fresh copy of the key from the secrets manager. Therefore, as long as you put the new key in your secret manager before disabling the old key, this second request will succeed and the service will continue to operate successfully with only a slight delay in response time caused by refreshing the cache.

import clearskies

class GithubBackend(clearskies.backends.ApiBackend):
    def __init__(
        self,
        pagination_parameter_name: str = "page",
        authentication: clearskies.authentication.Authentication | None = None,
    ):
        self.base_url = "https://api.github.com"
        self.limit_parameter_name = "per_page"
        self.pagination_parameter_name = pagination_parameter_name
        self.authentication = clearskies.authentication.SecretBearer(
            environment_key="GITHUB_API_KEY",
            header_prefix="Bearer ", # Because github expects a header of 'Authorization: Bearer API_KEY'
        )
        self.finalize_and_validate_configuration()

class Repo(clearskies.Model):
    id_column_name = "login"
    backend = GithubBackend()

    @classmethod
    def destination_name(cls):
        return "/user/repos"

    id = clearskies.columns.Integer()
    name = clearskies.columns.String()
    full_name = clearskies.columns.String()
    html_url = clearskies.columns.String()
    visibility = clearskies.columns.Select(["all", "public", "private"])

wsgi = clearskies.contexts.WsgiRef(
    clearskies.endpoints.List(
        model_class=Repo,
        readable_column_names=["id", "name", "full_name", "html_url"],
        sortable_column_names=["full_name"],
        default_sort_column_name="full_name",
        default_limit=10,
        where=["visibility=private"],
    ),
    classes=[Repo],
)

if __name__ == "__main__":
    wsgi()

model_casing

Optional

The casing used in the model (snake_case, camelCase, TitleCase)

This is used in conjunction with api_casing to tell the processing layer when you and the API are using different casing standards. The API backend will then automatically covnert the casing style of the API to match your model. This can be helpful when you have a standard naming convention in your own code which some external API doesn’t follow, that way you can at least standardize things in your code. In the following example, these parameters are used to convert from the snake_casing native to the Github API into the TitleCasing used in the model class:

import clearskies

class User(clearskies.Model):
    id_column_name = "login"
    backend = clearskies.backends.ApiBackend(
        base_url="https://api.github.com",
        limit_parameter_name="per_page",
        pagination_parameter_name="since",
        model_casing="TitleCase",
        api_casing="snake_case",
    )

    Id = clearskies.columns.Integer()
    Login = clearskies.columns.String()
    GravatarId = clearskies.columns.String()
    AvatarUrl = clearskies.columns.String()
    HtmlUrl = clearskies.columns.String()
    ReposUrl = clearskies.columns.String()

wsgi = clearskies.contexts.WsgiRef(
    clearskies.endpoints.List(
        model_class=User,
        readable_column_names=["Login", "AvatarUrl", "HtmlUrl", "ReposUrl"],
        sortable_column_names=["Id"],
        default_sort_column_name=None,
        default_limit=2,
        internal_casing="TitleCase",
        external_casing="TitleCase",
    ),
    classes=[User],
)

if __name__ == "__main__":
    wsgi()

and when executed:

$ curl http://localhost:8080 | jq
{
    "Status": "Success",
    "Error": "",
    "Data": [
        {
            "Login": "mojombo",
            "AvatarUrl": "https://avatars.githubusercontent.com/u/1?v=4",
            "HtmlUrl": "https://github.com/mojombo",
            "ReposUrl": "https://api.github.com/users/mojombo/repos"
        },
        {
            "Login": "defunkt",
            "AvatarUrl": "https://avatars.githubusercontent.com/u/2?v=4",
            "HtmlUrl": "https://github.com/defunkt",
            "ReposUrl": "https://api.github.com/users/defunkt/repos"
        }
    ],
    "Pagination": {
        "NumberResults": null,
        "Limit": 2,
        "NextPage": {
            "Since": "2"
        }
    },
    "InputErrors": {}
}

api_casing

Optional

The casing used by the API response (snake_case, camelCase, TitleCase)

See model_casing for details and usage.

api_to_model_map

Optional

A mapping from the data keys returned by the API to the data keys expected in the model

This comes into play when you want your model columns to use different names than what is returned by the API itself. Provide a dictionary where the key is the name of a piece of data from the API, and the value is the name of the column in the model. The API Backend will use this to match the API data to your model. In the example below, html_url from the API has been mapped to profile_url in the model:

import clearskies

class User(clearskies.Model):
    id_column_name = "login"
    backend = clearskies.backends.ApiBackend(
        base_url="https://api.github.com",
        limit_parameter_name="per_page",
        pagination_parameter_name="since",
        api_to_model_map={"html_url": "profile_url"},
    )

    id = clearskies.columns.Integer()
    login = clearskies.columns.String()
    profile_url = clearskies.columns.String()

wsgi = clearskies.contexts.WsgiRef(
    clearskies.endpoints.List(
        model_class=User,
        readable_column_names=["login", "profile_url"],
        sortable_column_names=["id"],
        default_sort_column_name=None,
        default_limit=2,
    ),
    classes=[User],
)

if __name__ == "__main__":
    wsgi()

And if you invoke it:

$ curl http://localhost:8080 | jq
{
    "status": "success",
    "error": "",
    "data": [
        {
            "login": "mojombo",
            "profile_url": "https://github.com/mojombo"
        },
        {
            "login": "defunkt",
            "profile_url": "https://github.com/defunkt"
        }
    ],
    "pagination": {
        "number_results": null,
        "limit": 2,
        "next_page": {
            "since": "2"
        }
    },
    "input_errors": {}
}

pagination_parameter_name

Optional

The name of the pagination parameter

pagination_parameter_type

Optional

The expected ‘type’ of the pagination parameter: must be either ‘int’ or ‘str’

Note: this is set as a literal string, not as a type.

limit_parameter_name

Optional

The name of the parameter that sets the number of records per page (if empty, setting the page size will not be allowed)