SELECT SUM(completedAmount) AS total, customerId
FROM transaction
GROUP BY customerId
ORDER BY total DESC;
The Analytics feature helps merchants to quickly analyze their transactions within wallee and get insights on how their bussiness is doing. With structured access to their data they can predict how to adapt to customer needs. Reports can be customized by writing SQL queries. All data analysis is handled by the PrestoDB open source SQL query engine for big data provided by Amazon Athena.
The Analytics Schema describes the data structures of the data which is exported from the main wallee database and can be queried through the Analytics Query Execution Service.
The Analytics Query Execution Service is used to manage Analytics queries. It allows to submit queries, fetch results in JSON
or CSV
formats, monitor query execution and cancel running queries.
For the full API reference see the Analytics Query Execution Service reference documentation.
In this example a query shall be executed which lists customers ordered by the amount they spent in the merchants space. Using the Analytics Schema we come up with the following PrestoDB SQL code which calculates the sum of transaction amounts grouped by customer:
SELECT SUM(completedAmount) AS total, customerId
FROM transaction
GROUP BY customerId
ORDER BY total DESC;
A query can be submitted for execution using the Submit Query method (HTTP POST
). The request body is expected to contain an Analytics Query structure in JSON format.
To submit our example query we send the following JSON:
{
"accountId": 1,
"externalId": "0d74f422-3d7e-4f8b-a7c4-10be0769e09c",
"queryString": "SELECT SUM(completedAmount) AS total, customerId FROM transaction GROUP BY customerId ORDER BY total DESC"
}
The query shall be executed by the account with ID 1
(parameter accountId
). The actual SQL string is specified in the queryString
parameter. Notice, we don’t provide any spaces
here as we want to include all spaces belonging to the given account. Read more about the Space and Account Restriction later.
We also need to specify either a maxCacheAge
or an externalId
property. In our example we use an externalId
containing the randomly generated UUID 0d74f422-3d7e-4f8b-a7c4-10be0769e09c
. See the section Query Submission Idempotency for a more detailed explanation of these parameters.
The response for a submission request will be a Query Execution structure in JSON format. It will contain the id
of the execution (which is needed to reference this execution in subsequent requests) as well as detail information about the query and the current state of the query execution:
{
"account": 1,
"externalId": "0d74f422-3d7e-4f8b-a7c4-10be0769e09c",
"failureReason": null,
"id": 1052,
"processingEndTime": null,
"processingStartTime": "2021-06-15T06:09:07.746Z",
"queryString": "SELECT SUM(completedAmount) AS total, customerId FROM transaction GROUP BY customerId ORDER BY total DESC",
"scannedBytes": 0,
"spaces": [],
"state": "PROCESSING"
}
A submitted query is processed asynchronously. Initially the query will be in PROCESSING
state and no results will yet be available. After the query has been processed and completed the state will change to PROCESSED
and the query results can be fetched.
If the query execution fails or is canceled before completion, the state will change to FAILED
resp. CANCELED
instead and no results will be returned. In the former case the failureReason
string contains information about the cause of the failure.
The current state of a previously submitted query can be inquired using the Execution Status method (HTTP GET
) with an id
URL parameter which contains the id
value of the Query Execution response returned by the Submit Query method (see Submit the Query).
The response of the Execution Status method will again be a Query Execution. In the case of our example the request URL will look like this:
/api/analytics-query/status?id=1052
After reaching the final PROCESSED
state we will get a response like the following:
{
"account": 1,
"externalId": "0d74f422-3d7e-4f8b-a7c4-10be0769e09c",
"failureReason": null,
"id": 1052,
"processingEndTime": "2021-06-15T06:09:09.561Z",
"processingStartTime": "2021-06-15T06:09:07.746Z",
"queryString": "SELECT SUM(completedAmount) AS total, customerId FROM transaction GROUP BY customerId ORDER BY total DESC",
"scannedBytes": 311398,
"spaces": [],
"state": "PROCESSED"
}
A query execution in PROCESSING
state can be canceled using Cancel Execution method (HTTP POST
) containing a request body with the query execution ID of the Query Execution response returned by the Submit Query method (see Submit the Query). The request url will look like the following:
/api/analytics-query/cancel-execution?id=1052
Canceling a query will always return an empty response (unless there was an error). If the query execution to be canceled has already reached some final state (PROCESSED
, FAILED
or CANCELED
) the canceling attempt will be silently ignored.
Once the Query Execution is in the PROCESSED
state, using the Fetch Result service returns the result in a Query Result Batch. The HTTP GET
request let’s you specify the URL parameter id
to identify the Query Execution. Use maxRows
to limit the number of rows in the result, e.g. 1 as shown in the example below. In addition, you can specify a timeout
parameter for the maximum time in seconds to wait for the result if it is not yet available. Use 0
(the default) to return the result immediately.
/api/analytics-query-service--fetch-result?id=1052&maxRows=1
The response contains the JSON
content of the result as rows
which are described by the columns
. Examine the below example response:
{
"columns": [
{
"aliasName": "total",
"columnName": "total",
"description": null,
"precision": 38,
"referencedTable": null,
"scale": 8,
"tableName": null,
"type": "decimal"
},
{
"aliasName": "customerId",
"columnName": "customerid",
"description": null,
"precision": 2147483647,
"referencedTable": null,
"scale": 0,
"tableName": null,
"type": "varchar"
}
],
"nextToken": "QVNRNmZUY2lVQUg4dGQ1Nkltd3dFWjdIbFJ1L1RxL2ZNYUhmblhkV2ZnUlRBbTJpRlJ6WUsxSzVDblVnZEFjelF6Qm5wbU5uc05aam9aYld3NGVRdjIrbWtjYkpTdzZYcGc9PQ",
"queryExecution": {
"account": 2,
"externalId": null,
"failureReason": null,
"id": 58,
"processingEndTime": "2021-06-15T06:09:09.561Z",
"processingStartTime": "2021-06-15T06:09:07.746Z",
"queryString": "SELECT SUM(completedAmount) AS total, customerId FROM transaction GROUP BY customerId ORDER BY total DESC",
"scannedBytes": 311398,
"spaces": [],
"state": "PROCESSED"
},
"rows": [
[
"8377.88000000",
"30"
]
]
}
The result is paginated, meaning only a batch of rows is provided at once. This is configured with maxRows
which are at maximum 1000. In order to receive the next batch, the nextToken
can be used with Fetch Result to retrieve subsequent batches. The below example fetches the second row.
/api/client/analytics-query-service--fetch-result?id=1052&maxRows=1&nextToken=QVNRNmZUY2lVQUg4dGQ1Nkltd3dFWjdIbFJ1L1RxL2ZNYUhmblhkV2ZnUlRBbTJpRlJ6WUsxSzVDblVnZEFjelF6Qm5wbU5uc05aam9aYld3NGVRdjIrbWtjYkpTdzZYcGc9PQ
You can also download the query result in the CSV
format by using the service Generate Download URL:
/api/analytics-query/generate-download-url?id=1052
The response body contains an Amazon S3 URL similar to the following:
https://...s3.eu-west-1.amazonaws.com/39779692-28b7-43fb-848c-21f8e748be81.csv?X-Amz-Security-Token=[...]&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210614T083206Z&X-Amz-SignedHeaders=host&X-Amz-Expires=599&X-Amz-Credential=[...]&X-Amz-Signature=[...]
A query must always be executed in the context of an authenticated account. The results of the query will be limited to the spaces which are associated with that account. A query may be limited to one space or multiple spaces but all spaces must belong to the executing account. Queries in spaces which do not belong to the executing account will fail with a permission error.
When submitting a query to Analytics you must specify either an External ID or a Maximal Cache Age to make the submission idempotent and prevent multiple submissions if multiple requests for the same query are received by the server. If both properties are specified in a request then the External ID will be preferred and the Maximal Cache Age ignored. If none is specified the request will fail with an error.
The value of the externalId
property is assigned by the client to a specific query to uniquely identify that query per account. If the server receives a query submission request for an account with an external ID which has already been used before for that account no new query will be submitted. Instead the query execution result of the original request with that ID will be returned. Note that the actual query in the request will be ignored in that case.
It is up to the client to generate the external IDs. Typical strategies are:
Generate a random UUID for each submission request (or some custom random ID with sufficient entropy to prevent collisions).
Keep a persistent counter which is atomically increased every time a query is submitted.
Store your queries in a database table with unique primary keys generated by the database and use these database keys for the external ID.
externalId
cannot be used to later identify a previously submitted query!.So you cannot use the external ID to request information about the status or fetch the results of a previously submitted query. Instead you will receive a query execution object as the response of a successfull submission (or resubmission) of a query. The ID of this execution object is used to identify the submitted query later on when requesting status information or fetching the result of the query.
As an alternative to an External ID you can specify a maximal cache age (in minutes) using the maxCacheAge
property (with a value of 1 or greater). When a query with a maximal cache age is submitted and an equivalent query (same query string, same executing account, same spaces) has been submitted not more than maxCacheAge
minutes before, the query will not be resubmitted but instead the query execution result of the previously submitted query will be returned.
Note that the maximal cache age will be ignored if an external ID is specified in a request!