Architecture

This section shows the high level architecture of Stream Machine. It shows how data is processed, before it is exposed to an application consuming the data.

Overview

When sending data or receiving data to/from Stream Machine, your application must identify itself using client credentials. These credentials are provided through the self-service portal or via the strm command line interface.

All events that are being sent to Stream Machine have schema reference (in a http header) that defines the structure and contents of the event.

The Stream Machine Gateway will verify that the data:

  1. …​is of the correct serialization format, as specified in schema reference

  2. …​is a valid serialized message

  3. …​complies to the pre-defined validation rules, as specified in the event requirement specification

High Level Architecture
Figure 1. High Level Architecture

Upon accepting an event, the Gateway will use the Encrypter to encrypt all defined Personally Identifiable Information (PII) attributes of the event. Which fields are PII is defined in the schema definition, for example nps_unified. The schema can have multiple levels of PII attributes but all of them are encrypted with the same encryption key that is linked to the session. This key is rotated every 24 hours.

You can [1] create decrypted streams that contain a subset of the encrypted stream, with only those PII fields decrypted that you have requested.

The algorithm is as follows:

  • we define an output stream linked to a certain encrypted stream where for example we request to receive events that provide consent for levels 1, 3 and 8. The levels are basically unlimited [2]

  • the decrypter will inspect every event, and will ignore all events that don’t have at least these 3 levels set in their mandatory consentLevels meta-data.

  • the events that are accepted by the decrypter will then be partially decrypted; only the attributes with PII levels 1, 3 and 8 will be decrypted. This means that assuming a customer has given consent for a couple more PII levels, fields with those consent levels will not be decrypted. This is the mechanism that Stream Machine uses to adhere to article 6 of the GDPR. A data processing unit in a company receives only those events that the end-user has consented to, with only those PII data that it needs.

Outputs

Stream Machine has multiple output possibilities.

The Batch Exporters

Currently Stream Machine has two batch exporter outputs, for AWS S3, and Google Cloud Storage. These can be configured to receive batches of JSON formatted events of encrypted streams or output streams.

The Websocket Egress endpoint

This endpoint is only for developer purposes

It allows developers working with Stream Machine to easily receive any of the streams they have access to. Note that there are no guarantees that you’ll receive all your events, and there’s no mechanism to indicate that you want to receive events from a certain point in time.

Kafka Consumer

Stream Machine is built on Apache Kafka, which provides it with very good scalability, fault-tolerance and decoupling of event producers and consumers.

Keys

It is possible to move the decryption of events to the customers own environment. This mechanism hasn’t been defined / designed yet. One could envision both a batch and a streaming mechanism.


1. but don’t have to
2. you could abuse them for A/B testing for instance