Architecture

This section shows the high level architecture of Stream Machine. It shows how data is processed, before it is exposed to an application consuming the data.

Overview

When sending data or receiving data to/from Stream Machine, your application must identify itself using client credentials. These credentials are provided through the self-service portal or the strm command line interface [1], upon the creation of a stream.

All events that is being sent to Stream Machine has a schema reference (in a http header) that defines the structure and contents of the event. The Stream Machine Gateway will verify that the data:

  1. …​is of the correct serialization format, as specified in schema reference

  2. …​complies to the pre-defined validation rules, as specified in the schema

  3. …​a valid serialized message

High Level Architecture
Figure 1. High Level Architecture

Upon accepting an event, the Gateway will use the Encrypter to encrypt all defined Personally Identifiable Information (PII) attributes of the event. Which fields are PII is defined in the schema definition, for example nps_unified_v1. The schema can have multiple levels of PII attributes but all of them are encrypted with the same encryption key that is linked to the session. This key is rotated every 24 hours.

You can [2] create decrypted streams (called output streams) that contain a subset of the encrypted stream, with only those PII fields decrypted that you have requested.

The algorithm is as follows:

  • we define an output stream linked to a certain encrypted stream where for example we request to receive events that provide consent for levels 1, 3 and 8. The levels are basically unlimited [3]

  • the decrypter will inspect every event, and will ignore all events that don’t have at least these 3 levels set in their mandatory consentLevels meta-data.

  • the events that are accepted by the decrypter will then be partially decrypted; only the attributes with PII levels 1, 3 and 8 will be decrypted. This means that assuming a customer has given consent for a couple more PII levels, fields with those PII levels will not be decrypted. This is the mechanism that Stream Machine uses to adhere to article 6 of the GDPR. A data processing unit in a company receives only those events that the customer has consented to, with only those PII data that it needs.

Outputs

Stream Machine has multiple output possibilities:

The Batch Exporters

currently Stream Machine has two batch exporter outputs, for AWS S3, and Google Cloud buckets. These can be configured to receive batches of json formatted events of encrypted streams or output streams.

The Websocket Egress endpoint

This endpoint is only for developer purposes It allows developers working with Stream Machine to easily receive any of the streams they have access to. Note that there are no guarantees that you’ll receive all your events, and there’s no mechanism to indicate that you want to receive events from a certain point in time.

Kafka Consumer

Stream Machine is built on Apache Kafka, which provides it with very good scalability, fault-tolerance and decoupling of event producers and consumers.

Allowing Kafka Consumers to directly access the Stream Machine streams is on the short term roadmap but not yet available!

Keys

It is possible to move all the decryption of events to the customers own environment. This mechanism hasn’t been defined/designed yet. One could envision both a batch and a streaming mechanism.


1. in alpha, not yet downloadable
2. but don’t have to
3. you could abuse them for A/B testing for instance