Schemas and contracts

All events sent to Stream Machine adhere to the following:

Serialization Schema

This is the blueprint of the data that is sent, hence, this is about the shape of the data.

Event Contract

This defines the content that is sent, and is composed of the verifications that should be done for the received content.

Serialization Schemas

In order to guarantee the integrity of the data that is sent to Stream Machine, all events must conform to a serialization schema. These schemas are easy to add and register with Stream Machine, hence they are simple to adapt to your use case.

The serialization schema defines how an event is turned into bytes and vice versa.

Currently, Stream Machine supports Apache Avro and Json Schema, however other serialization formats may be added in the future.

Each serialization schema must include a section with Stream Machine meta information. See the strmMeta documentation for details.

Event Contracts

In order to guarantee that data that is sent to Stream Machine adheres to the rules defined by your organization, events must conform to an event contract. Contracts determine the behavior of validations, which fields are encrypted, and how events are tied together (hence, they get the same encryption key).

An example event contract version is listed below.

A single version of a contract is linked to one, and only one serialization schema. The inverse is not necessary, one schema can be referred to by 0 or more event contracts.
streammachine/clickstream/1.0.0
{
  "ref": {  (1)
    "handle": "streammachine", "name": "clickstream", "version": "1.0.0"
  },
  "schemaRef": { (2)
    "handle": "streammachine", "name": "clickstream", "version": "1.0.0"
  },
  "isPublic": true, (3)
  "keyField": "producerSessionId", (4)
  "piiFields": { (5)
    "customer/id": 0,
    "producerSessionId": 1
  },
  "validations": [ (6)
    {
      "field": "customer/id",
      "type": "regex",
      "value": "^.+$"
    },
    {
      "field": "url",
      "type": "regex",
      "value": "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]"
    }
  ]
}
1 the reference to the event contract. In this case equal to the schema, but this is not necessary.
2 the serialization schema reference, which includes the organization name, schema name and schema version.
3 a schema or event contract can be public, in which case all Stream Machine customers can use it, or it can belong to a certain organization, and require organization credentials to use it.
4 the name of the field in the serialization schema that is used to "tie" events together. Typically, this is what determines an end user (i.e. your users) session.
5 the fields whose content in an event should be considered sensitive (i.e. personally identifiable information), and should be encrypted by Stream Machine.
6 the validations that should be performed on the content of specific fields in an event.
When a field is part of an object inside a collection, it cannot yet be part of the piiFields. We intend to add this in a future version.

These contracts are very versatile to use, and a use case that Stream Machine foresees, is that a single serialization schema could potentially have many contracts (i.e. same shape of the data, but different rules apply to it).

The validations that are performed on the data that is received by Stream Machine currently only support Regular Expressions. We aim to extend the validation mechanisms based on customer use cases.

Next, an example for the validations follows:

Assume an attribute of your event, say 'user/customer_id' in your organization
has to consist of 9 digits not starting with a zero, you could easily have
this as a validation rule in the event contract, implemented with a regex.

This is the mechanism that Stream Machine provides to increase the quality of your event data: validate before acceptance, and let the data processing teams define the rules instead of the data generating teams. An example of a validation can be seen (and tried) in Sending and receiving an event by hand

It is important to note the difference between keyField and keyLink, as they are related to each other, but are fundamentally different:

  1. keyField is part of the event contract and keyLink is part of the strmMeta section of the serialization schema

  2. keyField determines which field in the serialization schema is used for considering whether events belong to the same sequence (for example a session)

  3. keyLink links a single event to an encryption key

  4. The value for keyField is determined by you

  5. The value for keyLink is determined by Stream Machine

  6. The value of keyField is used when creating a keyLink

As you can see, the two have a strong relationship, but they are different.