Event logs for fractional source-of-truth systems (part 3 - log compaction)

One downside of the approach we’ve taken in this series is the potential for our log to grow to the point where it becomes unwieldy to store and computationally expensive to apply. One way we can cut down on that risk (though at the potential cost of fidelity loss) is through log compaction. Given we’ve already started emitting noop events in the conflict resolution portion, the simplest compaction step is simply:

export const compactLog = (events: event[]): event[] => {
  return events.reduce((accum, event) => {
    if event[EventIndex.EventCommand] == "noop" {
      return events;
    }
    return Arrays.push(events, event);
  });
};

For further compaction, we can start to figure out what operations are mutually exclusive. For example, the following sequential log events…

["setGroup", 1, [1, 2]],
["setGroup", 1, [1]]

…can be compacted down into…

["setGroup", 1, [1]]

…as they are mutually exclusive operations. We have more complexity to content with in cases where the operations are not sequential, as removing an earlier event could invalidate events that exist between a mutually exclusive pair. There are several approaches you can take to determine compactable events, but the downside is that analyzing your log to that degree can lead to an explosion in complexity, with unknown benefits.

An easier path is to view our mutable structures as a dependency tree, and only compact leaf nodes. For example, in this application we’d have the following structure:

Group
|
|--- Group Limit
|--- Vehicle Group Assignment
|--- Exclusions

Which means we’d be able to compact limit, vehicle assignment, and exclusion events. If this proves to still generate unwieldy logs, you might need to opt into the additional complexity. you could:

compact non-leaf nodes, apply the proposed compacted logs, and ensure no additional warnings are generated (leveraging the conflict resolution logic we’ve already built)
take an initial pass through to wipe out the leaf nodes then ensure there were no relevant orphaned events on group compaction
build an explicit dependency graph from the log and using that to drive compaction

Given the extra complexity of these approaches, it might be best to let instrumentation tell you if you should implement one of them. If these approaches prove unnecessary and we decide to only compact leaf nodes, there is a fairly straightforward path for compaction: last event wins. The approach I took in the source iterates through the events backwards, generates a unique key for each event, then stores the key in the set if it’s not present, and skips the event if it is.