For some presentations I also created an example of how the requests are being routed to proper partition. The example shows processing of two requests to Partition 1 (one lucky one unlucky)
For the core itself I created new diagram where I tried to structure the components and its functions logically. For each of he components I see quite allot of points for discussion but I wanted to share this soon. I will try to add my points for this later on.
Thank you for consolidating your toughts in diagrams.
I have a few questions to them:
I tried to look on it from clients perspective. If our goal is to provide the complete BPM solution, we also should provide tools for process initiation, monitoring, executing user tasks and so on, we should have internal web services to complete those tasks.
I believe that authorisations service should be outside BPM core. In most cases RBAC is good enought for BPM, in a very rare cases client might demand for ABAC. In this case we could use some plugin to solve this kind of access control.
To achieve maximum speed, script engines should be embedded into the BPM Engine, since those scripts (condition expression, transformation expressions on data flows, …) should have direct access to the process instance context. If the script engine will be outside the BPM Engine, we should also solve concurency and access control questions.
About the blue-ish diagram, I struggle reading it. I assume, you mean source packages and their relations, yes? If so, I would recommend some adjustments, to ensure alignment with well known implementation patterns (e.g. interfaces are separate from implementations).
Also, may I propose a legend, what the symbols mean?
Thoughts:
as a first approach, I would only implement auth at the APIs
Authentication is a must
Authorisation could be simple: just separating internal and external client calls
clients should not be able to call internal APIs
only other nodes can call internal APIs
I assume, further roles for clients are not required, no?
when APIs are implemented, we need to talk about concurrency == means how to handle that in the core engine … and likely need to adjust code, since prior lib-bpmn-engine was single threaded only. This is worth a separate diagram
Questions:
when implementing RAFT directly (as I read from the cluster diagram), why we need ‘rqlite’ ? I understand, once we implemented the RAFT protocol, each instance need to persist on its own.
@Adam I’m reading through the implementation with rqlite … what is the thinking behind creating 3 partitions in a single process instance?
Context: I found in cmd/main.go there are 3 partitions for rqlite configured, which then are connected to 3 bpmn-engines.
My naive thinking was: that in simplest case there is just one bpmn-engine per k8s container (single process), which connects via rqlite-client-sdk to a rqlited.
That said, for convinience the rqlited could be “somehow” embedded.
What was your thinking, creating three partitions?
Maybe I am thinking about it on a bit higher level of abstraction right now.
I see 3 different layers that it might be handy to have.
BPMN Engine (which would be the evolution of lib bpmn enhanced with the persistence interface)
The Complete BPM Engine (including BPMN, DMN and Script engine)
The layer that binds it in the cluster
Right now I am struggling to change the diagram so that it contains all of these thoughts. I think it might be beneficial to discuss this on the architectural workshop/meeting.
I agree that I placed the persistence and exporter interfaces in a bad place I am fixing it for now.
As of your question about partitioning. Right now each partition is independent rqlite cluster. Though in my example as I have 3 partitions i need to start the engine for each partition and let each of the engines join the right partitions cluster.
As we discussed the clusters structure a bit with Richard I thing that we will extend rqlite so that we do not need to create separated cluster for each partition (it will be one physical cluster but many “number of partitions” logical clusters) . But it still makes sense to spin up the partitions instance of the engine as it is handling its process and does not care much about the others.
It still does not rule out the possibility to run it with single node and single partition usecases where it could work with both rqlite and any other persistence provider implementing the “Persistence interface” (like normal relational databases). But this usecese hinders the horizontal scalability. It is similar to what Camunda 7 used.
I also apologize for my delayed participation in this important discussion. I’ve given some thought to your ideas about rqlite and partitioning.
I think we might encounter challenges when it comes to properly searching data across partitions. For example, if I want to see all active process instances, I would need to query each partition leader separately to get their active instances.
Additionally, if I need to perform filtering operations on this data (e.g., searching for all instances that started at a specific time), I would have to handle these operations entirely in memory without the benefits of SQL or database functionality.
That said, I might have misunderstood your ideas. If so, I apologize for the confusion.
If I am understanding this correctly, I’m wondering whether it’s truly necessary to have partitions divided across separate clusters.
I believe it definitely makes sense to use partitions or some mechanism to distribute the load across cluster members, but I’m uncertain about implementing it using separate rqlite clusters for the reasons mentioned above.
I have been thinking a bit about this problem since our workshop.
I think we need to support proper partitioning for being able to support horizontal scaling. But we can have multiple mechanisms which could help with the quering usecases (paging, sorting, filtering) I can see following strategies which could even be used in parallel and together cover all the requirements.
When a query request comes it is distributed to all the nodes an joined to a final response. For this to be reasonably efficient we could employ limits, global indexes, cursor based paging etc.
We could have something like global read only mirror.
There are allways exporters so we can use those as in Zeebe.
When it comes to the rqlite cluster. We would probably extend the rqlite so that it enables us to have only one physical cluster which we can logically divide by the number of partions, as this would save us complexity when it comes to configuration and operation and save resources as well.
About the proposed partitioning in the above diagram, I have concerns, that this will help to solve horizontal scaling == meaning that we want to provide a solution to distribute load among different nodes.
@Adam please comment, if this drawing rather matches the current implementation. And also, does this represent the foreseen goal?
To me, this architecture looks simple enough, to start/continue the MVP and I have concerns in further developing towards multiple partitions per node.
Before we develop towards custom and improved load distribution, as commonly described in database sharding, I do propose to align on some measurable goals. For that, I have started a small benchmark suite, which could give us some performance base line, what this PoC solution is capable of.