An overview of Cloud Agent Scalability
APPSeCONNECT Cloud Agent is the main component that processes data coming from different data sources. The processes defined within the application is executed in the Agent infrastructure maintained by APPSeCONNECT cloud. The agent is designed to support parallelism of process execution. The data processing over APPSeCONNECT agent can be horizontally scaled as well as vertically scaled. The cloud infrastructure handles thousands of requests coming from multiple organizations and can decide to scale up its infrastructure based on workloads.
The orchestration of data transfer through APPSeCONNECT Cloud Engine is :
- Handled by Implementer.
- Handled by the Infrastructure.
The workflow is an execution engine that loads data from Source application, does all kind of processing defined within a workflow and then push the transformed data to the target application. During this process of execution, an implementation can be defined which splits the data into batches such that the data could be parallelly processed within a negligible timeframe. This process requires advanced understanding on how the processing of data could be made quicker by using workflow nodes. The Self looping syntax could be used which could drastically improve the performance of data flow.
Data transfer could also be performed by the infrastructure. The inbuilt scheduler present in the application can trigger data transfer at a particular interval and the data could be processed parallelly in multiple machines or containers. We encourage to create workflows in such a way, so that independent data could be processed in different intervals such that users can improve performance of data sync. The data triggers present in the platform takes care of each execution and spawns different container to process a data transfer.
APPSeCONNECT provides two types of integration.
- Real-time Trigger
- Scheduled Trigger
In case of Real – time triggers, the Host application uses a Webhook feature to configure a particular URL, such that whenever data is posted to the host application, the data is automatically pushed to APPSeCONNECT. We use Enterprise Service Bus to ensure no data is lost during the transit. The data posted to the service bus present in APPSeCONNECT will ensure FIFO ( First In First Out) mechanism, and hence transactional sequence is maintained.
In case of scheduled triggers, a Scheduler service takes care of all the triggers present in an organization. The scheduler invokes the triggers automatically to start execution of a process. When APPSeCONNECT Cloud receives a request, the workflow is executed to process data from one application to another.
A client is a user or a computer which triggers individual processes. For instance, in case of Scheduled Triggers of individual processes, a client is just a request coming from a job trigger. On the other hand, if a request is coming from a Webhook, a client is an application posting data directly from enterprise service bus. A client can also be a manual trigger by a Real user from APPSeCONNECT portal.
An API Gateway is the entry point of any execution on cloud infrastructure. The request coming from client will be transferred to the load balancer to process the request. Know more about API Management!
APPSeCONNECT maintains trigger repository inside a scheduler which invokes a request to API Gateway whenever trigger is executed.
A load balancer checks workloads on different VM scale-sets and spawn containers accordingly.
For every container, a replica of the same is maintained by APPSeCONNECT such that whenever a VM scale-set is compromised, the replica is loaded automatically.
Containers are independent execution unit that executes a workflow from end to end. The containers are micro-service at its core and orchestrate and divides individual workloads into multiple processes.
A VM scale-set is a group of physical VMs which takes care of execution of containers.
A Message broker is a standard message queue pattern that stores and pushes messages to the consumers.
A micro-service is an execution engine which takes care of one individual processing of a task request and provides a response.
When a client calls a cloud agent, the request first hits the API gateway. This routine loads the organization details to authenticate the request and generates a token for processing. The Load balancer identifies the processing unit and spawns a container for request execution. The logical containers are hosted inside the VM Scale sets (or a collection of VM’s), and these services will load data from applications, does all the transformations, defined in the workflow and pushes it in the target application. Each container is an independent, processing module which is mapped with a workflow or other independent modules of AEC. The VM Scale sets are physical virtual machines present inside the cluster where these logical containers are stored. The logical container will always have a replica in another VM Scale-set idle mode, such that if the main VM scale-set is somehow crashed, the processing unit will transfer automatically to the idle scale-set.
While processing a workflow, the data which are required to be stored on physical storages are pushed through a Service Bus where we put messages for processing. There are number of Message Consumers which processes individual messages to put it in different data buckets such as file, log, audit etc.
The above image shows the representation of a logical container which caters and end to end data processing from one application to another. The container mainly executes the engine with some metadata passed to it from API layer. The container takes care of individual routines like Adapters, Scripts, Blobs, Cache, database etc. to fetch and provide data to it during execution. The processing unit takes care of individually execute tasks defined inside a flow to process data from application.
To achieve scalability of data processing, we provide two options:
- Horizontal Scaling
- Vertical Scaling
When we consider horizontal scaling, we divide each data blocks into a batch of data (say a batch of 100s) and then process each batch parallelly. This involves parallel execution of data in a loop such that the data is processed quickly. We ensure the data is always backed by a storage unit to ensure data is not missed during transit. There is a specific self-loop feature that executes an individual get / processing / push routine to be executed in parallel.
Check this out: How API Lifecycle Management can Help to Accelerate Growth
We can also consider vertical scaling, where the data is distributed over multiple independent workflows, each of which can be processed parallelly.
The data transfer in APPSeCONNECT platform always depend on the API response time and hence the processing time will largely depend on the API Response time combined.