Practical example: Big Data Transfer
Today we would like to present an interesting solution where the initial idea was conceived in one day, but it took a couple of months to implement. Sometimes such ideas arise out of necessity, and that was the case here. It was about the second requirement for a solution to integrate a critical system in government, from which surrounding systems needed to draw key data, including information for publishing as well as for international information exchange.
This requirement would not be extraordinary if these systems were not several, if these objects were not dozens, and if these objects did not contain millions of records. Moreover, these objects needed to be frequently replenished with entities that are created on a regular annual basis. Another specialty was that even when some systems needed to draw data from a common object, their data content requirements were not the same. The list of data to be transferred had to be audited to prevent unwanted leakage of information. Add to this the requirements for security, for monitoring and control of the transmission, for unambiguity and for completeness … the subject has become interesting to say the least.
Since the customer used the integration platform as a standard for all integrations, the core technology was fortunately clearly identified. Our first thought was to go the classic route. We set up as many scenarios in the integration platform as there are interfaces. We’ll take the data for them directly from the source system, but how else, right ? However, we had about 250 initial transfer requests in the list and the list was going to grow every year.
When we calculated how complex it would be for a single implementation and how complex the maintenance and further expansion would be, what the performance requirements would be in allowing multiple systems to read data directly from the critical system, we went back to the very beginning.
We asked ourselves a few key questions: is it really necessary to provide data that is completely up-to-date every minute, or can it be data that is up-to-date from the previous day? Is it possible to create a unified mechanism over this data that allows the required data to be retrieved under defined conditions? Can this mechanism be supplemented with a central setup for splitting the data into multiple batches, a checking mechanism, changing formats and compression? Will creating such a single dedicated solution be easier than setting up dozens of separate integration scenarios?
Since the answer to all the answers was YES, they devised and set up a unique efficient mechanism at the customer’s site that allows the data to be provided quickly, efficiently and, most importantly, transparently via the integration platform to request data.
In the end, we decided not to provide the data itself directly from the source critical system, but from the Business Warehouse system, where the requested entities are replicated on a regular basis every night. In this way, we also solved the performance overhead of the core system, including security. In addition, by changing the source system for data provisioning to the Business Warehouse, we have provided some unification of the objects that are stored in this system in a uniform manner for different business objects.
Today, gigabytes of data and millions of records regularly flow through this mechanism on a daily basis, with our solution being able to operate reliably for years. We can make minor changes or extensions in a matter of days. Today, monitoring and operation is provided by the customer at their own expense.
Of course, technology has advanced over the last period and today one could consider other technical alternatives and improvements. Nevertheless, I think it will not be easy to overcome this powerful and reliable unique … 😊, but we can certainly replace it in the future with some of the new and modern cloud technologies. It will certainly be a big challenge and we will definitely try it at the first opportunity …