16 November 2018

Microsoft, Uber, Airbnb and Netflix: Apache Flink has convinced them all.

‘Microsoft‘ uses it as well as ‘Uber‘; ‘Airbnb‘ uses it to analyse the performance of well over 40 million results per minute and ‘Netflix‘, too, trusts it to process its three trillion daily events, as well as 12 petabytes of data. We are talking about the Real-Time-Data-Stream Processing Engine ‘Apache Flink’, developed in Berlin.

Starting with detecting anomalies in Cloud activities right up to ‘intelligent‘ city traffic monitoring, from evaluating risks in trade right up to a performance- and error analysis for service-oriented architectures with distributed tracing-technologies - as diverse as the applications of Apache Flink may be - they all have one thing in common: Real-Time-Data-Streaming. “Classic data bases are based on the fact that data files must be saved to start with, before companies can gain access retrospectively via business analyses and are then able to take appropriate measures,” explains Fabian Hueske, co-founder of the start-up ‘data Artisans’, responsible for further developing Apache Flink. “Stream processing is a new paradigm in data processing, where events such as financial transactions, website behaviour, data from IoT-sensors are continuously processed with very little delay. Modern companies can benefit because it enables them to react to events immediately as soon as these are generated, meaning, when they are most vulnerable.“

Successful Stopgaps

Due to ever increasing volumes of data, increasingly more complex processes and the transition from a product-focussed business model to a customer- and service-oriented approach where it is important to act in real time, a Real-time-Data-Streaming has become inevitable. By 2025 the global market for streaming analytics is said to achieve a volume of nearly 48 billion Dollars, according to the ‘Global Streaming Analytics Market Report 2018’. This is equivalent to a continuous annual growing rate of 35 per cent between 2017 and 2025. Systems filling the gaps between classic data base systems and Big Data analysis frameworks are in high demand. However, Apache Flink, having emerged as an offshoot of the TU Berlin research project ‚Stratosphere‘ in 2009, is not the only ‘stopgap‘. “However, Apache Flink shows an acceptance that is the fastest growing among other open-source projects“, says Fabian Hueske joyfully and too right, as his technology prevails increasingly in a growing data-stream environment with competitors such as ‘Apache Kafka’ and ‘Apache Spark’. The reason is quite obvious for the software engineer: “Compared to Apache Kafka and Apache Spark, Apache Flink was designed and developed as a ‘Stream-Processing-Framework‘ with a high throughput, a low latency and exact semantics‘, he explains, “while other frameworks have their origin in batch-processing (Spark) or saving and distributing news (Kafka) and were only later on supplemented with the ability to process data streams.” The open source stream processor is characterised - among others - by the fact that it can save streaming data to large computer pools, fault-tolerant scalable, and that it is utmost reliable. In this way user behaviour or financial transactions can be processed immediately, reliably and with a very short delay. Various interfaces also allow data analysing processes to be implemented for the most varied applications.

Learn & Network

The diversity of applications became quite apparent beginning of September, within the scope of the ’Flink Forward‘ conference, having taken place in Berlin since 2015. For two days experts from Microsoft, Uber, ING or Lyft as well as other users from all areas met to present and discuss their experiences with Apache Flink and its application possibilities. About 350 developers, DevOps-engineers, system-and data architects, scientists from 28 nations came together for about 50 sessions on technology- and application examples. In addition, during the annual conference that was held for the fourth time this year, innovations such as ‘data Artisans Streaming Ledger’ were presented that additionally increase the spectrum of applications with Apache Flink. “With the introduction of Streaming Ledger as part of the data-Artisans platform, stream--processing applications can be set up that can read and update several differently saved data entries with ACID-guarantees“, according to Fabian Hueske, who also acts as organiser of Flink Forward, “with this, for the first time, the strongest consistency guarantees have been made available for stream-processing applications that are also offered by most (but not all) relational data bases.” Apart from these sessions on technology- and application examples during this two-day conference, that in the meantime also has subsidiaries in Beijing and San Francisco, practical training on the application of solutions for stream processing as well as data analysis in real time were on offer, too.

It goes without saying that networking took an important place, too. The heterogeneous international community is an essential success factor of the Open-Source-Platform that received the Datanami Editors’ Choice Award for Top 5 Open Source Projects to Watch“ for the second time running. By continuously further developing, the thousands of active community members are responsible for the fact that the success story of Apache Flink has not come to an end by far. There is still much to be done: “As presented at the Flink Forward Berlin, the Apache Flink Community will further develop the framework in respect of interoperability, supporting SQL on data streams, scaling and robustness“, says Fabian Hueske. Beyond that support for SQL will be further developed in order to enlarge the user group of Apache Flink by attracting users with SQL knowledge and to reduce the effort of implementation for many frequent applications, he says with a wink. Simplifying the handling of different data sources - whether in real time or historic data - will be approached by the community. In doing this, processing different data sources and - formats should be made possible from the same system, while at the same time existing guaranties and semantics will be maintained.

One thing is for sure: The landscape of data-stream process is not cast in stone for the near future, it continuously develops further … flink forward [fast forward].