Examining the Interplay Between Big Data and Microservices – A Bibliometric Review

. Due to the ever increasing amount of data that is produced and captured in today’s world, the concept of big data has risen to prominence. However, implementing the respective applications is still a challenging task. This holds especially true, since a high degree of flexibility is desirable. One potential approach is the utilization of novel decentralized technologies, as in the case of microservices to construct such big data analytics solutions. To obtain an overview of the current situation regarding the corresponding research, using the scientific database Scopus and its provided tools for search and analytics, this bibliometric review provides an analysis of the literature and subsequently discusses avenues for future research.


Introduction
The world of today is influenced by an ongoing increase in the amount of data [1], [2] that are created, captured, stored and analyzed [3]. Furthermore, also the demand for the corresponding processing speed increases [4]. As a consequence of those developments, the conventional technologies and methods of data handling are more and more often no longer sufficient, resulting in the necessity for a development of new techniques and the establishment of modern data analysis paradigms, as they are constituted by the terms big data and big data analytics (BDA) [5], [6]. Organizations who build the respective capabilities to utilize this new source of insights can hereby enhance their performance [7] through, inter alia, a more accurate or faster decision making, cost reductions, an optimization of their offered portfolio of services or an improvement of customer acquisition and retention [8]- [10].
However, implementing the respective applications is still a challenging task [11], [12]. This is even exacerbated by the circumstance that the business reality, including, for instance, the accessible sources, available algorithms and technologies as well as the type of questions whose answers promise benefit to the organization, might evolve over time, which also requires the BDA to adapt accordingly. As a result, a modular design, in contrast to a monolithic architecture, could be desirable [13]. One potential approach to achieve said modularity is the utilization of microservices as building blocks of the BDA solution as a whole [14]- [16].
Yet, while there are publications that deal with the interplay of big data and microservices, to our knowledge, there is no comprehensive overview of the development of this research as a whole, even though such studies can provide valuable insights by identifying trends as well as potential future research avenues and demands. This review aims to bridge this gap by answering the following research questions: RQ1: What is the current situation in the research combining big data and microservices? RQ2: Which are potential research areas or directions to facilitate the interconnected use of microservices and big data?
To address the research questions, the literature on big data and microservices is analyzed through conducting a bibliometric review on these topics using the search engine of Scopus 1 .
The remainder of this article is structured as follows. In the background (Section 2), the topics "big data" and "microservices" are briefly outlined. Afterward, the article's methodology is explained (Section 3), followed by the presentation and discussion of the search's findings (Section 4). Finally, in Section 5, a conclusion is given, also highlighting potentially beneficial directions for future research.

Background
In this section, the concepts of big data and microservices are briefly outlined, providing a general understanding before the actual review is conducted.

Big Data
For more than a decade, the term "big data" underwent a remarkable evolution. While initially being referred to as a synonym for large amounts of data that cannot be easily handled by relational databases and technologies of that time, today it covers a variety of advanced data characteristics, technologies, paradigms and methods [17]. During this time, the concept has undergone significant changes that dramatically moved the term from a hype topic [18] to the foundation of most of the data-driven and data-intensive projects known today [12]. Hence, it is not surprising that a multitude of researchers and practitioners are harnessing big data in all its facets within their endeavors, as in prominent application areas, such as healthcare [19], [20], transportation [21] or tourism [22].
Despite that long-lasting maturation and a highly active research community [23], no distinct and universally applied definition was found that precisely describes the nature and elements of that term [17]. Notwithstanding that, according to one of the most widely used definitions, big data "consists of extensive datasetsprimarily in the characteristics of volume, variety, velocity, and/or variabilitythat require a scalable architecture for efficient storage, manipulation, and analysis" [24]. Similar to the pure definition itself, many differences about the description of the data exist. While some of the data characteristics are observed as core characteristics, namely, volume, variety, and velocity, others are treated unequally [17], [25]. While volume refers to the size and the number of elements to be processed, the variety focuses on the structure of data, which can be either unstructured, semi-structured or structured. Furthermore, the velocity describes the speed at which the data is coming in and is being processed [6]. Lastly, the variability that was addressed in the aforementioned definition refers to changes which may occur in the dataset, regarding the other characteristics [24].
Although eventually this may give rise to the feeling that the engineering, testing, and application of related systems became easier in recent years, the opposite is often the case. As stated before, various challenges are still existing that hamper the related implementation and deployment activities. Apart from the pure lack of experts and qualified staff [26], the comprehensive planning, engineering, and integration of architectures represent a cumbersome task [11]. Many practitioners and researchers noted this problematic situation and attempted to reduce the prevailing complexity through the design and development of promising solutions, such as reference architectures [27], decision support systems [28], automation approaches [29] or the application of new technologies [15]. Especially in times in which highly decentralized or loosely coupled environments are sought after more than ever, as in the case of very large business application scenarios, the use of big data in combination with such environments remains desirable. Microservice technologies, in this direction, constitute a promising approach.

Microservices
Even though there seems to be no widely recognized definition of the term "microservice", it can be described as an architectural approach to building software applications and a relatively new implementation of service-oriented architecture [30]. In essence, it entails the development of a single application by decomposing it into a suit of small services [31, p.6] where each of them runs in their own process and communicates with other applications only via lightweight mechanisms. Furthermore, these services, as for their independent nature, usually only need a rather limited amount of central management [32] and can be written in differing programming languages and are based on different technology stacks [15], which can, for instance, be specialized on analysis, management tasks or data storage functionalities.
This setup allows them to be efficiently deployed independently of each other with continuous deployment tools and pipelines, and it is a common practice that they are based on business functions, so a service can be specialized to solve one related task. This structure has diverse practical and organizational implications. The microservice approach tends to align teams around business capabilities instead of traditionally building teams based on the technology layer. Consequently, the teams are more cross-functional and encompass the full range of skills required for development, thus, preventing a plethora of siloed architectures that each contain their own logic [33]. The approach comes with the constraint that the team implementing this concept is usually not based around strict hierarchical communication [34].
In general, componentization is considered to be a good practice in software engineering, but achieving a high degree of modularity is often seen as a difficult task [35]. As systems are broken down into services that are independently deployable, with the microservice architecture, componentization is achieved by design. One of the major advantages is the reduced effort for maintenance and modifications. For small internal changes, often only the affected service has to be redeployed. This also facilitates an evolutionary design, where the services' decomposition is used as a driving force to enable frequent and controlled changes in the system [36]. As microservices are specialized around the business logic, minor changes or feature requests can lead to implementing completely new services or variants of existing ones, which, in both cases, can be easily integrated with the existing application. Additionally it allows to adopt to new technologies on a smaller scale first. In theory, when completely following the promoted idea, every microservice has its own storage and its own technology to manage its stored data. However, since this approach would often be highly unpractical, in many cases central databases are used, provided the services are still not sharing data and each only accesses its respective section [31, pp. 81-84].

Methodology
This study is heavily inspired by and somewhat follows the approach used in [37], which, in turn, refers to [38] and [39]. However, the incorporation of additional steps for filtering as well as the use of inclusion and exclusion criteria, as propagated in [40], amends the approach of [37] and helps in increasing the overall quality. To assure even more rigor and despite the study not being a structured literature review, but a bibliometric review, the steps proposed by Kitchenham et al. [41] for conducting a structured literature review have been mostly adapted but slightly modified when needed, as also has been practiced, for instance, in [42]. At first, the research questions that are being investigated are formulated. Afterwards, the search string is defined and used to obtain the initial set of literature. By using certain inclusion and exclusion criteria, it is reduced to only those publications that are actually suitable for the study's scope. In a fourth step, the relevant data from the remaining papers are collected. Those are subsequently analyzed and presented. Finally, the study's results are being interpreted to generate additional insights that exceed pure numerical information, identify trends, shortcomings and avenues for future research (based on the most influential publications), and help to advance the domain as a whole. Those six steps conducted in this review are outlined in Table 1. However, while the first four steps are executed one by one, the presentation of the results in the fifth step is already complemented with the corresponding interpretations to keep the context. Since the research questions have already been defined in the first section, subsequently, in the following, the actual search process takes place.
As it is common for papers that are focused on analyzing the existing literature [43], [44], the most important step is the initial search for relevant literature by defining a suitable search string that includes the relevant papers, without being too broad, and thereby diluting the relevance of the findings.
To find the relevant literature, at first, a search was conducted in Scopus, using the following search string: TITLE-ABS-KEY ( "big data" ) AND TITLE-ABS-KEY ( microservice OR microservices OR "micro service" OR "micro services" OR "micro-service" OR "micro-services" ) This means that it was searched for all those papers whose title, abstract or keywords contain the terms big data and at least one of microservice, microservices, micro service, micro services, micro-service or micro-services. The latter is necessary to avoid missing relevant publications due to possible variants in the spelling. Furthermore, with Scopus being considered the most extensive abstract and citation database for scientific literature like conference proceedings, scientific books and journals and a variety of tools to facilitate sophisticated searches [37], [45], its results promise a comprehensive overview of the domain of interest.
The search was conducted on 27.06.2021. In doing so, 243 entries, matching the search term, where found. Those are distributed between five types of publications, as shown in Table 2. Searching for non-unique items (i.e., identical title) revealed three occurrences, resulting in their removal. While two of those three are not complete duplicates but updates to older papers [46], [47], the deprecated versions were still excluded, leaving only the newer ones. Therefore, the set of publications considered for the actual analysis comprises 240 entries.
After getting an overview of those papers, it became apparent, that the entries in the category "Conference Review" were of no benefit, since they have no scientific content of their own but are actually just outlining conferences, respectively their proceedings. As a result, items of this type have been excluded for the remainder of this study. Furthermore, only contributions for which the publication process has been finalized were being considered. Subsequently, two additional papers have been excluded. Finally, also papers which were not written in English have been excluded, eliminating two more contributions. Applying all of those criteria brings the final number of considered publications down to 205. To increase transparency and allow other researchers to understand and replicate the conducted steps, as, for instance, heavily advocated for in [48], the entire filter process is broken down in Figure 1. The study's final criteria for inclusion and exclusion are outlined in Table 3. Table 3. Inclusion and exclusion criteria

Inclusion criteria Exclusion criteria Contribution is listed in Scopus
Contribution is a duplicate Contribution is connected to big data An updated version of the contribution has been found Contribution is connected to microservices Contribution is a "Conference Review" Contribution stage is not "final" Contribution is not written in English

Results and Discussion
In the following, the posed research questions will be answered, providing an overview of the current situation regarding research that combines the domains of big data and microservices as well as outlining avenues for future research.

The Current Situation
In this section, the first research question "What is the current situation in the research combining big data and microservices?" is being answered. For this purpose, the factors denominated in Table 4 are considered by using the analytical capabilities provided by Scopus. When amalgamated, those factors give a comprehensive picture of the current situation. What are the publications that focus on the topic? Factor 2 What is their temporal distribution? Factor 3 Which subject areas are they originating from? Factor 4 Where are the found papers published? Factor 5 The researchers of which countries are the main contributors? Factor 6 How are the publications distributed amongst authors? Factor 7 Which are the most cited publications?
At first, constituting the foundation for the following considerations, giving an overview of the available research and, therefore, taking Factor 1 into account, the results of the conducted search [14]- [16], [36], [49]- [245] are presented. Furthermore, the findings are discussed with the purpose of extracting valuable information and providing context where it is deemed necessary. As shown in Table 5, which corresponds to Factor 2, the first papers fitting the search have not appeared before 2015 [36], [49]- [53]. However, when dropping the search term big data and just looking for the different variants of the term microservice, the earliest papers that are found were published in 2003 [246], [247]. The term big data can be traced back even further [5]. This shows that it has taken more than a decade until the combination of both domains has been scientifically explored. In conjunction with the oldest publication being less than six years old, this indicates that the corresponding research is still rather in its infancy.
While in the beginning there has been a constant increase in published papers, this does not hold true for the years 2020 and 2021. For the latter this can be easily explained with the search date being in June 2021 and several contributions probably not yet being registered. The former could be caused by the Covid-19 pandemic, which might have thwarted the corresponding research endeavors. And, while there is no explicit evidence for or against this hypothesis and Scopus overall lists more papers published in 2020 than in 2019, this is reversed when excluding the medical domain, which experienced a surge in publications, probably also caused by the pandemic. However, the number of papers found in the search for the year 2020 was still about 34.1 percent higher than its equivalent for the years 2015, 2016 and 2017, as the first three years with findings, combined.
The deviation between the sum of the single values and the displayed total is caused by the limitation to two digits for this presentation, while the actual calculations use a higher precision. This phenomenon also applies to some of the following tables, but will not be explicitly mentioned any further.  Table 6 shows the distribution of publications by subject area, which corresponds to Factor 3. This table is heavily dominated by "Computer Science" papers with 179 entries (87.75%), followed by a group of contributions from "Engineering" with 62 papers (30.39%), "Decision Sciences" with 51 papers (25.00%), and "Mathematics" with 45 papers (22.06%). While other domains also have a number of articles, their numbers are considerably lower. However, it is possible for one paper to belong to more than one category at the same time. As a result, in total, there are 427 attributions of subject areas to the 204 papers, implying that on average each paper is associated with slightly over two subject areas, which is reflected in the displayed percentage. While there are numerous available subject areas recorded in Scopus [248] and comparing the ones listed in Table 6 with the entire collection to highlight those that have not been associated with big data and microservices is not a promising approach, a more focused examination can potentially point out promising directions for future endeavors. Therefore, the occurrence of subject areas when searching for "big data", respectively the different spellings of "microservices" (as already outlined in the explanation of the search string) without any additional constraints is depicted in Figure 2. The subjects that are inside the dark gray circle in the center are taken from Table 6. They each have at least one associated publication that is concerned with both, big data and microservices. In the light gray sections, on the left (big data) and right (microservices) sides of the figure, the areas that only occurred when searching for the respective term but not the other one are assigned to it. However, those subject areas that appeared in both separate searches but not in this paper's primary search for papers that combine both topics are in the middle gray part in between the previously described sections of the depiction. Specifically these are dentistry, immunology and microbiology, pharmacology, toxicology and pharmaceutics, economics, econometrics and finance and arts and humanities. With them being compatible to both concepts but not being dealt with accordingly by now, they could be potentially auspicious fields for researchers to conduct pioneer work while still providing a strong certainty regarding the feasibility. Nevertheless, the same also applies to many of the subject areas that are mentioned in Table 6, since they are only present because of few papers or even just a single contribution, leaving a lot of room for novel studies and approaches.  Table 7, the distribution of publications by their type after the filter process was applied is still heavily dominated by "Conference Paper" with 169 findings (82.84%), followed by "Article" with 30 papers (14.71%), 4 entries for "Book Chapter" (1.96%) and 1 "Review" (0.49%). The comparatively low number of journal articles, not even having one fifth of the volume of conference papers, could be considered another indicator of the corresponding research stream still being in a relatively early stage of its development. Overall, the relevant papers originate from 142 sources, with each source containing an average of around 1.44 publications. While the "top lists" were generally intended to show the top 10, in case of additional entries with the same number of publications, those were also included. As a consequence, Table 8   When looking at the distribution of publications by country in Table 9, which provides the information related to Factor 5, it shows that a large proportion of papers comes from a small number of countries. However, for 1 of the 204 publications (0.40%), a journal article, the country was stated as unknown. Just regarding the other entries, in total, there were 248 mentions of countries for the remaining 203 publications. The percentages for all the actual countries are also based on this number. It is noteworthy that the combined percentages add up to a value of above 100%. This means that for all the contributions with known origins, on average, researchers of around 1.22 countries have collaborated.
In total, 756 authors were associated with the relevant publications. The number already incorporates the fact that Clemens Düpmeier was present with three separate entries, due to spelling (Clemens Düpmeier, Clemens Dupmeier and Clemens Duepmeier). Therefore, on average, each contribution has around 4.18 authors and each author has 1.13 papers. Of the 756 authors, 689 (91.02%) had only a single article. The complete distribution of how many authors have which number of publications is shown in Table 10. It again points to the infancy of the research direction, since there is only a low number of heavily invested researchers, while most of the authors only contributed to a maximum of two publications.  The multiple occurrences of Clemens Düpmeier were also considered and corrected for when compiling the list of the most published authors depicted in Table 11. As for the list of the sources with the most publications, the initially intended top 10 was extended due to a parity in numbers and therefore comprises 16 authors. In case of parity on contributions, they are ordered alphabetically.
The list is led by Clemens Düpmeier and Veit Hagenmeyer having, respectively, 7 and 6 publications, with those 6 ones all being authored together. They are followed by Kurt Sandkuhl, Rainer Schmidt and Alfred Zimmermann, with 5 papers that were also each authored in cooperation of all three. Next in the list is Hamzeh Kazaei with 4 publications. The final cluster in this list, with 3 papers each, is formed by Hatem Khalloof and Shadi Shahoud whose  In Table 12, the letters in the row and column headers are the shortest unambiguous abbreviations of the surnames from the authors in Table 11.  In the relevant selection, on average, every paper has been cited 5.03 times. However, when looking at the distribution of citations in Table 14, it becomes apparent, that only a small number of publications is responsible for the majority of the citations. While 83 of the 204 contributions (40.69%) have not been cited at all, the top 9 papers in conjunction already generate 529 of the 1026 citations (51.56%). When ignoring all uncited items, for the remaining publications, the number of average citations increases to 8.48 per paper, which is still heavily influenced by a comparatively small number of outliers. With regard to the papers that were discarded due to being deprecated, their citations have been added to those of the newer versions to avoid omitting them. Furthermore, regarding Table 10, Table 11, Table 12 and Table 13 in conjunction, Factor 6 is covered.
To take Factor 7 into account, the 10 most cited publications are depicted in Table 15. While the number of citations does not necessarily always translate to the significance of a publication, it usually at least somewhat correlates. Furthermore, this is an objective measurement, whereas other approaches, as, for instance, an evaluation based on the perception of this article's authors would always bear the risk of subjectivity and bias.
When combining the presented information concerning the seven factors, a comprehensive overview of the current situation is achieved, which in turn constitutes the answer to RQ1. However, even though understanding the past and present of a research domain is important, it is probably even more essential to provide successive researchers with guidance and, therefore, facilitate the development of the research stream as a whole. While some inferences can already be drawn from the previously presented information, the next section is explicitly dedicated to the investigation of potential avenues for future research and thus the answer to RQ2.

Avenues for Future Research
When attempting to answer the second research question "Which are potential research areas or directions to facilitate the use of microservices in the big data domain?", it is at first necessary to understand the essence of the existing research that sparked the interest of the scientific community. Subsequently, potential avenues for future research can be identified. Therefore, to get an understanding of the most popular research directions inside the domain, in the following, each of the top 10 most cited publications, shown in Table 15, will be briefly presented.
"Fog Computing Survey of Trends, Architectures, Requirements, and Research Directions" [114] focusses on how to connect big data producers like IoT systems with cloud computing. Especially when the data has to be processed in real-time, cloud computing can be too slow for the desired use case. This latency-aware computation is proposed to be solved by calculating on devices near the user. This "Fog computing paradigm" uses devices like smartphones and switches in close proximity, to process the data needed by the user. These devices are often just using their full processing power on peak hours and are idle otherwise. This decentralised approach comes with the drawback of limited resources available on each one of these devices, and potential failure of some. Microservices can be used to mitigate those problems by providing lightweight software and immediate deployment. Fog computing can be used in a number of fields using real-time processing, like virtual reality, healthcare, and smart homes. However, further research is still needed.
"Designing a Smart City Internet of Things Platform with Microservice Architecture" [36] presents the early stages of the "Dimmer" Smart City Project. It consists of a platform and applications to involve stakeholders in increasing the energy efficiency of a city. A microservice architecture is used to independently develop and run sensor technologies, models, services, and applications. An "IoT Data Gateway" allows convenient data requests, while the data itself is stored decentralized. The decentralized approach allows the use of different protocols for different use cases and decreases the coordination required to manage a large interdisciplinary team. This also allows easier implementation of applications for web, desktop, and mobile. The benefits of a microservice architecture are already visible to them in these early stages of development, but will come at the cost of an increased complexity of distributed systems.
"Hosting Virtual IoT Resources on Edge-Hosts with Blockchain" [73] examines methods for shifting the computational load of IoT systems to the cloud. In the suggested approach, physical devices communicate with virtual resources, which are RESTful microservices. The virtual resource manages the IoT device and does the computation. An advantage of this approach is the ability to host multiple virtual resources on the same host. Additionally, it is shown that the management of the data and microservices can be done via blockchains. This creates large amounts of data and dependencies which can be tackled by big data technologies.
"An Open IoT Framework Based on Microservices Architecture" [72] proposes a decomposition of an IoT system into microservices. Subsequently, these microservices are connected to form a platform which can be extended and integrated into other applications. The frameworks consists of a core service that coordinates the system and eight microservices. Additionally, their system uses plugins to support heterogeneous devices. A series of microservices is used for hierarchical pre-processing of sensor data. It is shown that the proposed framework improves scalability and maintainability of IoT systems.
"Seer Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices" [156] presents Seer, a diagnosis tool for detecting Quality of Service (QoS) violations in microservice applications. It focuses on detecting QoS violations proactively, thus reducing recovery times or avoiding violations completely. This is achieved using deep learning techniques to learn which patterns in tracing data indicate future QoS violations. Subsequently, Seer uses hardware monitoring to identify the cause of violations and suggests actions necessary for avoiding them. In a controlled environment, Seers has been shown to detect QoS violations accurately and improve performance predictability.
"A Microservice-based Middleware for the Digital Factory" [69] presents the implementation of a distributed middleware developed within the European MAYA project and tailored to enable scalable interoperability between enterprise applications and cyber-physical systems, with a particular focus on simulation tools. Overall, this middleware is intended to enable a digital factory. This is the first solution to rely on both Microservices and Big Data, realizing true digital synchronization while ensuring the security and confidentiality of sensitive factory data.
"Using Blockchain to Push Software-Defined IoT Components onto Edge Hosts" [64] presents the idea and evaluation of using virtual resources in combination with a permissionbased blockchain for provisioning IoT services on edge hosts. A big data architecture could be a solution for the blockchain technology.
"Introducing the New Paradigm of Social Dispersed Computing Applications, Technologies and Challenges" [118] gives a broad overview of social computing methods. Social computing applications are software where the output of a user's query is influenced by other users. For instance, in route planning. These types of applications are often implemented using microservices and involve Big Data methods to evaluate, analyze, and manage the data.
"Delivering Elastic Containerized Cloud Applications to Enable DevOps" [82] presents a method to enable an autonomous management system for multi-tier, data-intensive containerized applications based on a performance model of such systems. This makes it possible to develop microservice-based software and to test and monitor it efficiently and accurately. To analyze this large amount of data, big data analytics are applied.
"Design and Evaluation of a Scalable Smart City Software Platform with Large-scale Simulations" [155] shows the realization of a smart city platform with microservices. The opensource platform "InterSCity" provides web-based services to manage IoT services, as well as store and process data. A microservice architecture is used to work with the heterogeneous IoT services, and facilitate the future development and expansion of the platform. A focus of the project was set on scalability, as an important aspect to create a platform that can handle big data from IoT devices and thousands of user requests. A large-scale emulation is used to simulate the behavior of a city, with big data technologies managing the incurring amount of data. This emulation showed that the platform is highly scalable while still maintaining low response times.
While there are numerous additional contributions that are relevant to the domain, those ten presented papers are the most cited ones, which can, therefore, also be considered as particularly important when it comes to determining the current focus of the research stream. This especially applies, since the first results found in the search conducted in this study are from 2015, which entails that this analysis is not distorted by any extremely old and outdated publications.
To answer the question, which promising research topics could be of concern in combining the two domains (RQ2), the aforementioned relevant papers were thoroughly read and analyzed. Subsequently, it was concluded that two overarching directions were predominant and therefore mainly taken into consideration. On the one hand, the technical intersections between the papers were explored, and, on the other hand, the direction in which future research seems to move was contextually inferred.
Hereby, it stands out that none of the highlighted papers concentrate on the direct interplay between big data and microservices. Instead, they are focused on use cases where big data and microservices play an important but more technical role. Therefore, it is not examined, how the application of one exactly influences the application of the other with regards to the specific use cases. However, this could be an area of research to focus on in the future, as it seems that not much work has been done here so far.
In general, the focus of the current relevant research is rather on specific use cases. Although, it is also explored how big data can be used to facilitate the development of microservices; yet, the papers presented do not get specific about how they use the technology. Subsequently, future research could investigate, which aspects and technologies of big data make sense in this context, where the limits are, and what considerations have to be made. When looking at the inverse perspective, it is noticeable that, despite some exceptions [15], [249], there seems to be a gap regarding research on how microservices can be used for the development of big data applications.
Referring to the focus on specific use cases, it is apparent that the most cited papers of the domain have a very strong connection to IoT [36], [64], [72], [73], [114], [118]. They describe, how microservices can be used to implement large systems such as smart cities or smart factories on a large scale. These examples each involve large amounts of data that can be evaluated and used with the help of big data technologies. Yet, in most cases, there is no concrete description of how exactly those could be of use. Furthermore, as IoT is a vast domain, most examined topics have very few contentual overlaps, which makes the derivation of a coherent direction for future research difficult.
However, one topic in the IoT domain, which is standing out, is the exploration of the boundaries of fog computing [64], [73], [114] with regards to the application of different technologies. In fog computing, edge devices are used locally to carry out computations and store data in a distributed system. This applies very well to the IoT topic, where masses of raw data are collected via sensor inputoften in distributed locations. The question of where and how to store and process this huge amount of data so that efficiency for the specific application is maximized, is crucial in this context and applies to smart factories and smart cities as well [36], [69].
An additional research domain, where microservices and big data can intersect, are social computing applications [118]. In this context, efficient frameworks need to be developed to deal with highly distributed systems and great amounts of data. Usually these systems are highly complex in their nature and expensive to compute, as input data influences the existing data.
Another interesting approach is detailed in [156], where deep learning, as a way of dealing with big data, is used to detect QoS violations in microservice based architectures. This could, for instance, be extended to using big data approaches for performance and security monitoring of microservice based architectures.
Furthermore, researchers could focus on an even more application-bound investigation of microservices and big data by analyzing domains, where microservices and big data are heavily used. However, to strengthen the research stream, it might be also beneficial if more researchers focused on creating reviews of this research area and its subareas, allowing others faster access relevant information and providing a general structure instead of a plethora of independent standalone projects.

Conclusion and Directions for Future Research
Since the use of microservices in combination with big data is rather new, this research direction is still in its infancy. Nevertheless, since 2015, a considerable amount of publications dealing with the topic have been created. However, to our knowledge, there has been no overview of those contributions. This article aims to make a contribution in this direction by pointing to those works that can make a valuable contribution to a research stream by constituting the groundwork for future research endeavors, which can build upon them [48]. For this purpose, a bibliometric study was conducted, comprising those papers that have a relation to big data as well as microservices. During the process it was identified, which publications exist (Factor 1) and how they are distributed across years (Factor 2), subject areas (Factor 3) and publication outlets (Factor 4). Furthermore, the authors of those contributions were regarded (Factor 5 and Factor 6) and the most cited publications were presented (Factor 7). In conjunction, those factors also provide the answer to RQ1 by outlining the current situation of the research stream.
Based on the findings of those analyses, there are several insights regarding future works in the area that can be derived. This in turn corresponds to RQ2. Firstly, amongst the most active authors, there are several clusters that have no connection to each other. If those groups of authors also shared ideas and cooperated with each other, this potentially holds synergies which might propel the research domain as a whole. Furthermore, there are two important research aspects, on the one hand, the use of big data to enhance microservices and, on the other hand, the utilization of microservices to realize big data applications. While both have their merits, especially the latter one could gain higher importance, since its allows to fulfil the rising need for flexibility in analytics solutions [13]. However, in either case, the development of the research should slowly move to more mature endeavors, resulting in a shift from the momentarily extremely predominant conference publications to more journal contributions. This way, over time, the accompanying increase in quality and the appearance in more prestigious outlets might spark the interest of other researchers, which might enrich the community long-term. Besides those aspects it can be expected that the factor time will also play a huge role in the growth of the research stream, since there was a significant and steady increase in the yearly number of publications, which was (likely) only stopped by the Covid-19 pandemic.
Another finding with regards to the content and the development of future research directions is that the heterogeneity of the domain leads to a high number of standalone projects. And while those each deal with potentially highly relevant issues, there is no overall research agenda. However, such an agenda might be beneficial. Yet, the intricacy of the topics of big data and microservices, especially when combined, makes it at least very hard to develop one that does justice to the underlying complexity. In general, big data and microservices, in this context, are "only" tools to create and improve products of different domains. Therefore, for future research, mainly focusing on application domains might be a promising approach. Yet, also future review papers that examine the relation of big data and microservices will be needed to provide prospective researchers with an easily accessible overview of the domain, lowering the barrier to entry and allowing to build upon existing knowledge [48].
Even though the present study is slightly limited by the fact that only a single database was used, since it is the presumably largest and most comprehensive one, the general findings can still be assumed to be valid. In the future, to further extend the insights in the domain, it could be beneficial to go beyond the scope of this bibliometric study and to conduct a comprehensive structured literature review [43], [44], [48] that sheds light on the state of the art and goes more in-depth regarding the respective contents of the relevant publications. While an overview of the most cited papers has already been given in the course of this work, performing such an analysis in a comprehensive manner is not the main focus of a bibliometric study, nevertheless, conducting such a review can be seen as a next step for facilitating the research concerning the interplay between big data and microservices. As aforementioned, this might help in gaining and forming new insights, best practices, guidelines, patterns and antipatterns, but also in highlighting especially relevant research gaps, which might attract further contributors that will help to advance not only the research, but also the practical application and combination of those two highly relevant and innovative tools.