Empowering Digital Healthcare Transformation with ZIFTM

The Modern-Day Healthcare

The healthcare industry is one of the biggest revenue generation sectors for the economy. In 2020, the healthcare industry generated close to $2.5 trillion dollars in the US. This has been made possible due to multiple revenue generation streams that encompass the development and commercialization of products and services that aid in maintaining and restoring health.

The modern healthcare industry has three essential sectors – services, products, and finance, which in turn can be further branched to various interdisciplinary groups of professionals that meet the health needs of their respective customers.

For any industry to scale and cover more customers, being digital is the best solution. Stepping into the digital space brings various tools and functionalities that can improve the effectivity and efficiency of the products and services offered in the Healthcare Industry.

The key component of any Digital Healthcare Transformation is it’s Patient-Focused Healthcare Approach. The transformation must aid healthcare providers in better streamlining the operations, understanding what the patients need and in turn build loyalty, trust and a stellar user experience.

Healthcare Transformation Trends

Innovation is the foundation for all Transformation initiatives. The vision of rationalizing work, optimizing systems, improving delivery results, eliminating human error, reducing costs, and improving the overall customer experiences are the levers that churn the wheel. With the advent of VR, wearable medical devices, telemedicine, and 5G using AI-enabled systems have significantly changed the traditional way that consumers use healthcare products and services.

ai automated root cause analysis solution

The industry has shifted its focus in making intelligent and scalable systems that can process complex functionalities as well as deliver customer experience at its finest.  With the integration of AI and omnichannel platforms, organizations can better understand their customers, address service and product gaps to better capitalize the market to achieve higher growth. Hence, transformation is the key to pushing forward in unprecedented and unpredictable times in order to achieve organizational vision and goals.

Sacrosanct System Reliability

The healthcare industry is a very sensitive sector that requires careful attention to its customers. A mishap in the service can result in a life-and-death situation. Healthcare organizations aim to learn lessons from failures and incidents to make sure that they never happen again.

Maintaining and ensuring safe, efficient, and effective systems is the foundation for creating reliable systems in the Healthcare industry. Hence, innovation and transformation disrupt the existing process ecosystems and evolve them to bring in more value.

The challenge that organizations face is in their implementation and value realization with respect to cost savings, productivity enhancements, and overall revenue. The prime aspect of system reliability signifies the level of performance over time. When we consider healthcare, looking at defects alone does not differentiate reliability from the functional quality of the system. Reliability should be measured to its failure-free operation over time. Systems should be designed and implemented to focus on failure-free operation.

Measuring system operations over time can be depicted as a bathtub curve. While measuring performance, initial failure tends to arise from defects and situational factors. Eventually, the efficiency improves, and the curve flattens out to depict useful life until the wear-out phase starts from design and other situational factors.

ai data analytics monitoring tools

While understanding the bathtub curve of system operations over time, we can infer that system design majorly contributes to the initial defects and system longevity. Hence, organizations must strive to build systems that can last a tenure from which the invested capital can be gained back, and the additional returns can be used for its future modernization goals.

Towards the end of the day, system reliability revolves around the following factors:

  1. Process failure prevention
  2. Identification and Mitigation of failure
  3. Process redesign for critical failure avoidance

Reliability and Stability should seriously be considered whenever healthcare systems are being implemented. This is because the industry is facing quality-related challenges. Healthcare organizations are not delivering safe, reliable, and proof-based care. Thus, it is important for professionals to be empowered with tools and modern-day functionalities that would reduce the error and risk involved in their service delivery. These modern-day tools’ reliability must be sacrosanct to ensure that stellar customer experience and patient care are given.

Organizations purely focused on cost savings as a standalone goal can lead to unpredictable outcomes. It is imperative that an organization realize robust and reliability-centered processes that define clears roles and accountability to its employees, in order to have a sustainable form of operation.

When all these factors come together, the value realizations for the organization as well as its customer base are immense. These systems can contribute towards better ROI, improved profitability, enhanced competitive advantage, and an evolved customer brand perception.

These enhanced systems improve the customer loyalty and the overall brand value.

ai devops platform management services

Device Monitoring with ZIFTM

Ever since the pandemic hit, healthcare organizations have concentrated towards remote patient health monitoring, telemedicine, and operations to expedite vaccine deliveries. These healthcare organizations have invested heavily in systems that connect all the data required for day-to-day operations into one place for consolidated analysis and decision making.

For the effective functioning of these consolidated systems, each of the devices that are connected to the main system needs to be functioning to its optimal capacity. If there is a deviation in device performance and the root cause is not identified promptly, this can have adverse effects on the service delivery as well as the patient’s health.

These incidents can be addressed with ZIFTM’s OEM Device monitoring capabilities. ZIFTM can be used to provide a visual dashboard of all operational devices and monitor their health to set thresholds for maintenance, incident detection, and problem resolutions. The integration can also create a consolidated view for all logs and vital data that can be later used for processing to give predictive information for actionable insights. The end goal that ZIFTM aims to achieve here is to pivot organizations towards a proactive approach to servicing and support for the devices that are operational. This connectivity and monitoring of devices across their portfolio can substantially bring in measurable changes in its cost savings, service efficiency, and effectivity.

Prediction & Reliability Enhancement

With healthcare systems and digital services expanding across different organizations, predicting their reliability, efficiency and effectivity are important. When we look at reliability prediction, the core function is to evaluate systems and predict or estimate their failure rate.

In the current scenario, organizations are performing reliability and prediction analysis manually. Each of the resources analyzes the system to its component level and monitors its performance. This process has a high susceptibility to manual errors and data discrepancies. With ZIFTM, the integrated systems can be analyzed and modeled based on various characteristics that contribute to its systemic operation and failure. ZIFTM analyzes the system down to its component level to model and estimates each of its parameters that contribute to the system’s reliability.

The ZIFTM Empowerment

Players in the Healthcare Industry must understand that Digital Transformation is the way forward to keep up with the emerging trends and tend to its growing customer needs. The challenge comes in selecting the right technology that is worth investing and reaping its benefits within the expected time period.

As healthcare service empowerment leaders in the industry, GAVS is committed to align with our healthcare customers’ goals and bring in customized solutions that help them attain their vision. When it comes to supporting reliable systems and making them self-resilient, the Zero Incident FrameworkTM can bring in measurable value realizations upon its implementation.

ZIFTM is an AIOps platform that is crafted for predictive and autonomous IT operations that support day-to-day business processes. Our flagship AIOps platform empowers businesses to Discover, Monitor, Analyze, Predict and Remediate threats and incidents faced during operations. ZIFTM is one unified platform that can transform IT operations that ensure service assurance and reliability.   

ZIFTM transforms how organizations view and handle incidents. IT war rooms become more proactive when it comes to fire fighting. Upon implementation, customers can get end-to-end visibility of enterprise applications and infrastructure dependencies to better understand areas needing optimization and monitoring.   The Low code/No code implementation with various avenues for integration, provides our customers a unified and real-time view of on-premise and cloud layers of their application systems. This enables and empowers them to track performance, reduce incidents and improve the overall MTTR for service request and application incidents.

Zero is Truly, The New Normal.

ai for application monitoring

Experience and Explore the power of AI led Automation that can empower and ensure System Reliability and Resilience.

Schedule a Demo today and let us show you how ZIFTM can transform your business ecosystem.

www.zif.ai

About the Author –

Ashish Joseph

Ashish Joseph is a Lead Consultant at GAVS working for a healthcare client in the Product Management space. His areas of expertise lie in branding and outbound product management.

He runs two independent series called BizPective & The Inside World, focusing on breaking down contemporary business trends and Growth strategies for independent artists on his website www.ashishjoseph.biz

Outside work, he is very passionate about basketball, music, and food.

Kappa (κ) Architecture – Streaming at Scale

We are in the era of Stream processing-as-a-service and for any data-driven organization, Stream-based computing has becoming the norm. In the last three parts https://bit.ly/2WgnILP, https://bit.ly/3a6ij2k,  https://bit.ly/3gICm88, I had explored Lambda Architecture and its variants. In this article let’s discover Streaming in the big data. ‘Real-time analytics’, ‘Real-time data’ and ‘Streaming data’ has become mandatory in any big data platform. The aspiration to extend data analysis (predictive, descriptive, or otherwise) to streaming event data has been common across every enterprise and there is a growing interest to find real-time big data architectures. Kappa (K) Architecture is one that deals with streaming. Let’s see why Real-Time Analytics matter more than ever and mandates data streaming and how streaming architecture like Kappa works. Is Kappa an alternative to lambda?

“You and I are streaming data engines.” – Jeff Hawkins

workflow automation software architecture

Questioning Lambda

Lambda architecture fits very well in many real-time use cases, mainly in re-computing algorithms. At the same time, Lambda Architecture has the inherent development and operational complexities like all the algorithms must be implemented twice, once in the cold path, the batch layer, and another execution in the hot path or the real-time layer. Apart from this dual execution path, the Lambda Architecture has the inevitable issue of debugging. Because operating two distributed multi-node services is more complex than operating one.

Given the obvious discrepancies of Lambda Architecture, Jay Kreps, CEO of Confluent, co-creator of Apache Kafka started the discussion on the need for new architecture paradigm which uses less code resource and could perform well in certain enterprise scenarios. This gave rise to Kappa (K) Architecture. The real need Kappa Architecture isn’t about efficiency at all, but rather about allowing people to develop, test, debug, and operate their systems on top of a single processing framework. In fact, Kappa is not taken as competitor to LA on the contrary it is seen as an alternative.

cognitive process automation tools for business

What is Streaming & Streaming Architecture?

Modern business requirements necessitate a paradigm shift from traditional approach of batch processing to real-time data streams. Data-centric organizations mandate the Stream first approach. Real-time data streaming or Stream first approach means at the very moment. So real-time analytics, either On-demand real-time analytics or Continuous real-time analytics, is the capability to process data right at the moment it arrives in the system. There is no possibility of batch processing of data. Not to mention, it enhances the ability to make better decision making and performing meaningful action on a timely basis. At the right place and at the right time, real-time analytics combines and analyzes data. Thus, it generates value from disparate data.

Typically, most of the streaming architectures will have the following 3 components:

  • an aggregator that gathers event streams and batch files from a variety of data sources,
  • a broker that makes data available for consumption,
  • an analytics engine that analyzes the data, correlates values and blends streams together.

Kappa (K) Architecture for Big Data era

Kappa (K) Architecture is one of the new software architecture patterns for the new Data era. It’s mainly used for processing streaming data. Kappa architecture gets the name Kappa from the Greek letter (K) and is attributed to Jay Kreps for introducing this architecture.

The main idea behind the Kappa Architecture is that both the real-time and batch processing can be carried out, especially for analytics, with a single technology stack. The data from IoT, streaming, and static/batch sources or near real-time sources like change data capture is ingested into messaging/ pub-sub platforms like Apache Kafka.

An append-only immutable log store is used in the Kappa Architecture as the canonical store. Following are the pub/sub or message buses or log databases that can be used for ingestion:

  • Amazon Quantum Ledger Database (QLDB)
  • Apache Kafka
  • Apache Pulsar
  • Amazon Kinesis
  • Amazon DynamoDB Streams
  • Azure Cosmos DB Change Feed
  • Azure EventHub
  • DistributedLog
  • EventStore
  • Chronicle Queue
  • Pravega

Distributed Stream processing engines like Apache Spark, Apache Flink, etc. will read the data from the streaming platform and transform it into an analyzable format, and then store it into an analytics database in the serving layer. Following are some of the distributed streaming computation systems

  • Amazon Kinesis
  • Apache Flink
  • Apache Samza
  • Apache Spark
  • Apache Storm
  • Apache Beam
  • Azure Stream Analytics
  • Hazelcast Jet
  • Kafka Streams
  • Onyx
  • Siddhi

In short, any query in the Kappa Architecture is defined by the following functional equation.

Query = λ (Complete data) = λ (live streaming data) * λ (Stored data)

The equation means that all the queries can be catered by applying Kappa function to the live streams of data at the speed layer. It also signifies that the stream processing occurs on the speed layer in Kappa architecture.

Pros and Cons of Kappa architecture

Pros

  • Any architecture that is used to develop data systems that doesn’t need batch layer like online learning, real-time monitoring & alerting system, can use Kappa Architecture.
  • If computations and analysis done in the batch and streaming layer are identical, then using Kappa is likely the best solution.
  • Re-computations or re-iterations is required only when the code changes.
  • It can be deployed with fixed memory.
  • It can be used for horizontally scalable systems.
  • Fewer resources are required as the machine learning is being done on the real-time basis.

Cons

Absence of batch layer might result in errors during data processing or while updating the database that requires having an exception manager to reprocess the data or reconciliation.

On finding the right architecture for any data driven organizations, a lot of considerations were taken in. Like most successful analytics project, which involves streaming first approach, the key is to start small in scope with well-defined deliverables, then iterate. The reason for considering distributed systems architecture (Generic Lambda or unified Lambda or Kappa) is due to minimized time to value.

Sources

About the Author

Bargunan Somasundaram

Bargunan Somasundaram

Bargunan is a Big Data Engineer and a programming enthusiast. His passion is to share his knowledge by writing his experiences about them. He believes “Gaining knowledge is the first step to wisdom and sharing it is the first step to humanity.”

Customize Business Outcomes with ZIFTM

Zero Incident Framework™ (ZIF) is the only AIOps platform that is powered with true machine learning algorithms with the capability to self-learn and adapt to today’s modern IT infrastructure.

ZIF’s goal has always been to deliver the right business outcomes for the stakeholders. Return on investment can be measured based on the outcomes the platform has delivered. Users get to choose what business outcomes are expected from the platform and the respective features are deployed in the enterprise to deliver the chosen outcome.

Single Pane of Action – Unified View across IT Enterprise

The biggest challenge IT Operations teams have been trying to tackle over the years is to get a bird’s eye view on what is happening across their IT landscape. The more complex the enterprise becomes the harder it becomes for the IT Operations team to understand what is happening across their enterprise. ZIF solves this issue with ease.

digital transformation company in usa

The capability to ingest data from any source monitoring or ITSM tool has helped IT organizations to have a real-time view of what is happening across their landscape. Enormous time can be saved by the IT engineers with ZIF’s unified view, who would otherwise be traversing between multiple monitoring tools.

ZIF can integrate with 100+ tools to ingest (static/dynamic) data in real-time via ZIF Universal Connector. This is a low code component of ZIF and dataflows within the connector can also be templatized for reuse. 

AIOps based Analytics Platform

Intelligence – Reduction in MTTR – Correlation of Alerts/Events

Approximately 80% of the time is lost by IT engineers in identifying the problem statement for an incident. This has been costing billions of dollars for enterprises. ZIF, with the help of Artificial Intelligence, can reduce the mean time to identify the probable root cause of the incident within seconds. The high-performance correlation engine that runs under the hood of the platform process millions of patterns that the platform has learned from the historical data and correlates the sequences that are happening in real-time and creates cases. These cases are then assigned to IT engineers with the probable root cause for them to fix the issue. This increases the productivity of the IT engineers resulting in better revenue for organizations.

best aiops solutions in usa

best aiops products tools and products

Intelligence – Predictive Analytics

AIOps platforms are incomplete without the Predictive Analytics capability. ZIF has adopted unsupervised machine learning algorithms to perform predictive analytics on the utilization data that is ingested into the platform. These algorithms can learn trends and understand the symptoms of an incident by analyzing tons of data that the platform had consumed over a period. Based on the analysis, the platform generates opportunity cards that help IT engineers take proactive measures on the forecasted incident. These opportunity cards are generated a minimum of 60 minutes in advance which gives the engineers a lead time to fix an issue before it strikes the landscape.

Visibility – Auto-Discovery of IT Assets & Applications

ZIF agentless discovery is a seamless discovery component, that helps in identifying all the IP assets that are available in an enterprise. Just not discovering the assets, but the component also plots a physical topology & logical map for better consumption of the IT engineers. This gives a very detailed view of every asset in the IT landscape. The logical topology gives in-depth insights into the workload metrics that can be utilized for deep analytics.

predictive analytics using ai applications

ai data analytics monitoring tools

Visibility – Cloud Monitoring

ai devops platform management services

In today’s digital transformation journey, cloud is inevitable. To have a better control over the cloud orchestrated application, enterprises must depend on the monitoring tools provided by cloud providers. The lack of insights often leads to the unavailability of applications for end-users. More than monitoring, insights that help enterprises take better-informed decisions are the need of the hour.

ZIF’s cloud monitoring components can monitor any cloud instance. Data that are generated from the provider provided monitoring tools are ingested into ZIF to further analyze the data. ZIF can connect to Azure, AWS & Google Cloud to derive data-driven insights.

Optimization – Remediation – Autonomous IT Operations

ZIF does not stop by just providing insights. The platform deploys the right automation bot to remediate the incident.

ZIF has 250+ automation bots that can be deployed to fast-track the resolution process by a minimum of 90%. Faster resolutions result in increased uptime of applications and better revenue for the enterprise.

Sample ZIF bots:

  • Service Restart / VM Restart
  • Disk Space Clean-up
  • IIS Monitoring App Pool
  • Dynamic Resource Allocation
  • Process Monitoring & Remediation
  • DL & Security Group Management
  • Windows Event Log Monitoring
  • Automated phishing control based on threat score
  • Service request automation like password reset, DL mapping, etc.
best aiops solutions in usa

For more information on ZIF, please visit www.zif.ai

About the Author –

Anoop Aravindakshan

An evangelist of Zero Incident FrameworkTM, Anoop has been a part of the product engineering team for long and has recently forayed into product marketing. He has over 14 years of experience in Information Technology across various verticals, which include Banking, Healthcare, Aerospace, Manufacturing, CRM, Gaming, and Mobile.

Lambda (λ), Kappa (κ) and Zeta (ζ) – The Tale of 3 AIOps Musketeers (PART-3)

“Data that sit unused are no different from data that were never collected in the first place.” – Doug Fisher

In the part 1 (https://bit.ly/3hDChCH), we delved into Lambda Architecture and in part 2 (https://bit.ly/3hDCg1B) about Generic Lambda. Given the limitations of the Generic lambda architecture and its inherent complexity, the data is replicated in two layers and keeping them in-sync is quite challenging in an already complex distributed system.There is a growing interest to find the simpler alternative to the Generic Lambda, that would bring just about the same benefits and handle the full problem set. The solution is Unified Lambda (λ) Architecture.

Unified Lambda (λ) Architecture

The unified approach addresses the velocity and volume problems of Big Data as it uses a hybrid computation model. This model combines both batch data and instantaneous data transparently.

There are basically three approaches:

  1. Pure Streaming Framework
  2. Pure Batch Framework
  3. Lambdoop Framework

1. Pure streaming framework

In this approach, a pure streaming model is adopted and a flexible framework like Apache Samza can be employed to provide unified data processing model for both stream and batch processing using the same data flow structure.

Pure streaming framework

To avoid the large turn-around times involved in Hadoop’s batch processing, LinkedIn came up with a distributed stream processing framework Apache Samza. It is built on top of distributed messaging bus; Apache Kafka, so that it can be a lightweight framework for streaming platform. i.e. for continuous data processing. Samza has built-in integration with Apache Kafka, which is comparable to HDFS and MapReduce. In the Hadoop world, HDFS is the storage layer and MapReduce, the processing layer. In the similar way, Apache Kafka ingests and stores the data in topics, which is then streamed and processed by Samza. Samza normally computes results continuously as and when the data arrives, thus delivering sub-second response times.

Albeit it’s a distributed stream processing framework, its architecture is pluggable i.e. can be integrated with umpteen sources like HDFS, Azure EventHubs, Kinensis etc. apart from Kafka. It follows the principle of WRITE ONCE, RUN ANYWHERE; meaning, the same code can run in both stream and batch mode. Apache Samza’s streams are re-playable, ordered partitions.

Unified API for Batch & Streaming in pure Streaming

Apache Samza offers a unified data processing model for both real-time as well as batch processing.  Based on the input data size, bounded or unbounded the data processing model can be identified, whether batch or stream.Typically bounded (e.g. static files on HDFS) are Batch data sources and streams are unbounded (e.g. a topic in Kafka). Under the bonnet, Apache Samza’s stream-processing engine handles both types with high efficiency.

Unified API for Batch & Streaming in pure Streaming

Another advantage of this unified API for Batch and Streaming in Apache Samza, is that makes it convenient for the developers to focus on the processing logic, without treating bounded and unbounded sources differently. Samza differentiates the bounded and unboundeddata by a special token end-of-stream. Also, only config change is needed, and no code changes are required, in case of switching gears between batch and streaming, e.g. Kafka to HDFS.Let us take an example of Count PageViewEvent for each mobile Device OS in a 5-minute window and send the counts to PageViewEventPerDeviceOS

Pure Batch framework

This is the reverse approach of pure streaming where a flexible Batch framework is employed, which would offer both the batch processing and real-time data processing ability. The streaming is achieved by using mini batches which is small enough to be close to real-time, with Apache Spark/Spark Streaming or Storm’s Trident. Under the hood, Spark streaming is a sequence of micro-batch processes with the sub-second latency. Trident is a high-level abstraction for doing streaming computation on top of Storm. The core data model of Trident is the “Stream”, processed as a series of batches.

Apache Spark achieves the dual goal of Batch as well as real-time processing by the following modes.

  • Micro-batch processing model
  • Continuous Processing model

Micro-batch processing model

Micro-batch processing is analogous to the traditional batch processing in that data are usually processed as a group. The primary difference is that the batches are smaller and processed more often. In spark streaming, the micro-batches are created based on the time rather than on the accumulated data size. The smaller the time to trigger a micro-batch to process, lesser the latency.

Continuous Processing model

Apache Spark 2.3, introduced Low-latency Continuous Processing Mode in Structured Streaming whichenables low (~1 ms) end-to-end latency with at-least-once fault-tolerance guarantees. Comparing this with the default micro-batch processing engine which can achieve exactly-once guarantees but achieve latencies of ~100 ms at best. Without modifying the application logic i.e. DataFrame/Dataset operations mini-batching or continuous streaming can be chosen at runtime. Spark Streaming also has the abilityto work well with several data sources like HDFS, Flume or Kafka.

Example of Micro-batching and Continuous Batching

3. Lambdoop Approach

In many places, capability of both batch and real time processing is needed.It is cumbersome to develop a software architecture of such capabilities by tailoring suitable technologies, software layers, data sources, data storage solutions, smart algorithms and so on to achieve the good scalable solution. This is where the frameworks like Spring “XD”, Summingbird or Lambdoop comes in, since they already have a combined API for batch and real-time processing.

Lambdoop

Lambdoop is a software framework based on the Lambda architecture which provides an abstraction layer to the developers. This feature makes the developers life easy to develop any Big Data applications by combining real time and batch processing approaches. Developers don’t have to deal with different technologies, configurations, data formats etc. They can use the Lambdoop framework as the only needed API. Also, Lambdoop includes other interesting tools such as input/output drivers, visualization tools, cluster management tools and widely accepted AI algorithms.

The Speed layer in Lambdoop runs on Storm and the Batch layer on Hadoop, Lambdoop (Lambda-Hadoop, with HBase, Storm and Redis) also combines batch/real-time by offering a single API for both processing models.

Summingbird

Summingbird aka‘Streaming MapReduce’ is a hybrid computational system where both the batch/streaming computations can be run at the same time and the results can be merged automatically. In Summingbird, the developer can write the code/job logic once and change the backend as and when needed. Following are the modes in which Summingbird Job/code can be executed.

  • batch mode (using Scalding on Hadoop)
  • real-time mode (using Storm)
  • hybrid batch/real-time mode (offers attractive fault-tolerance properties)

If the model assumes streaming, one-at-a-time semantics, then the code can be run in real-time e.g. Strom or in offline/batch mode e.g. Hadoop, spark etc. It can operate in a hybrid processing mode, when there is a need to transparently integrate batch and online results to efficiently generate up-to-date views over long time spans.

Conclusion

The volume of any Big Data platform is handled by building a batch processing application which requires, MapReduce, spark development, Use of other Hadoop related tools like Sqoop, Zookeeper, HCatalog etc. and storage systems like HBase, MongoDB, HDFS, Cassandra. At the same time the velocity of any Big Data platform is handled by building a real-time streaming application which requires, stream computing development using Storm, Samza, Kafka-connect, Apache Flink andS4, and use of temporal datastores like in-memory data stores, Apache Kafka messaging system etc.

The Unified Lambda handles the both Volume and Velocity if any Big Data platform by the intermixed approach of featuring a hybrid computation model, where both batch and real-time data processing are combined transparently. Also, the limitations of Generic Lambda like Dual execution mode, Replicating and maintaining the data sync between different layers are avoided and in the Unified Lambda, there would be only one system to learn and maintain.

About the Author:

Bargunan Somasundaram

Bargunan Somasundaram

Bargunan is a Big Data Engineer and a programming enthusiast. His passion is to share his knowledge by writing his experiences about them. He believes “Gaining knowledge is the first step to wisdom and sharing it is the first step to humanity.”

Modern IT Infrastructure

Infrastructure today has grown beyond the physical confines of the traditional data center, has spread its wings to the cloud, and is increasingly distributed, virtual, and abstract. With the cloud gaining wide acceptance, most enterprises have their workloads spread across data centers, colocations, multi-cloud, and edge locations. On-premise infrastructure is also being replaced by Hyperconverged Infrastructure (HCI) where software-defined, virtualized compute, storage, and network are in one single system, greatly simplifying IT operations. Infrastructure is also becoming increasingly elastic, scales & shrinks on demand and doesn’t have to be provisioned upfront.

Let’s look at a few interesting technologies that are steering the modern IT landscape.

Containers and Serverless

Traditional application deployment on physical servers comes with the overhead of managing the infrastructure, middleware, development tools, and everything in between. Application developers would rather have this grunt work be handled by someone else, so they could focus on just their applications. This is where containers and serverless technologies come into picture. Both are cloud-based offerings and provide different levels of abstraction, in a way that hides layers beyond the front end, from the developer. They typically deploy smaller components of monolithic applications, microservices, and functions.

A Container is like an all-in-one-box, containing the app, and all its dependencies like libraries, executables & config files. The containerized application is highly portable, will run anywhere the container runtime is installed, and behave the same regardless of the OS or hardware it is deployed on. Containers give developers great flexibility and control since they cater to specific application requirements like the OS, S/W versions. The flip side is that there is still a need for manual maintenance of the runtime environment, like security patches, software updates, etc. Secondly, the flexibility it affords translates into high operational costs, since it lacks agility in scaling.

Serverless technologies provide much greater abstraction of the OS and infrastructure. ‘Serverless’ though, does not imply that there are no servers, it just means application developers do not have to worry about the underlying OS, the server environment, or the infra that their applications will be deployed on. Serverless is event-driven and is based on the premise that the application is split into functions that get executed based on events. The developer only needs to deploy function code and define the event(s) that will trigger them! The rest of the magic is done by the cloud service provider (with the help of third parties). 

The biggest advantage of serverless is that consumers are billed only for the running time of the function instances or the number of times the function gets executed, depending on the provider. Since it has zero administrative overhead, it guarantees rapid iterative deployment and faster time to market. Since the architecture is intrinsically auto-scaling, it is a perfect fit for applications with undefinable usage patterns. The other side of the coin is that developers need to deal with a black box back-end environment, so, holistic testing, debugging of the application becomes a challenge. Vendor lock-in is a real problem since the consumer is restricted by the technology stack supported by the vendor. Since serverless best practices dictate light, isolated functions with limited scope, building complex applications can get difficult. Function as a Service (FaaS) is a subset of serverless computing.

Internet of Things (IoT)

IoT is about connecting everyday things – beyond just computing devices or smartphones – to the internet. It is possible to convert practically anything into an IoT device, with a computer chip installation & internet access, and have it communicate independently with the internet – without any human intervention. But why would we want everyday things like for instance a watch or a light bulb, to become IoT devices? It’s in a bid to bridge the chasm between the physical and digital worlds and make the environment around us more intelligent, communicative, and responsive to our needs.

IoT’s use cases are just about everywhere; in personal devices, self-driving cars, smart homes, smart workspaces, smart cities, and industries across all verticals. For instance, live data from sensors in products while in use, gives good visibility into their operations on the ground, helps remediate issues proactively & aids improvements in design/manufacturing processes.

The Industrial Internet of Things (IIoT) is the use of IoT data in business, in tandem with Big Data, AI, Analytics, Cloud, and High-speed networks, with the primary goal of finding efficient business models to improve productivity & optimize expenditure. The need for real-time response to sensor data and advanced analytics to power insights has increased the demand for 5G networks for speed, cloud technologies for storage and computing, edge computing to reduce latency, and hyper-scale data centers for rapid scaling.

With IoT devices extending an organization’s infrastructure landscape, and the likelihood that IT staff may not even be aware of all the IoT devices in it is a security nightmare that could open corporate networks & sensitive data for attacks. Global standards and regulations for IoT device security are in the works. Until then, it is up to the enterprise security team to safeguard against IoT-related vulnerabilities.

Hyperscaling

The ability of infrastructure to rapidly scale out on a massive level is called hyperscaling.

Unprecedented needs for high-power computing and on-demand massive scalability has given rise to a new breed of hyperscale computing architectures, where traditional elements are replaced by hyper-converged, software-defined infrastructure with a high degree of virtualization. These hyperscale environments are characterized by high-density server racks, with software designed and specifically built for scale-out environments. Since high-density implies heavy power consumption, heating problems need to be handled by specialized cooling solutions like liquid cooling. Hyperscale data centre operators usually look for renewable energy options to save on power & cooling.

Today, there are several hundred hyperscale data centers in the world, with the dominant players being Microsoft, Google, Apple, Amazon & Facebook.

Edge Computing

Edge computing as the name indicates means moving data processing away from distant servers or the cloud, closer to the source of data.  This is to reduce latency and network bandwidth used for back & forth communication between the data source and the server. Edge, also called the network edge refers to where the data source connects to the internet. The explosive growth of IoT and applications like self-driving cars, virtual reality, smart cities for instance, that require real-time computing and analytics are paving the way for edge computing. Most cloud providers now provide geographically distributed edge servers. As with IoT devices, data at the edge can be a ticking security time bomb necessitating appropriate security mechanisms.

The evolution of IT technologies continuously raises the bar for the IT team. IT personnel have been forced to move beyond legacy practices and mindsets & constantly up-skill themselves to be able to ride the wave. For customers pampered by sophisticated technologies, round the clock availability of systems and immersive experiences have become baseline expectations. With more & more digitalization, there is increasing reliance on IT infrastructure and hence lesser tolerance for outages. The responsibilities of maintaining a high-performing IT infrastructure with near-zero downtime fall on the shoulders of the IT operations team.

This has underscored the importance of AI in IT operations since IT needs have now surpassed human capabilities. Gavs’ AI-powered Platform for IT operations, ZIF, caters to the entire ITOps spectrum, right from automated discovery of the landscape, monitoring, to predictive and prescriptive analytics that proactively drive the organization towards zero incidents. For more details, please visit https://zif.ai

About the Author:

Padmapriya Sridhar

Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel, and Yoga. She aspires to become a Yoga Instructor someday!

Monitoring Microservices and Containers

Monitoring applications and infrastructure is a critical part of IT Operations. Among other things, monitoring provides alerts on failures, alerts on deteriorations that could potentially lead to failures, and performance data that can be analysed to gain insights. AI-led IT Ops Platforms like ZIF use such data from their monitoring component to deliver pattern recognition-based predictions and proactive remediation, leading to improved availability, system performance and hence better user experience.

The shift away from monolith applications towards microservices has posed a formidable challenge for monitoring tools. Let’s first take a quick look at what microservices are, to understand better the complications in monitoring them.

Monoliths vs Microservices

A single application(monolith) is split into a number of modular services called microservices, each of which typically caters to one capability of the application. These microservices are loosely coupled, can communicate with each other and can be deployed independently.

Quite likely the trigger for this architecture was the need for agility. Since microservices are stand-alone modules, they can follow their own build/deploy cycles enabling rapid scaling and deployments. They usually have a small codebase which aids easy maintainability and quick recovery from issues. The modularity of these microservices gives complete autonomy over the design, implementation and technology stack used to build them.

Microservices run inside containers that provide their execution environment. Although microservices could also be run in virtual machines(VMs), containers are preferred since they are comparatively lightweight as they share the host’s operating system, unlike VMs. Docker and CoreOS Rkt are a couple of commonly used container solutions while Kubernetes, Docker Swarm, and Apache Mesos are popular container orchestration platforms. The image below depicts microservices for hiring, performance appraisal, rewards & recognition, payroll, analytics and the like linked together to deliver the HR function.

Challenges in Monitoring Microservices and Containers

Since all good things come at a cost, you are probably wondering what it is here… well, the flip side to this evolutionary architecture is increased complexity! These are some contributing factors:

Exponential increase in the number of objects: With each application replaced by multiple microservices, 360-degree visibility and observability into all the services, their interdependencies, their containers/VMs, communication channels, workflows and the like can become very elusive. When one service goes down, the environment gets flooded with notifications not just from the service that is down, but from all services dependent on it as well. Sifting through this cascade of alerts, eliminating noise and zeroing in on the crux of the problem becomes a nightmare.

Shared Responsibility: Since processes are fragmented and the responsibility for their execution, like for instance a customer ordering a product online, is shared amongst the services, basic assumptions of traditional monitoring methods are challenged. The lack of a simple linear path, the need to collate data from different services for each process, inability to map a client request to a single transaction because of the number of services involved make performance tracking that much more difficult.

Design Differences: Due to the design/implementation autonomy that microservices enjoy, they could come with huge design differences, and implemented using different technology stacks. They might be using open source or third-party software that makes it difficult to instrument their code, which in turn affects their monitoring.

Elasticity and Transience: Elastic landscapes where infrastructure scales or collapses based on demand, instances appear & disappear dynamically, have changed the game for monitoring tools. They need to be updated to handle elastic environments, be container-aware and stay in-step with the provisioning layer. A couple of interesting aspects to handle are: recognizing the difference between an instance that is down versus an instance that is no longer available; data of instances that are no longer alive continue to have value for analysis of operational efficiency or past performance.

Mobility: This is another dimension of dynamic infra where objects don’t necessarily stay in the same place, they might be moved between data centers or clouds for better load balancing, maintenance needs or outages. The monitoring layer needs to arm itself with new strategies to handle moving targets.

Resource Abstraction: Microservices deployed in containers do not have a direct relationship with their host or the underlying operating system. This abstraction is what helps seamless migration between hosts but comes at the expense of complicating monitoring.

Communication over the network: The many moving parts of distributed applications rely completely on network communication. Consequently, the increase in network traffic puts a heavy strain on network resources necessitating intensive network monitoring and a focused effort to maintain network health.

What needs to be measured

This is a high-level laundry list of what needs to be done/measured while monitoring microservices and their containers.

Auto-discovery of containers and microservices:

As we’ve seen, monitoring microservices in a containerized world is a whole new ball game. In the highly distributed, dynamic infra environment where ephemeral containers scale, shrink and move between nodes on demand, traditional monitoring methods using agents to get information will not work. The monitoring system needs to automatically discover and track the creation/destruction of containers and explore services running in them.

Microservices:

  • Availability and performance of individual services
  • Host and infrastructure metrics
  • Microservice metrics
  • APIs and API transactions
    • Ensure API transactions are available and stable
    • Isolate problematic transactions and endpoints
  • Dependency mapping and correlation
  • Features relating to traditional APM

Containers:

  • Detailed information relating to each container
    • Health of clusters, master and slave nodes
  • Number of clusters
  • Nodes per cluster
  • Containers per cluster
    • Performance of core Docker engine
    • Performance of container instances

Things to consider while adapting to the new IT landscape

Granularity and Aggregation: With the increase in the number of objects in the system, it is important to first understand the performance target of what’s being measured – for instance, if a service targets 99% uptime(yearly), polling it every minute would be an overkill. Based on this, data granularity needs to be set prudently for each aspect measured, and can be aggregated where appropriate. This is to prevent data inundation that could overwhelm the monitoring module and drive up costs associated with data collection, storage, and management.    

Monitor Containers: The USP of containers is the abstraction they provide to microservices, encapsulating and shielding them from the details of the host or operating system. While this makes microservices portable, it makes them hard to reach for monitoring. Two recommended solutions for this are to instrument the microservice code to generate stats and/or traces for all actions (can be used for distributed tracing) and secondly to get all container activity information through host operating system instrumentation.    

Track Services through the Container Orchestration Platform: While we could obtain container-level data from the host kernel, it wouldn’t give us holistic information about the service since there could be several containers that constitute a service. Container-native monitoring solutions could use metadata from the container orchestration platform by drilling into appropriate layers of the platform to obtain service-level metrics. 

Adapt to dynamic IT landscapes: As mentioned earlier, today’s IT landscape is dynamically provisioned, elastic and characterized by mobile and transient objects. Monitoring systems themselves need to be elastic and deployable across multiple locations to cater to distributed systems and leverage native monitoring solutions for private clouds.

API Monitoring: Monitoring APIs can provide a wealth of information in the black box world of containers. Tracking API calls from the different entities – microservices, container solution, container orchestration platform, provisioning system, host kernel can help extract meaningful information and make sense of the fickle environment.

Watch this space for more on Monitoring and other IT Ops topics. You can find our blog on Monitoring for Success here, which gives an overview of the Monitorcomponent of GAVS’ AIOps Platform, Zero Incident FrameworkTM (ZIF). You can Request a Demo or Watch how ZIF works here.

About the Author:

Sivaprakash Krishnan


Bio – Siva is a long timer at Gavs and has been with the company for close to 15 years. He started his career as a developer and is now an architect with a strong technology background in Java, Big Data, DevOps, Cloud Computing, Containers and Micro Services. He has successfully designed & created a stable Monitoring Platform for ZIF, and designed & driven cloud assessment and migration, enterprise BRMS and IoT based solutions for many of our customers. He is currently focused on building ZIF 4.0, a new gen business-oriented TechOps platform.

Padmapriya Sridhar


Bio – Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel and Yoga. She aspires to become a Yoga Instructor some day!

Pivotal Role of AI and Machine Learning in Industry 4.0 and Manufacturing

Industry 4.0 is a name given to the current trend of automation and data exchange in manufacturing technologies. It includes cyber-physical systems, the Internet of things, cloud computing and cognitive computing.Industry 4.0 is commonly referred to as the fourthindustrial revolution.

Industry 4.0 is the paving the path for digitization of the manufacturing sector, where artificial intelligence (AI) and machine-learning based systems are not only changing the ways we interact with information and computers but also revolutionizing it.

Compelling reasons for most companies to shift towards Industry 4.0 and automate manufacturing include;

  • Increase productivity
  • Minimize human / manual errors
  • Optimize production costs
  • Focus human efforts on non-repetitive tasks to improve efficiency

Manufacturing is now being driven by effective data management and AI that will decide its future. The more data sets computers are fed, the more they can observe trends, learn and make decisions that benefit the manufacturing organization. This automation will help to predict failures more accurately, predict workloads, detect and anticipate problems to achieve Zero Incidence.

GAVS’ proprietary AIOps based TechOps platform – Zero Incident Framework TM (ZIF) can successfully integrate AI and machine learning into the workflow allowing manufacturers to build robust technology foundations.

To maximize the many opportunities presented by Industry 4.0, manufacturers need to build a system with the entire production process in mind as it requires collaboration across the entire supply chain cycle.

Top ways in which ZIF’s expertise in AI and ML are revolutionizing manufacturing sector:

  • Asset management, supply chain management and inventory management are the dominant areas of artificial intelligence, machine learning and IoT adoption in manufacturing today. Combining these emerging technologies, they can improve asset tracking accuracy, supply chain visibility, and inventory optimization.
  • Improve predictive maintenance through better adoption of ML techniques like analytics, Machine Intelligence driven processes and quality optimization.
  • Reduce supply chain forecasting errors and reduce lost sales to increase better product availability.
  • Real time monitoring of the operational loads on the production floor helps in providing insights into the production schedule performances.
  • Achieve significant reduction in test and calibration time via accurate prediction of calibration and test results using machine learning.
  • Combining ML and Overall Equipment Effectiveness (OEE), manufacturers can improve yield rates, preventative maintenance accuracy and workloads by the assets. OEE is a universally used metric in manufacturing as it combines availability, performance, and quality, defining production effectiveness.
  • Improving the accuracy of detecting costs of performance degradation across multiple manufacturing scenarios that reduces costs by 50% or more.

Direct benefits of Machine Learning and AI for Manufacturing

The introduction of AI and Machine Learning to industry 4.0 represents a big change for manufacturing companies that can open new business opportunities and result in advantages like efficiency improvements among others.

  • Cost reduction through Predictive Maintenance that leads to less maintenance activity, which means lower labor costs, reduced inventory and materials wastage.
  • Predicting Remaining Useful Life (RUL): Keeping tabs on the behavior of machines and equipment leads to creating conditions that improve performance while maintaining machine health. By predicting RUL, it reduces the scenarios which causes unplanned downtime.
  • Improved supply chain management through efficient inventory management and a well monitored and synchronized production flow.
  • Autonomous equipment and vehicles: Use of autonomous cranes and trucks to streamline operations as they accept containers from transport vehicles, ships, trucks etc.
  • Better Quality Control with actionable insights to constantly raise product quality.
  • Improved human-machine collaboration while improving employee safety conditions and boosting overall efficiency.
  • Consumer-focused manufacturing: Being able to respond quickly to changes in the market demand.

Touch base with GAVS AI experts here: https://www.gavstech.com/reaching-us/ and see how we can help you drive your manufacturing operation towards Industry 4.0.