Ensure Service Availability and Reliability with ZIF

To survive in the current climate, most enterprises have already embarked on their digital transformation journeys. This is leading to uncertainty in the way applications and services supporting the applications are being monitored and managed. Inadequate information is leading to downtime in service availability for end-users eventually resulting in unhappy users and revenue loss.

Zero Incident Framework™ has been architected to address the IT Ops issues of today and tomorrow.

Leveraging the power of Artificial Intelligence on telemetry data ingested in real-time, ZIF can provide insights and resolve forecasted issues – resulting in the availability of application service when end-user wants the service at the right time.

Business Value delivered to customers from ZIF

  • Minimum 40% reduction in capital expenses and a minimum 50% reduction in IT operational cost
  • Faster resolution by 60% (MTTR)
  • Service availability of 99.99%
  • ZIF bots to increase productivity by a minimum of 80%
  • Increased user experience measured by metrics (UEI) User Experience Index
AI in operations management service

ICEBERG STATE IN ITOps

Many IT operations are in an ‘ICEBERG’ state even today. Do not be surprised if your organization is also one of them. Issues and incidents that surfaces to the top are the ones that are known to the team. But the unknown issues are not uncovered.

Therefore, enterprises have started to embark on artificial intelligence to help them identify and track the unknown issues within the complex IT landscape.

OBSERVABILITY USING ZIF

ZIF, architected and developed on the premise of observability, not only helps with visibility but also enables discovering deeper insights, thus freeing up more time for more strategic initiatives. This becomes critical to the overall success of Site Reliability Engineering (SRE) in enterprises.

Externalizing the internal state of systems, services, and application to the maximum, helps in complete observability.

Monitoring Vs. Observability?

automated discovery of networked services

Pillars of Observability – Events | Metrics | Traces

Ensure SERVICE RELIABILITY

“Reliability is defined as the probability that an application, system, or service will perform its intended function adequately for a specified period or will operate in a defined environment without failure.”

ZIF has mastered the art of predicting device, application & service failure, or performance degradation. This unique proposition from ZIF gives IT engineers the edge on service reliability of all applications, systems, or services that they are responsible for. ZIF’s auto-remediation bots can resolve predicted issues to make sure the intended function performs as and when expected by users.

SERVICE AVAILABILITY

Availability is measured as the percentage of time your service or system or application is available.

A small variation in availability percentage will have to be addressed on priority. A 99.999% availability allows only 5.26 minutes of downtime a year, whereas 99% availability allows downtime of 3.65 days a year.

ZIF helps IT engineers achieve the agreed-upon availability of application or system by learning the usage of the system and application from the metrics that are collected from the environment. Collecting the right metrics helps in getting the right availability. With the help of unsupervised algorithms, patterns are learned which helps in discovering when the application or system is required the most and then predicting any potential downtime. With above 95% accuracy in prediction, ZIF can achieve 99.99% availability for application and devices which allows 52.56 minutes downtime a year.

ZIF’s goal has always been to deliver the right business outcomes for the stakeholders. Users have the privilege to choose what business outcomes are expected from the platform and the respective features are deployed in the enterprise to deliver the chosen outcome.

About the Author

Anoop Aravindakshan

An evangelist of Zero Incident FrameworkTM, Anoop has been a part of the product engineering team for long and has recently forayed into product marketing. He has over 14 years of experience in Information Technology across various verticals, which include Banking, Healthcare, Aerospace, Manufacturing, CRM, Gaming, and Mobile.

Algorithmic Alert Correlation

Today’s always-on businesses and 24×7 uptime demands have necessitated IT monitoring to go into overdrive. While constant monitoring is a good thing, the downside is that the flood of alerts generated can quickly get overwhelming. Constantly having to deal with thousands of alerts each day causes alert fatigue, and impacts the overall efficiency of the monitoring process.

Hence, chalking out an optimal strategy for alert generation & management becomes critical. Pattern-based thresholding is an important first step, since it tunes thresholds continuously, to adapt to what ‘normal’ is, for the real-time environment. Threshold accuracy eliminates false positives and prevents alerts from getting fired incorrectly. Selective alert suppression during routine IT Ops maintenance activities like backups, patches, or upgrades, is another. While there are many other strategies to keep alert numbers under control, a key process in alert management is the grouping of alerts, known as alert correlation. It groups similar alerts under one actionable incident, thereby reducing the number of alerts to be handled individually.

But, how is alert ‘similarity’ determined? One way to do this is through similarity definitions, in the context of that IT landscape. A definition, for instance, would group together alerts generated from applications on the same host, or connectivity issues from the same data center. This implies that similarity definitions depend on the physical and logical relationships in the environment – in other words – the topology map. Topology mappers detect dependencies between applications, processes, networks, infrastructure, etc., and construct an enterprise blueprint that is used for alert correlation.

But what about related alerts generated by entities that are neither physically nor logically linked? To give a hypothetical example, let’s say application A accesses a server S which is responding slowly, and so A triggers alert A1. This slow communication of A with S eats up host bandwidth, and hence affects another application B in the same host. Due to this, if a third application C from another host calls B, alert A2 is fired by C due to the delayed response from B.  Now, although we see the link between alerts A1 & A2, they are neither physically nor logically related, so how can they be correlated? In reality, such situations could imply thousands of individual alerts that cannot be combined.

Algorithmic Alert Correlation

This is one of the many challenges in IT operations that we have been trying to solve at GAVS. The correlation engine of our AIOps Platform ZIF uses algorithmic alert correlation to find a solution for this problem. We are working on two unsupervised machine learning algorithms that are fundamentally different in their approach – one based on pattern recognition and the other based on spatial clustering. Both algorithms can function with or without a topology map, and work around what is supplied and available. The pattern learning algorithm derives associations based on learnings from historic patterns of alert relationships. The spatial clustering algorithm works on the principle of similarity based on multiple features of alerts, including problem similarity derived by applying Natural Language Processing (NLP), and relationships, among several others. Tuning parameters enable customization of algorithmic behavior to meet specific demands, without requiring modifications to the core algorithms. Time is also another important dimension factored into these algorithms, since the clustering of alerts generated over an extended period of time will not give meaningful results.

Traditional alert correlation has not been able to scale up to handle the volume and complexity of alerts generated by the modern-day hybrid and dynamic IT infrastructure. We have reached a point where our ITOps needs have surpassed the limits of human capabilities, and so, supplementing our intelligence with Artificial Intelligence and Machine Learning has now become indispensable.

About the Authors –

Padmapriya Sridhar

Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel, and Yoga. She aspires to become a Yoga Instructor someday!

Gireesh Sreedhar KP

Gireesh is a part of the projects run in collaboration with IIT Madras for developing AI solutions and algorithms. His interest includes Data Science, Machine Learning, Financial markets, and Geo-politics. He believes that he is competing against himself to become better than who he was yesterday. He aspires to become a well-recognized subject matter expert in the field of Artificial Intelligence.

Cloud Adoption, Challenges, and Solution Through Monitoring, AI & Automation

Cloud Adoption

Cloud computing is the delivery of computing services including Servers, Database, Storage, Networking & others over the internet. Public, Private & Hybrid clouds are different ways of deploying cloud computing.  

  • In public cloud, the cloud resources are owned by 3rd party cloud service provider
  • A private cloud consists of computing resources exclusively by one business or organization
  • Hybrid provides the best of both worlds, combines on-premises infrastructure, private cloud with public cloud

Microsoft, Google, Amazon, Oracle, IBM, and others are providing cloud platform to users to host and experience practical business solution. The worldwide public cloud services market is forecast to grow 17% in 2020 to total $266.4 billion and $354.6 billion in 2022, up from $227.8 billion in 2019, per Gartner, Inc.

There are various types of Instances, workloads & options available as part of cloud ecosystem, i.e. IaaS, PaaS, SaaS, Multi-cloud, Serverless.

Challenges

When very large, large and medium enterprise decides to move their IT environment from on-premise to cloud, they try to move some/most of their on-premises into cloud and keep the rest under their control on-premise. There are various factors that impact the decision, to name a few,

  1. ROI vs Cost of Cloud Instance, Operation cost
  2. Architecture dependency of the application, i.e. whether it is monolithic or multi-tier or polyglot or hybrid cloud
  3. Requirement and need for elasticity and scalability
  4. Availability of right solution from the cloud provider
  5. Security of some key data

After crossing all, once the IT environment is cloud-enabled, the challenge comes in ensuring the monitoring of the Cloud-enabled IT environment. Here are some of the business and IT challenges

1. How to ensure the various workloads & Instances are working as expected?

While the cloud provider may give high availability & up time depending on the tier we choose, it is important that our IT team monitors the environment, as in the case of IaaS and to some extent in PaaS as well.

2. How to ensure the Instances are optimally used in terms of compute and storage?

Cloud providers give most of the metrics around the Instances, though it may not provide all metrics that we may need to make decision in all scenarios.

The disadvantage with this model is, cost, latency & not straight forward, e.g. the LOG analytics which comes in Azure involves cost for every MB/GB of data that is stored and the latency in getting the right metrics at right time, if there is latency/delay, you may not get a right result

3. How to ensure the Application or the components of a single solution that are spread across on-premise and Cloud environment is working as expected?

Some cloud providers give tools for integrating the metrics from on-premise to cloud environment to have a shared view.

The disadvantage with this model is, it is not possible to bring in all sorts of data together to get the insights straight. That is, observability is always a question. The ownership of getting the observability lies with the IT team who handles the data.

4. How to ensure the Multi-Cloud + On-Premise environment is effectively monitored & utilized to ensure the best End-user experience?

Multi-Cloud environment – With rapid growing Microservices Architecture & Container based cloud enabled model, it is quite natural that the Enterprise may choose the best from different cloud providers like Azure, AWS, Google & others.

There is little support from cloud provider on this space. In fact, some cloud providers do not even support this scenario.

5. How to get a single panel of view for troubleshooting & root cause analysis?

Especially when problem occurs in Application, Database, Middle Tier, Network & 3rd party layers that are spread across multi-cluster, multi-cloud, elastic environment, it is very important to get a Unified view of entire environment.

ZIF (Zero Incident FrameworkTM), provides a single platform for Cloud Monitoring.

ZIF has Discovery, Monitoring, Prediction & Remediate that seamlessly fits for a cloud enabled solution. ZIF provides the unified dashboard with insights across all layers of IT infrastructure that is distributed across On-premise host, Cloud Instance & Containers.

Core features & benefits of ZIF for Cloud Monitoring are,

1. Discovery & Topology

  • Discovers and provides dynamic mapping of resources across all layers.
  • Provides real-time mapping of applications and its dependent layers irrespective of whether the components live on-premise, or on cloud or containerized in cloud.
  • Dynamically built topology of all layers which helps in taking effective decisions.

2. Observability across Multi-Cloud, Hybrid-Cloud & On-Premise tiers

  • It is not just about collecting metrics; it is very important to analyze the monitored data and provide meaningful insights.
  • When the IT infrastructure is spread across multiple cloud platform like Azure, AWS, Google Cloud, and others, it is important to get a unified view of your entire environment along with the on-premise servers.
  • Health of each layers are represented in topology format, this helps to understand the impact and take necessary actions.

3. Prediction driven decision for resource optimization

  • Prediction engine analyses the metrics of cloud resources and predicts the resource usage. This helps the resource owner to make proactive action rather than being reactive.
  • Provides meaningful insights and alerts in terms of the surge in the load, the growth in number of VMs, containers, and the usage of resource across other workloads.
  • Authorize the Elasticity & Scalability through real-time metrics.

4. Container & Microservice support

  • Understand the resource utilization of your containers that are hosted in Cloud & On-Premise.
  • Know the bottlenecks around the Microservices and tune your environment for the spikes in load.
  • Provides full support for monitoring applications distributed across your local host & containers in cloud in a multi-cluster setup.

5. Root cause analysis made simple

  • Quick root cause analysis by analysing various causes captured by ZIF Monitor instead of going through layer by layer. This saves time to focus on problem-solving and arresting instead of spending effort on identifying the root cause.
  • Provides insights across your workload including the impact due to 3rd party layers as well.

6. Automation

  • Irrespective of whether the workload and instance is on-premise or on Azure or AWS or other provider, the ZIF automation module can automate the basics to complex activities

7. Ensure End User Experience

  • Helps to improve the end-user experience who gets served by the workload from cloud.
  • The ZIF tracing helps to trace each & every request of each & every user, thereby it is quite natural for ZIF to unearth the performance bottleneck across all layers, which in turn helps to address the problem and thereby improve the User Experience

Cloud and Container Platform Support

ZIF Seamlessly integrates with following Cloud & Container environments,

  • Microsoft Azure
  • AWS
  • Google Cloud
  • Grafana Cloud
  • Docker
  • Kubernetes

About the Author

Suresh Kumar Ramasamy-Picture

Suresh Kumar Ramasamy


Suresh heads the Monitor component of ZIF at GAVS. He has 20 years of experience in Native Applications, Web, Cloud, and Hybrid platforms from Engineering to Product Management. He has designed & hosted the monitoring solutions. He has been instrumental in conglomerating components to structure the Environment Performance Management suite of ZIF Monitor.

Suresh enjoys playing badminton with his children. He is passionate about gardening, especially medicinal plants.

Generative Adversarial Networks (GAN)

In my previous article (zif.ai/inverse-reinforcement-learning/), I had introduced Inverse Reinforcement Learning and explained how it differs from Reinforcement Learning. In this article, let’s explore Generative Adversarial Networks or GAN; both GAN and reinforcement learning help us understand how deep learning is trying to imitate human thinking.

With access to greater hardware power, Neural Networks have made great progress. We use them to recognize images and voice at levels comparable to humans sometimes with even better accuracy. Even with all of that we are very far from automating human tasks with machines because a tremendous amount of information is out there and to a large extent easily accessible in the digital world of bits. The tricky part is to develop models and algorithms that can analyze and understand this humongous amount of data.

GAN in a way comes close to achieving the above goal with what we call automation, we will see the use cases of GAN later in this article.

This technique is very new to the Machine Learning (ML) world. GAN is a deep learning, unsupervised machine learning technique proposed by Ian Goodfellow and few other researchers including Yoshua Bengio in 2014. One of the most prominent researcher in the deep learning area, Yann LeCun described it as “the most interesting idea in the last 10 years in Machine Learning”.

What is Generative Adversarial Network (GAN)?

A GAN is a machine learning model in which two neural networks compete to become more accurate in their predictions. GANs typically run unsupervised and use a cooperative zero-sum game framework to learn.

The logic of GANs lie in the rivalry between the two Neural Nets. It mimics the idea of rivalry between a picture forger and an art detective who repeatedly try to outwit one another. Both networks are trained on the same data set.

A generative adversarial network (GAN) has two parts:

  • The generator (the artist) learns to generate plausible data. The generated instances become negative training examples for the discriminator.
  • The discriminator (the critic) learns to distinguish the generator’s fake data from real data. The discriminator penalizes the generator for producing implausible results.

GAN can be compared with Reinforcement Learning, where the generator is receiving a reward signal from the discriminator letting it know whether the generated data is accurate or not.

Generative Adversarial Networks

During training, the generator tries to become better at generating real looking images, while the discriminator trains to be better classify those images as fake. The process reaches equilibrium at a point when the discriminator can no longer distinguish real images from fakes.

Generative Adversarial Networks

Here are the steps a GAN takes:

  • The input to the generator is random numbers which returns an image.
  • The output image of the generator is fed as input to the discriminator along with a stream of images taken from the actual dataset.
  • Both real and fake images are given to the discriminator which returns probabilities, a number between 0 and 1, 1 meaning a prediction of authenticity and 0 meaning fake.

So, you have a double feedback loop in the architecture of GAN:

  • We have a feedback loop with the discriminator having ground truth of the images from actual training dataset
  • The generator is, in turn, in a feedback loop along with the discriminator.

Most GANs today are at least loosely based on the DCGAN architecture (Radford et al., 2015). DCGAN stands for “deep, convolution GAN.” Though GANs were both deep and convolutional prior to DCGANs, the name DCGAN is useful to refer to this specific style of architecture.

Applications of GAN

Now that we know what GAN is and how it works, it is time to dive into the interesting applications of GANs that are commonly used in the industry right now.

Generative Adversarial Networks

Can you guess what’s common among all the faces in this image?

None of these people are real! These faces were generated by GANs, exciting and at the same time scary, right? We will focus about the ethical application of the GAN in the article.

GANs for Image Editing

Using GANs, appearances can be drastically changed by reconstructing the images.

GANs for Security

GANs has been able to address the concern of ‘adversarial attacks’.

These adversarial attacks use a variety of techniques to fool deep learning architectures. Existing deep learning models are made more robust to these techniques by GANs by creating more such fake examples and training the model to identify them.

Generating Data with GANs

The availability of data in certain domains is a necessity, especially in domains where training data is needed to model learning algorithms. The healthcare industry comes to mind here. GANs shine again as they can be used to generate synthetic data for supervision.

GANs for 3D Object Generation

GANs are quite popular in the gaming industry. Game designers work countless hours recreating 3D avatars and backgrounds to give them a realistic feel. And, it certainly takes a lot of effort to create 3D models by imagination. With the incredible power of GANs, wherein they can be used to automate the entire process!

GANs are one of the few successful techniques in unsupervised machine learning and it is evolving quickly and improving our ability to perform generative tasks. Since most of the successful applications of GANs have been in the domain of computer vision, generative model sure has a lot of potential, but is not without some drawbacks.

About the Author –

Naresh B

Naresh is a part of Location Zero at GAVS as an AI/ML solutions developer. His focus is on solving problems leveraging AI/ML.
He strongly believes in making success as a habit rather than considering it as a destination.
In his free time, he likes to spend time with his pet dogs and likes sketching and gardening.

Lambda (λ), Kappa (κ) and Zeta (ζ) – The Tale of 3 AIOps Musketeers (PART-3)

“Data that sit unused are no different from data that were never collected in the first place.” – Doug Fisher

In the part 1 (https://bit.ly/3hDChCH), we delved into Lambda Architecture and in part 2 (https://bit.ly/3hDCg1B) about Generic Lambda. Given the limitations of the Generic lambda architecture and its inherent complexity, the data is replicated in two layers and keeping them in-sync is quite challenging in an already complex distributed system.There is a growing interest to find the simpler alternative to the Generic Lambda, that would bring just about the same benefits and handle the full problem set. The solution is Unified Lambda (λ) Architecture.

Unified Lambda (λ) Architecture

The unified approach addresses the velocity and volume problems of Big Data as it uses a hybrid computation model. This model combines both batch data and instantaneous data transparently.

There are basically three approaches:

  1. Pure Streaming Framework
  2. Pure Batch Framework
  3. Lambdoop Framework

1. Pure streaming framework

In this approach, a pure streaming model is adopted and a flexible framework like Apache Samza can be employed to provide unified data processing model for both stream and batch processing using the same data flow structure.

Pure streaming framework

To avoid the large turn-around times involved in Hadoop’s batch processing, LinkedIn came up with a distributed stream processing framework Apache Samza. It is built on top of distributed messaging bus; Apache Kafka, so that it can be a lightweight framework for streaming platform. i.e. for continuous data processing. Samza has built-in integration with Apache Kafka, which is comparable to HDFS and MapReduce. In the Hadoop world, HDFS is the storage layer and MapReduce, the processing layer. In the similar way, Apache Kafka ingests and stores the data in topics, which is then streamed and processed by Samza. Samza normally computes results continuously as and when the data arrives, thus delivering sub-second response times.

Albeit it’s a distributed stream processing framework, its architecture is pluggable i.e. can be integrated with umpteen sources like HDFS, Azure EventHubs, Kinensis etc. apart from Kafka. It follows the principle of WRITE ONCE, RUN ANYWHERE; meaning, the same code can run in both stream and batch mode. Apache Samza’s streams are re-playable, ordered partitions.

Unified API for Batch & Streaming in pure Streaming

Apache Samza offers a unified data processing model for both real-time as well as batch processing.  Based on the input data size, bounded or unbounded the data processing model can be identified, whether batch or stream.Typically bounded (e.g. static files on HDFS) are Batch data sources and streams are unbounded (e.g. a topic in Kafka). Under the bonnet, Apache Samza’s stream-processing engine handles both types with high efficiency.

Unified API for Batch & Streaming in pure Streaming

Another advantage of this unified API for Batch and Streaming in Apache Samza, is that makes it convenient for the developers to focus on the processing logic, without treating bounded and unbounded sources differently. Samza differentiates the bounded and unboundeddata by a special token end-of-stream. Also, only config change is needed, and no code changes are required, in case of switching gears between batch and streaming, e.g. Kafka to HDFS.Let us take an example of Count PageViewEvent for each mobile Device OS in a 5-minute window and send the counts to PageViewEventPerDeviceOS

Pure Batch framework

This is the reverse approach of pure streaming where a flexible Batch framework is employed, which would offer both the batch processing and real-time data processing ability. The streaming is achieved by using mini batches which is small enough to be close to real-time, with Apache Spark/Spark Streaming or Storm’s Trident. Under the hood, Spark streaming is a sequence of micro-batch processes with the sub-second latency. Trident is a high-level abstraction for doing streaming computation on top of Storm. The core data model of Trident is the “Stream”, processed as a series of batches.

Apache Spark achieves the dual goal of Batch as well as real-time processing by the following modes.

  • Micro-batch processing model
  • Continuous Processing model

Micro-batch processing model

Micro-batch processing is analogous to the traditional batch processing in that data are usually processed as a group. The primary difference is that the batches are smaller and processed more often. In spark streaming, the micro-batches are created based on the time rather than on the accumulated data size. The smaller the time to trigger a micro-batch to process, lesser the latency.

Continuous Processing model

Apache Spark 2.3, introduced Low-latency Continuous Processing Mode in Structured Streaming whichenables low (~1 ms) end-to-end latency with at-least-once fault-tolerance guarantees. Comparing this with the default micro-batch processing engine which can achieve exactly-once guarantees but achieve latencies of ~100 ms at best. Without modifying the application logic i.e. DataFrame/Dataset operations mini-batching or continuous streaming can be chosen at runtime. Spark Streaming also has the abilityto work well with several data sources like HDFS, Flume or Kafka.

Example of Micro-batching and Continuous Batching

3. Lambdoop Approach

In many places, capability of both batch and real time processing is needed.It is cumbersome to develop a software architecture of such capabilities by tailoring suitable technologies, software layers, data sources, data storage solutions, smart algorithms and so on to achieve the good scalable solution. This is where the frameworks like Spring “XD”, Summingbird or Lambdoop comes in, since they already have a combined API for batch and real-time processing.

Lambdoop

Lambdoop is a software framework based on the Lambda architecture which provides an abstraction layer to the developers. This feature makes the developers life easy to develop any Big Data applications by combining real time and batch processing approaches. Developers don’t have to deal with different technologies, configurations, data formats etc. They can use the Lambdoop framework as the only needed API. Also, Lambdoop includes other interesting tools such as input/output drivers, visualization tools, cluster management tools and widely accepted AI algorithms.

The Speed layer in Lambdoop runs on Storm and the Batch layer on Hadoop, Lambdoop (Lambda-Hadoop, with HBase, Storm and Redis) also combines batch/real-time by offering a single API for both processing models.

Summingbird

Summingbird aka‘Streaming MapReduce’ is a hybrid computational system where both the batch/streaming computations can be run at the same time and the results can be merged automatically. In Summingbird, the developer can write the code/job logic once and change the backend as and when needed. Following are the modes in which Summingbird Job/code can be executed.

  • batch mode (using Scalding on Hadoop)
  • real-time mode (using Storm)
  • hybrid batch/real-time mode (offers attractive fault-tolerance properties)

If the model assumes streaming, one-at-a-time semantics, then the code can be run in real-time e.g. Strom or in offline/batch mode e.g. Hadoop, spark etc. It can operate in a hybrid processing mode, when there is a need to transparently integrate batch and online results to efficiently generate up-to-date views over long time spans.

Conclusion

The volume of any Big Data platform is handled by building a batch processing application which requires, MapReduce, spark development, Use of other Hadoop related tools like Sqoop, Zookeeper, HCatalog etc. and storage systems like HBase, MongoDB, HDFS, Cassandra. At the same time the velocity of any Big Data platform is handled by building a real-time streaming application which requires, stream computing development using Storm, Samza, Kafka-connect, Apache Flink andS4, and use of temporal datastores like in-memory data stores, Apache Kafka messaging system etc.

The Unified Lambda handles the both Volume and Velocity if any Big Data platform by the intermixed approach of featuring a hybrid computation model, where both batch and real-time data processing are combined transparently. Also, the limitations of Generic Lambda like Dual execution mode, Replicating and maintaining the data sync between different layers are avoided and in the Unified Lambda, there would be only one system to learn and maintain.

About the Author:

Bargunan Somasundaram

Bargunan Somasundaram

Bargunan is a Big Data Engineer and a programming enthusiast. His passion is to share his knowledge by writing his experiences about them. He believes “Gaining knowledge is the first step to wisdom and sharing it is the first step to humanity.”

Lambda (λ), Kappa (κ) and Zeta (ζ) – The tale of three musketeers (Part-2)

In my previous article https://bit.ly/2T7DO9r, we saw the brief introduction and terminologies of Lambda Architecture. Let’s jump on to its various implementation patterns in the enterprises.

Lambda data processing architecture can be implemented in three ways,

  1. GenericLambdaλArchitecture.
  2. Unified LambdaλArchitecture.
  3. Multi-Agent Lambdaλ Architecture (MALA)

Generic Lambda λ Architecture

The three layers of Generic λ

  • Batch Layer-The The master data is managed here, and the batch views are precomputed.
  • Speed Layer-This layer serves recent data only and increments the real-time views
  • Serving Layer-This layer is responsible for indexing and exposing the views so that they can be queried.

How does Generic Lambdaλ Architecture work?

The new information collected, or ingested data is sent simultaneously to both Batch and Speed/Streaming layers for processing. The batch layer, called ‘Data Lake’, handles two vital tasks.

1) Managing the master data set (Data Lake), which is an immutable append-only raw data.

2) Precomputing the batch views on business-relevant aggregations and metrics.

The computation from Batch Layer is fed into Serving Layer which indexes the batch views, for a low latency query.

In the Speed layer or Streaming layer, the views are transient in nature, since only new data is considered to compensate for the high latency of the writes.

A serving layer can be a presentation side/reporting layer aimed to handle both batch reporting as well as real-time reporting. At the presentation side, queries are answered by merging both batch and real-time views. 

Generic Lambdaλ is technology agnostic

 The data pipeline can be broken down into layers with clear delineation of roles and responsibilities and at each layer, we can choose from several technologies. For instance, in the speed layer any of Apache Storm or Apache Spark Streaming, or spring ‘XD’ (eXtreme Data)could be employed.

Speed Layer Components

The following table represents some of the Stream Processing Frameworks, that are well suited for the speed components.

Apache Storm

Apache Storm is an open-source, distributed, and advanced Big Data processing engine that processes the real-time streaming data at an unprecedented speed, way faster than Apache Hadoop. What Hadoop does for batch processing, Apache Storm does for unbounded streams of data in a reliable manner.

  • Apache Storm can process over a million jobs on a node in a fraction of a second.
  • It is integrated with Hadoop to harness higher throughputs.
  • It is easy to implement and can be integrated with any programming language.

Apache Spark Streaming

Spark Streaming was added to Apache Spark in 2013, an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data is ingested from varied sources like Apache Kafka, Flume, Kinesis and can be processed using complex algorithms expressed with high-level functions like map, reduce, join, and window. The processed data can be pushed out to filesystems, databases, and live dashboards. Spark’s machine learning and graph processing algorithms can be applied to data streams.

Spring XD (eXtreme Data)

Spring XD (eXtreme Data) is a unified, distributed, and extensible service for data ingestion, real-time analytics, batch processing, and data export.

The Spring Data team has via Spring XD has provided support for NoSQL datastores and has also simplified the development experience with Hadoop. Spring XD is built on the fundamental blocks of Apache Hadoop. Also, it uses various pre-existing Spring technologies. For instance, Spring Data supports NoSQL/Hadoop work, Spring Batch is employed to support the workflow orchestration with job state management and retry/restart capabilities, and Spring Integration manages the event-driven data ingestion stream processing and the various Enterprise Application Integration patterns. Spring Reactor provides simplified API for developing asynchronous applications using the LMAX Disruptor.

Batch layer Components

Similarly, in Batch Layer frameworks like Apache Pig, Apache MapReduce and Apache spark can be employed. The Processing Frameworks that are commonly used in the batch layer are outlined below.

Apache Hadoop MapReduce

Hadoop MapReduce is a paradigm and software framework for writing applications that process large amounts of data on large clusters of commodity hardware in a parallel, reliable, fault-tolerant manner. MapReduce programs are written in various languages like Java, Ruby, Python, and C++ can be run in Apache Hadoop platform.

Apache Pig

Apache Pig is an abstraction over MapReduce and a tool/platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. To address the problem of programs generating series of Map and Reduce stages in MapReduce, Apache Pig creates an abstraction over them. The most noticeable property of Pig programs is that their structure is amenable to substantial parallelization, which in turn enables them to handle very large data sets.

Apache Spark

Apache Spark is the largest open source project in Big data processing. It is a blazing-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.

Serving Layer Components

Technologies or Merge/Low-Latency Databases like Druid, Apache HBase, Elephant DB, Apache SOLR, Elasticsearch, Azure Cosmos DB, MongoDB, VoltDBcan be employed for speed-layer output.

Limitations of GenericLambda λ architecture

  1. Write everything twice
  2. In the generic lambda architecture, the data must be written twice i.e. data is sent to both the speed layer and the batch layer as it is created. Any logic is duplicated and implemented twice. The batch layer takes a while to produce results so the speed layer does the same work so it can answer questions about in-flight events and recent activities.

  3. Two execution paths
  4. There are always two separate execution paths for streaming and batch. It’s a maintenance nightmare, where dealing with a plethora of frameworks, components, and clusters.

  5. Two programming models
  6. Typically, an undesirable effect of Generic Lambda is that the codebases tend to diverge since the code that executes in a batch world works on a large but finite data set while a real-time stream processing system works on an infinite event stream.

  7. Diverse skill sets
  8. Just to manage the platform, more developers with diverse skill sets are needed than focusing on core business problems.

Conclusion

It is evident that the Generic Lambda λ fits best for the system that has fast data i.e. high velocity of data and Data Lake i.e. system that involves complex processing of both historical (re-computational) and real-time (incremental) aggregated view with nearly unlimited memory capacity and data storage space. Use cases like login gestion (syslog’s, application logs, weblogs), which are one-way data pipelines are some of the areas where the Generic Lambda λ shines the brightest. But having known the limitations of the Generic lambda λ, there is always a constant search to address its limitations. Let’s continue to explore to find solutions in the next part.

Happy Learning!

About the Author:

Bargunan Somasundaram


Bargunan is a Big Data Engineer and a programming enthusiast. His passion is to share his knowledge by writing his experiences about them. He believes “Gaining knowledge is the first step to wisdom and sharing it is the first step to humanity.”

Inverse Reinforcement Learning

Naresh B

What is Inverse Reinforcement Learning(IRL)?

Inverse reinforcement learning is a recently developed Machine Learning framework that can solve the inverse problem of Reinforcement Learning (RL). Basically, IRL is about learning from humans. Inverse reinforcement learning is the field of learning an agent’s objectives, values, or rewards by observing its behavior.

Before getting into further details of IRL, let us recap RL.
Reinforcement learning is an area of Machine Learning (ML) that takes suitable actions to maximize rewards. The goal of reinforcement learning algorithms is to find the best possible action to take in a specific situation.

Challenges in RL

One of the hardest challenges in many reinforcement learning tasks is that it is often difficult to find a good reward function which is both learnable (i.e. rewards happen early and often enough) and correct (i.e. leads to the desired outcomes). Inverse reinforcement learning aims to deal with this problem by learning a reward function based on observations of expert behavior.

What distinguishes Inverse Reinforcement Learning from Reinforcement Learning?

In RL, our agent is provided with a reward function which, whenever it executes an action in some state, provides feedback about the agent’s performance. This reward function is used to obtain an optimal policy, one where the expected future reward (discounted by how far away it will occur) is maximal.

In IRL, the setting is (as the name suggests) inverse. We are now given some agent’s policy or a history of behavior and we try to find a reward function that explains the given behavior. Under the assumption that our agent acted optimally, i.e. always picks the best possible action for its reward function, we try to estimate a reward function that could have led to this behavior.

The biggest motivation for IRL

Maybe the biggest motivation for IRL is that it is often immensely difficult to manually specify a reward function for a task. So far, RL has been successfully applied in domains where the reward function is very clear. But in the real world, it is often not clear at all what the reward should be and there are rarely intrinsic reward signals such as a game score.

For example, consider we want to design an artificial intelligence for a self-driving car. A simple approach would be to create a reward function that captures the desired behavior of a driver, like stopping at red lights, staying off the sidewalk, avoiding pedestrians, and so on. In real life, this would require an exhaustive list of every behavior we’d want to consider, as well as a list of weights describing how important each behavior is.

Instead, in the IRL framework, the task is to take a set of human-generated driving data and extract an approximation of that human’s reward function for the task. Of course, this approximation necessarily deals with a simplified model of driving. Still, much of the information necessary for solving a problem is captured within the approximation of the true reward function. Since it quantifies how good or bad certain actions are. Once we have the right reward function, the problem is reduced to finding the right policy and can be solved with standard reinforcement learning methods.

For our self-driving car example, we’d be using human driving data to automatically learn the right feature weights for the reward. Since the task is described completely by the reward function, we do not even need to know the specifics of the human policy, so long as we have the right reward function to optimize. In the general case, algorithms that solve the IRL problem can be seen as a method for leveraging expert knowledge to convert a task description into a compact reward function.

Conclusion

The foundational methods of inverse reinforcement learning can achieve their results by leveraging information obtained from a policy executed by a human expert. However, in the long run, the goal is for machine learning systems to learn from a wide range of human data and perform tasks that are beyond the abilities of human experts.

References

About the Author

Naresh is a part of Location Zero at GAVS as an AI/ML solutions developer. His focus is on solving problems leveraging AI/ML. He strongly believes in making success as an habit rather than considering it a destination. In his free time, he likes to spend time with his pet dogs and likes sketching and gardening.

Automating IT ecosystems with ZIF Remediate

Alwinking N Rajamani

Alwinking N Rajamani


Zero Incident FrameworkTM (ZIF) is an AIOps based TechOps platform that enables proactive detection and remediation of incidents helping organizations drive towards a Zero Incident Enterprise™. ZIF comprises of 5 modules, as outlined below.

This article’s focus is on the Remediate function of ZIF. Most ITSM teams envision a future of ticketless ITSM, driven by AI and Automation.

Remediate being a key module ofZIF, has more than 500+ connectors to various ITSMtools, Monitoring, Security and Incident management tools, storage/backup tools and others.Few of the connectors are referenced below that enables quick automation building.

Key Features of Remediate

  • Truly Agent-less software.
  • 300+ readily available templates – intuitive workflow/activity-based tool for process automation from a rich repository of pre-coded activities/templates.
  • No coding or programming required to create/deploy automated workflows. Easy drag & drop to sequence activities for workflow design.
  • Workflow execution scheduling for pre-determined time or triggering from events/notifications via email or SMS alerts.
  • Can be installed on-premise or on the cloud, on physical or virtual servers
  • Self Service portal for end-users/admins/help-desk to handle tasks &remediation automatically
  • Fully automated service management life cycle from incident creation to resolution and automatic closure
  • Has integration packs for all leading ITSM tools

Key features for futuristic Automation Solutions

Although the COVID pandemic has landed us in unprecedented times, we have been able to continue supporting our customers and enabled their IT operations with ZIF Remediate.

  • Self-learning capability to deliver Predictive/Prescriptive actionable alerts.
  • Access to multiple data sources and types – events, metrics, thresholds, logs, event triggers e.g. mail or SMS.
  • Support for a wide range of automation
    • Interactive Automation – Web, SMS, and email
    • Non-interactive automation – Silent based on events/trigger points
  • Supporting a wide range of advanced heuristics.

Benefits of AIOPS driven Automation

  • Faster MTTR
  • Instant identification of threats and appropriate responses
  • Faster delivery of IT services
  • Quality services leading to Employee and Customer satisfaction
  • Fulfillment and Alignment of IT services to business performance

Interactive and Non-interactive automation

Through our automation journey so far, we have understood that the best automation empowers humans, rather than replacing them. By implementing ZIF Remediate, organizations can empower their people to focus their attention on critical thinking and value-added activities and let our platform handle mundane tasks by bringing data-driven insights for decision making.

  • Interactive Automation – Web portal, Chatbot and SMS based
  • Non-interactive automations – Event or trigger driven automation

Involved decision driven Automations

ZIF Remediate has its unique, interactive automation capabilities, where many automation tools do not allow interactive decision making. Need approvals built into an automated change management process that involves sensitive aspects of your environment? Need numerous decision points that demand expert approval or oversight? We have the solution for you. Take an example of Phishing automation, here a domain or IP is blocked based on insights derived by mimicking an SOC engineer’s actions – parsing the observables i.e. URL, suspicious links or attachments in a phish mail and have those observables validated for threat against threat response tools, virus total, and others.

Some of the key benefits realized by our customers which include one of the largest manufacturing organizations, a financial services company, a large PR firm, health care organizations, and others.

  • Reduction of MTTR by 30% across various service requests.
  • Reduction of 40% of incidents/tickets, thus enabling productivity improvements.
  • Ticket triaging process automation resulting in a reduction of time taken by 50%.
  • Reclaiming TBs of storage space every week through snapshot monitoring and approval-driven model for a large virtualized environment.
  • Eliminating manual threat analysis by Phishing Automation, leading to man-hours being redirected towards more critical work.
  • Reduction of potential P1 outages by 40% through self-healing automations.

For more detailed information on ZIF Remediate, or to request a demo please visit https://zif.ai/products/remediate/

About the Author:

Alwin leads the Product Engineering for ZIF Remediate and zIrrus. He has over 20 years of IT experience spanning across Program & Portfolio Management for large customer accounts of various business verticals.

In his free time, Alwin loves going for long drives, travelling to scenic locales, doing social work and reading & meditating the Bible.

Lambda (λ), Kappa (κ) and Zeta (ζ) – The Tale of 3 AIOps Musketeers (PART-1)

Bargunan Somasundaram


Architecture inspires people, no wonder so many famous writers, artists, politicians, and designers have such profound and fascinating observations about architecture. Whether embracing minimalism or adoring resplendence, everyone has experiences and tastes that shape the way they interact with the world. The Greek architectural beauties have captured the imagination of many. The crown jewel of their architecture is the “post and lintel” which was used for their grand, large, open-air structures that could accommodate 20,000 spectators.

Greeks are also famous for their Alphabets. When the Greek Architecture and Alphabets are merged, the state-of-the-art overarching “Big Data Processing Architecture” is produced; Lambda λ, kappa κ, and Zeta ζ.

Big Data Architectural patterns

The evolution of the technologies in Big Data in the last decade has presented a history of battles with growing data volume. An increasing number of systems are being built to handle the Volume, Velocity, Variety, Veracity, Validity, and Volatility of Big Data and help gain new insights and make better business decisions. A well-designed big data architecture must handle the 6 V’s of Big Data, save your company money, and help predict future trends.

Lambda (λ) Architecture

The Lambda Architecture λis an emerging paradigm in Big Data computing. The name lambda architecture is derived from a functional point of view of data processingi.e. all data processing is understood as the application of a function to all data.

Lambda architecture is popular for its data processing technique of handling huge amounts of data by taking advantage of both a batch layer and a speed/stream-processing layer. This specific approach attempts to balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data. The outputs from both batch and speed layers can be merged before the presentation.

The efficiency of this architecture becomes evident in the form of increased throughput, reduced latency, and negligible errors, thus resulting in a linearly scalable architecture that scales out rather than scaling up.

Basic tenets of Lambda Architecture

The Lambda Architecture achieves high scalability and low latency due to the following principles,

  • Immutability of Data
  • Data Denormalization
  • Precomputed Views

Immutability of Data

The Big Data immutability is based on similar principles as the immutability in programming data structures. The goal being the same – do not change the data in-place and instead create a new one. The data can’t be altered and deleted. This rule can be defined for eternity or for a specified time period.   

Immutable data is fundamentally simpler than mutable data. The idea here is not to change the data in-place i.e. no updating or deleting of records but creating new ones. Now, this could be time-bound or for the eternity. Thus, write operations only add new data units. In CRUD parlance only CR (Create & Read) and no UD (Update & Delete).

This approach makes data handling highly scalable because it is very easy to distribute and replicate data. This immutable model makes the data aggregation kind of a logging system. With the attributes like “data creation timestamp”, the old and the most recent version can be distinguished. Apache Kafka – an append-only distributed log system is a great example of an immutable data store.

As a drawback, even more, data is generated, and answering queries becomes more difficult. For example, to find the current owner of a brand, the owner for that brand with the latest timestamp must be found.

In the mutable data model, it is no longer possible to find out that the brand Jaguar was once owned by Ford. This is different when using an immutable data model which is achieved by adding a timestamp to each data record.

Now it is possible to get both bits of information: the fact that Jaguar is now owned by Tata Motors (latest timestamp) and the fact it was formerly owned by Ford. It is also much easier to recover from errors because the old information is not deleted.

Data Denormalization

The traditional database systems are named for their storage efficiency and data integrity. It is possible due to the Normalization process like 1NF, 2NF, 3NF, BCNF, 4NF, and 5NF. Due to efficient normalization strategy, data redundancy is eliminated. The same data need not be saved in multiple places (tables) and any updates (partial or full) on the same, need not be done at multiple places (tables). But this makes the traditional databases poor at scaling their read performance since data from multiple places (tables) need to be brought together by complex and costly join operations.

For the sake of performance, Big data systems accept denormalization and duplication of data as a fact of life with the data schema such that data stored in-representation is equivalent to that after performing joins on normalized tables.

In this way, the knowledge about the schema is not necessary, and joins can be avoided, and the query results are faster. This also motivates the query-driven data modeling methodology. Albeit the data exists in multiple places after denormalization, the consistency of the data is ensured via strong consistency, timeline consistency, and eventual consistency models in the event of partial or full updates. This is often acceptable, especially when denormalized representations are used as precomputed views.

Precomputed Views

To give fast and consistent answers to queries on huge amounts of data, precomputed views are prepared both in the batch layer and in the speed layer. In the batch layer, these are constructed by applying a batch function to all the data. This leads to a transformation of the data into a more compact form suitable for answering a pre-defined set of queries. This idea is essentially the same as what is done in data warehousing.

Layers of Lambda

The Lambda Architecture solves the problem of computing arbitrary functions on arbitrary data in real-time by decomposing the problem into three layers,

  1. Batch Layer or Cold Path
  2. Speed Layer or Hot path
  3. Serving Layer

Batch layer or Cold path

The nub of the λ is the master dataset. The master dataset is the source of truth in Lambda Architecture.  The Master dataset must hold the following three properties,

  1. Data is raw.
  2. Data is immutable.
  3. Data is eternally true.

This gives the Lambda architecture ability to reconstruct the application from the master data even if the whole serving layer data set is lost. The batch layer pre-computes results using a distributed processing system that can handle very large quantities of data. The batch layer aims at perfect accuracy by being able to process all available data when generating views. 

The batch layer prefers re-computation algorithms over incremental algorithms. The problem with incremental algorithms is the failure to address the challenges faced by human mistakes. The re-computational nature of the batch layer creates simple batch views as the complexity is addressed during precomputation. Additionally, the responsibility of the batch layer is to historically process the data with high accuracy. Machine learning algorithms take time to train the model and give better results over time. Such naturally exhaustive and time-consuming tasks are processed inside the batch layer.

The problem with the batch layer is high latency. The batch jobs must be run over the entire master dataset. These jobs can process data that can be relatively old as they cannot keep up with the inflow of stream data. This is a serious limitation for real-time data processing. To overcome this limitation, the speed layer is very significant.

Frameworks and solutions such as Hadoop MapReduce, Spark core, Spark SQL, GraphX, and MLLib are the widely adopted big-data tools using batch mode. Batch schedulers include Apache Oozie, Spring Batch, and Unix crontab which, invoke the processing at a periodic interval.

Speed layer or Streaming layer or Hot path

The real-time data processing is realized in the speed layer. The speed layer achieves up-to-date query results and compensates for the high latency of the batch layer.

To create real-time views of the most recent data, this layer sacrifices throughput and decreases latency substantially. The real-time views are generated immediately after the data is received but are not as complete or precise as the batch layer.  In contrast to the re-computation approach of the batch layer, the speed layer adopts incremental computational algorithms. Since the data is not complete i.e less data so less computation. The incremental computation is more complex, but the data handled in the speed layer is vastly smaller and the views are transient.

Most operations on streams are windowed operations operating on slices of time such as moving averages for the stock process every hour, top products sold this week, fraud attempts in banking, etc. Popular choices for stream-processing tools include Apache Kafka, Apache Flume, Apache Storm, Spark Streaming, Apache Flink, Amazon Kinesis, etc.

Serving Layer

The output from both the batch and speed layers are stored in the serving layer.pre-computed batch views are indexed in this layer for faster retrieval. All the on-demand queries from the reporting or presentation layer are served by merging the batch and real-time views and outputs a result.

Query = λ (Complete data) = λ (live streaming data) * λ (Stored data)

The must-haves of the serving layer are,

  • Batch writable

The batch views for a serving layer are produced from scratch. When a new version of a view becomes available, it must be possible to completely swap out the older version with the updated view.

  • Scalable

A serving layer database must be capable of handling views of arbitrary size. As with the distributed filesystems and batch computation framework previously discussed, this requires it to be distributed across multiple machines.

  • Random reads

A serving layer database must support random reads, with indexes providing direct access to small portions of the view. This requirement is necessary to have low latency on queries.

  • Fault-tolerant

Because a serving layer database is distributed, it must be tolerant of machine failures.

This is how Lambda Architecture λ handles humongous amounts of data with low latency queries in a fault-tolerant manner. Let’s see the various implementation of lambda architecture and its applications in the next part.

To be continued…

About the Author:

Bargunan is a Big Data Engineer and a programming enthusiast. His passion is to share his knowledge by writing his experiences about them. He believes “Gaining knowledge is the first step to wisdom and sharing it is the first step to humanity.”

Prediction for Business Service Assurance

Artificial Intelligence for IT operations or AIOps has exploded over the past few years. As more and more enterprises set about their digital transformation journeys, AIOps becomes imperative to keep their businesses running smoothly. 

AIOps uses several technologies like Machine Learning and Big Data to automate the identification and resolution of common Information Technology (IT) problems. The systems, services, and applications in a large enterprise produce volumes of log and performance data. AIOps uses this data to monitor the assets and gain visibility into the behaviour and dependencies among these assets.

According to a Gartner publication, the adoption of AIOps by large enterprises would rise to 30% by 2023.

ZIF – The ideal AIOps platform of choice

Zero Incident FrameworkTM (ZIF) is an AIOps based TechOps platform that enables proactive detection and remediation of incidents helping organizations drive towards a Zero Incident Enterprise™.

ZIF comprises of 5 modules, as outlined below.

At the heart of ZIF, lies its Analyze and Predict (A&P) modules which are powered by Artificial Intelligence and Machine Learning techniques. From the business perspective, the primary goal of A&P would be 100% availability of applications and business processes.

Let us understand more about thePredict module of ZIF.

Predictive Analytics is one of the main USP of the ZIF platform. ZIF encompassesSupervised, Unsupervised and Reinforcement Learning algorithms for realization of various business use cases (as shown below).

How does the Predict Module of ZIF work?

Through its data ingestion capabilities, the ZIF platform can receive and process all types of data (both structured and unstructured) from various tools in the enterprise. The types of data can be related to alerts, events, logs, performance of devices, relations of devices, workload topologies, network topologies etc. By analyzing all these data, the platform predicts the anomalies that can occur in the environment. These anomalies get presented as ‘Opportunity Cards’ so that suitable action can be taken ahead of time to eliminate any undesired incidents from occurring. Since this is ‘Proactive’ and not ‘Reactive’, it brings about a paradigm shift to any organization’s endeavour to achieve 100% availability of their enterprise systems and platforms. Predictions are done at multiple levels – application level, business process level, device level etc.

Sub-functions of Prediction Module

How does the Predict module manifest to enterprise users of the platform?

Predict module categorizes the opportunity cards into three swim lanes.

  1. Warning swim lane – Opportunity Cards that have an “Expected Time of Impact” (ETI) beyond 60 minutes.
  2. Critical swim lane – Opportunity Cards that have an ETI within 60 minutes.
  3. Processed / Lost– Opportunity Cards that have been processed or lost without taking any action.

Few of the enterprises that realized the power of ZIF’s Prediction Module

  • A manufacturing giant in the US
  • A large non-profit mental health and social service provider in New York
  • A large mortgage loan service provider in the US
  • Two of the largest private sector banks in India

For more detailed information on GAVS’ Analyze, or to request a demo please visithttps://zif.ai/products/predict/

References:https://www.gartner.com/smarterwithgartner/how-to-get-started-with-aiops/

About the Author:

Vasudevan Gopalan

Vasu heads Engineering function for A&P. He is a Digital Transformation leader with ~20 years of IT industry experience spanning across Product Engineering, Portfolio Delivery, Large Program Management etc. Vasu has designed and delivered Open Systems, Core Banking, Web / Mobile Applications etc.

Outside of his professional role, Vasu enjoys playing badminton and focusses on fitness routines.