Analyze

Have you heard of AIOps?

Artificial intelligence for IT operations (AIOps) is an umbrella term for the application of Big Data Analytics, Machine Learning (ML) and other Artificial Intelligence (AI) technologies to automate the identification and resolution of common Information Technology (IT) problems. The systems, services and applications in a large enterprise produce immense volumes of log and performance data. AIOps uses this data to monitor the assets and gain visibility into the working behaviour and dependencies between these assets.

According to a Gartner study, the adoption of AIOps by large enterprises would rise to 30% by 2023.

ZIF – The ideal AIOps platform of choice

Zero Incident FrameworkTM (ZIF) is an AIOps based TechOps platform that enables proactive detection and remediation of incidents helping organizations drive towards a Zero Incident Enterprise™

ZIF comprises of 5 modules, as outlined below.

At the heart of ZIF, lies its Analyze and Predict (A&P) modules which are powered by Artificial Intelligence and Machine Learning techniques. From the business perspective, the primary goal of A&P would be 100% availability of applications and business processes.

Come, let us understand more about the Analyze function of ZIF.

With Analyzehaving a Big Data platform under its hood, volumes of raw monitoring data, both structured and unstructured, can be ingested and grouped to build linkages and identify failure patterns.

Data Ingestion and Correlation of Diverse Data

The module processes a wide range of data from varied data sources to break siloes while providing insights, exposing anomalies and highlighting risks across the IT landscape. It increases productivity and efficiency through actionable insights.

  • 100+ connectors for leading tools, environments and devices
  • Correlation and aggregation methods uncover patterns and relationships in the data

Noise Nullification

Eliminates duplicate incidents, false positives and any alerts that are insignificant. This also helps reduce the Mean-Time-To-Resolution and event-to-incident ratio.

  • Deep learning algorithms isolate events that have the potential to become incidents along with their potential criticality
  • Correlation and Aggregation methods group alerts and incidents that are related and needs a common remediation
  • Reinforcement learning techniques are applied to find and eliminate false positives and duplicates

Event Correlation

Data from various sources are ingested real-time into ZIF either by push or pull mechanism. As the data is ingested, labelling algorithms are run to label the data based on identifiers. The labelled data is passed through the correlation engine where unsupervised algorithms are run to mine the patterns. Sub-sequence mining algorithms help in identifying unique patterns from the data.

Unique patterns identified are clustered using clustering algorithms to form cases. Every case that is generated is marked by a unique case id. As part of the clustering process, seasonality aspects are checked from historical transactions to derive higher accuracy of correlation.

Correlation is done based on pattern recognition, helping to eliminate the need for relational CMDB from the enterprise. The accuracy of the correlation increases as patterns reoccur. Algorithms also can unlearn patterns based on the feedback that can be provided by actions taken on correlation. As these are unsupervised algorithms, the patterns are learnt with zero human intervention.

Accelerated Root Cause Analysis (RCA)

Analyze module helps in identifying the root causes of incidents even when they occur in different silos. Combination of correlation algorithms with unsupervised deep learning techniques aid in accurately nailing down the root causes of incidents/problems. Learnings from historical incidents are also applied to find root causes in real-time. The platform retraces the user journeys step-by-step to identify the exact point where an error occurs.

Customer Success Story – How ZIF’s A&P transformed IT Operations of a Manufacturing Giant

  • Seamless end-to-end monitoring – OS, DB, Applications, Networks
  • Helped achieve more than 50% noise reduction in 6 months
  • Reduced P1 incidents by ~30% through dynamic and deep monitoring
  • Achieved declining trend of MTTR and an increasing trend of Availability
  • Resulted in optimizingcommand centre/operations head count by ~50%
  • Resulted in ~80% reduction in operations TCO

For more detailed information on GAVS’ Analyze, or to request a demo please visit zif.ai/products/analyze

References: www.gartner.com/smarterwithgartner/how-to-get-started-with-aiops

ABOUT THE AUTHOR

Vasudevan Gopalan


Vasu heads Engineering function for A&P. He is a Digital Transformation leader with ~20 years of IT industry experience spanning across Product Engineering, Portfolio Delivery, Large Program Management etc. Vasu has designed and delivered Open Systems, Core Banking, Web / Mobile Applications etc.

Outside of his professional role, Vasu enjoys playing badminton and focusses on fitness routines.

READ ALSO OUR NEW UPDATES

CCPA for Healthcare

The California Consumer Privacy Act (CCPA) is a state statute intended to enhance consumer protection and data privacy rights of the residents of California, United States. It is widely considered one of the most sweeping consumer privacy laws, giving Californians the strongest data privacy rights in the U.S.

The focus of this article is CCPA as it applies to Healthcare. Let’s take a quick look at what CCPA is and then move onto its relevance for Healthcare entities. CCPA is applicable to any for-profit organization – regardless of whether it physically operates out of California – that interacts with, does business with and/or collects, processes or monetizes personal information of California residents AND meets at least one of these criteria: has annual gross revenue in excess of $25 million USD; collects or transacts with the personal information of 50,000 or more California consumers, households, or devices; earns 50% or more of its annual revenue by monetizing such data. CCPA also empowers California consumers with the rights to complete ownership; control; and security of their personal information and imposes new stringent responsibilities on businesses to enable these rights for their consumers.

Impact on Healthcare Companies

Companies directly or indirectly involved in the healthcare sector and dealing with medical information are regulated by the Confidentiality of Medical Information Act (CMIA) and the Health Insurance Portability and Accountability Act (HIPAA). CCPA does not supersede these laws & does not apply to ‘Medical Information (MI)’ as defined by CMIA, or to ‘Protected Health Information (PHI)’ as defined by HIPAA. CCPA also excludes de- identified data and information collected by federally-funded clinical trials, since such research studies are regulated by the ‘Common Rule’.

The focus of the CCPA is ‘Personal Information (PI)’ which means information that “identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.” PI refers to data including but not limited to personal identifiers such as name, address, phone numbers, email ids, social security number; personal details relating to education, employment, family, finances; biometric information, geolocation, consumer activity like purchase history, product preferences; internet activity.

So, if CCPA only regulates personal information, are healthcare companies that are already in compliance with CMIA and HIPAA safe? Is there anything else they need to do?

Well, there is a lot that needs to be done! This only implies that such companies should continue to comply with those rules when handling Medical Information as defined by the CMIA, or Protected Health Information, as defined by HIPAA. They will still need to adhere to CCPA regulations for personal data that is outside of MI and PHI. This will include

employee personal information routinely obtained and processed by the company’s HR; those collected from websites, health apps, health devices, events; clinical studies that are not funded by the federal government; information of a CCPA-covered entity that is handled by a non-profit affiliate, to give a few examples.

There are several possibilities – some not so apparent – even in healthcare entities, for personal data collection and handling that would fall under the purview of CCPA. They need to take stock of the different avenues through which they might be obtaining/handling such data and prioritize CCPA compliance. Else, with the stringent CCPA regulations, they could quickly find themselves embroiled in class action lawsuits (which by the way, do not require proof of damage to the plaintiff) in case of data breaches, or statutory penalties of up to $7500 for each violation.

The good news is that since CCPA carves out a significant chunk of data that healthcare companies/those involved in healthcare-related functions collect and process, entities that are already complying with HIPAA and CMIA are well into the CCPA compliance journey. A peek into the kind of data CMIA & HIPAA regulate will help gauge what other data needs to be taken care of.

CMIA protects the confidentiality of Medical Information (MI) which is “individually identifiable information, in electronic or physical form, in possession of or derived from a provider of health care, health care service plan, pharmaceutical company, or contractor regarding a patient’s medical history, mental or physical condition, or treatment.”

HIPAA regulates how healthcare providers, health plans, and healthcare clearinghouses, referred to as ‘covered entities’ can use and disclose Protected Health Information (PHI), and requires these entities to enable protection of data privacy. PHI refers to individually identifiable medical information such as medical records, medical bills, lab tests, scans and the like. This also covers PHI in electronic form(ePHI). The privacy and security rule of HIPAA is also applicable to ‘business associates’ who provide services to the ‘coveredentities’ that involve the use or disclosure of PHI.

Two other types of data that are CCPA exempt are Research Data & De-Identified Data. As mentioned above, the ‘Common Rule’ applies only to federally-funded research studies, and the CCPA does not provide much clarity on exemption status for data from clinical trials that are not federally-funded.

And, although the CCPA does not apply to de-identified data, the definitions of de-identified data of HIPAA and CCPA slightly differ which makes it quite likely that de-identified data by HIPAA standards may not qualify under CCPA standards and therefore would not be exempt from CCPA regulations.

Compliance Approach

Taking measures to ensure compliance with regulations is cumbersome and labour-intensive, especially with the constantly evolving regulatory environment. Using this opportunity for a proactive, well-thought-out approach for comprehensive enterprise-wide data security and governance will be strategically wise since it will minimize the need for policy and process rehaul with each new regulation.

The most crucial step is a thorough assessment of the following:

  • Policies, procedures, workflows, entities relating to/involved in data collection, sharing and processing, in order to arrive at clear enterprise-wide data mapping; to determine what data, data activities, data policies would fall under the scope of CCPA; and to identify gaps and decide on prioritized action items for compliance.
  • Business processes, contracts, terms of agreement with affiliates, partners and third-party entities the company does business with, to understand CCPA applicability. In some cases,

HIPAA and CMIA may be applicable to only the healthcare-related business units, subjecting other business units to CCPA compliance.

  • Current data handling methods, not just its privacy & security. CCPA dictates that companies need to have mechanisms put in place to cater to CCPA consumer right to request all information relating to the personal data collected about them, right to opt-out of sale of their data, right to have their data deleted by the organization (which will extend to 3rd parties doing business with this organization as well).

Consumer Consent Management

With CCPA giving full ownership and control of personal data back to its owners, consent management mechanisms become the pivot of a successful compliance strategy. An effective mechanism will ensure proper administration and enforcement of consumer authorizations.

Considering the limitations of current market solutions for data privacy and security, GAVS has come up with its Blockchain-based Rhodium Framework (pending patent) for Customer Master Data Management and Compliance with Data Privacy Laws like CCPA.

You can get more details on CCPA in general and GAVS’ solution for true CCPA Compliance in our White Paper, Blockchain Solution for CCPA Compliance.

READ ALSO OUR NEW UPDATES

Proactive Monitoring

Is your IT environment proactively monitored?

It is important to have the right monitoring solution for an enterprise’s IT environment. More than that, it is imperative to leverage the right solution and deploy it for the appropriate requirements. In this context, the IT environment includes but is not limited to Applications, Servers, Services, End-User Devices, Network devices, APIs, Databases, etc. Towards that, let us understand the need and importance of Proactive Monitoring. This has a direct role in achieving the journey towards Zero Incident EnterpriseTM. Let us unravel the difference between reactive and proactive monitoring.

Reactive Monitoring – When a problem occurs in an IT environment, it gets notified through monitoring and the concerned team acts on it to resolve the issue.The problem could be as simple as slowness/poor performance, or as extreme as the unavailability of services like web site going down or server crashing leading to loss of business and revenue.  

Proactive Monitoring – There are two levels of proactive monitoring, 

  • Symptom-based proactive monitoring is all about identifying the signals and symptoms of an issue in advance and taking appropriate and immediate action to nip the root-cause in the bud.
  • Synthetic-based proactive monitoring is achieved through Synthetic Transactions. Performance bottlenecks or failures are identified much in advance; even before the actual user or the dependent layer encounters the situation

Symptom-based proactive monitoring is a USP of the ZIF Monitor module. For example, take the case of CPU related monitoring. It is common to monitor the CPU utilization and act based on that. But Monitor doesn’t just focus on CPU utilization, there are a lot of underlying factors which causes the CPU utilization to go high. To name a few,

  • Processor queue length 
  • Processor context switches
  • Processes that are contributing to high CPU utilization

It is important to arrest these brewing factors at the right time, i.e., in the case of Processor Queue length, continuous or sustained queue of greater than 2 threads is generally an indication of congestion at processor level.Of course, in a multiple processor environment, we need to divide the queue length by the number of processors that are servicing the workload. As a remedy, the following can be done

1) the number of threads can be limited at the application level

2) unwanted processes can be killed to help close the queued items

3) upgrading the processor will help in keeping the queue length under control, which eventually will control the CPU utilization.

Above is a sample demonstration of finding the symptom and signal and arrest them proactively. ZIF’s Monitor not only monitors these symptoms, but also suggests the remedy through the recommendation from SMEs.

Synthetic monitoring (SM) is done by simulating the transactions through the tool without depending on the end-user to do the transactions. The advantages of synthetic monitoring are, 

  • it uses automated transaction simulation technology
  • it helps to monitor the environment round-the-clock 
  • it helps to validate from across different geographic locations 
  • it provides options to choose the number of flows/transactions to be verified
  • it is proactive – identifies performance bottlenecks or failures much in advance even before the actual user or the dependent layer encounters the situation

How does Synthetic Monitoring(SM) work?

It works through 3 simple steps,

1) Record key transactions – Any number of transactions can be recorded, if required, all the functional flows can be recorded. An example of transaction in an e-commerce website could be, as simple as login and view the product catalogue, or,as elaborate as login, view product catalogue, move item to cart, check-out, make-payment and logout. For simulation purpose, dummy credit cards are used during payment gateway transactions.

2) Schedule the transactions – Whether it should run every 5 minutes or x hours or minutes.

3) Choose the location from which thesetransactions need to be triggered – The SM is available as on-premise or cloud options. Cloud SM provides the options to choose the SM engines available across globe (refer to the green dots in the figure below).

This is applicable mainly for web based applications, but can also be used for the underlying APIs as well.

SM solution has engines which run the recorded transactions against the target application. Once scheduled, the SM engine hosted either on-premise or remotely (refer to the green dots in the figure shown as sample representation), will run the recorded transactions at a predefined interval. The SM dashboard provides insights as detailed under the benefits section below.

Benefits of SM

As the SM does the synthetic transactions, it provides various insights like,

  • The latency in the transactions, i.e. the speed at which the transaction is happening. This also gives a trend analysis of how the application is performing over a period.
  • If there are any failures during the transaction, SM provides the details of the failure including the stack trace of the exception. This makes fixing the failure simpler, by avoiding the time spent in debugging.
  • In case of failure, SM provides insights into the parameter details that triggered the failure.
  • Unlike real user monitoring, there is the flexibility to test all flows or at least all critical flows without waiting for the user to trigger or experience it.
  • This not only unearths the problem at the application tier but also provides deeper insights while combining it with Application, Server, Database, Network Monitoring which are part of the ZIF Monitor suite.
  • Applications working fine under one geography may fail in a different geography due to various factors like network, connectivity, etc. SM will exactly pinpoint the availability and performance across geographies.

For more detailed information on GAVS’Monitor, or to request a demo please visit, https://zif.ai/products/monitor/

About the Author

Suresh Kumar Ramasamy


Suresh heads the Monitor component of ZIF at GAVS. He has 20 years of experience in Native Applications, Web, Cloud and Hybrid platforms from Engineering to Product Management. He has designed & hosted the monitoring solutions. He has been instrumental in conglomerating components to structure the Environment Performance Management suite of ZIF Monitor.

Suresh enjoys playing badminton with his children. He is passionate about gardening, especially medicinal plants.

READ ALSO OUR NEW UPDATES

Monitoring Microservices and Containers

Monitoring applications and infrastructure is a critical part of IT Operations. Among other things, monitoring provides alerts on failures, alerts on deteriorations that could potentially lead to failures, and performance data that can be analysed to gain insights. AI-led IT Ops Platforms like ZIF use such data from their monitoring component to deliver pattern recognition-based predictions and proactive remediation, leading to improved availability, system performance and hence better user experience.

The shift away from monolith applications towards microservices has posed a formidable challenge for monitoring tools. Let’s first take a quick look at what microservices are, to understand better the complications in monitoring them.

Monoliths vs Microservices

A single application(monolith) is split into a number of modular services called microservices, each of which typically caters to one capability of the application. These microservices are loosely coupled, can communicate with each other and can be deployed independently.

Quite likely the trigger for this architecture was the need for agility. Since microservices are stand-alone modules, they can follow their own build/deploy cycles enabling rapid scaling and deployments. They usually have a small codebase which aids easy maintainability and quick recovery from issues. The modularity of these microservices gives complete autonomy over the design, implementation and technology stack used to build them.

Microservices run inside containers that provide their execution environment. Although microservices could also be run in virtual machines(VMs), containers are preferred since they are comparatively lightweight as they share the host’s operating system, unlike VMs. Docker and CoreOS Rkt are a couple of commonly used container solutions while Kubernetes, Docker Swarm, and Apache Mesos are popular container orchestration platforms. The image below depicts microservices for hiring, performance appraisal, rewards & recognition, payroll, analytics and the like linked together to deliver the HR function.

Challenges in Monitoring Microservices and Containers

Since all good things come at a cost, you are probably wondering what it is here… well, the flip side to this evolutionary architecture is increased complexity! These are some contributing factors:

Exponential increase in the number of objects: With each application replaced by multiple microservices, 360-degree visibility and observability into all the services, their interdependencies, their containers/VMs, communication channels, workflows and the like can become very elusive. When one service goes down, the environment gets flooded with notifications not just from the service that is down, but from all services dependent on it as well. Sifting through this cascade of alerts, eliminating noise and zeroing in on the crux of the problem becomes a nightmare.

Shared Responsibility: Since processes are fragmented and the responsibility for their execution, like for instance a customer ordering a product online, is shared amongst the services, basic assumptions of traditional monitoring methods are challenged. The lack of a simple linear path, the need to collate data from different services for each process, inability to map a client request to a single transaction because of the number of services involved make performance tracking that much more difficult.

Design Differences: Due to the design/implementation autonomy that microservices enjoy, they could come with huge design differences, and implemented using different technology stacks. They might be using open source or third-party software that makes it difficult to instrument their code, which in turn affects their monitoring.

Elasticity and Transience: Elastic landscapes where infrastructure scales or collapses based on demand, instances appear & disappear dynamically, have changed the game for monitoring tools. They need to be updated to handle elastic environments, be container-aware and stay in-step with the provisioning layer. A couple of interesting aspects to handle are: recognizing the difference between an instance that is down versus an instance that is no longer available; data of instances that are no longer alive continue to have value for analysis of operational efficiency or past performance.

Mobility: This is another dimension of dynamic infra where objects don’t necessarily stay in the same place, they might be moved between data centers or clouds for better load balancing, maintenance needs or outages. The monitoring layer needs to arm itself with new strategies to handle moving targets.

Resource Abstraction: Microservices deployed in containers do not have a direct relationship with their host or the underlying operating system. This abstraction is what helps seamless migration between hosts but comes at the expense of complicating monitoring.

Communication over the network: The many moving parts of distributed applications rely completely on network communication. Consequently, the increase in network traffic puts a heavy strain on network resources necessitating intensive network monitoring and a focused effort to maintain network health.

What needs to be measured

This is a high-level laundry list of what needs to be done/measured while monitoring microservices and their containers.

Auto-discovery of containers and microservices:

As we’ve seen, monitoring microservices in a containerized world is a whole new ball game. In the highly distributed, dynamic infra environment where ephemeral containers scale, shrink and move between nodes on demand, traditional monitoring methods using agents to get information will not work. The monitoring system needs to automatically discover and track the creation/destruction of containers and explore services running in them.

Microservices:

  • Availability and performance of individual services
  • Host and infrastructure metrics
  • Microservice metrics
  • APIs and API transactions
    • Ensure API transactions are available and stable
    • Isolate problematic transactions and endpoints
  • Dependency mapping and correlation
  • Features relating to traditional APM

Containers:

  • Detailed information relating to each container
    • Health of clusters, master and slave nodes
  • Number of clusters
  • Nodes per cluster
  • Containers per cluster
    • Performance of core Docker engine
    • Performance of container instances

Things to consider while adapting to the new IT landscape

Granularity and Aggregation: With the increase in the number of objects in the system, it is important to first understand the performance target of what’s being measured – for instance, if a service targets 99% uptime(yearly), polling it every minute would be an overkill. Based on this, data granularity needs to be set prudently for each aspect measured, and can be aggregated where appropriate. This is to prevent data inundation that could overwhelm the monitoring module and drive up costs associated with data collection, storage, and management.    

Monitor Containers: The USP of containers is the abstraction they provide to microservices, encapsulating and shielding them from the details of the host or operating system. While this makes microservices portable, it makes them hard to reach for monitoring. Two recommended solutions for this are to instrument the microservice code to generate stats and/or traces for all actions (can be used for distributed tracing) and secondly to get all container activity information through host operating system instrumentation.    

Track Services through the Container Orchestration Platform: While we could obtain container-level data from the host kernel, it wouldn’t give us holistic information about the service since there could be several containers that constitute a service. Container-native monitoring solutions could use metadata from the container orchestration platform by drilling into appropriate layers of the platform to obtain service-level metrics. 

Adapt to dynamic IT landscapes: As mentioned earlier, today’s IT landscape is dynamically provisioned, elastic and characterized by mobile and transient objects. Monitoring systems themselves need to be elastic and deployable across multiple locations to cater to distributed systems and leverage native monitoring solutions for private clouds.

API Monitoring: Monitoring APIs can provide a wealth of information in the black box world of containers. Tracking API calls from the different entities – microservices, container solution, container orchestration platform, provisioning system, host kernel can help extract meaningful information and make sense of the fickle environment.

Watch this space for more on Monitoring and other IT Ops topics. You can find our blog on Monitoring for Success here, which gives an overview of the Monitorcomponent of GAVS’ AIOps Platform, Zero Incident FrameworkTM (ZIF). You can Request a Demo or Watch how ZIF works here.

About the Author:

Sivaprakash Krishnan


Bio – Siva is a long timer at Gavs and has been with the company for close to 15 years. He started his career as a developer and is now an architect with a strong technology background in Java, Big Data, DevOps, Cloud Computing, Containers and Micro Services. He has successfully designed & created a stable Monitoring Platform for ZIF, and designed & driven cloud assessment and migration, enterprise BRMS and IoT based solutions for many of our customers. He is currently focused on building ZIF 4.0, a new gen business-oriented TechOps platform.

Padmapriya Sridhar


Bio – Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel and Yoga. She aspires to become a Yoga Instructor some day!

Monitoring for Success

Do you know if your end users are happy?

(In the context of users of Applications (desktop, web or cloud-based), Services, Servers and components of IT environment, directly or indirectly.)

The question may sound trivial, but it has a significant impact on the success of a company. The user experience is a journey, from the time they use the application or service, till after they complete the interaction. Experience can be determined based on factors like Speed, Performance, Flawlessness, Ease of use, Security, Resolution time, among others. Hence, monitoring the ‘Wow’ & ‘Woe’ moments of the users is vital.

Monitor is a component of GAVS’ AIOps Platform, Zero Incident FrameworkTM (ZIF). One of the key objectives of the Monitor platform is to measure and improve end-user experience. This component monitors all the layers (includes but not limited to application, database, server, APIs, end-points, and network devices) in real-time that are involved in the user experience. Ultimately,this helps to drive the environment towards Zero Incidents.

This figure shows the capability of ZIF monitoring that cut across all layers starting from end-user to storage and how it is linked to other the components of the platform

Key Features of ZIF Monitor are,

  • Unified solution for all IT environment monitoring needs: The platform covers the end-to-end monitoring of an IT landscape. The key focus is to ensure all verticals of IT are brought under thorough monitoring. The deeper the monitoring, the closer an organization is to attaining a Zero Incident EnterpriseTM.
  • Agents with self-intelligence: The intelligent agents capture various health parameters about the environment. When the target environment is already running under low resource, the agent will not task it with more load. It will collect the health-related metrics and communicate through the telemetry channel efficiently and effectively. The intelligence is applied in terms of parameters to be collected, the period of collection and many more.
  • Depth of monitoring: The core strength of Monitor is it comes with a list of performance counters which are defined by SMEs across all layers of the IT environment. This is a key differentiator; the monitoring parameters can be dynamically configured for the target environment. Parameters can be added or removed on a need basis.
  • Agent & Agentless (Remote): The customers can choose from Agent & Agentless options for the solutions. The remote solution is called as Centralized Remote Monitoring Solution (CRMS). Each monitoring parameter can be remotely controlled and defined from the CRMS. Even the agents that are running in the target environment can be controlled from the server console.
  • Compliance: Plays a key role in terms of the compliance of the environment. Compliance ranges from ensuring the availability of necessary services and processes in the target environment and defines the standard of what Application, Make, Version, Provider, Size, etc. that are allowed in the target environment.
  • Auto discovery: Monitor can auto-discover the newer elements (servers, endpoints, databases, devices, etc.) that are getting added to the environment. It can automatically add those newer elements into the purview of monitoring.
  • Auto scale: Centralized Remote Monitoring Solution (CRMS) can auto-scale on its own when newer elements are added for monitoring through auto-discovery. The auto scale includes various aspects, like load on channel, load on individual polling engine, and load on each agentless solution.
  • Real time user & Synthetic Monitoring: Real-time user monitoring is to monitor the environment when the user is active. Synthetic monitoring is through simulated techniques. It doesn’t wait for the user to make a transaction or use the system. Instead, it simulates the scenario and provide insights to make decision proactively.
  • Availability & status of devices connected: Monitor also includes the monitoring of availability and control of USB and COM port devices that are connected.
  • Black box monitoring: It is not always possible to instrument the application to get insights.Hence, the Black Box technique is used. Here the application is treated as a black box and it is monitored in terms of its interaction with the Kernel & OS through performance counters.
High level overview of Monitor’s components,

  • Agents, Agentless: These are the means through which monitoring is done at the target environment, like user devices, servers, network devices, load balancers, virtualized environment, API layers, databases, replications, storage devices, etc.
  • ZIF Telemetry Channel: The performance telemetry that are collected from source to target are passed through this channel to the big data platform.
  • Telemetry Data: Refers to the performance data and other metrics collected from all over the environment.
  • Telemetry Database:This is the big data platform, in which the telemetry data from all sources are captured and stored.
  • Intelligence Engine: This parses the telemetry data in near real time and raises notifications based on rule-based threshold and as well as through dynamic threshold.
  • Dashboard&Alerting Mechanism: These are the means through which the results of monitoring are conveyed as metrics in dashboard and as well as notifications.
  • Integration with Analyze, Predict & Remediate components: Monitoring module communicates the telemetry to Analyze & Predict components of the ZIF platform for it to use the data for analysis and apply Machine Learning for prediction. Both Monitor & Predict components, communicate with Remediate platform to trigger remediation.

The Monitor component works in tandem with Analyze, Predict and Remediate components of the ZIF platform to achieve an incident free IT environment. Implementation of ZIF is the right step to driving an enterprise towards Zero Incidents. ZIF is the only platform in the industry which comes from the single product platform owner who owns the end-to-end IP of the solution with products developed from scratch.

For more detailed information on GAVS’ Monitor, or to request a demo please visit zif.ai/products/monitor/

(To be continued…)

About the Author

Suresh Kumar Ramasamy


Suresh heads the Monitor component of ZIF at GAVS. He has 20 years of experience in Native Applications, Web, Cloud and Hybrid platforms from Engineering to Product Management. He has designed & hosted the monitoring solutions. He has been instrumental in conglomerating components to structure the Environment Performance Management suite of ZIF Monitor.

Suresh enjoys playing badminton with his children. He is passionate about gardening, especially medicinal plants.

READ ALSO OUR NEW UPDATES

Cleaning up our Digital Dirt

Now, what exactly is digital dirt, in the context of enterprises? It is highly complex and ambiguous to precisely identify digital dirt, let alone address the related issues. Chandra Mouleswaran S, Head of Infra Services at GAVS Technologies says that not all the applications that run in an organization are actually required to run. The applications that exist, but not used by internal or external users or internal or external applications contribute to digital dirt. Such dormant applications get accumulated over time due to the uncertainty of their usage and lack of clarity in sunsetting them. They stay in the organization forever and waste resources, time and effort. Such hidden applications burden the system, hence they need to be discovered and removed to improve operational efficiency.

Are we prepared to clean the trash? The process of eliminating digital dirt can be cumbersome. We cannot fix what we do not find. So, the first step is to find them using a specialized application for discovery. Chandra further elaborated on the expectations from the ‘Discovery’ application. It should be able to detect all applications, the relationships of those applications with the rest of the environment and the users using those applications. It should give complete visibility into applications and infrastructure components to analyze the dependencies.

Shadow IT

Shadow IT, the use of technology outside the IT purview is becoming a tacitly approved aspect of most modern enterprises. As many as 71% of employees across organizations are using unsanctioned apps on devices of every shape and size, making it very difficult for IT departments to keep track. The evolution of shadow IT is a result of technology becoming simpler and the cloud offering easy connectivity to applications and storage. Because of this, people have begun to cherry-pick those things that would help them get things done easily.

Shadow IT may not start or evolve with bad intentions. But, when employees take things into their own hands, it is a huge security and compliance risk, if the sprawling shadow IT is not reined in. Gartner estimates that by next year (2020), one-third of successful attacks experienced by enterprises will be on their shadow IT resources.

The Discovery Tool

IT organizations should deploy a tool that gives complete visibility of the landscape, discovers all applications – be it single tenant or multi-tenant, single or multiple instance, native or virtually delivered, on-premise or on cloud and map the dependencies between them. That apart, the tool should also indicate the activities on those applications by showing the users who access them and the response times in real-time. The dependency map along with user transactions captured over time will paint a very clear picture for IT Managers and might bring to light some applications and their dependencies, that they probably never knew existed!

Discover, is a component of GAVS’ AIOps Platform,Zero Incident Framework™ (ZIF). Discover can work as a stand-alone component and also cohesively with the rest of the AIOps Platform. Discover provides Application Auto Discovery and Dependency Mapping (ADDM). It automatically discovers and maps the applications and topology of the end to end deployment, hop by hop. Some of its key features are:

  • Zero Configuration

The auto-discovery features require no additional configuration upon installation.

  • Discovers Applications

It uniquely and automatically discovers all Windows and Linux application in your environment, identifies it by name, and measures the end-to-end and hop-by-hop response time and throughput of each application. This works for applications installed on physical servers, in virtualized guest operating systems, applications automatically provisioned in private or hybrid clouds, and those running in public clouds. It also works irrespective of whether the application was custom developed or purchased.

  • Discovers Multitenant Applications

It auto-discovers multitenant applications hosted on web servers and does not limit the discovery to the logical server level.

  • Discovers Multiple Instances of Application

It auto-discovers multiple instances of the same application and presents them all as a group with the ability to drill down to the details of each instance of the application.

  • Discovers SaaS Applications

It auto-discovers any requests directed to SaaS applications such as Office 365 or Salesforce and calculates response time and throughput to these applications from the enterprise.

  • Discovers Virtually Delivered Applications or Desktops

It automatically maps the topology of the delivered applications and VDIs, hop-by-hop and end-to-end. It provides extensive support for Citrix delivered applications or desktops. This visibility extends beyond the Citrix farm into the back-end infrastructure on which the delivered applications and VDIs are supported.

  • Discovers Application Workload Topologies

The architecture auto-discovers application flow mapping topology and user response times to create the application topology and update it in near real-time — all without user configuration. This significantly reduces the resources required to configure service models and operate the product.

  • Discovers Every Tier of Every Multi-Tiered Application

It auto-discovers the different tiers of every multi-tiered application and displays the performance of each tier. Each tier is discovered and named with the transactional throughput and response times shown for each tier.

  • Discovers All Users of All Applications

It identifies each user of every application and the response time that the user experiences for each use of a given application.

  • Discovers Anomalies with Applications

The module uses a sophisticated anomaly detection algorithm to automatically assess when a response time excursion is valid, then if a response exceeds normal baseline or SLA performance expectations, deep diagnostics are triggered to analyze the event. In addition, the hop-by-hop segment latency is compared against the historical norms to identify deterministically which segment has extended latency and reduced application performance.

For more detailed information on GAVS’ Discover, or to request a demo please visit

Discover

About the Authors:

Chandra Mouleswaran S:

Chandra heads the IMS practice at GAVS. He has around 25+ years of rich experience in IT Infrastructure Management, enterprise applications design & development and incubation of new products / services in various industries. He has also created a patent for a mistake proofing application called ‘Advanced Command Interface”. He thinks ahead and his implementation of ‘disk based backup using SAN replication’ in one of his previous organizations as early as in 2005 is a proof of his visionary skills.

Sri Chaganty:

Sri is a Serial Entrepreneur with over 30 years’ experience delivering creative, client-centric, value-driven solutions for bootstrapped and venture-backed startups.

A Deep Dive into Deep Learning!

The Nobel Prize winner & French author André Gide said, “Man cannot discover new oceans unless he has the courage to lose sight of the shore”. This rings true with enterprises that made bold investments in cutting-edge AI that are now starting to reap rich benefits. Artificial Intelligence is shredding all perceived boundaries of a machine’s cognitive abilities. Deep Learning, at the very core of Artificial Intelligence, is pushing the envelope still further into unchartered territory. According to Gartner, “Deep Learning is here to stay and expands ML by allowing intermediate representations of the data”.

What is Deep Learning?

Deep Learning is a subset of Machine Learning that is based on Artificial Neural Networks (ANN). It is an attempt to mimic the phenomenal learning mechanisms of the human brain and train AI models to perform cognitive tasks like speech recognition, image classification, face recognition, natural language processing (NLP) and the like.

The tens of billions of neurons and their connections to each other form the brain’s neural network. Although Artificial Neural Networks have been around for quite a few decades now, they are now gaining momentum due to the declining price of storage and the exponential growth of processing power. This winning combination of low-cost storage and high computational prowess is bringing back Deep Learning from the woods.

Improved machine learning algorithms and the availability of staggering amounts of diverse unstructured data such as streaming and textual data, are boosting performance of Deep Learning systems. The performance of the ANN depends heavily on how much data it is trained with and it continuously adapts and evolves its learning with time as it is exposed to more & more datasets.

Simply put, the ANN consists of an Input layer, hidden computational layers, and the Output layer. If there is more than one hidden layer between the Input & Output layers, then it is called a Deep Network.

The Neural Network

The Neuron is central to the human Neural Network. Neurons have Dendrites, which are the receivers of information and the Axon which is the transmitter. The Axon is connected to the Dendrites of other neurons, through which signal transmission takes place. The signals that are passed are called Synapses.

While the neuron by itself cannot accomplish much, it creates magic when it forms connections with the other neurons to form an interconnected neural network. In artificial neural networks, the neuron is represented by a node or a unit. There are several interconnected layers of such units, categorized as input, output and hidden, as seen in the figure. 

A Deep Dive into Deep Learning!

The input layer receives the input values and passes them onto the first hidden layer in the ANN, similar to how our senses receive inputs from the environment around us & send signals to the brain. Let’s look at what happens in one node when it receives these input values from the different nodes of the input layer. The values are standardized/normalized-so that they are all within a certain range-and then weighted. Weights are crucial to a neural network since a value’s weight is indicative its impact on the outcome. An activation function is then applied to the weighted sum of values, to help determine if this transformed value needs to be passed on within the network. Some commonly used activation functions are the Threshold, Sigmoid and Rectifier functions.

This gives a very high-level idea of the generic structure and functioning of an ANN. The actual implementation would use one of several different architectures of neural networks that define how the layers are connected together, and what functions and algorithms are used to transform the input data. To give a couple of examples, a Convolutional network uses nonlinear activation functions and is highly efficient at processing nonlinear data like speech, image and video while a Recurrent network has information flowing around recursively, is much more complicated and difficult to train, but that much more powerful. Recurrent networks are closer in representation to the human neural network and are best suited for applications like sequence generation and predicting stock prices.

Deep Learning at work

Deep Learning has been adopted by almost all industry verticals at least at some level. To give some interesting examples, the automobile industry employs it in self-driving vehicles and driver-assistance services, the entertainment industry applies it to auto-addition of audio to silent movies and social media uses deep learning for curation of content feeds in user’s timelines. Alexa, Cortana, Google Assistant and Siri have now invaded our homes to provide virtual assistance!

Deep Learning has several applications in the field of Computer Vision, which is an umbrella term for what the computer “sees”, that is, interpreting digital visual content like images, photos or videos. This includes helping the computer learn & perform tasks like Image Classification, Object Detection, Image Reconstruction, to name a few. Image classification or image recognition when localized, can be used in Healthcare for instance, to locate cancerous regions in an x-ray and highlight them.

Deep Learning applied to Face Recognition has changed the face of research in this area. Several computational layers are used for feature extraction, with the complexity and abstraction of the learnt feature increasing with each layer, making it pretty robust for applications like public surveillance or public security in buildings. But there are still many challenges like the identification of facial features across styles, ages, poses, effects of surgery that need to be tackled before FR can be reliably used in areas like watch-list surveillance, forensic tasks which demand high levels of accuracy and low alarm rates. 

Similarly, there are several applications of deep learning for Natural Language Processing. Text Classification can be used for Spam filtering, Speech recognition can be used to transcribe a speech, or create captions for a movie, and Machine translation can be used for translation of speech and text from one language to another.

Closing Thoughts

As evident, the possibilities are endless and the road ahead for Deep Learn is exciting! But, despite the tremendous progress in Deep Learning, we are still very far from human-level AI. AI models can only perform local generalizations and adapt to new situations that are similar to past data, whereas human cognition is capable of quickly acclimatizing to radically novel circumstances. Nevertheless, this arduous R&D journey has nurtured a new-found respect for nature’s engineering miracle – the infinitely complex human brain!

Is Your Investment in TRUE AI?

Yes, AIOps the messiah of ITOps is here to stay! The Executive decision now is on the who and how, rather than when. With a plethora of products in the market offering varying shades of AIOps capabilities, choosing the right vendor is critical, to say the least.

Exclusively AI-based Ops?

Simply put, AIOps platforms leverage Big Data & AI technologies to enhance IT operations. Gartner defines Acquire, Aggregate, Analyze & Act as the four stages of AIOps. These four fall under the purview of Monitoring tools, AIOps Platforms & Action Platforms. However, there is no Industry-recognized mandatory feature list to be supported, for a Platform to be classified as AIOps. Due to this ambiguity in what an AIOps Platform needs to Deliver, huge investments made in rosy AIOps promises can lead to sub-optimal ROI, disillusionment or even derailed projects. Some Points to Ponder…

  • Quality in, Quality out. The value delivered from an AIOps investment is heavily dependent on what data goes into the system. How sure can we be that IT Asset or Device monitoring data provided by the Customer is not outdated, inaccurate or patchy? How sure can we be that we have full visibility of the entire IT landscape? With Shadow IT becoming a tacitly approved aspect of modern Enterprises, are we seeing all devices, applications and users? Doesn’t this imply that only an AIOps Platform providing Application Discovery & Topology Mapping, Monitoring features would be able to deliver accurate insights?
  • There is a very thin line between Also AI and Purely AI. Behind the scenes, most AIOps Platforms are reliant on CMDB or similar tools, which makes Insights like Event Correlation, Noise Reduction etc., rule-based. Where is the AI here?
  • In Gartner’s Market Guide, apart from support features for the different data types, Automated Pattern Discovery is the only other Capability taken into account for the Capabilities of AIOps Vendors matrix. With Gartner being one of the most trusted Technology Research and Advisory companies, it is natural for decision makers to zero-in on one of these listed vendors. What is not immediately evident is that there is so much more to AIOps than just this, and with so much at stake, companies need to do their homework and take informed decisions before finalizing their vendor.
  • Most AIOps vendors ingest, provide access to & store heterogenous data for analysis, and provide actionable Insights and RCA; at which point the IT team takes over. This is a huge leap forward, since it helps IT work through the data clutter and significantly reduces MTTR. But, due to the absence of comprehensive Predictive, Prescriptive & Remediation features, these are not end-to-end AIOps Platforms.
  • At the bleeding edge of the Capability Spectrum is Auto-Remediation based on Predictive & Prescriptive insights. A Comprehensive end-to-end AIOps Platform would need to provide a Virtual Engineer for Auto-Remediation. But, this is a grey area not fully catered to by AIOps vendors.  

The big question now is, if an AIOps Platform requires human intervention or multiple external tools to take care of different missing aspects, can it rightfully claim to be true end-to-end AIOps?

So, what do we do?

Time for you to sit back and relax! Introducing ZIF- One Solution for all your ITOps ills!

We have you completely covered with the full suite of tools that an IT infrastructure team would need. We deliver the entire AIOps Capability spectrum and beyond.

ZIF (Zero Incident Framework™) is an AIOps based TechOps platform that enables proactive Detection and Remediation of incidents helping organizations drive towards a Zero Incident Enterprise™.

The Key Differentiator is that ZIF is a Pure-play AI Platform powered by Unsupervised Pattern-based Machine Learning Algorithms. This is what sets us a Class Apart.

  • Rightly aligns with the Gartner AIOps strategy. ZIF is based on and goes beyond the AIOps framework
  • Huge Investments in developing various patented AI Machine Learning algorithms, Auto-Discovery modules, Agent & Agentless Application Monitoring tools, Network sniffers, Process Automation, Remediation & Orchestration capabilities to form Zero Incident Framework™
  • Powered entirely by Unsupervised Pattern-based Machine Learning Algorithms, ZIF needs no further human intervention and is completely Self-Reliant
  • Unsupervised ML empowers ZIF to learn autonomously, glean Predictive & Prescriptive Intelligence and even uncover Latent Insights
  • The 5 Modules can work together cohesively or as independent stand-alone components
  • Can be Integrated with existing Monitoring and ITSM tools, as required
  • Applies LEAN IT Principle and is on an ambitious journey towards FRICTIONLESS IT.

Realizing a Zero Incident EnterpriseTM

AIOps Demystified

IT Infrastructure has been on an incredibly fascinating journey from the days of mainframes housed in big rooms just a few decades ago, to mini computers, personal computers, client-servers, enterprise & mobile networks, virtual machines and the cloud! While mobile technologies have made computing omnipresent, the cloud coupled with technologies like virtual computing and containers has changed the traditional IT industry in unimaginable ways and has fuelled the rise of service-oriented architectures where everything is offered as a service and on-demand. Infrastructure as a Service (IaaS), Platform as a Service (PaaS), DBaaS, MBaaS, SaaS and so on.

As companies try to grapple with this technology explosion, it is very clear that the first step has to be optimization of the IT infrastructure & operations. Efficient ITOps has become the foundation not just to aid transformational business initiatives, but even for basic survival in this competitive world.

The term AIOps was first coined by Gartner based on their research on Algorithmic IT Operations. Now, it refers to the use of Artificial Intelligence(AI) for IT Operations(Ops), which is the use of Big Data Analytics and AI technologies to optimize, automate and supercharge all aspects of IT Operations.

Why AI in IT operations?

The promise behind bringing AI into the picture has been to do what humans have been doing, but better, faster and at a much larger scale. Let’s delve into the different aspects of IT operations and see how AI can make a difference.

Visibility

The first step to effectively managing the IT landscape is to get complete visibility into it. Why is that so difficult? The sheer variety and volume of applications, users and environments make it extremely challenging to get a full 360 degree view of the landscape. Most organizations use applications that are web-based, virtually delivered, vendor-built, custom-made, synchronous/asynchronous/batch processing, written using different programming languages and/or for different operating systems, SaaS, running in public/private/hybrid cloud environments, multi-tenant, multiple instances of the same applications, multi-tiered, legacy, running in silos! Adding to this complexity is the rampant issue of shadow IT, which is the use of applications outside the purview of IT, triggered by the easy availability of and access to applications and storage on the cloud. And, that’s not all! After all the applications have been discovered, they need to be mapped to the topology, their performances need to be baselined and tracked, all users in the system have to be found and their user experiences captured.

The enormity of this challenge is now evident. AI powers auto-discovery of all applications, topology mapping, baselining response times and tracking all users of all these applications. Machine Learning algorithms aid in self-learning, unlearning and auto-correction to provide a highly accurate view of the IT landscape.

Monitoring

When the IT landscape has been completely discovered, the next step is to monitor the infrastructure and application stacks. Monitoring tools provide real-time data on their availability and performance based on relevant metrics.

The problem is two-fold here. Typically, IT organizations need to rely on several monitoring tools that cater to the different environments/domains in the landscape. Since these tools work in silos, they give a very fractured view of the entire system, necessitating data correlation before it can be gainfully used for Root Cause Analysis(RCA) or actionable insights.

Pattern recognition-based learning from current and historical data helps correlate these seemingly independent events, and therefore to recognize & alert deviations, performance degradations or capacity utilization bottlenecks in real-time and consequently enable effective Root Cause Analysis(RCA) and reduce an important KPI, Mean Time to Identify(MTTI).

Secondly, there is colossal amounts of data in the form of logs, events, metrics pouring in at high velocity from all these monitoring tools, creating alert fatigue. This makes it almost impossible for the IT support team to check each event, correlate with the other events, tag and prioritize them and plan remedial action.

Inherently, machines handle volume with ease and when programmed with ML algorithms learn to sift through all the noise and zero-in on what is relevant. Noise nullification is achieved by the use of Deep Learning algorithms that isolate events that have the potential to become incidents and Reinforcement Learning algorithms that find and eliminate duplicates and false positives. These capabilities help organizations bring dramatic improvements to another critical ITOps metric, Mean Time to Resolution(MTTR).

Other areas of ITOps where AI brings a lot of value are in Advanced Analytics- Predictive & Prescriptive- and Remediation.

Advanced Analytics

Unplanned IT Outages result in huge financial losses for companies and even worse, a sharp dip in customer confidence. One of the biggest value-adds of AI for ITOps then, is in driving proactive operations that deliver superior user experiences with predictable uptime. Advanced Analytics on historical incident data identifies patterns, causes and situations in the entire stack(infrastructure, networks, services and applications) that lead to an outage. Multivariate predictive algorithms drive predictions of incident and service request volumes, spikes and lulls way in advance. AIOps tools forecast usage patterns and capacity requirements to enable planning, just-in-time procurement and staffing to optimize resource utilization. Reactive purchases after the fact, can be very disruptive & expensive.

Remediation

AI-powered remediation automates remedial workflows & service actions, saving a lot of manual effort and reducing errors, incidents and cost of operations. Use of chatbots provides round-the-clock customer support, guiding users to troubleshoot standard problems, and auto-assigns tickets to appropriate IT staff. Dynamic capacity orchestration based on predicted usage patterns and capacity needs induces elasticity and eliminates performance degradation caused by inefficient capacity planning.

Conclusion

The beauty of AIOps is that it gets better with age as the learning matures on exposure to more and more data. While AIOps is definitely a blessing for IT Ops teams, it is only meant to augment the human workforce and not to replace them entirely. And importantly, it is not a one-size-fits-all approach to AIOps. Understanding current pain points and future goals and finding an AIOps vendor with relevant offerings is the cornerstone of a successful implementation.

GAVS’ Zero Incident Framework TM (ZIF) is an AIOps-based TechOps Platform that enables organizations to trend towards a Zero Incident Enterprise TM. ZIF comes with an end-to-end suite of tools for ITOps needs. It is a pure-play AI Platform powered entirely by Unsupervised Pattern-based Machine Learning! You can learn more about ZIF or request a demo here.

READ ALSO OUR NEW UPDATES

AIOps Demystified

IT Infrastructure has been on an incredibly fascinating journey from the days of mainframes housed in big rooms just a few decades ago, to mini computers, personal computers, client-servers, enterprise & mobile networks, virtual machines and the cloud! While mobile technologies have made computing omnipresent, the cloud coupled with technologies like virtual computing and containers has changed the traditional IT industry in unimaginable ways and has fuelled the rise of service-oriented architectures where everything is offered as a service and on-demand. Infrastructure as a Service (IaaS), Platform as a Service (PaaS), DBaaS, MBaaS, SaaS and so on.

As companies try to grapple with this technology explosion, it is very clear that the first step has to be optimization of the IT infrastructure & operations. Efficient ITOps has become the foundation not just to aid transformational business initiatives, but even for basic survival in this competitive world.

The term AIOps was first coined by Gartner based on their research on Algorithmic IT Operations. Now, it refers to the use of Artificial Intelligence(AI) for IT Operations(Ops), which is the use of Big Data Analytics and AI technologies to optimize, automate and supercharge all aspects of IT Operations.

Why AI in IT operations?

The promise behind bringing AI into the picture has been to do what humans have been doing, but better, faster and at a much larger scale. Let’s delve into the different aspects of IT operations and see how AI can make a difference.

Visibility

The first step to effectively managing the IT landscape is to get complete visibility into it. Why is that so difficult? The sheer variety and volume of applications, users and environments make it extremely challenging to get a full 360 degree view of the landscape. Most organizations use applications that are web-based, virtually delivered, vendor-built, custom-made, synchronous/asynchronous/batch processing, written using different programming languages and/or for different operating systems, SaaS, running in public/private/hybrid cloud environments, multi-tenant, multiple instances of the same applications, multi-tiered, legacy, running in silos! Adding to this complexity is the rampant issue of shadow IT, which is the use of applications outside the purview of IT, triggered by the easy availability of and access to applications and storage on the cloud. And, that’s not all! After all the applications have been discovered, they need to be mapped to the topology, their performances need to be baselined and tracked, all users in the system have to be found and their user experiences captured.

The enormity of this challenge is now evident. AI powers auto-discovery of all applications, topology mapping, baselining response times and tracking all users of all these applications. Machine Learning algorithms aid in self-learning, unlearning and auto-correction to provide a highly accurate view of the IT landscape.

Monitoring

When the IT landscape has been completely discovered, the next step is to monitor the infrastructure and application stacks. Monitoring tools provide real-time data on their availability and performance based on relevant metrics.

The problem is two-fold here. Typically, IT organizations need to rely on several monitoring tools that cater to the different environments/domains in the landscape. Since these tools work in silos, they give a very fractured view of the entire system, necessitating data correlation before it can be gainfully used for Root Cause Analysis(RCA) or actionable insights.

Pattern recognition-based learning from current and historical data helps correlate these seemingly independent events, and therefore to recognize & alert deviations, performance degradations or capacity utilization bottlenecks in real-time and consequently enable effective Root Cause Analysis(RCA) and reduce an important KPI, Mean Time to Identify(MTTI).

Secondly, there is colossal amounts of data in the form of logs, events, metrics pouring in at high velocity from all these monitoring tools, creating alert fatigue. This makes it almost impossible for the IT support team to check each event, correlate with the other events, tag and prioritize them and plan remedial action.

Inherently, machines handle volume with ease and when programmed with ML algorithms learn to sift through all the noise and zero-in on what is relevant. Noise nullification is achieved by the use of Deep Learning algorithms that isolate events that have the potential to become incidents and Reinforcement Learning algorithms that find and eliminate duplicates and false positives. These capabilities help organizations bring dramatic improvements to another critical ITOps metric, Mean Time to Resolution(MTTR).

Other areas of ITOps where AI brings a lot of value are in Advanced Analytics- Predictive & Prescriptive- and Remediation.

Advanced Analytics

Unplanned IT Outages result in huge financial losses for companies and even worse, a sharp dip in customer confidence. One of the biggest value-adds of AI for ITOps then, is in driving proactive operations that deliver superior user experiences with predictable uptime. Advanced Analytics on historical incident data identifies patterns, causes and situations in the entire stack(infrastructure, networks, services and applications) that lead to an outage. Multivariate predictive algorithms drive predictions of incident and service request volumes, spikes and lulls way in advance. AIOps tools forecast usage patterns and capacity requirements to enable planning, just-in-time procurement and staffing to optimize resource utilization. Reactive purchases after the fact, can be very disruptive & expensive.

Remediation

AI-powered remediation automates remedial workflows & service actions, saving a lot of manual effort and reducing errors, incidents and cost of operations. Use of chatbots provides round-the-clock customer support, guiding users to troubleshoot standard problems, and auto-assigns tickets to appropriate IT staff. Dynamic capacity orchestration based on predicted usage patterns and capacity needs induces elasticity and eliminates performance degradation caused by inefficient capacity planning.

Conclusion

The beauty of AIOps is that it gets better with age as the learning matures on exposure to more and more data. While AIOps is definitely a blessing for IT Ops teams, it is only meant to augment the human workforce and not to replace them entirely. And importantly, it is not a one-size-fits-all approach to AIOps. Understanding current pain points and future goals and finding an AIOps vendor with relevant offerings is the cornerstone of a successful implementation.

GAVS’ Zero Incident Framework TM (ZIF) is an AIOps-based TechOps Platform that enables organizations to trend towards a Zero Incident Enterprise TM. ZIF comes with an end-to-end suite of tools for ITOps needs. It is a pure-play AI Platform powered entirely by Unsupervised Pattern-based Machine Learning! You can learn more about ZIF or request a demo here.

READ ALSO OUR NEW UPDATES