Analyze

Have you heard of AIOps?

Artificial intelligence for IT operations (AIOps) is an umbrella term for the application of Big Data Analytics, Machine Learning (ML) and other Artificial Intelligence (AI) technologies to automate the identification and resolution of common Information Technology (IT) problems. The systems, services and applications in a large enterprise produce immense volumes of log and performance data. AIOps uses this data to monitor the assets and gain visibility into the working behaviour and dependencies between these assets.

According to a Gartner study, the adoption of AIOps by large enterprises would rise to 30% by 2023.

ZIF – The ideal AIOps platform of choice

Zero Incident FrameworkTM (ZIF) is an AIOps based TechOps platform that enables proactive detection and remediation of incidents helping organizations drive towards a Zero Incident Enterprise™

ZIF comprises of 5 modules, as outlined below.

At the heart of ZIF, lies its Analyze and Predict (A&P) modules which are powered by Artificial Intelligence and Machine Learning techniques. From the business perspective, the primary goal of A&P would be 100% availability of applications and business processes.

Come, let us understand more about the Analyze function of ZIF.

With Analyzehaving a Big Data platform under its hood, volumes of raw monitoring data, both structured and unstructured, can be ingested and grouped to build linkages and identify failure patterns.

Data Ingestion and Correlation of Diverse Data

The module processes a wide range of data from varied data sources to break siloes while providing insights, exposing anomalies and highlighting risks across the IT landscape. It increases productivity and efficiency through actionable insights.

  • 100+ connectors for leading tools, environments and devices
  • Correlation and aggregation methods uncover patterns and relationships in the data

Noise Nullification

Eliminates duplicate incidents, false positives and any alerts that are insignificant. This also helps reduce the Mean-Time-To-Resolution and event-to-incident ratio.

  • Deep learning algorithms isolate events that have the potential to become incidents along with their potential criticality
  • Correlation and Aggregation methods group alerts and incidents that are related and needs a common remediation
  • Reinforcement learning techniques are applied to find and eliminate false positives and duplicates

Event Correlation

Data from various sources are ingested real-time into ZIF either by push or pull mechanism. As the data is ingested, labelling algorithms are run to label the data based on identifiers. The labelled data is passed through the correlation engine where unsupervised algorithms are run to mine the patterns. Sub-sequence mining algorithms help in identifying unique patterns from the data.

Unique patterns identified are clustered using clustering algorithms to form cases. Every case that is generated is marked by a unique case id. As part of the clustering process, seasonality aspects are checked from historical transactions to derive higher accuracy of correlation.

Correlation is done based on pattern recognition, helping to eliminate the need for relational CMDB from the enterprise. The accuracy of the correlation increases as patterns reoccur. Algorithms also can unlearn patterns based on the feedback that can be provided by actions taken on correlation. As these are unsupervised algorithms, the patterns are learnt with zero human intervention.

Accelerated Root Cause Analysis (RCA)

Analyze module helps in identifying the root causes of incidents even when they occur in different silos. Combination of correlation algorithms with unsupervised deep learning techniques aid in accurately nailing down the root causes of incidents/problems. Learnings from historical incidents are also applied to find root causes in real-time. The platform retraces the user journeys step-by-step to identify the exact point where an error occurs.

Customer Success Story – How ZIF’s A&P transformed IT Operations of a Manufacturing Giant

  • Seamless end-to-end monitoring – OS, DB, Applications, Networks
  • Helped achieve more than 50% noise reduction in 6 months
  • Reduced P1 incidents by ~30% through dynamic and deep monitoring
  • Achieved declining trend of MTTR and an increasing trend of Availability
  • Resulted in optimizingcommand centre/operations head count by ~50%
  • Resulted in ~80% reduction in operations TCO

For more detailed information on GAVS’ Analyze, or to request a demo please visit zif.ai/products/analyze

References: www.gartner.com/smarterwithgartner/how-to-get-started-with-aiops

ABOUT THE AUTHOR

Vasudevan Gopalan


Vasu heads Engineering function for A&P. He is a Digital Transformation leader with ~20 years of IT industry experience spanning across Product Engineering, Portfolio Delivery, Large Program Management etc. Vasu has designed and delivered Open Systems, Core Banking, Web / Mobile Applications etc.

Outside of his professional role, Vasu enjoys playing badminton and focusses on fitness routines.

READ ALSO OUR NEW UPDATES

CCPA for Healthcare

The California Consumer Privacy Act (CCPA) is a state statute intended to enhance consumer protection and data privacy rights of the residents of California, United States. It is widely considered one of the most sweeping consumer privacy laws, giving Californians the strongest data privacy rights in the U.S.

The focus of this article is CCPA as it applies to Healthcare. Let’s take a quick look at what CCPA is and then move onto its relevance for Healthcare entities. CCPA is applicable to any for-profit organization – regardless of whether it physically operates out of California – that interacts with, does business with and/or collects, processes or monetizes personal information of California residents AND meets at least one of these criteria: has annual gross revenue in excess of $25 million USD; collects or transacts with the personal information of 50,000 or more California consumers, households, or devices; earns 50% or more of its annual revenue by monetizing such data. CCPA also empowers California consumers with the rights to complete ownership; control; and security of their personal information and imposes new stringent responsibilities on businesses to enable these rights for their consumers.

Impact on Healthcare Companies

Companies directly or indirectly involved in the healthcare sector and dealing with medical information are regulated by the Confidentiality of Medical Information Act (CMIA) and the Health Insurance Portability and Accountability Act (HIPAA). CCPA does not supersede these laws & does not apply to ‘Medical Information (MI)’ as defined by CMIA, or to ‘Protected Health Information (PHI)’ as defined by HIPAA. CCPA also excludes de- identified data and information collected by federally-funded clinical trials, since such research studies are regulated by the ‘Common Rule’.

The focus of the CCPA is ‘Personal Information (PI)’ which means information that “identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.” PI refers to data including but not limited to personal identifiers such as name, address, phone numbers, email ids, social security number; personal details relating to education, employment, family, finances; biometric information, geolocation, consumer activity like purchase history, product preferences; internet activity.

So, if CCPA only regulates personal information, are healthcare companies that are already in compliance with CMIA and HIPAA safe? Is there anything else they need to do?

Well, there is a lot that needs to be done! This only implies that such companies should continue to comply with those rules when handling Medical Information as defined by the CMIA, or Protected Health Information, as defined by HIPAA. They will still need to adhere to CCPA regulations for personal data that is outside of MI and PHI. This will include

employee personal information routinely obtained and processed by the company’s HR; those collected from websites, health apps, health devices, events; clinical studies that are not funded by the federal government; information of a CCPA-covered entity that is handled by a non-profit affiliate, to give a few examples.

There are several possibilities – some not so apparent – even in healthcare entities, for personal data collection and handling that would fall under the purview of CCPA. They need to take stock of the different avenues through which they might be obtaining/handling such data and prioritize CCPA compliance. Else, with the stringent CCPA regulations, they could quickly find themselves embroiled in class action lawsuits (which by the way, do not require proof of damage to the plaintiff) in case of data breaches, or statutory penalties of up to $7500 for each violation.

The good news is that since CCPA carves out a significant chunk of data that healthcare companies/those involved in healthcare-related functions collect and process, entities that are already complying with HIPAA and CMIA are well into the CCPA compliance journey. A peek into the kind of data CMIA & HIPAA regulate will help gauge what other data needs to be taken care of.

CMIA protects the confidentiality of Medical Information (MI) which is “individually identifiable information, in electronic or physical form, in possession of or derived from a provider of health care, health care service plan, pharmaceutical company, or contractor regarding a patient’s medical history, mental or physical condition, or treatment.”

HIPAA regulates how healthcare providers, health plans, and healthcare clearinghouses, referred to as ‘covered entities’ can use and disclose Protected Health Information (PHI), and requires these entities to enable protection of data privacy. PHI refers to individually identifiable medical information such as medical records, medical bills, lab tests, scans and the like. This also covers PHI in electronic form(ePHI). The privacy and security rule of HIPAA is also applicable to ‘business associates’ who provide services to the ‘coveredentities’ that involve the use or disclosure of PHI.

Two other types of data that are CCPA exempt are Research Data & De-Identified Data. As mentioned above, the ‘Common Rule’ applies only to federally-funded research studies, and the CCPA does not provide much clarity on exemption status for data from clinical trials that are not federally-funded.

And, although the CCPA does not apply to de-identified data, the definitions of de-identified data of HIPAA and CCPA slightly differ which makes it quite likely that de-identified data by HIPAA standards may not qualify under CCPA standards and therefore would not be exempt from CCPA regulations.

Compliance Approach

Taking measures to ensure compliance with regulations is cumbersome and labour-intensive, especially with the constantly evolving regulatory environment. Using this opportunity for a proactive, well-thought-out approach for comprehensive enterprise-wide data security and governance will be strategically wise since it will minimize the need for policy and process rehaul with each new regulation.

The most crucial step is a thorough assessment of the following:

  • Policies, procedures, workflows, entities relating to/involved in data collection, sharing and processing, in order to arrive at clear enterprise-wide data mapping; to determine what data, data activities, data policies would fall under the scope of CCPA; and to identify gaps and decide on prioritized action items for compliance.
  • Business processes, contracts, terms of agreement with affiliates, partners and third-party entities the company does business with, to understand CCPA applicability. In some cases,

HIPAA and CMIA may be applicable to only the healthcare-related business units, subjecting other business units to CCPA compliance.

  • Current data handling methods, not just its privacy & security. CCPA dictates that companies need to have mechanisms put in place to cater to CCPA consumer right to request all information relating to the personal data collected about them, right to opt-out of sale of their data, right to have their data deleted by the organization (which will extend to 3rd parties doing business with this organization as well).

Consumer Consent Management

With CCPA giving full ownership and control of personal data back to its owners, consent management mechanisms become the pivot of a successful compliance strategy. An effective mechanism will ensure proper administration and enforcement of consumer authorizations.

Considering the limitations of current market solutions for data privacy and security, GAVS has come up with its Blockchain-based Rhodium Framework (pending patent) for Customer Master Data Management and Compliance with Data Privacy Laws like CCPA.

You can get more details on CCPA in general and GAVS’ solution for true CCPA Compliance in our White Paper, Blockchain Solution for CCPA Compliance.

READ ALSO OUR NEW UPDATES

Monitoring Microservices and Containers

Monitoring applications and infrastructure is a critical part of IT Operations. Among other things, monitoring provides alerts on failures, alerts on deteriorations that could potentially lead to failures, and performance data that can be analysed to gain insights. AI-led IT Ops Platforms like ZIF use such data from their monitoring component to deliver pattern recognition-based predictions and proactive remediation, leading to improved availability, system performance and hence better user experience.

The shift away from monolith applications towards microservices has posed a formidable challenge for monitoring tools. Let’s first take a quick look at what microservices are, to understand better the complications in monitoring them.

Monoliths vs Microservices

A single application(monolith) is split into a number of modular services called microservices, each of which typically caters to one capability of the application. These microservices are loosely coupled, can communicate with each other and can be deployed independently.

Quite likely the trigger for this architecture was the need for agility. Since microservices are stand-alone modules, they can follow their own build/deploy cycles enabling rapid scaling and deployments. They usually have a small codebase which aids easy maintainability and quick recovery from issues. The modularity of these microservices gives complete autonomy over the design, implementation and technology stack used to build them.

Microservices run inside containers that provide their execution environment. Although microservices could also be run in virtual machines(VMs), containers are preferred since they are comparatively lightweight as they share the host’s operating system, unlike VMs. Docker and CoreOS Rkt are a couple of commonly used container solutions while Kubernetes, Docker Swarm, and Apache Mesos are popular container orchestration platforms. The image below depicts microservices for hiring, performance appraisal, rewards & recognition, payroll, analytics and the like linked together to deliver the HR function.

Challenges in Monitoring Microservices and Containers

Since all good things come at a cost, you are probably wondering what it is here… well, the flip side to this evolutionary architecture is increased complexity! These are some contributing factors:

Exponential increase in the number of objects: With each application replaced by multiple microservices, 360-degree visibility and observability into all the services, their interdependencies, their containers/VMs, communication channels, workflows and the like can become very elusive. When one service goes down, the environment gets flooded with notifications not just from the service that is down, but from all services dependent on it as well. Sifting through this cascade of alerts, eliminating noise and zeroing in on the crux of the problem becomes a nightmare.

Shared Responsibility: Since processes are fragmented and the responsibility for their execution, like for instance a customer ordering a product online, is shared amongst the services, basic assumptions of traditional monitoring methods are challenged. The lack of a simple linear path, the need to collate data from different services for each process, inability to map a client request to a single transaction because of the number of services involved make performance tracking that much more difficult.

Design Differences: Due to the design/implementation autonomy that microservices enjoy, they could come with huge design differences, and implemented using different technology stacks. They might be using open source or third-party software that makes it difficult to instrument their code, which in turn affects their monitoring.

Elasticity and Transience: Elastic landscapes where infrastructure scales or collapses based on demand, instances appear & disappear dynamically, have changed the game for monitoring tools. They need to be updated to handle elastic environments, be container-aware and stay in-step with the provisioning layer. A couple of interesting aspects to handle are: recognizing the difference between an instance that is down versus an instance that is no longer available; data of instances that are no longer alive continue to have value for analysis of operational efficiency or past performance.

Mobility: This is another dimension of dynamic infra where objects don’t necessarily stay in the same place, they might be moved between data centers or clouds for better load balancing, maintenance needs or outages. The monitoring layer needs to arm itself with new strategies to handle moving targets.

Resource Abstraction: Microservices deployed in containers do not have a direct relationship with their host or the underlying operating system. This abstraction is what helps seamless migration between hosts but comes at the expense of complicating monitoring.

Communication over the network: The many moving parts of distributed applications rely completely on network communication. Consequently, the increase in network traffic puts a heavy strain on network resources necessitating intensive network monitoring and a focused effort to maintain network health.

What needs to be measured

This is a high-level laundry list of what needs to be done/measured while monitoring microservices and their containers.

Auto-discovery of containers and microservices:

As we’ve seen, monitoring microservices in a containerized world is a whole new ball game. In the highly distributed, dynamic infra environment where ephemeral containers scale, shrink and move between nodes on demand, traditional monitoring methods using agents to get information will not work. The monitoring system needs to automatically discover and track the creation/destruction of containers and explore services running in them.

Microservices:

  • Availability and performance of individual services
  • Host and infrastructure metrics
  • Microservice metrics
  • APIs and API transactions
    • Ensure API transactions are available and stable
    • Isolate problematic transactions and endpoints
  • Dependency mapping and correlation
  • Features relating to traditional APM

Containers:

  • Detailed information relating to each container
    • Health of clusters, master and slave nodes
  • Number of clusters
  • Nodes per cluster
  • Containers per cluster
    • Performance of core Docker engine
    • Performance of container instances

Things to consider while adapting to the new IT landscape

Granularity and Aggregation: With the increase in the number of objects in the system, it is important to first understand the performance target of what’s being measured – for instance, if a service targets 99% uptime(yearly), polling it every minute would be an overkill. Based on this, data granularity needs to be set prudently for each aspect measured, and can be aggregated where appropriate. This is to prevent data inundation that could overwhelm the monitoring module and drive up costs associated with data collection, storage, and management.    

Monitor Containers: The USP of containers is the abstraction they provide to microservices, encapsulating and shielding them from the details of the host or operating system. While this makes microservices portable, it makes them hard to reach for monitoring. Two recommended solutions for this are to instrument the microservice code to generate stats and/or traces for all actions (can be used for distributed tracing) and secondly to get all container activity information through host operating system instrumentation.    

Track Services through the Container Orchestration Platform: While we could obtain container-level data from the host kernel, it wouldn’t give us holistic information about the service since there could be several containers that constitute a service. Container-native monitoring solutions could use metadata from the container orchestration platform by drilling into appropriate layers of the platform to obtain service-level metrics. 

Adapt to dynamic IT landscapes: As mentioned earlier, today’s IT landscape is dynamically provisioned, elastic and characterized by mobile and transient objects. Monitoring systems themselves need to be elastic and deployable across multiple locations to cater to distributed systems and leverage native monitoring solutions for private clouds.

API Monitoring: Monitoring APIs can provide a wealth of information in the black box world of containers. Tracking API calls from the different entities – microservices, container solution, container orchestration platform, provisioning system, host kernel can help extract meaningful information and make sense of the fickle environment.

Watch this space for more on Monitoring and other IT Ops topics. You can find our blog on Monitoring for Success here, which gives an overview of the Monitorcomponent of GAVS’ AIOps Platform, Zero Incident FrameworkTM (ZIF). You can Request a Demo or Watch how ZIF works here.

About the Author:

Sivaprakash Krishnan


Bio – Siva is a long timer at Gavs and has been with the company for close to 15 years. He started his career as a developer and is now an architect with a strong technology background in Java, Big Data, DevOps, Cloud Computing, Containers and Micro Services. He has successfully designed & created a stable Monitoring Platform for ZIF, and designed & driven cloud assessment and migration, enterprise BRMS and IoT based solutions for many of our customers. He is currently focused on building ZIF 4.0, a new gen business-oriented TechOps platform.

Padmapriya Sridhar


Bio – Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel and Yoga. She aspires to become a Yoga Instructor some day!

Monitoring for Success

Do you know if your end users are happy?

(In the context of users of Applications (desktop, web or cloud-based), Services, Servers and components of IT environment, directly or indirectly.)

The question may sound trivial, but it has a significant impact on the success of a company. The user experience is a journey, from the time they use the application or service, till after they complete the interaction. Experience can be determined based on factors like Speed, Performance, Flawlessness, Ease of use, Security, Resolution time, among others. Hence, monitoring the ‘Wow’ & ‘Woe’ moments of the users is vital.

Monitor is a component of GAVS’ AIOps Platform, Zero Incident FrameworkTM (ZIF). One of the key objectives of the Monitor platform is to measure and improve end-user experience. This component monitors all the layers (includes but not limited to application, database, server, APIs, end-points, and network devices) in real-time that are involved in the user experience. Ultimately,this helps to drive the environment towards Zero Incidents.

This figure shows the capability of ZIF monitoring that cut across all layers starting from end-user to storage and how it is linked to other the components of the platform

Key Features of ZIF Monitor are,

  • Unified solution for all IT environment monitoring needs: The platform covers the end-to-end monitoring of an IT landscape. The key focus is to ensure all verticals of IT are brought under thorough monitoring. The deeper the monitoring, the closer an organization is to attaining a Zero Incident EnterpriseTM.
  • Agents with self-intelligence: The intelligent agents capture various health parameters about the environment. When the target environment is already running under low resource, the agent will not task it with more load. It will collect the health-related metrics and communicate through the telemetry channel efficiently and effectively. The intelligence is applied in terms of parameters to be collected, the period of collection and many more.
  • Depth of monitoring: The core strength of Monitor is it comes with a list of performance counters which are defined by SMEs across all layers of the IT environment. This is a key differentiator; the monitoring parameters can be dynamically configured for the target environment. Parameters can be added or removed on a need basis.
  • Agent & Agentless (Remote): The customers can choose from Agent & Agentless options for the solutions. The remote solution is called as Centralized Remote Monitoring Solution (CRMS). Each monitoring parameter can be remotely controlled and defined from the CRMS. Even the agents that are running in the target environment can be controlled from the server console.
  • Compliance: Plays a key role in terms of the compliance of the environment. Compliance ranges from ensuring the availability of necessary services and processes in the target environment and defines the standard of what Application, Make, Version, Provider, Size, etc. that are allowed in the target environment.
  • Auto discovery: Monitor can auto-discover the newer elements (servers, endpoints, databases, devices, etc.) that are getting added to the environment. It can automatically add those newer elements into the purview of monitoring.
  • Auto scale: Centralized Remote Monitoring Solution (CRMS) can auto-scale on its own when newer elements are added for monitoring through auto-discovery. The auto scale includes various aspects, like load on channel, load on individual polling engine, and load on each agentless solution.
  • Real time user & Synthetic Monitoring: Real-time user monitoring is to monitor the environment when the user is active. Synthetic monitoring is through simulated techniques. It doesn’t wait for the user to make a transaction or use the system. Instead, it simulates the scenario and provide insights to make decision proactively.
  • Availability & status of devices connected: Monitor also includes the monitoring of availability and control of USB and COM port devices that are connected.
  • Black box monitoring: It is not always possible to instrument the application to get insights.Hence, the Black Box technique is used. Here the application is treated as a black box and it is monitored in terms of its interaction with the Kernel & OS through performance counters.
High level overview of Monitor’s components,

  • Agents, Agentless: These are the means through which monitoring is done at the target environment, like user devices, servers, network devices, load balancers, virtualized environment, API layers, databases, replications, storage devices, etc.
  • ZIF Telemetry Channel: The performance telemetry that are collected from source to target are passed through this channel to the big data platform.
  • Telemetry Data: Refers to the performance data and other metrics collected from all over the environment.
  • Telemetry Database:This is the big data platform, in which the telemetry data from all sources are captured and stored.
  • Intelligence Engine: This parses the telemetry data in near real time and raises notifications based on rule-based threshold and as well as through dynamic threshold.
  • Dashboard&Alerting Mechanism: These are the means through which the results of monitoring are conveyed as metrics in dashboard and as well as notifications.
  • Integration with Analyze, Predict & Remediate components: Monitoring module communicates the telemetry to Analyze & Predict components of the ZIF platform for it to use the data for analysis and apply Machine Learning for prediction. Both Monitor & Predict components, communicate with Remediate platform to trigger remediation.

The Monitor component works in tandem with Analyze, Predict and Remediate components of the ZIF platform to achieve an incident free IT environment. Implementation of ZIF is the right step to driving an enterprise towards Zero Incidents. ZIF is the only platform in the industry which comes from the single product platform owner who owns the end-to-end IP of the solution with products developed from scratch.

For more detailed information on GAVS’ Monitor, or to request a demo please visit zif.ai/products/monitor/

(To be continued…)

About the Author

Suresh Kumar Ramasamy


Suresh heads the Monitor component of ZIF at GAVS. He has 20 years of experience in Native Applications, Web, Cloud and Hybrid platforms from Engineering to Product Management. He has designed & hosted the monitoring solutions. He has been instrumental in conglomerating components to structure the Environment Performance Management suite of ZIF Monitor.

Suresh enjoys playing badminton with his children. He is passionate about gardening, especially medicinal plants.

READ ALSO OUR NEW UPDATES

Cleaning up our Digital Dirt

Now, what exactly is digital dirt, in the context of enterprises? It is highly complex and ambiguous to precisely identify digital dirt, let alone address the related issues. Chandra Mouleswaran S, Head of Infra Services at GAVS Technologies says that not all the applications that run in an organization are actually required to run. The applications that exist, but not used by internal or external users or internal or external applications contribute to digital dirt. Such dormant applications get accumulated over time due to the uncertainty of their usage and lack of clarity in sunsetting them. They stay in the organization forever and waste resources, time and effort. Such hidden applications burden the system, hence they need to be discovered and removed to improve operational efficiency.

Are we prepared to clean the trash? The process of eliminating digital dirt can be cumbersome. We cannot fix what we do not find. So, the first step is to find them using a specialized application for discovery. Chandra further elaborated on the expectations from the ‘Discovery’ application. It should be able to detect all applications, the relationships of those applications with the rest of the environment and the users using those applications. It should give complete visibility into applications and infrastructure components to analyze the dependencies.

Shadow IT

Shadow IT, the use of technology outside the IT purview is becoming a tacitly approved aspect of most modern enterprises. As many as 71% of employees across organizations are using unsanctioned apps on devices of every shape and size, making it very difficult for IT departments to keep track. The evolution of shadow IT is a result of technology becoming simpler and the cloud offering easy connectivity to applications and storage. Because of this, people have begun to cherry-pick those things that would help them get things done easily.

Shadow IT may not start or evolve with bad intentions. But, when employees take things into their own hands, it is a huge security and compliance risk, if the sprawling shadow IT is not reined in. Gartner estimates that by next year (2020), one-third of successful attacks experienced by enterprises will be on their shadow IT resources.

The Discovery Tool

IT organizations should deploy a tool that gives complete visibility of the landscape, discovers all applications – be it single tenant or multi-tenant, single or multiple instance, native or virtually delivered, on-premise or on cloud and map the dependencies between them. That apart, the tool should also indicate the activities on those applications by showing the users who access them and the response times in real-time. The dependency map along with user transactions captured over time will paint a very clear picture for IT Managers and might bring to light some applications and their dependencies, that they probably never knew existed!

Discover, is a component of GAVS’ AIOps Platform,Zero Incident Framework™ (ZIF). Discover can work as a stand-alone component and also cohesively with the rest of the AIOps Platform. Discover provides Application Auto Discovery and Dependency Mapping (ADDM). It automatically discovers and maps the applications and topology of the end to end deployment, hop by hop. Some of its key features are:

  • Zero Configuration

The auto-discovery features require no additional configuration upon installation.

  • Discovers Applications

It uniquely and automatically discovers all Windows and Linux application in your environment, identifies it by name, and measures the end-to-end and hop-by-hop response time and throughput of each application. This works for applications installed on physical servers, in virtualized guest operating systems, applications automatically provisioned in private or hybrid clouds, and those running in public clouds. It also works irrespective of whether the application was custom developed or purchased.

  • Discovers Multitenant Applications

It auto-discovers multitenant applications hosted on web servers and does not limit the discovery to the logical server level.

  • Discovers Multiple Instances of Application

It auto-discovers multiple instances of the same application and presents them all as a group with the ability to drill down to the details of each instance of the application.

  • Discovers SaaS Applications

It auto-discovers any requests directed to SaaS applications such as Office 365 or Salesforce and calculates response time and throughput to these applications from the enterprise.

  • Discovers Virtually Delivered Applications or Desktops

It automatically maps the topology of the delivered applications and VDIs, hop-by-hop and end-to-end. It provides extensive support for Citrix delivered applications or desktops. This visibility extends beyond the Citrix farm into the back-end infrastructure on which the delivered applications and VDIs are supported.

  • Discovers Application Workload Topologies

The architecture auto-discovers application flow mapping topology and user response times to create the application topology and update it in near real-time — all without user configuration. This significantly reduces the resources required to configure service models and operate the product.

  • Discovers Every Tier of Every Multi-Tiered Application

It auto-discovers the different tiers of every multi-tiered application and displays the performance of each tier. Each tier is discovered and named with the transactional throughput and response times shown for each tier.

  • Discovers All Users of All Applications

It identifies each user of every application and the response time that the user experiences for each use of a given application.

  • Discovers Anomalies with Applications

The module uses a sophisticated anomaly detection algorithm to automatically assess when a response time excursion is valid, then if a response exceeds normal baseline or SLA performance expectations, deep diagnostics are triggered to analyze the event. In addition, the hop-by-hop segment latency is compared against the historical norms to identify deterministically which segment has extended latency and reduced application performance.

For more detailed information on GAVS’ Discover, or to request a demo please visit

Discover

About the Authors:

Chandra Mouleswaran S:

Chandra heads the IMS practice at GAVS. He has around 25+ years of rich experience in IT Infrastructure Management, enterprise applications design & development and incubation of new products / services in various industries. He has also created a patent for a mistake proofing application called ‘Advanced Command Interface”. He thinks ahead and his implementation of ‘disk based backup using SAN replication’ in one of his previous organizations as early as in 2005 is a proof of his visionary skills.

Sri Chaganty:

Sri is a Serial Entrepreneur with over 30 years’ experience delivering creative, client-centric, value-driven solutions for bootstrapped and venture-backed startups.

AIOps Demystified

IT Infrastructure has been on an incredibly fascinating journey from the days of mainframes housed in big rooms just a few decades ago, to mini computers, personal computers, client-servers, enterprise & mobile networks, virtual machines and the cloud! While mobile technologies have made computing omnipresent, the cloud coupled with technologies like virtual computing and containers has changed the traditional IT industry in unimaginable ways and has fuelled the rise of service-oriented architectures where everything is offered as a service and on-demand. Infrastructure as a Service (IaaS), Platform as a Service (PaaS), DBaaS, MBaaS, SaaS and so on.

As companies try to grapple with this technology explosion, it is very clear that the first step has to be optimization of the IT infrastructure & operations. Efficient ITOps has become the foundation not just to aid transformational business initiatives, but even for basic survival in this competitive world.

The term AIOps was first coined by Gartner based on their research on Algorithmic IT Operations. Now, it refers to the use of Artificial Intelligence(AI) for IT Operations(Ops), which is the use of Big Data Analytics and AI technologies to optimize, automate and supercharge all aspects of IT Operations.

Why AI in IT operations?

The promise behind bringing AI into the picture has been to do what humans have been doing, but better, faster and at a much larger scale. Let’s delve into the different aspects of IT operations and see how AI can make a difference.

Visibility

The first step to effectively managing the IT landscape is to get complete visibility into it. Why is that so difficult? The sheer variety and volume of applications, users and environments make it extremely challenging to get a full 360 degree view of the landscape. Most organizations use applications that are web-based, virtually delivered, vendor-built, custom-made, synchronous/asynchronous/batch processing, written using different programming languages and/or for different operating systems, SaaS, running in public/private/hybrid cloud environments, multi-tenant, multiple instances of the same applications, multi-tiered, legacy, running in silos! Adding to this complexity is the rampant issue of shadow IT, which is the use of applications outside the purview of IT, triggered by the easy availability of and access to applications and storage on the cloud. And, that’s not all! After all the applications have been discovered, they need to be mapped to the topology, their performances need to be baselined and tracked, all users in the system have to be found and their user experiences captured.

The enormity of this challenge is now evident. AI powers auto-discovery of all applications, topology mapping, baselining response times and tracking all users of all these applications. Machine Learning algorithms aid in self-learning, unlearning and auto-correction to provide a highly accurate view of the IT landscape.

Monitoring

When the IT landscape has been completely discovered, the next step is to monitor the infrastructure and application stacks. Monitoring tools provide real-time data on their availability and performance based on relevant metrics.

The problem is two-fold here. Typically, IT organizations need to rely on several monitoring tools that cater to the different environments/domains in the landscape. Since these tools work in silos, they give a very fractured view of the entire system, necessitating data correlation before it can be gainfully used for Root Cause Analysis(RCA) or actionable insights.

Pattern recognition-based learning from current and historical data helps correlate these seemingly independent events, and therefore to recognize & alert deviations, performance degradations or capacity utilization bottlenecks in real-time and consequently enable effective Root Cause Analysis(RCA) and reduce an important KPI, Mean Time to Identify(MTTI).

Secondly, there is colossal amounts of data in the form of logs, events, metrics pouring in at high velocity from all these monitoring tools, creating alert fatigue. This makes it almost impossible for the IT support team to check each event, correlate with the other events, tag and prioritize them and plan remedial action.

Inherently, machines handle volume with ease and when programmed with ML algorithms learn to sift through all the noise and zero-in on what is relevant. Noise nullification is achieved by the use of Deep Learning algorithms that isolate events that have the potential to become incidents and Reinforcement Learning algorithms that find and eliminate duplicates and false positives. These capabilities help organizations bring dramatic improvements to another critical ITOps metric, Mean Time to Resolution(MTTR).

Other areas of ITOps where AI brings a lot of value are in Advanced Analytics- Predictive & Prescriptive- and Remediation.

Advanced Analytics

Unplanned IT Outages result in huge financial losses for companies and even worse, a sharp dip in customer confidence. One of the biggest value-adds of AI for ITOps then, is in driving proactive operations that deliver superior user experiences with predictable uptime. Advanced Analytics on historical incident data identifies patterns, causes and situations in the entire stack(infrastructure, networks, services and applications) that lead to an outage. Multivariate predictive algorithms drive predictions of incident and service request volumes, spikes and lulls way in advance. AIOps tools forecast usage patterns and capacity requirements to enable planning, just-in-time procurement and staffing to optimize resource utilization. Reactive purchases after the fact, can be very disruptive & expensive.

Remediation

AI-powered remediation automates remedial workflows & service actions, saving a lot of manual effort and reducing errors, incidents and cost of operations. Use of chatbots provides round-the-clock customer support, guiding users to troubleshoot standard problems, and auto-assigns tickets to appropriate IT staff. Dynamic capacity orchestration based on predicted usage patterns and capacity needs induces elasticity and eliminates performance degradation caused by inefficient capacity planning.

Conclusion

The beauty of AIOps is that it gets better with age as the learning matures on exposure to more and more data. While AIOps is definitely a blessing for IT Ops teams, it is only meant to augment the human workforce and not to replace them entirely. And importantly, it is not a one-size-fits-all approach to AIOps. Understanding current pain points and future goals and finding an AIOps vendor with relevant offerings is the cornerstone of a successful implementation.

GAVS’ Zero Incident Framework TM (ZIF) is an AIOps-based TechOps Platform that enables organizations to trend towards a Zero Incident Enterprise TM. ZIF comes with an end-to-end suite of tools for ITOps needs. It is a pure-play AI Platform powered entirely by Unsupervised Pattern-based Machine Learning! You can learn more about ZIF or request a demo here.

READ ALSO OUR NEW UPDATES

AIOps Demystified

IT Infrastructure has been on an incredibly fascinating journey from the days of mainframes housed in big rooms just a few decades ago, to mini computers, personal computers, client-servers, enterprise & mobile networks, virtual machines and the cloud! While mobile technologies have made computing omnipresent, the cloud coupled with technologies like virtual computing and containers has changed the traditional IT industry in unimaginable ways and has fuelled the rise of service-oriented architectures where everything is offered as a service and on-demand. Infrastructure as a Service (IaaS), Platform as a Service (PaaS), DBaaS, MBaaS, SaaS and so on.

As companies try to grapple with this technology explosion, it is very clear that the first step has to be optimization of the IT infrastructure & operations. Efficient ITOps has become the foundation not just to aid transformational business initiatives, but even for basic survival in this competitive world.

The term AIOps was first coined by Gartner based on their research on Algorithmic IT Operations. Now, it refers to the use of Artificial Intelligence(AI) for IT Operations(Ops), which is the use of Big Data Analytics and AI technologies to optimize, automate and supercharge all aspects of IT Operations.

Why AI in IT operations?

The promise behind bringing AI into the picture has been to do what humans have been doing, but better, faster and at a much larger scale. Let’s delve into the different aspects of IT operations and see how AI can make a difference.

Visibility

The first step to effectively managing the IT landscape is to get complete visibility into it. Why is that so difficult? The sheer variety and volume of applications, users and environments make it extremely challenging to get a full 360 degree view of the landscape. Most organizations use applications that are web-based, virtually delivered, vendor-built, custom-made, synchronous/asynchronous/batch processing, written using different programming languages and/or for different operating systems, SaaS, running in public/private/hybrid cloud environments, multi-tenant, multiple instances of the same applications, multi-tiered, legacy, running in silos! Adding to this complexity is the rampant issue of shadow IT, which is the use of applications outside the purview of IT, triggered by the easy availability of and access to applications and storage on the cloud. And, that’s not all! After all the applications have been discovered, they need to be mapped to the topology, their performances need to be baselined and tracked, all users in the system have to be found and their user experiences captured.

The enormity of this challenge is now evident. AI powers auto-discovery of all applications, topology mapping, baselining response times and tracking all users of all these applications. Machine Learning algorithms aid in self-learning, unlearning and auto-correction to provide a highly accurate view of the IT landscape.

Monitoring

When the IT landscape has been completely discovered, the next step is to monitor the infrastructure and application stacks. Monitoring tools provide real-time data on their availability and performance based on relevant metrics.

The problem is two-fold here. Typically, IT organizations need to rely on several monitoring tools that cater to the different environments/domains in the landscape. Since these tools work in silos, they give a very fractured view of the entire system, necessitating data correlation before it can be gainfully used for Root Cause Analysis(RCA) or actionable insights.

Pattern recognition-based learning from current and historical data helps correlate these seemingly independent events, and therefore to recognize & alert deviations, performance degradations or capacity utilization bottlenecks in real-time and consequently enable effective Root Cause Analysis(RCA) and reduce an important KPI, Mean Time to Identify(MTTI).

Secondly, there is colossal amounts of data in the form of logs, events, metrics pouring in at high velocity from all these monitoring tools, creating alert fatigue. This makes it almost impossible for the IT support team to check each event, correlate with the other events, tag and prioritize them and plan remedial action.

Inherently, machines handle volume with ease and when programmed with ML algorithms learn to sift through all the noise and zero-in on what is relevant. Noise nullification is achieved by the use of Deep Learning algorithms that isolate events that have the potential to become incidents and Reinforcement Learning algorithms that find and eliminate duplicates and false positives. These capabilities help organizations bring dramatic improvements to another critical ITOps metric, Mean Time to Resolution(MTTR).

Other areas of ITOps where AI brings a lot of value are in Advanced Analytics- Predictive & Prescriptive- and Remediation.

Advanced Analytics

Unplanned IT Outages result in huge financial losses for companies and even worse, a sharp dip in customer confidence. One of the biggest value-adds of AI for ITOps then, is in driving proactive operations that deliver superior user experiences with predictable uptime. Advanced Analytics on historical incident data identifies patterns, causes and situations in the entire stack(infrastructure, networks, services and applications) that lead to an outage. Multivariate predictive algorithms drive predictions of incident and service request volumes, spikes and lulls way in advance. AIOps tools forecast usage patterns and capacity requirements to enable planning, just-in-time procurement and staffing to optimize resource utilization. Reactive purchases after the fact, can be very disruptive & expensive.

Remediation

AI-powered remediation automates remedial workflows & service actions, saving a lot of manual effort and reducing errors, incidents and cost of operations. Use of chatbots provides round-the-clock customer support, guiding users to troubleshoot standard problems, and auto-assigns tickets to appropriate IT staff. Dynamic capacity orchestration based on predicted usage patterns and capacity needs induces elasticity and eliminates performance degradation caused by inefficient capacity planning.

Conclusion

The beauty of AIOps is that it gets better with age as the learning matures on exposure to more and more data. While AIOps is definitely a blessing for IT Ops teams, it is only meant to augment the human workforce and not to replace them entirely. And importantly, it is not a one-size-fits-all approach to AIOps. Understanding current pain points and future goals and finding an AIOps vendor with relevant offerings is the cornerstone of a successful implementation.

GAVS’ Zero Incident Framework TM (ZIF) is an AIOps-based TechOps Platform that enables organizations to trend towards a Zero Incident Enterprise TM. ZIF comes with an end-to-end suite of tools for ITOps needs. It is a pure-play AI Platform powered entirely by Unsupervised Pattern-based Machine Learning! You can learn more about ZIF or request a demo here.

READ ALSO OUR NEW UPDATES

What you need to know about AIOps?

Emergence of AIOps

There has been a gigantic growth of AIOps in the last two years. It has successfully transitioned from an emergent category to an inevitability. Companies adopted AIOps to automate and improve IT operations by applying big data and machine learning (ML). Adoption of such technologies compelled IT operations to adapt a multi-cloud infrastructure. According to Infoholic Research, the AIOps market is expected to grow at a CAGR of 33.08% during the forecast period 2018–2024.

What is AIOps?

AIOps broadly stands for Artificial Intelligence for IT Operations. With a combination of big data and ML, AIOps platform improvises IT operations and also replaces certain tasks including tracking availability, event correlation, performance monitoring, IT service management and automation. Most of these technologies are well-defined and matured.

AIOps data originates from log files, metrics, monitoring tools, helpdesk ticketing and other sources. It sorts, manages and assimilates these data to provide insight in problem areas. The goal of AIOps is to analyze data and discover patterns that can predict potential incidents in future.

Focus areas of AIOps

  • AIOps helps with open data access without letting organizational silos play a part in it.
  • AIOps upgrades data handling ability which also impacted on the scope of data analysis.
  • It has a unique ability to stay aligned to organizational goals.
  • AIOps increases the scope of risk prediction.
  • It also reduces response time.

Impact of AI in IT operations

  • Capacity planning: AIOps can support in understanding workloads and plan configuration appropriately without allowing a scope for speculation.
  • Resource utilization: AIOps allows predictive scaling where auto-scale feature of cloud IaaS can adjust itself based on historical data.
  • Storage: AIOps helps in storage activity through disk calibration, reconfiguration and allocation of new storage volumes.
  • Anomaly detection: It can detect anomalies and critical issues faster with accuracy more than humans, reducing potential threats and system downtime.
  • Threat management: It helps to analyze breaches in both internal and external environments.
  • Root-cause analysis: AIOps is effective in root-cause analysis, through which it reduces response time and creates remedy after locating the issue.
  • Forecasting outages: Outage prediction is essential for the growth of IT operations. Infact, the market of forecasting outages through AIOps, is expected to grow from $493.7 to $1.14 billion between 2016 and 2021 based on industry reports.
  • Future innovation: AIOps has played a key role in automating a major chunk of IT operations in a massive way. It frees resources to focus on crucial things aligned to strategy and organizational goals.

Problems AIOps solved

The common issues AIOps solves to enable IT operations’ adoption of digitization are as follows:

  • It has the ability to gain access over large data sets across environments while maintaining data reliability for comprehensive analysis.
  • It simplifies data analysis through automation empowered by ML.
  • Through accurate prediction mechanism, it can avoid costly downtime and improve customer satisfaction.
  • Through implementation of automation, manual tasks can be eliminated.
  • AIOps can improve teamwork and workflow activities between IT groups and other business units.

Peeping into the future

AIOps platform acts as a foundation stone in projecting future endeavors of organizations. It uses real-time analysis of data to provide insights to impact business decisions. Successful implementation of AIOps depends on key parameters index (KPIs). It can also deliver a predictive and proactive IT operation by reducing failure, detection, resolution and investigation.

READ ALSO OUR NEW UPDATES

Impact of network analytics in IT ops

Role of network analytics

Looking at the pace in which our world is moving towards digitization, one has to admit, that network analytics will play an important part in paving the way how IT would operate in future. Network analytics for an enterprise is complex, the AI and automation technologies in use help achieve intelligent and effective ways towards future IT operations.

Network analytics improves user experience in IT operations by analyzing network data. It compares and correlates data to address a problem or trend. It manages IT operations by channelizing the below mentioned data inputs.

  1. Real network traffic generated by client.
  2. Synthetic network traffic created by virtual clients.
  3. Metrics from infrastructure.
  4. System logs.
  5. Data flow.
  6. Application program interface (API) from application server.

Scope of network analytics

A user can face poor network performance or disruption in service due to either, OS problem, Wi-Fi or LAN issue, DHCP, WAN problem or application failure. To locate the actual cause for interruption is essential for smooth functioning of IT operations. Network analytics operate with the help of big data analytics along with cloud computing and machine learning to examine data and create a holistic perspective. Proactive IT Operations Led by predictive insights enhance 90% data accuracy. It can also interpret data in a visual format to develop an elaborate understanding. Here, network analytics plays an important role in redefining IT operations.

  • Network analytics uses proactive analytical tools such as; Sisense, Azure, R Open, GoodData etc. for a deeper understanding of issues and to locate the source of error which can make IT operation seamless. Sisense helps processing data 10 times faster, Azure’s 100 modules per experiment or 10 GB storage space is cost effective. GoodData allows 360-degree overview for customer insights.
  • Earlier, the task to fix a network issue was relatively simple, now, with the increasing usage of virtual and mobile devices and cloud computing, detecting an issue within a network and fixing the same has become complex. Without network analytics, IT Ops will not be able to sustain the wrath of disruption.
  • There has been huge diversification lately in the field of hardware, operating systems, application and services. Understanding network problems within these landscapes, can be challenging. Network analytics plays an important role here by easing the task through user performance management (UPM).
  • Network analytics also minimizes the issue with access network in IT operations starting from getting Wi-Fi access to authentications, obtaining IP addresses or resolving DHCP requests.
  • Network analytics tool can help reduce network traffic through alteration in facilities. It can use network event correlation to understand the impact on devices and customer’s experience on bandwidth latency.
  • Network analytics assists a great deal in network capacity planning and deployment opportunity for an improved network ROI by up to 15% as per market research.

Difference between monitoring and analytics network solution

To analyze the impact network analytics has on IT Ops, it is essential to understand the difference between monitoring and analytics solution. Monitoring refers to collecting and interpreting data in a passive form and sharing potentially actionable information to the network manager. Hence, it focuses on spotting problems without fixing them.

Analytics is more prescriptive where, recorded historical data is understood, learnt and analyzed paving a pattern to be followed. Data collected from Wi-Fi, devices, applications and WAN create trends that impact IT operations.

Advanced analytics

Along with pinpointing the area of concern, advanced analytics tries to automate new solutions to the detected problem. Advanced network analytics help to understand if the issue is with a client operating system, application, network services or Wi-Fi access. This enhances the scope of IT Ops by improving infrastructure by providing insights to take the overall operations to the next level. The new generation of network analytic tools and solutions can reduce outages, upgrade systems and applications, improve customer experience and simplify the process of operations in IT.

Benefits of network analytics in IT ops

  1. Network analytics can help IT Ops analyze the requirement and create a balance so that, the available resources can be optimally utilized to enhance network performance and lower the cost structure of IT Ops.
  2. Network analytics help with data mining insights for identification of revenue and enabling a data-driven and action-oriented IT operation.
  3. Network analytics can help in capacity planning where both resources and services can be calculated in advance for an apt provisioning.

Impact of network analytics in brief

Network analytics, with its analytics tool, can predict future down time, allowing necessary action to be taken on time. It also increases awareness of the root cause of the problem to remediate faster and eventually prevent and result in reducing MTTR by 95%. This can reduce organizational disruption and operational costs while increasing customer satisfaction.

READ ALSO OUR NEW UPDATES

AIOps – IT Infrastructure Services for the Digital Age

The IT infrastructure services landscape is undergoing a significant shift, driven by digitalization. As focus shifts from cost efficiency to digital enablement, organizations need to re-imagine the IT infrastructure services model to deliver the necessary back-end agility, flexibility, and fluidity. Automation, analytics, and Artificial Intelligence (AI) – comprising the “codifying elements” for driving AIOps – help drive this desired level of adaptability within IT infrastructure services. Automation, analytics, and AI – which together comprise the “codifying elements” for driving AIOps– help drive the desired level of adaptiveness within IT infrastructure services. Intelligent automation, leveraging analytics and ML, embeds powerful, real-time business and user context and autonomy into IT infrastructure services. Intelligent automation has made inroads in enterprises in the last two to three years, backed by a rapid proliferation and maturation of solutions in the market.

Artificial Intelligence Operations (AIOps) . Everest Group 2018 Report . IT Infrastructure

Benefits of codification of IT infrastructure services

Progressive leverage of analytics and AI, to drive an AIOps strategy, enables the introduction of a broader and more complex set of operational use cases into IT infrastructure services automation. As adoption levels scale and processes become orchestrated, the benefits potentially expand beyond cost savings to offer exponential value around user experience enrichment, services agility and availability, and operations resilience. Intelligent automation helps maximize value from IT infrastructure services by:

  1. Improving the end-user experience through contextual and personalized support
  2. Driving faster resolution of known/identified incidents leveraging existing knowledge, intelligent diagnosis, and reusable, automated workflows
  3. Avoiding potential incidents and improving business systems performance through contextual learning (i.e., based on relationships among systems), proactive health monitoring and anomaly detection, and preemptive healing

Although the benefits of intelligent automation are manifold, enterprises are yet to realize commensurate advantage from investments in infrastructure services codification. Siloed adoption, lack of well-defined change management processes, and poor governance are some of the key barriers to achieving the expected value.  The design should involve an optimal level of human effort/intervention targeted primarily at training, governing, and enhancing the system, rather than executing routine, voluminous tasks.  A phased adoption of automation, analytics, and AI within IT infrastructure services has the potential to offer exponential business value. However, to realize the full potential of codification, enterprises need to embrace a lean operating model, underpinned by a technology-agnostic platform. The platform should embed the codifying elements within a tightly integrated infrastructure services ecosystem with end-to-end workflow orchestration and resolution.

The market today has a wide choice of AIOps solutions, but the onus is on enterprises to select the right set of tools / technologies that align with their overall codification strategy.

Click here to read the complete whitepaper by Everest Group

READ ALSO OUR NEW UPDATES