Monitoring for Success

Do you know if your end users are happy?

(In the context of users of Applications (desktop, web or cloud-based), Services, Servers and components of IT environment, directly or indirectly.)

The question may sound trivial, but it has a significant impact on the success of a company. The user experience is a journey, from the time they use the application or service, till after they complete the interaction. Experience can be determined based on factors like Speed, Performance, Flawlessness, Ease of use, Security, Resolution time, among others. Hence, monitoring the ‘Wow’ & ‘Woe’ moments of the users is vital.

Monitor is a component of GAVS’ AIOps Platform, Zero Incident FrameworkTM (ZIF). One of the key objectives of the Monitor platform is to measure and improve end-user experience. This component monitors all the layers (includes but not limited to application, database, server, APIs, end-points, and network devices) in real-time that are involved in the user experience. Ultimately,this helps to drive the environment towards Zero Incidents.

This figure shows the capability of ZIF monitoring that cut across all layers starting from end-user to storage and how it is linked to other the components of the platform

Key Features of ZIF Monitor are,

  • Unified solution for all IT environment monitoring needs: The platform covers the end-to-end monitoring of an IT landscape. The key focus is to ensure all verticals of IT are brought under thorough monitoring. The deeper the monitoring, the closer an organization is to attaining a Zero Incident EnterpriseTM.
  • Agents with self-intelligence: The intelligent agents capture various health parameters about the environment. When the target environment is already running under low resource, the agent will not task it with more load. It will collect the health-related metrics and communicate through the telemetry channel efficiently and effectively. The intelligence is applied in terms of parameters to be collected, the period of collection and many more.
  • Depth of monitoring: The core strength of Monitor is it comes with a list of performance counters which are defined by SMEs across all layers of the IT environment. This is a key differentiator; the monitoring parameters can be dynamically configured for the target environment. Parameters can be added or removed on a need basis.
  • Agent & Agentless (Remote): The customers can choose from Agent & Agentless options for the solutions. The remote solution is called as Centralized Remote Monitoring Solution (CRMS). Each monitoring parameter can be remotely controlled and defined from the CRMS. Even the agents that are running in the target environment can be controlled from the server console.
  • Compliance: Plays a key role in terms of the compliance of the environment. Compliance ranges from ensuring the availability of necessary services and processes in the target environment and defines the standard of what Application, Make, Version, Provider, Size, etc. that are allowed in the target environment.
  • Auto discovery: Monitor can auto-discover the newer elements (servers, endpoints, databases, devices, etc.) that are getting added to the environment. It can automatically add those newer elements into the purview of monitoring.
  • Auto scale: Centralized Remote Monitoring Solution (CRMS) can auto-scale on its own when newer elements are added for monitoring through auto-discovery. The auto scale includes various aspects, like load on channel, load on individual polling engine, and load on each agentless solution.
  • Real time user & Synthetic Monitoring: Real-time user monitoring is to monitor the environment when the user is active. Synthetic monitoring is through simulated techniques. It doesn’t wait for the user to make a transaction or use the system. Instead, it simulates the scenario and provide insights to make decision proactively.
  • Availability & status of devices connected: Monitor also includes the monitoring of availability and control of USB and COM port devices that are connected.
  • Black box monitoring: It is not always possible to instrument the application to get insights.Hence, the Black Box technique is used. Here the application is treated as a black box and it is monitored in terms of its interaction with the Kernel & OS through performance counters.
High level overview of Monitor’s components,

  • Agents, Agentless: These are the means through which monitoring is done at the target environment, like user devices, servers, network devices, load balancers, virtualized environment, API layers, databases, replications, storage devices, etc.
  • ZIF Telemetry Channel: The performance telemetry that are collected from source to target are passed through this channel to the big data platform.
  • Telemetry Data: Refers to the performance data and other metrics collected from all over the environment.
  • Telemetry Database:This is the big data platform, in which the telemetry data from all sources are captured and stored.
  • Intelligence Engine: This parses the telemetry data in near real time and raises notifications based on rule-based threshold and as well as through dynamic threshold.
  • Dashboard&Alerting Mechanism: These are the means through which the results of monitoring are conveyed as metrics in dashboard and as well as notifications.
  • Integration with Analyze, Predict & Remediate components: Monitoring module communicates the telemetry to Analyze & Predict components of the ZIF platform for it to use the data for analysis and apply Machine Learning for prediction. Both Monitor & Predict components, communicate with Remediate platform to trigger remediation.

The Monitor component works in tandem with Analyze, Predict and Remediate components of the ZIF platform to achieve an incident free IT environment. Implementation of ZIF is the right step to driving an enterprise towards Zero Incidents. ZIF is the only platform in the industry which comes from the single product platform owner who owns the end-to-end IP of the solution with products developed from scratch.

For more detailed information on GAVS’ Monitor, or to request a demo please visit zif.ai/products/monitor/

(To be continued…)

About the Author

Suresh Kumar Ramasamy


Suresh heads the Monitor component of ZIF at GAVS. He has 20 years of experience in Native Applications, Web, Cloud and Hybrid platforms from Engineering to Product Management. He has designed & hosted the monitoring solutions. He has been instrumental in conglomerating components to structure the Environment Performance Management suite of ZIF Monitor.

Suresh enjoys playing badminton with his children. He is passionate about gardening, especially medicinal plants.

READ ALSO OUR NEW UPDATES

Optimizing ITOps for Digital Transformation

The key focus of Digital Transformation is removing procedural bottlenecks and bending the curve on productivity. As Chief Insights Officer, Forbes Media says, Digital Transformation is now “essential for corporate survival”.

Emerging technologies are enabling dramatic innovations in IT infrastructure and operations. It is no longer just about hardware, software, data centers, the cloud or the service desk; it is about backing business strategies. So, here are some reasons why companies should think about redesigning their IT services to embrace digital disruption.

DevOps for Agility

As companies move away from the traditional Waterfall model of software development and adopt Agile methodologies, IT infrastructure and operations also need to become agile and malleable. Agility has become indispensible to stay competitive in this era of dynamism and constant change. What started off as a set of software development methodologies has now permeated all aspects of an organization, ITOps being one of them. Development, QA and IT teams need to come out of their silos and work in tandem for constant productive collaboration, in what is termed DevOps.

Shorter development & deployment cycles have necessitated overall ITOps efficiency and among other things, IT enviroment provisioning to be on-demand and self-service. Provisioning needs to be automated and built into the CI/CD pipeline.  

Downtime Mitigation

With agility being the org-wide mantra, predictable IT uptime becomes a mandate. Outages incur a very high cost and adversely affect the pace of innovation. The average cost of unplanned application downtime for Fortune 1000 companies is anywhere between $1.25 billion to $2.5 billion, says a report by DevOps.com. It further goes on to say that, infrastructure failure can cost the bottom line $100,000/hr and the cost of critical application failure is $500,000 to $1 million/hr.

ITOps must stay ahead of the game by eliminating outdated legacy systems, tools, technologies and workflows. End-to-end automation is key. IT needs to modernize its stack by zeroing-in on tools for Discovery of the complete IT landscape, Monitoring of devices, Analytics for noise reduction and event correlation, AI-based tools for RCA, incident Prediction and Auto-Remediation. All of this intelligent automation will help proactive response rather than a reactive response after the fact, when the damage has already been done.

Moving away from the shadows

Shadow IT, the use of technology outside the IT purview, is becoming a tacitly approved aspect of most modern enterprises. It is a result of proliferation of technology and the cloud offering easy access to applications and storage. Users of Shadow IT systems bypass the IT approval and provisioning process to use unauthorized technology, without the consent of the IT department. There are huge security and compliance risks waiting to happen if this sprawling syndrome is not reined in. To bring Shadow IT under control, the IT dept must first know about it. This is where automated Discovery tools bring in a lot of value by automating the process of application discovery and topology mapping.

Moving towards Hybrid IT

Hybrid IT means the use of an optimal, cost-effective mix of public & private clouds and on-premise systems that enable an infrastructure that is dynamic, on-demand, scalable, and composable. IT spend on datacenters is seeing a downward trend. Most organizations are thinking beyond traditional datacentres to options in the cloud. Colocation is an important consideration since it delivers better availability, energy and time savings, scalability and reduces the impact of network latency. Organizations are only keeping mission-critical processes that require close monitoring & control, on-premise.

Edge computing

Gartner defines edge computing as solutions that facilitate data processing at or near the source of data generation. With huge volumes of data being churned out at rapid rates, for instance by monitoring or IoT devices, it is highly inefficient to stream all this data to a centralized datacenter or cloud for processing. Organizations now understand the value in a decentralized approach to address modern digital infrastructure needs. Edge computing serves as the decentralized extension of the datacenter/cloud and addresses the need for localized computing power.

CyberSecurity

Cyber attacks are on the rise and securing networks and protecting data is posing big challenges. With Hybrid IT, IoT, Edge computing etc, extension of the IT footprint beyond secure enterprise boundaries has increased the number of attack target points manifold. IT teams need to be well versed with the nuances of security set-up in different cloud vendor environments. There is a lot of ambiguity in ownership of data integrity, in the wake of data being spread across on-premise, cloud environments, shared workstations and virtual machines. With Hybrid IT deployments, a comprehensive security plan regardless of the data’s location has gained paramount importance.

Upskilling IT Teams

With blurring lines between Dev and IT, there is increasing demand for IT professionals equipped with a broad range of cross-functional skills in addition to core IT competencies. With constant emergence of new technologies, there is usually not much clarity on the exact skillsets required by the IT team in an organization. More than expertise in one specific area, IT teams need to be open to continuous learning to adapt to changing IT environments, to close the skills gap and support their organization’s Digital Transformation goals.

READ ALSO OUR NEW UPDATES

Out of the trenches to AIOps – the Peacekeeper

The last thing an IT team wants to hear is ‘there is an issue’ which usually has them rushing to ‘battle zones’ to try and resolve – ‘problem with the apps?’, ‘is it the network?’, desperately trying to kill the problem while it grows larger within the Enterprise.  No credits for crumbling SLAs, the fire-fighting continues long and hard sometimes.

IT Operations are most times battling heavy volumes of alerts, having to deal with hundreds of incident tickets that come from the environment, from the performance of its apps and infrastructure. They are constantly overwhelmed trying to manage and respond to every alert in order to avoid the threat of outages and heavy losses.

Increasing components within the infrastructure; today a stack can have more than 10,000 metrics, and that sort of complexity runs the threat of increase in points of failure, and with the addition of speedier change cycles provided / supported by DevOps, cloud computing and so on, there really is very little time to take control or take action. Under such circumstances, AIOps is fast emerging as a powerful solution to deal with the constant battle, with the efficiency that AI and ML can bring in. We are looking more and more into unsupervised methods / processes, to read data and make it coherent, make it ‘see the unknown unknowns’, and remediate/ bring problems into focus before it impacts customers. Adopting AI into IT Operations provide an increased visibility into operations through Machine Learning and the subsequent reduction in incidents, false alarms and the advantage of predictive warnings that can do away with outages.  It means insights are implemented thru automation tools leading to saving time and effort of the concerned teams.

With AIOps gathering and processing data, we require very little or almost nil manual intervention where algorithms help automate, due diligence gets done, and rich business insights are provided. AIOps becomes the much sought-after solution to the multitudinous problems in complex IT Enterprises.

“The global AIops Platform market is expected to generate a revenue of US$ 20,428 billion with a CAGR of 36.2% by 2025. – reports Coherent Market Insights

Gartner recommends that AIOps is adopted in phases. Early adopters typically start by applying machine learning to monitoring, operations and infrastructure data, before progressing to using deep neural networks for service and help desk automation.

The greatest strength with AIOps is that it can find all the potential risks and outages that may happen in the environment which can’t be done or anticipated by humans, and these operations can be conducted with greater consistency and time to value.

The complexity of an IT Enterprise is so huge though this makes an ideal scenario of ML, Data Science and Artificial Intelligence to help solutioning with specific, machine learning algorithms which is impossible for humans to reduce them in simple instructions and remediations. AIOps becomes the real answer to tackle critical issues and at the same time, it eliminates all the false positives that usually makes up a large percentage of ‘events’ that is reflected in monitoring tools.

Gartner predicted that by this year about 25% of the enterprises, globally, would implement an AIOps platform.  And that obviously means increasing complexities and huge data volumes but deep insights and more intelligence within the environment.  Experts say that this implies that AI is going to reach right from the device or environment till the customer.

ChatOps

AIOps is fast paced; it is believed that in the next decade majority of large Enterprises will take to ‘multi-system automations’ and will host digital colleagues – we are going to have virtual engineers to attend to queries and tasks.  IT Service desks are going to be ‘manned’ by digital colleagues, and they are going to take care of the frequent and mundane tasks with almost nil or minimal human intervention.  It is predicted that this year will see the emergence of ChatOps, where enterprises are going to introduce “AI based digital colleagues into chat-based IT Operations”, and digital colleagues will make a major impact on how IT operations function.

Establishing digital service desk bots brings in speed and agility into the service.  Reports say that actions which hitherto took up to 20 steps can now be accomplished with just one phrase and a couple of clarifications from the digital colleague.  This can save human labor hours and have their skills channeled to more important areas with mundane and frequent tasks such as password resets, catalogue requests, access requests and so forth being taken care of by digital colleagues. They can be entrusted with all incoming requests and those which cannot be processed by them are automatically escalated to the right human engineers.  Even L3 & L4 issues are expected to be resolved by digital colleagues with workflows being created by them and approved by human engineers. AI is going to keep recommending better and deeper automations, and we are going to see the true power of human / machine collaboration.

Humans will collaborate more and more with digital colleagues, change requests get created on a simple command with resolutions to be had within minutes / or assigned to human colleagues.  Algorithms are expected to integrate operations more and more.  Life with AI is going to make tasks such as identifying and inviting right people into root cause analysis sessions and have post resolution meetings to ensure continuous learning.

With AIOps, IT operations is going to reconstruct most tasks with AI and automation. It is reported that 38.4% of organizations take a minimum resolution time of 30 minutes on incidents and adopting AIOps is definitely the key. We may be looking at a future where we would have the luxury of an autonomous data center, and human resources in IT can truly spend their time on strategic decisions and business growth, work on innovation and become more visible to an organization’s growth.

Reference
https://www.coherentmarketinsights.com/market-insight/aiops-platform-market-2073

READ ALSO OUR NEW UPDATES

The future of AIOps

AIOps or Artificial Intelligence based IT operations is the buzzword that’s capturing the CXO’s interest in organizations worldwide. Why? Because data explosion is here, and the traditional tools and processes are unable to completely handle its creation, storage, analysis and management. Likewise, humans are unable to thoroughly analyze this data to obtain any meaningful insights. IT teams also face the challenging task of providing speed, security and reliability in an increasingly mobile and connected world.

Add to this the complex, manual and siloed processes that the legacy IT solutions offer to the organizations. As a result, the productivity for IT remains low due to their inability to find the exact root cause of incidents. Plus, the business leaders don’t have a 360-degree view of all their IT and business services across the organization.

AIOps is the Future for IT Operations

AIOps platforms are the foundation on which the organizations will project their future endeavors. Advanced machine learning and analytics are the building blocks to enhance their IT operations through a proactive approach towards service desk, monitoring and automation. Using effective data collection methods that utilize real time analytic technologies, AIOps provide insights to impact business decisions.

Successful AIOps implementations depend on key parameters Index (KPIs) whose impact can be seen on performance variation, service degradation, revenue, customer satisfaction and brand image.

All these impacts the organization’s services including but not limited to supply chain, online or digital. One way in which AIOps can deliver a predictive and proactive IT is by decreasing the MTBF (Mean time between failure), MTTD (Mean time to detection), MTTR (Mean time to resolution) and MTTI (Mean time to investigate) factors.

The future of AIOps is already on the way in the below mentioned use cases. There is just the surface with scope for many more use cases to be added in the future.

Capacity planning

Enterprise workloads are moving to the cloud with providers such as AWS, Google and Azure setting up various configurations for running them. The complexity involved increases as new configurations are added by the architects involving parameters like disk types, memory, network and storage resources.

AIOps can reduce the guesswork in aligning the correct usage of the network, storage and memory resources with the right configurations of servers and VMs through recommendations.

Optimal resource utilization

Enterprises are leveraging cloud elasticity to improve their application scaling in or scaling out automatically. With AIOps, IT administrators can rely on predictive scaling to take the auto scale cloud to the next level. Based on historical data, the workload will automatically determine the resources required by monitoring itself.

Data store management

AIOps can also be utilized to monitor the network and the storage resources that will impact the applications in the operations. When performance degradation issues are seen, the admin will get notified. By using AI for both network and storage management, mundane tasks such as reconfiguring and recalibration can be automated. Through predictive analytics, storage capacity is automatically adjusted by adding new volumes proactively.

Anomaly detection

Anomaly detection is the most important application of AIOps. This can prevent potential outages and disruptions that can be faced by organizations. As anomalies can occur in any part of the technology stack, pinpointing them in real-time, using advanced analytics and machine learning is crucial. AIOps can accurately detect the actual source which can help IT teams in performing efficient root cause analysis almost in real-time.

Threat detection & analysis

Along with anomaly detection, AIOps will play a critical role in enhancing the security of IT infrastructure. Security systems can use ML algorithms and AI’s self-learning capabilities to help the IT teams detect data breached and violations. By correlating various internal sources like log files, network and event logs, with the external information on malicious IPs and domains, AI can be used to detect anomalies and risk events through analysis. Advanced machine learning algorithms can be used to identify unexpected and potentially unauthorized and malicious activity within the infrastructure.

Although still early in deployment, companies are taking advantage of AI and machine learning to improve tech support and manage infrastructure.  AIOps, the convergence of AI and IT ops, will change the face of infrastructure management.

READ ALSO OUR NEW UPDATES

AIOps – IT Infrastructure Services for the Digital Age

The IT infrastructure services landscape is undergoing a significant shift, driven by digitalization. As focus shifts from cost efficiency to digital enablement, organizations need to re-imagine the IT infrastructure services model to deliver the necessary back-end agility, flexibility, and fluidity. Automation, analytics, and Artificial Intelligence (AI) – comprising the “codifying elements” for driving AIOps – help drive this desired level of adaptability within IT infrastructure services. Automation, analytics, and AI – which together comprise the “codifying elements” for driving AIOps– help drive the desired level of adaptiveness within IT infrastructure services. Intelligent automation, leveraging analytics and ML, embeds powerful, real-time business and user context and autonomy into IT infrastructure services. Intelligent automation has made inroads in enterprises in the last two to three years, backed by a rapid proliferation and maturation of solutions in the market.

Artificial Intelligence Operations (AIOps) . Everest Group 2018 Report . IT Infrastructure

Benefits of codification of IT infrastructure services

Progressive leverage of analytics and AI, to drive an AIOps strategy, enables the introduction of a broader and more complex set of operational use cases into IT infrastructure services automation. As adoption levels scale and processes become orchestrated, the benefits potentially expand beyond cost savings to offer exponential value around user experience enrichment, services agility and availability, and operations resilience. Intelligent automation helps maximize value from IT infrastructure services by:

  1. Improving the end-user experience through contextual and personalized support
  2. Driving faster resolution of known/identified incidents leveraging existing knowledge, intelligent diagnosis, and reusable, automated workflows
  3. Avoiding potential incidents and improving business systems performance through contextual learning (i.e., based on relationships among systems), proactive health monitoring and anomaly detection, and preemptive healing

Although the benefits of intelligent automation are manifold, enterprises are yet to realize commensurate advantage from investments in infrastructure services codification. Siloed adoption, lack of well-defined change management processes, and poor governance are some of the key barriers to achieving the expected value.  The design should involve an optimal level of human effort/intervention targeted primarily at training, governing, and enhancing the system, rather than executing routine, voluminous tasks.  A phased adoption of automation, analytics, and AI within IT infrastructure services has the potential to offer exponential business value. However, to realize the full potential of codification, enterprises need to embrace a lean operating model, underpinned by a technology-agnostic platform. The platform should embed the codifying elements within a tightly integrated infrastructure services ecosystem with end-to-end workflow orchestration and resolution.

The market today has a wide choice of AIOps solutions, but the onus is on enterprises to select the right set of tools / technologies that align with their overall codification strategy.

Click here to read the complete whitepaper by Everest Group

READ ALSO OUR NEW UPDATES

Cost effective solutions on AIOps platforms

Digital transformation in IT operations

The global market value of AIOps is predicted to increase from $2.24 billion in 2017 to $9.90 billion by 2023, as per industry reports. IT organizations, globally, are focusing on digital transformation aggressively. Technologies like AI, Big Data, ML are compelling IT operations’ platforms to modify and adapt to multi-cloud infrastructure. With a vision to explore new arena of opportunities, AIOps can monitor, analyze, correlate and automate, easing IT operations. The focus areas where AIOps plays a key role in enabling digital transformation includes:

  1. Open data access, where data can be recorded from various authentic sources and can be freed from organizational silos for repetitive analysis
  2. Big data was initially thought to increase efficiency and decision-making capabilities of enterprises. However, with the expansion of data, things became complex. Here the intervention of AIOps improved the ability to handle huge data thus, expanding the scope of data analysis
  3. ML can access data from various sources and can modify or create new algorithms without human intervention. AIOps enhances ML’s ability to handle enormous data and at the same time stay aligned to organizational goals
  4. Data analytics can solve major data related problems in IT domain and on top of that, AIOps could leverage competitive advantage by promising richer business context, short response time and ability to predict potential risk

Scope of AI in IT Ops – are they cost effective?

  • With an intent to study time and labor management, any organization will end up spending massively on both, time and money. For that matter, an application programming interface (API), can help a company complete, its reports in no time. This can ramp up the pace of report creation, thus opening a scope for real-time analysis of compliance. Now, that is definitely cost-effective.
  • A global recruitment firm increased its hiring ratio by about 8%, through implementation of AI. It helped the firm to identify and match the right skill set along with the prediction for attrition per resource. This proved cost effective since attrition costed the organization up to $25,000 per resource.
  • From the operational perspective, in a 24/7 environment, if there is an outage, it will result in a series of logged complaints, which then will become difficult for an individual to manually transcribe. This is where AI plays an important part in identifying the main issues through log analytics.
  • Technology like cognitive insight, creates a data pool of wide range of solutions on critical issues. AI bridges the gap between big data and humans through operational intelligence, accuracy and speed, thus making it cost-effective to a great extent.
  • Enterprises like Dyn and British Airways suffered Distributed Denial of Service (DDoS) attacks post which they implemented cognitive insight which secured their operations.

Cost effective solution of AIOps

Analyzing and managing cost is essential. Doing a cost analysis of cloud with components like IOPs, VMs, storage capacity, bandwidth, API can be tricky and complex. AI implementation can help here to segregate the cost of securing a more accurate IT budget.

  • AI and root-cause analysis
    AI is very effective in the area of root-cause analysis. It is efficient in locating an issue and creating a remediation for the same, thus solving complex problems in a short span. AIOps helped a US Bank to automate root cause correlation to gather data on customer dissatisfaction and thus, enhancing customer experience.
  • Threat detection is now a cakewalk
    Through machine learning algorithms, AIOps can learn to detect anomalies and critical issues. GAVS’ security division designed a remedial platform combining ML algorithms and AI’s self-learning capabilities to reduce risk and predict future anomalies on an IT platform, ensuring a secured environment for GAVS’ customers.
  • AIOps and its outage forecasting competences
    AIOps can forecast outages through data prediction and also increase resource utilization through identifying areas of cross training. The market of forecasting outages through AIOps, is expected to grow from $493.7 million in 2016 to $1.14 billion by 2021, as per industry reports.
  • Combining tools for an innovative future
    Automation and collaboration of tools can enhance productivity and accuracy. AIOps powered with big data and ML helps in process automation and is used more as a strategic than operational tool. With this merger, data could be analyzed, optimized, and transformed efficiently. In GAVS, the focus is on a “Zero Incident” platform where GAVS can help enterprises to reach Zero Incident state through the above-mentioned collaboration of tools. This will definitely prove cost-effective and enhance the end-user experience.

Solutions built with innovation and cost-efficiency is the key

In their zeal to enter the digitized innovation area, organizations are aggressively trying to locate cost-effective and reliable solutions. Although many companies still rely on age old machines and processes which require constant monitoring and human intervention, however, automation of IT operations is a boon, ensuring cost-efficiency across levels.