Algorithmic Alert Correlation

Today’s always-on businesses and 24×7 uptime demands have necessitated IT monitoring to go into overdrive. While constant monitoring is a good thing, the downside is that the flood of alerts generated can quickly get overwhelming. Constantly having to deal with thousands of alerts each day causes alert fatigue, and impacts the overall efficiency of the monitoring process.

Hence, chalking out an optimal strategy for alert generation & management becomes critical. Pattern-based thresholding is an important first step, since it tunes thresholds continuously, to adapt to what ‘normal’ is, for the real-time environment. Threshold accuracy eliminates false positives and prevents alerts from getting fired incorrectly. Selective alert suppression during routine IT Ops maintenance activities like backups, patches, or upgrades, is another. While there are many other strategies to keep alert numbers under control, a key process in alert management is the grouping of alerts, known as alert correlation. It groups similar alerts under one actionable incident, thereby reducing the number of alerts to be handled individually.

But, how is alert ‘similarity’ determined? One way to do this is through similarity definitions, in the context of that IT landscape. A definition, for instance, would group together alerts generated from applications on the same host, or connectivity issues from the same data center. This implies that similarity definitions depend on the physical and logical relationships in the environment – in other words – the topology map. Topology mappers detect dependencies between applications, processes, networks, infrastructure, etc., and construct an enterprise blueprint that is used for alert correlation.

But what about related alerts generated by entities that are neither physically nor logically linked? To give a hypothetical example, let’s say application A accesses a server S which is responding slowly, and so A triggers alert A1. This slow communication of A with S eats up host bandwidth, and hence affects another application B in the same host. Due to this, if a third application C from another host calls B, alert A2 is fired by C due to the delayed response from B.  Now, although we see the link between alerts A1 & A2, they are neither physically nor logically related, so how can they be correlated? In reality, such situations could imply thousands of individual alerts that cannot be combined.

Algorithmic Alert Correlation

This is one of the many challenges in IT operations that we have been trying to solve at GAVS. The correlation engine of our AIOps Platform ZIF uses algorithmic alert correlation to find a solution for this problem. We are working on two unsupervised machine learning algorithms that are fundamentally different in their approach – one based on pattern recognition and the other based on spatial clustering. Both algorithms can function with or without a topology map, and work around what is supplied and available. The pattern learning algorithm derives associations based on learnings from historic patterns of alert relationships. The spatial clustering algorithm works on the principle of similarity based on multiple features of alerts, including problem similarity derived by applying Natural Language Processing (NLP), and relationships, among several others. Tuning parameters enable customization of algorithmic behavior to meet specific demands, without requiring modifications to the core algorithms. Time is also another important dimension factored into these algorithms, since the clustering of alerts generated over an extended period of time will not give meaningful results.

Traditional alert correlation has not been able to scale up to handle the volume and complexity of alerts generated by the modern-day hybrid and dynamic IT infrastructure. We have reached a point where our ITOps needs have surpassed the limits of human capabilities, and so, supplementing our intelligence with Artificial Intelligence and Machine Learning has now become indispensable.

About the Authors –

Padmapriya Sridhar

Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel, and Yoga. She aspires to become a Yoga Instructor someday!

Gireesh Sreedhar KP

Gireesh is a part of the projects run in collaboration with IIT Madras for developing AI solutions and algorithms. His interest includes Data Science, Machine Learning, Financial markets, and Geo-politics. He believes that he is competing against himself to become better than who he was yesterday. He aspires to become a well-recognized subject matter expert in the field of Artificial Intelligence.

Data Migration Powered by RPA

What is RPA?

Robotic Process Automation(RPA) is the use of specialized software to automate repetitive tasks. Offloading mundane, tedious grunt work to the software robots frees up employee time to focus on more cerebral tasks with better value-add. So, organizations are looking at RPA as a digital workforce to augment their human resources. Since robots excel at rules-based, structured, high-volume tasks, they help improve business process efficiency, reduce time and operating costs due to the reliability, consistency & speed they bring to the table.

Generally, RPA is low-cost, has faster deployment cycles as compared to other solutions for streamlining business processes, and can be implemented easily. RPA can be thought of as the first step to more transformative automations. With RPA steadily gaining traction, Forrester predicts the RPA Market will reach $2.9 Billion by 2021.

Over the years, RPA has evolved from low-level automation tasks like screen scraping to more cognitive ones where the bots can recognize and process text/audio/video, self-learn and adapt to changes in their environment. Such Automation supercharged by AI is called Intelligent Process Automation.

Use Cases of RPA

Let’s look at a few areas where RPA has resulted in a significant uptick in productivity.

Service Desk – One of the biggest time-guzzlers of customer service teams is sifting through scores ofemails/phone calls/voice notes received every day. RPA can be effectively used to scour them, interpret content, classify/tag/reroute or escalate as appropriate, raise tickets in the logging system and even drive certain routine tasks like password resets to closure!

Claims Processing – This can be used across industries and result in tremendous time and cost savings.This would include interpreting information in the forms, verification of information, authentication of e-signatures & supporting documents, and first level approval/rejection based on the outcome of the verification process.

Data Transfers – RPA is an excellent fit for tasks involving data transfer, to either transfer data on paperto systems for digitization, or to transfer data between systems during data migration processes.

Fraud Detection – Can be a big value-add for banks, credit card/financial services companies as a first lineof defense, when used to monitor account or credit card activity and flag suspicious transactions.

Marketing Activities – Can be a very resourceful member of the marketing team, helping in all activities

right from lead gen, to nurturing leads through the funnel with relevant, personalized, targeted content



RPA can be used to generate reports and analytics on predefined parameters and KPIs, that can help

give insights into the health of the automated process and the effectiveness of the automation itself.

The above use cases are a sample list to highlight the breadth of their capabilities. Here are some industry-specific tasks where RPA can play a significant role.

Banks/Financial Services/Accounting Firms – Account management through its lifecycle, Cardactivation/de-activation, foreign exchange payments, general accounting, operational accounting, KYC digitization

Manufacturing, SCM –Vendor handling, Requisition to Purchase Order, Payment processing, Inventorymanagement

HR – Employee lifecycle management from On-boarding to Offboarding, Resume screening/matching

Data Migration Triggers & Challenges

A common trigger for data migration is when companies want to sunset their legacy systems or integrate them with their new-age applications. For some, there is a legal mandate to retain legacy data, as with patient records or financial information, in which case these organizations might want to move the data to a lower-cost or current platform and then decommission the old system.

This is easier said than done. The legacy systems might have their data in flat files or non-relational DBs or may not have APIs or other standards-based interfaces, making it very hard to access the data. Also, they might be based on old technology platforms that are no longer supported by the vendor. For the same reasons, finding resources with the skillset and expertise to navigate through these systems becomes a challenge.

Two other common triggers for data migrations are mergers/acquisitions which necessitate the merging of systems and data and secondly, digital transformation initiatives. When companies look to modernize their IT landscape, it becomes necessary to standardize applications and remove redundant ones across application silos. Consolidation will be required when there are multiple applications for the same use cases in the merged IT landscape.

Most times such data migrations can quickly spiral into unwieldy projects, due to the sheer number, size, and variety of the systems and data involved, demanding meticulous design and planning. The first step would be to convert all data to a common format before transition to the target system which would need detailed data mappings and data cleansing before and after conversion, making it extremely complex, resource-intensive and expensive.

RPA for Data Migration

Structured processes that can be precisely defined by rules is where RPA excels. So, if the data migration process has clear definitions for the source and target data formats, mappings, workflows, criteria for rollback/commit/exceptions, unit/integration test cases and reporting parameters, half the battle is won. At this point, the software bots can take over!

Another hurdle in humans performing such highly repetitive tasks is mental exhaustion, which can lead to slowing down, errors and inconsistency. Since RPA is unfazed by volume, complexity or monotony, it automatically translates to better process efficiency and cost benefits. Employee productivity also increases because they are not subjected to mind-numbing work and can focus on other interesting tasks on hand. Since the software bots can be configured to create logfiles/reports/dashboards in any format, level of detail & propagation type/frequency, traceability, compliance, and complete visibility into the process are additional happy outcomes!

To RPA or not to RPA?

Well, while RPA holds a lot of promise, there are some things to keep in mind

  • Important to choose the right processes/use-cases to automate, else it could lead to poor ROI
  • Quality of the automation depends heavily on diligent design and planning
  • Integration challenges with other automation tools in the landscape
  • Heightened data security and governance concerns since it will have full access to the data
  • Periodic reviews required to ensure expected RPA behavior
  • Dynamic scalability might be an issue when there are unforeseen spikes in data or usage patterns
  • Lack of flexibility to adapt to changes in underlying systems/platforms could make it unusable

But like all other transformational initiatives, the success of RPA depends on doing the homework right, taking informed decisions, choosing the right vendor(s) and product(s) that align with your Business imperatives, and above all, a whole-hearted buy-in from the business, IT & Security teams and the teams that will be impacted by the RPA.