Monitoring Microservices and Containers

Monitoring applications and infrastructure is a critical part of IT Operations. Among other things, monitoring provides alerts on failures, alerts on deteriorations that could potentially lead to failures, and performance data that can be analysed to gain insights. AI-led IT Ops Platforms like ZIF use such data from their monitoring component to deliver pattern recognition-based predictions and proactive remediation, leading to improved availability, system performance and hence better user experience.

The shift away from monolith applications towards microservices has posed a formidable challenge for monitoring tools. Let’s first take a quick look at what microservices are, to understand better the complications in monitoring them.

Monoliths vs Microservices

A single application(monolith) is split into a number of modular services called microservices, each of which typically caters to one capability of the application. These microservices are loosely coupled, can communicate with each other and can be deployed independently.

Quite likely the trigger for this architecture was the need for agility. Since microservices are stand-alone modules, they can follow their own build/deploy cycles enabling rapid scaling and deployments. They usually have a small codebase which aids easy maintainability and quick recovery from issues. The modularity of these microservices gives complete autonomy over the design, implementation and technology stack used to build them.

Microservices run inside containers that provide their execution environment. Although microservices could also be run in virtual machines(VMs), containers are preferred since they are comparatively lightweight as they share the host’s operating system, unlike VMs. Docker and CoreOS Rkt are a couple of commonly used container solutions while Kubernetes, Docker Swarm, and Apache Mesos are popular container orchestration platforms. The image below depicts microservices for hiring, performance appraisal, rewards & recognition, payroll, analytics and the like linked together to deliver the HR function.

Challenges in Monitoring Microservices and Containers

Since all good things come at a cost, you are probably wondering what it is here… well, the flip side to this evolutionary architecture is increased complexity! These are some contributing factors:

Exponential increase in the number of objects: With each application replaced by multiple microservices, 360-degree visibility and observability into all the services, their interdependencies, their containers/VMs, communication channels, workflows and the like can become very elusive. When one service goes down, the environment gets flooded with notifications not just from the service that is down, but from all services dependent on it as well. Sifting through this cascade of alerts, eliminating noise and zeroing in on the crux of the problem becomes a nightmare.

Shared Responsibility: Since processes are fragmented and the responsibility for their execution, like for instance a customer ordering a product online, is shared amongst the services, basic assumptions of traditional monitoring methods are challenged. The lack of a simple linear path, the need to collate data from different services for each process, inability to map a client request to a single transaction because of the number of services involved make performance tracking that much more difficult.

Design Differences: Due to the design/implementation autonomy that microservices enjoy, they could come with huge design differences, and implemented using different technology stacks. They might be using open source or third-party software that makes it difficult to instrument their code, which in turn affects their monitoring.

Elasticity and Transience: Elastic landscapes where infrastructure scales or collapses based on demand, instances appear & disappear dynamically, have changed the game for monitoring tools. They need to be updated to handle elastic environments, be container-aware and stay in-step with the provisioning layer. A couple of interesting aspects to handle are: recognizing the difference between an instance that is down versus an instance that is no longer available; data of instances that are no longer alive continue to have value for analysis of operational efficiency or past performance.

Mobility: This is another dimension of dynamic infra where objects don’t necessarily stay in the same place, they might be moved between data centers or clouds for better load balancing, maintenance needs or outages. The monitoring layer needs to arm itself with new strategies to handle moving targets.

Resource Abstraction: Microservices deployed in containers do not have a direct relationship with their host or the underlying operating system. This abstraction is what helps seamless migration between hosts but comes at the expense of complicating monitoring.

Communication over the network: The many moving parts of distributed applications rely completely on network communication. Consequently, the increase in network traffic puts a heavy strain on network resources necessitating intensive network monitoring and a focused effort to maintain network health.

What needs to be measured

This is a high-level laundry list of what needs to be done/measured while monitoring microservices and their containers.

Auto-discovery of containers and microservices:

As we’ve seen, monitoring microservices in a containerized world is a whole new ball game. In the highly distributed, dynamic infra environment where ephemeral containers scale, shrink and move between nodes on demand, traditional monitoring methods using agents to get information will not work. The monitoring system needs to automatically discover and track the creation/destruction of containers and explore services running in them.

Microservices:

  • Availability and performance of individual services
  • Host and infrastructure metrics
  • Microservice metrics
  • APIs and API transactions
    • Ensure API transactions are available and stable
    • Isolate problematic transactions and endpoints
  • Dependency mapping and correlation
  • Features relating to traditional APM

Containers:

  • Detailed information relating to each container
    • Health of clusters, master and slave nodes
  • Number of clusters
  • Nodes per cluster
  • Containers per cluster
    • Performance of core Docker engine
    • Performance of container instances

Things to consider while adapting to the new IT landscape

Granularity and Aggregation: With the increase in the number of objects in the system, it is important to first understand the performance target of what’s being measured – for instance, if a service targets 99% uptime(yearly), polling it every minute would be an overkill. Based on this, data granularity needs to be set prudently for each aspect measured, and can be aggregated where appropriate. This is to prevent data inundation that could overwhelm the monitoring module and drive up costs associated with data collection, storage, and management.    

Monitor Containers: The USP of containers is the abstraction they provide to microservices, encapsulating and shielding them from the details of the host or operating system. While this makes microservices portable, it makes them hard to reach for monitoring. Two recommended solutions for this are to instrument the microservice code to generate stats and/or traces for all actions (can be used for distributed tracing) and secondly to get all container activity information through host operating system instrumentation.    

Track Services through the Container Orchestration Platform: While we could obtain container-level data from the host kernel, it wouldn’t give us holistic information about the service since there could be several containers that constitute a service. Container-native monitoring solutions could use metadata from the container orchestration platform by drilling into appropriate layers of the platform to obtain service-level metrics. 

Adapt to dynamic IT landscapes: As mentioned earlier, today’s IT landscape is dynamically provisioned, elastic and characterized by mobile and transient objects. Monitoring systems themselves need to be elastic and deployable across multiple locations to cater to distributed systems and leverage native monitoring solutions for private clouds.

API Monitoring: Monitoring APIs can provide a wealth of information in the black box world of containers. Tracking API calls from the different entities – microservices, container solution, container orchestration platform, provisioning system, host kernel can help extract meaningful information and make sense of the fickle environment.

Watch this space for more on Monitoring and other IT Ops topics. You can find our blog on Monitoring for Success here, which gives an overview of the Monitorcomponent of GAVS’ AIOps Platform, Zero Incident FrameworkTM (ZIF). You can Request a Demo or Watch how ZIF works here.

About the Author:

Sivaprakash Krishnan


Bio – Siva is a long timer at Gavs and has been with the company for close to 15 years. He started his career as a developer and is now an architect with a strong technology background in Java, Big Data, DevOps, Cloud Computing, Containers and Micro Services. He has successfully designed & created a stable Monitoring Platform for ZIF, and designed & driven cloud assessment and migration, enterprise BRMS and IoT based solutions for many of our customers. He is currently focused on building ZIF 4.0, a new gen business-oriented TechOps platform.

Padmapriya Sridhar


Bio – Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel and Yoga. She aspires to become a Yoga Instructor some day!

Can automation manage system alerts?

System alerts and critical alerts

One of the most important and critical roles of an IT professional is to handle incoming alerts efficiently and effectively. This will ensure a threat-free environment and reduce the chances of system outages. Now, not all incoming alerts are critical; an alert can pop up on a window screen for a user to act on, blocking the underlying webpage. One can configure the setting to automatic alert resolution where an alert will be closed automatically after a number of days.

Can automation manage system alerts?

Gradually, many companies are incorporating automation in the field of managing system alerts. The age-old technology of monitoring system for both, internal and external alerts is not effective in streamlining the actual process of managing these incoming alerts. Here, IT process automation (ITPA) can take incident management to a whole new level. Automation in collaboration with monitoring tools can identify, analyze and finally prioritize incoming alerts while sending notification to fix the issue. Such notifications can be customized depending on the selected mode of preference. Also, it is worth mentioning here that automated workflows can be created to open, update and close tickets in the service desk, minimizing human intervention while electronically resolving issues.

Integration of a monitoring system with automation

Automation of system alerts happen with the following workflow. It highly improved the incident management system, reducing human intervention and refining the quality of monitoring.

  1. The monitoring system detects an incident within the IT infrastructure and triggers an alert.
  2. The alert is addressed by automation software and a trouble ticket is generated thereafter in service desk.
  3. Then the affected lot is notified via preferred method of communication.
  4. Network admin is then notified by ITPA to address the issue and recover.
  5. The service ticket is accordingly updated through implementation of automation.

Benefits of automation to manage system alerts

Relying on a process that is manually performed especially, while dealing with critical information in a workflow can be difficult. In such a scenario, automation of monitoring critical data in business systems like accounting, CRM, ERP or warehousing can improve on consistency. It can also recognize significant or critical data changes immediately triggering notification for the same. With this 360-degree visibility of critical information, decision making can happen a lot faster which in the long run can forestall serious crisis. It also improves the overall performance of the company and customer service and reduces financial risk due to anomalies and security threats. Hence, it can be aptly mentioned that automation of system alerts can effectively reduce response and resolution time. It can also lessen system downtime and improve MTTR.

BPA platform’s role to manage system alerts

The business process automation (BPA) platform enables multi-recipient capabilities so that notification can be sent to employees across different verticals. This will increase their visibility on real-time information that is relevant to their organizational role. This platform also provides escalation capabilities where notification will be sent to higher management if an alert is not addressed on time.

Conclusion

For large-scale organizations, the number of alerts spotted by detection tools are growing in number with time. This inspired IT enterprises to automate security control configurations and implement responsive security analysis tasks. Through automation of security control and processes, a new firewall rule can be automatically created or deleted based on alerts. Once a threat is detected, automated response is created. We can conclude that automation can manage system alerts efficiently and effectively. And a pre-built workflow often helps to jump-start an automation process of addressing a system alert.

READ ALSO OUR NEW UPDATES

AIOps Trends in 2019

Adoption of AIOps by organizations

Artificial Intelligence in IT operations (AIOps) is rapidly pacing up with digital transformation. Over the years, there has been a paradigm shift of enterprise application and IT infrastructure. With a mindset to enhance flexibility and agility of business processes, organizations are readily adopting cloud platforms to provision their on-premise software. Implementation of technologies like AIOps and hybrid environment has facilitated organizations to gauge the operational challenges and reduced their operational costs considerably. It helps enterprises in:

  • Resource utilization
  • Capacity planning
  • Anomaly detection
  • Threat detection
  • Storage management
  • Cognitive analysis

Infact, if we look at Gartner’s prediction, by 2022, 40% of medium and large-scale enterprises will adopt artificial intelligence (AI) to increase IT productivity.

AIOps Market forecast

According to Infoholic Research, the AIOps market is expected to reach approximately $14 billion by 2024, growing at a CAGR of 33.08% between 2018–2024. The companies that will provide AIOps solutions to enhance IT operations management in 2019 include BMC Software, IBM, GAVS Technologies, Splunk, Fix Stream, Loom System and Micro Focus. By end of 2019, US alone is expected to contribute over 30% of growth in AIOps and it will also help the global IT industry reach over $5,000 billion by the end of this year. Research conducted by Infoholic also confirmed that AIOps has been implemented by 60% of the organizations to reduce noise alerts and identify real-time root cause analysis.

Changes initiated by enterprises to adopt AIOps

2019 will be the year to reveal the true value of AIOps through its applications. By now, organizations have realized that context and efficient integrations with existing systems are essential to successfully implement AIOps.

1. Data storage

Since AIOps need to operate on a large amount of data, it is essential that enterprises absorb data from reliable and disparate sources which, then, can be contextualized for use in AI and ML applications. For this process to work seamlessly, data must be stored in modern data lakes so that it can be free from traditional silos.

2. Technology partnership

Maintaining data accuracy is a constant struggle and in order to overcome such complexity, in 2019, there will be technology partnership between companies to deal with customer demands for better application program interface (APIs).

3. Automation of menial tasks

Organizations are trying to automate menial tasks to increase agility by freeing up resources. Through automation, organizations can explore a wide range of opportunities in AIOps that will increase their efficiency.

4. Streamling of people, process and tools

Although multi-cloud solutions provide flexibility and cost-efficiency, however, without proper tools to monitor, it can be challenging to manage them. Hence, enterprises are trying to streamline their people, process and tools to create a single, siloed-free overview to benefit from AIOps.

5. Use of real-time data

Enterprises are trying to ingest and use real-time data for event correlation and immediate anomaly detection since, with the current industrial pace, old data is useless to the market.

6. Usage of self-discovery tools

Organizations are trying to induce self-discovery tools in order to overcome the challenge of lack of data scientists in the market or IT personnel with coding skills to monitor the process. The self-discovery tools can operate without human intervention.

Conclusion

Between 2018 to 2024, the global AIOps market value of real time analytics and application performance management is expected to grow at a rapid pace. Also, it is observed that currently only 5% of large IT firms have adopted AIOps platforms due to lack of knowledge and assumption about the cost-effectiveness. However, this percentage is expected to reach 40% by 2022. Companies like CA Technologies, GAVS Technologies, Loom Systems and ScienceLogic has designed tools to simplify AIOps deployment and it is anticipated that over the next three years, there will be sizable progress in the AIOps market.

READ ALSO OUR NEW UPDATES

Pivotal Role of AI and Machine Learning in Industry 4.0 and Manufacturing

Industry 4.0 is a name given to the current trend of automation and data exchange in manufacturing technologies. It includes cyber-physical systems, the Internet of things, cloud computing and cognitive computing.Industry 4.0 is commonly referred to as the fourthindustrial revolution.

Industry 4.0 is the paving the path for digitization of the manufacturing sector, where artificial intelligence (AI) and machine-learning based systems are not only changing the ways we interact with information and computers but also revolutionizing it.

Compelling reasons for most companies to shift towards Industry 4.0 and automate manufacturing include;

  • Increase productivity
  • Minimize human / manual errors
  • Optimize production costs
  • Focus human efforts on non-repetitive tasks to improve efficiency

Manufacturing is now being driven by effective data management and AI that will decide its future. The more data sets computers are fed, the more they can observe trends, learn and make decisions that benefit the manufacturing organization. This automation will help to predict failures more accurately, predict workloads, detect and anticipate problems to achieve Zero Incidence.

GAVS’ proprietary AIOps based TechOps platform – Zero Incident Framework TM (ZIF) can successfully integrate AI and machine learning into the workflow allowing manufacturers to build robust technology foundations.

To maximize the many opportunities presented by Industry 4.0, manufacturers need to build a system with the entire production process in mind as it requires collaboration across the entire supply chain cycle.

Top ways in which ZIF’s expertise in AI and ML are revolutionizing manufacturing sector:

  • Asset management, supply chain management and inventory management are the dominant areas of artificial intelligence, machine learning and IoT adoption in manufacturing today. Combining these emerging technologies, they can improve asset tracking accuracy, supply chain visibility, and inventory optimization.
  • Improve predictive maintenance through better adoption of ML techniques like analytics, Machine Intelligence driven processes and quality optimization.
  • Reduce supply chain forecasting errors and reduce lost sales to increase better product availability.
  • Real time monitoring of the operational loads on the production floor helps in providing insights into the production schedule performances.
  • Achieve significant reduction in test and calibration time via accurate prediction of calibration and test results using machine learning.
  • Combining ML and Overall Equipment Effectiveness (OEE), manufacturers can improve yield rates, preventative maintenance accuracy and workloads by the assets. OEE is a universally used metric in manufacturing as it combines availability, performance, and quality, defining production effectiveness.
  • Improving the accuracy of detecting costs of performance degradation across multiple manufacturing scenarios that reduces costs by 50% or more.

Direct benefits of Machine Learning and AI for Manufacturing

The introduction of AI and Machine Learning to industry 4.0 represents a big change for manufacturing companies that can open new business opportunities and result in advantages like efficiency improvements among others.

  • Cost reduction through Predictive Maintenance that leads to less maintenance activity, which means lower labor costs, reduced inventory and materials wastage.
  • Predicting Remaining Useful Life (RUL): Keeping tabs on the behavior of machines and equipment leads to creating conditions that improve performance while maintaining machine health. By predicting RUL, it reduces the scenarios which causes unplanned downtime.
  • Improved supply chain management through efficient inventory management and a well monitored and synchronized production flow.
  • Autonomous equipment and vehicles: Use of autonomous cranes and trucks to streamline operations as they accept containers from transport vehicles, ships, trucks etc.
  • Better Quality Control with actionable insights to constantly raise product quality.
  • Improved human-machine collaboration while improving employee safety conditions and boosting overall efficiency.
  • Consumer-focused manufacturing: Being able to respond quickly to changes in the market demand.

Touch base with GAVS AI experts here: https://www.gavstech.com/reaching-us/ and see how we can help you drive your manufacturing operation towards Industry 4.0.