Modern IT Infrastructure

Infrastructure today has grown beyond the physical confines of the traditional data center, has spread its wings to the cloud, and is increasingly distributed, virtual, and abstract. With the cloud gaining wide acceptance, most enterprises have their workloads spread across data centers, colocations, multi-cloud, and edge locations. On-premise infrastructure is also being replaced by Hyperconverged Infrastructure (HCI) where software-defined, virtualized compute, storage, and network are in one single system, greatly simplifying IT operations. Infrastructure is also becoming increasingly elastic, scales & shrinks on demand and doesn’t have to be provisioned upfront.

Let’s look at a few interesting technologies that are steering the modern IT landscape.

Containers and Serverless

Traditional application deployment on physical servers comes with the overhead of managing the infrastructure, middleware, development tools, and everything in between. Application developers would rather have this grunt work be handled by someone else, so they could focus on just their applications. This is where containers and serverless technologies come into picture. Both are cloud-based offerings and provide different levels of abstraction, in a way that hides layers beyond the front end, from the developer. They typically deploy smaller components of monolithic applications, microservices, and functions.

A Container is like an all-in-one-box, containing the app, and all its dependencies like libraries, executables & config files. The containerized application is highly portable, will run anywhere the container runtime is installed, and behave the same regardless of the OS or hardware it is deployed on. Containers give developers great flexibility and control since they cater to specific application requirements like the OS, S/W versions. The flip side is that there is still a need for manual maintenance of the runtime environment, like security patches, software updates, etc. Secondly, the flexibility it affords translates into high operational costs, since it lacks agility in scaling.

Serverless technologies provide much greater abstraction of the OS and infrastructure. ‘Serverless’ though, does not imply that there are no servers, it just means application developers do not have to worry about the underlying OS, the server environment, or the infra that their applications will be deployed on. Serverless is event-driven and is based on the premise that the application is split into functions that get executed based on events. The developer only needs to deploy function code and define the event(s) that will trigger them! The rest of the magic is done by the cloud service provider (with the help of third parties). 

The biggest advantage of serverless is that consumers are billed only for the running time of the function instances or the number of times the function gets executed, depending on the provider. Since it has zero administrative overhead, it guarantees rapid iterative deployment and faster time to market. Since the architecture is intrinsically auto-scaling, it is a perfect fit for applications with undefinable usage patterns. The other side of the coin is that developers need to deal with a black box back-end environment, so, holistic testing, debugging of the application becomes a challenge. Vendor lock-in is a real problem since the consumer is restricted by the technology stack supported by the vendor. Since serverless best practices dictate light, isolated functions with limited scope, building complex applications can get difficult. Function as a Service (FaaS) is a subset of serverless computing.

Internet of Things (IoT)

IoT is about connecting everyday things – beyond just computing devices or smartphones – to the internet. It is possible to convert practically anything into an IoT device, with a computer chip installation & internet access, and have it communicate independently with the internet – without any human intervention. But why would we want everyday things like for instance a watch or a light bulb, to become IoT devices? It’s in a bid to bridge the chasm between the physical and digital worlds and make the environment around us more intelligent, communicative, and responsive to our needs.

IoT’s use cases are just about everywhere; in personal devices, self-driving cars, smart homes, smart workspaces, smart cities, and industries across all verticals. For instance, live data from sensors in products while in use, gives good visibility into their operations on the ground, helps remediate issues proactively & aids improvements in design/manufacturing processes.

The Industrial Internet of Things (IIoT) is the use of IoT data in business, in tandem with Big Data, AI, Analytics, Cloud, and High-speed networks, with the primary goal of finding efficient business models to improve productivity & optimize expenditure. The need for real-time response to sensor data and advanced analytics to power insights has increased the demand for 5G networks for speed, cloud technologies for storage and computing, edge computing to reduce latency, and hyper-scale data centers for rapid scaling.

With IoT devices extending an organization’s infrastructure landscape, and the likelihood that IT staff may not even be aware of all the IoT devices in it is a security nightmare that could open corporate networks & sensitive data for attacks. Global standards and regulations for IoT device security are in the works. Until then, it is up to the enterprise security team to safeguard against IoT-related vulnerabilities.


The ability of infrastructure to rapidly scale out on a massive level is called hyperscaling.

Unprecedented needs for high-power computing and on-demand massive scalability has given rise to a new breed of hyperscale computing architectures, where traditional elements are replaced by hyper-converged, software-defined infrastructure with a high degree of virtualization. These hyperscale environments are characterized by high-density server racks, with software designed and specifically built for scale-out environments. Since high-density implies heavy power consumption, heating problems need to be handled by specialized cooling solutions like liquid cooling. Hyperscale data centre operators usually look for renewable energy options to save on power & cooling.

Today, there are several hundred hyperscale data centers in the world, with the dominant players being Microsoft, Google, Apple, Amazon & Facebook.

Edge Computing

Edge computing as the name indicates means moving data processing away from distant servers or the cloud, closer to the source of data.  This is to reduce latency and network bandwidth used for back & forth communication between the data source and the server. Edge, also called the network edge refers to where the data source connects to the internet. The explosive growth of IoT and applications like self-driving cars, virtual reality, smart cities for instance, that require real-time computing and analytics are paving the way for edge computing. Most cloud providers now provide geographically distributed edge servers. As with IoT devices, data at the edge can be a ticking security time bomb necessitating appropriate security mechanisms.

The evolution of IT technologies continuously raises the bar for the IT team. IT personnel have been forced to move beyond legacy practices and mindsets & constantly up-skill themselves to be able to ride the wave. For customers pampered by sophisticated technologies, round the clock availability of systems and immersive experiences have become baseline expectations. With more & more digitalization, there is increasing reliance on IT infrastructure and hence lesser tolerance for outages. The responsibilities of maintaining a high-performing IT infrastructure with near-zero downtime fall on the shoulders of the IT operations team.

This has underscored the importance of AI in IT operations since IT needs have now surpassed human capabilities. Gavs’ AI-powered Platform for IT operations, ZIF, caters to the entire ITOps spectrum, right from automated discovery of the landscape, monitoring, to predictive and prescriptive analytics that proactively drive the organization towards zero incidents. For more details, please visit

About the Author:

Padmapriya Sridhar

Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel, and Yoga. She aspires to become a Yoga Instructor someday!

Monitoring Microservices and Containers

Monitoring applications and infrastructure is a critical part of IT Operations. Among other things, monitoring provides alerts on failures, alerts on deteriorations that could potentially lead to failures, and performance data that can be analysed to gain insights. AI-led IT Ops Platforms like ZIF use such data from their monitoring component to deliver pattern recognition-based predictions and proactive remediation, leading to improved availability, system performance and hence better user experience.

The shift away from monolith applications towards microservices has posed a formidable challenge for monitoring tools. Let’s first take a quick look at what microservices are, to understand better the complications in monitoring them.

Monoliths vs Microservices

A single application(monolith) is split into a number of modular services called microservices, each of which typically caters to one capability of the application. These microservices are loosely coupled, can communicate with each other and can be deployed independently.

Quite likely the trigger for this architecture was the need for agility. Since microservices are stand-alone modules, they can follow their own build/deploy cycles enabling rapid scaling and deployments. They usually have a small codebase which aids easy maintainability and quick recovery from issues. The modularity of these microservices gives complete autonomy over the design, implementation and technology stack used to build them.

Microservices run inside containers that provide their execution environment. Although microservices could also be run in virtual machines(VMs), containers are preferred since they are comparatively lightweight as they share the host’s operating system, unlike VMs. Docker and CoreOS Rkt are a couple of commonly used container solutions while Kubernetes, Docker Swarm, and Apache Mesos are popular container orchestration platforms. The image below depicts microservices for hiring, performance appraisal, rewards & recognition, payroll, analytics and the like linked together to deliver the HR function.

Challenges in Monitoring Microservices and Containers

Since all good things come at a cost, you are probably wondering what it is here… well, the flip side to this evolutionary architecture is increased complexity! These are some contributing factors:

Exponential increase in the number of objects: With each application replaced by multiple microservices, 360-degree visibility and observability into all the services, their interdependencies, their containers/VMs, communication channels, workflows and the like can become very elusive. When one service goes down, the environment gets flooded with notifications not just from the service that is down, but from all services dependent on it as well. Sifting through this cascade of alerts, eliminating noise and zeroing in on the crux of the problem becomes a nightmare.

Shared Responsibility: Since processes are fragmented and the responsibility for their execution, like for instance a customer ordering a product online, is shared amongst the services, basic assumptions of traditional monitoring methods are challenged. The lack of a simple linear path, the need to collate data from different services for each process, inability to map a client request to a single transaction because of the number of services involved make performance tracking that much more difficult.

Design Differences: Due to the design/implementation autonomy that microservices enjoy, they could come with huge design differences, and implemented using different technology stacks. They might be using open source or third-party software that makes it difficult to instrument their code, which in turn affects their monitoring.

Elasticity and Transience: Elastic landscapes where infrastructure scales or collapses based on demand, instances appear & disappear dynamically, have changed the game for monitoring tools. They need to be updated to handle elastic environments, be container-aware and stay in-step with the provisioning layer. A couple of interesting aspects to handle are: recognizing the difference between an instance that is down versus an instance that is no longer available; data of instances that are no longer alive continue to have value for analysis of operational efficiency or past performance.

Mobility: This is another dimension of dynamic infra where objects don’t necessarily stay in the same place, they might be moved between data centers or clouds for better load balancing, maintenance needs or outages. The monitoring layer needs to arm itself with new strategies to handle moving targets.

Resource Abstraction: Microservices deployed in containers do not have a direct relationship with their host or the underlying operating system. This abstraction is what helps seamless migration between hosts but comes at the expense of complicating monitoring.

Communication over the network: The many moving parts of distributed applications rely completely on network communication. Consequently, the increase in network traffic puts a heavy strain on network resources necessitating intensive network monitoring and a focused effort to maintain network health.

What needs to be measured

This is a high-level laundry list of what needs to be done/measured while monitoring microservices and their containers.

Auto-discovery of containers and microservices:

As we’ve seen, monitoring microservices in a containerized world is a whole new ball game. In the highly distributed, dynamic infra environment where ephemeral containers scale, shrink and move between nodes on demand, traditional monitoring methods using agents to get information will not work. The monitoring system needs to automatically discover and track the creation/destruction of containers and explore services running in them.


  • Availability and performance of individual services
  • Host and infrastructure metrics
  • Microservice metrics
  • APIs and API transactions
    • Ensure API transactions are available and stable
    • Isolate problematic transactions and endpoints
  • Dependency mapping and correlation
  • Features relating to traditional APM


  • Detailed information relating to each container
    • Health of clusters, master and slave nodes
  • Number of clusters
  • Nodes per cluster
  • Containers per cluster
    • Performance of core Docker engine
    • Performance of container instances

Things to consider while adapting to the new IT landscape

Granularity and Aggregation: With the increase in the number of objects in the system, it is important to first understand the performance target of what’s being measured – for instance, if a service targets 99% uptime(yearly), polling it every minute would be an overkill. Based on this, data granularity needs to be set prudently for each aspect measured, and can be aggregated where appropriate. This is to prevent data inundation that could overwhelm the monitoring module and drive up costs associated with data collection, storage, and management.    

Monitor Containers: The USP of containers is the abstraction they provide to microservices, encapsulating and shielding them from the details of the host or operating system. While this makes microservices portable, it makes them hard to reach for monitoring. Two recommended solutions for this are to instrument the microservice code to generate stats and/or traces for all actions (can be used for distributed tracing) and secondly to get all container activity information through host operating system instrumentation.    

Track Services through the Container Orchestration Platform: While we could obtain container-level data from the host kernel, it wouldn’t give us holistic information about the service since there could be several containers that constitute a service. Container-native monitoring solutions could use metadata from the container orchestration platform by drilling into appropriate layers of the platform to obtain service-level metrics. 

Adapt to dynamic IT landscapes: As mentioned earlier, today’s IT landscape is dynamically provisioned, elastic and characterized by mobile and transient objects. Monitoring systems themselves need to be elastic and deployable across multiple locations to cater to distributed systems and leverage native monitoring solutions for private clouds.

API Monitoring: Monitoring APIs can provide a wealth of information in the black box world of containers. Tracking API calls from the different entities – microservices, container solution, container orchestration platform, provisioning system, host kernel can help extract meaningful information and make sense of the fickle environment.

Watch this space for more on Monitoring and other IT Ops topics. You can find our blog on Monitoring for Success here, which gives an overview of the Monitorcomponent of GAVS’ AIOps Platform, Zero Incident FrameworkTM (ZIF). You can Request a Demo or Watch how ZIF works here.

About the Author:

Sivaprakash Krishnan

Bio – Siva is a long timer at Gavs and has been with the company for close to 15 years. He started his career as a developer and is now an architect with a strong technology background in Java, Big Data, DevOps, Cloud Computing, Containers and Micro Services. He has successfully designed & created a stable Monitoring Platform for ZIF, and designed & driven cloud assessment and migration, enterprise BRMS and IoT based solutions for many of our customers. He is currently focused on building ZIF 4.0, a new gen business-oriented TechOps platform.

Padmapriya Sridhar

Bio – Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel and Yoga. She aspires to become a Yoga Instructor some day!

Pivotal Role of AI and Machine Learning in Industry 4.0 and Manufacturing

Industry 4.0 is a name given to the current trend of automation and data exchange in manufacturing technologies. It includes cyber-physical systems, the Internet of things, cloud computing and cognitive computing.Industry 4.0 is commonly referred to as the fourthindustrial revolution.

Industry 4.0 is the paving the path for digitization of the manufacturing sector, where artificial intelligence (AI) and machine-learning based systems are not only changing the ways we interact with information and computers but also revolutionizing it.

Compelling reasons for most companies to shift towards Industry 4.0 and automate manufacturing include;

  • Increase productivity
  • Minimize human / manual errors
  • Optimize production costs
  • Focus human efforts on non-repetitive tasks to improve efficiency

Manufacturing is now being driven by effective data management and AI that will decide its future. The more data sets computers are fed, the more they can observe trends, learn and make decisions that benefit the manufacturing organization. This automation will help to predict failures more accurately, predict workloads, detect and anticipate problems to achieve Zero Incidence.

GAVS’ proprietary AIOps based TechOps platform – Zero Incident Framework TM (ZIF) can successfully integrate AI and machine learning into the workflow allowing manufacturers to build robust technology foundations.

To maximize the many opportunities presented by Industry 4.0, manufacturers need to build a system with the entire production process in mind as it requires collaboration across the entire supply chain cycle.

Top ways in which ZIF’s expertise in AI and ML are revolutionizing manufacturing sector:

  • Asset management, supply chain management and inventory management are the dominant areas of artificial intelligence, machine learning and IoT adoption in manufacturing today. Combining these emerging technologies, they can improve asset tracking accuracy, supply chain visibility, and inventory optimization.
  • Improve predictive maintenance through better adoption of ML techniques like analytics, Machine Intelligence driven processes and quality optimization.
  • Reduce supply chain forecasting errors and reduce lost sales to increase better product availability.
  • Real time monitoring of the operational loads on the production floor helps in providing insights into the production schedule performances.
  • Achieve significant reduction in test and calibration time via accurate prediction of calibration and test results using machine learning.
  • Combining ML and Overall Equipment Effectiveness (OEE), manufacturers can improve yield rates, preventative maintenance accuracy and workloads by the assets. OEE is a universally used metric in manufacturing as it combines availability, performance, and quality, defining production effectiveness.
  • Improving the accuracy of detecting costs of performance degradation across multiple manufacturing scenarios that reduces costs by 50% or more.

Direct benefits of Machine Learning and AI for Manufacturing

The introduction of AI and Machine Learning to industry 4.0 represents a big change for manufacturing companies that can open new business opportunities and result in advantages like efficiency improvements among others.

  • Cost reduction through Predictive Maintenance that leads to less maintenance activity, which means lower labor costs, reduced inventory and materials wastage.
  • Predicting Remaining Useful Life (RUL): Keeping tabs on the behavior of machines and equipment leads to creating conditions that improve performance while maintaining machine health. By predicting RUL, it reduces the scenarios which causes unplanned downtime.
  • Improved supply chain management through efficient inventory management and a well monitored and synchronized production flow.
  • Autonomous equipment and vehicles: Use of autonomous cranes and trucks to streamline operations as they accept containers from transport vehicles, ships, trucks etc.
  • Better Quality Control with actionable insights to constantly raise product quality.
  • Improved human-machine collaboration while improving employee safety conditions and boosting overall efficiency.
  • Consumer-focused manufacturing: Being able to respond quickly to changes in the market demand.

Touch base with GAVS AI experts here: and see how we can help you drive your manufacturing operation towards Industry 4.0.