Lambda (λ), Kappa (κ) and Zeta (ζ) – The Tale of 3 AIOps Musketeers (PART-3)

“Data that sit unused are no different from data that were never collected in the first place.” – Doug Fisher

In the part 1 (https://bit.ly/3hDChCH), we delved into Lambda Architecture and in part 2 (https://bit.ly/3hDCg1B) about Generic Lambda. Given the limitations of the Generic lambda architecture and its inherent complexity, the data is replicated in two layers and keeping them in-sync is quite challenging in an already complex distributed system.There is a growing interest to find the simpler alternative to the Generic Lambda, that would bring just about the same benefits and handle the full problem set. The solution is Unified Lambda (λ) Architecture.

Unified Lambda (λ) Architecture

The unified approach addresses the velocity and volume problems of Big Data as it uses a hybrid computation model. This model combines both batch data and instantaneous data transparently.

There are basically three approaches:

  1. Pure Streaming Framework
  2. Pure Batch Framework
  3. Lambdoop Framework

1. Pure streaming framework

In this approach, a pure streaming model is adopted and a flexible framework like Apache Samza can be employed to provide unified data processing model for both stream and batch processing using the same data flow structure.

Pure streaming framework

To avoid the large turn-around times involved in Hadoop’s batch processing, LinkedIn came up with a distributed stream processing framework Apache Samza. It is built on top of distributed messaging bus; Apache Kafka, so that it can be a lightweight framework for streaming platform. i.e. for continuous data processing. Samza has built-in integration with Apache Kafka, which is comparable to HDFS and MapReduce. In the Hadoop world, HDFS is the storage layer and MapReduce, the processing layer. In the similar way, Apache Kafka ingests and stores the data in topics, which is then streamed and processed by Samza. Samza normally computes results continuously as and when the data arrives, thus delivering sub-second response times.

Albeit it’s a distributed stream processing framework, its architecture is pluggable i.e. can be integrated with umpteen sources like HDFS, Azure EventHubs, Kinensis etc. apart from Kafka. It follows the principle of WRITE ONCE, RUN ANYWHERE; meaning, the same code can run in both stream and batch mode. Apache Samza’s streams are re-playable, ordered partitions.

Unified API for Batch & Streaming in pure Streaming

Apache Samza offers a unified data processing model for both real-time as well as batch processing.  Based on the input data size, bounded or unbounded the data processing model can be identified, whether batch or stream.Typically bounded (e.g. static files on HDFS) are Batch data sources and streams are unbounded (e.g. a topic in Kafka). Under the bonnet, Apache Samza’s stream-processing engine handles both types with high efficiency.

Unified API for Batch & Streaming in pure Streaming

Another advantage of this unified API for Batch and Streaming in Apache Samza, is that makes it convenient for the developers to focus on the processing logic, without treating bounded and unbounded sources differently. Samza differentiates the bounded and unboundeddata by a special token end-of-stream. Also, only config change is needed, and no code changes are required, in case of switching gears between batch and streaming, e.g. Kafka to HDFS.Let us take an example of Count PageViewEvent for each mobile Device OS in a 5-minute window and send the counts to PageViewEventPerDeviceOS

Pure Batch framework

This is the reverse approach of pure streaming where a flexible Batch framework is employed, which would offer both the batch processing and real-time data processing ability. The streaming is achieved by using mini batches which is small enough to be close to real-time, with Apache Spark/Spark Streaming or Storm’s Trident. Under the hood, Spark streaming is a sequence of micro-batch processes with the sub-second latency. Trident is a high-level abstraction for doing streaming computation on top of Storm. The core data model of Trident is the “Stream”, processed as a series of batches.

Apache Spark achieves the dual goal of Batch as well as real-time processing by the following modes.

  • Micro-batch processing model
  • Continuous Processing model

Micro-batch processing model

Micro-batch processing is analogous to the traditional batch processing in that data are usually processed as a group. The primary difference is that the batches are smaller and processed more often. In spark streaming, the micro-batches are created based on the time rather than on the accumulated data size. The smaller the time to trigger a micro-batch to process, lesser the latency.

Continuous Processing model

Apache Spark 2.3, introduced Low-latency Continuous Processing Mode in Structured Streaming whichenables low (~1 ms) end-to-end latency with at-least-once fault-tolerance guarantees. Comparing this with the default micro-batch processing engine which can achieve exactly-once guarantees but achieve latencies of ~100 ms at best. Without modifying the application logic i.e. DataFrame/Dataset operations mini-batching or continuous streaming can be chosen at runtime. Spark Streaming also has the abilityto work well with several data sources like HDFS, Flume or Kafka.

Example of Micro-batching and Continuous Batching

3. Lambdoop Approach

In many places, capability of both batch and real time processing is needed.It is cumbersome to develop a software architecture of such capabilities by tailoring suitable technologies, software layers, data sources, data storage solutions, smart algorithms and so on to achieve the good scalable solution. This is where the frameworks like Spring “XD”, Summingbird or Lambdoop comes in, since they already have a combined API for batch and real-time processing.

Lambdoop

Lambdoop is a software framework based on the Lambda architecture which provides an abstraction layer to the developers. This feature makes the developers life easy to develop any Big Data applications by combining real time and batch processing approaches. Developers don’t have to deal with different technologies, configurations, data formats etc. They can use the Lambdoop framework as the only needed API. Also, Lambdoop includes other interesting tools such as input/output drivers, visualization tools, cluster management tools and widely accepted AI algorithms.

The Speed layer in Lambdoop runs on Storm and the Batch layer on Hadoop, Lambdoop (Lambda-Hadoop, with HBase, Storm and Redis) also combines batch/real-time by offering a single API for both processing models.

Summingbird

Summingbird aka‘Streaming MapReduce’ is a hybrid computational system where both the batch/streaming computations can be run at the same time and the results can be merged automatically. In Summingbird, the developer can write the code/job logic once and change the backend as and when needed. Following are the modes in which Summingbird Job/code can be executed.

  • batch mode (using Scalding on Hadoop)
  • real-time mode (using Storm)
  • hybrid batch/real-time mode (offers attractive fault-tolerance properties)

If the model assumes streaming, one-at-a-time semantics, then the code can be run in real-time e.g. Strom or in offline/batch mode e.g. Hadoop, spark etc. It can operate in a hybrid processing mode, when there is a need to transparently integrate batch and online results to efficiently generate up-to-date views over long time spans.

Conclusion

The volume of any Big Data platform is handled by building a batch processing application which requires, MapReduce, spark development, Use of other Hadoop related tools like Sqoop, Zookeeper, HCatalog etc. and storage systems like HBase, MongoDB, HDFS, Cassandra. At the same time the velocity of any Big Data platform is handled by building a real-time streaming application which requires, stream computing development using Storm, Samza, Kafka-connect, Apache Flink andS4, and use of temporal datastores like in-memory data stores, Apache Kafka messaging system etc.

The Unified Lambda handles the both Volume and Velocity if any Big Data platform by the intermixed approach of featuring a hybrid computation model, where both batch and real-time data processing are combined transparently. Also, the limitations of Generic Lambda like Dual execution mode, Replicating and maintaining the data sync between different layers are avoided and in the Unified Lambda, there would be only one system to learn and maintain.

About the Author:

Bargunan Somasundaram

Bargunan Somasundaram

Bargunan is a Big Data Engineer and a programming enthusiast. His passion is to share his knowledge by writing his experiences about them. He believes “Gaining knowledge is the first step to wisdom and sharing it is the first step to humanity.”

Growing Importance of Business Service Reliability

Business services are a set of business activities delivered to an outside party, such as a customer or a partner. Successful delivery of business services often depends on one or more IT services. For example, an IT business service that would support “order to cash”, as an example could be “supply chain service”. The supply chain service could be delivered by an application such as SAP, with the customer of that service being an employee in finance/accounting using the application to perform customer-facing services such as accounts receivable, or the collection of cash from an outside party. A business service is not simply the application that the end-user sees – it is the entire chain that supports the delivery of the service, including physical and virtualized servers, databases, middleware, storage, and networks. A failure in any of these can affect the service – and so it is crucial that IT organizations have an integrated, accurate, and up-to-date view of these components and of how they work together to provide the service.

The technologies for Social Networking, Mobile Applications, Analytics, Cloud (SMAC), and Artificial Intelligence (AI) are redefining the business and the services that businesses provide. Their widespread usage is changing the business landscape, increasing reliability and availability to levels that were unimaginable even a few years ago.

Availability versus Reliability

At first glance, it might seem that if a service has a high availability then it should also have high reliability. However, this is not necessarily the case. Availability and Reliability have different meanings, serve different purposes, and require different strategies to maintain desired standards of service levels. Reliability is the measure of how long a business service performs its intended function, whereas availability is the measure of the percentage of time a business service is operable. For example, a business service may be available 90% of the time, but reliable only 75% of the time from a performance standpoint. Service reliability can be seen as:

  • Probability of success
  • Durability
  • Dependability
  • Quality over time
  • Availability to perform a function

Merely having a service available isn’t sufficient. When a business service is available, it should actually serve the intended purpose under varying and unexpected conditions. One way to measure this performance is to evaluate the reliability of the service that is available to consume. The performance of a business service is now rated not by its availability, but by how consistently reliable it is. Take the example of mobile services – 4 bars of signal strength on your smartphone does not guarantee that the quality of the call you received or going to make. Organizations need to measure how well the service fulfills the necessary business performance needs.

Recognizing the importance of reliability, Google initiated Site Reliability Engineering (SRE) practices with a mission to protect, provide for, and progress the software and systems behind all of Google’s public services — Google Search, Ads, Gmail, Android, YouTube, and App Engine, to name just a few — with an ever-watchful eye on their availability, latency, performance, and capacity.

Zero Incident FrameworkTM (ZIF)

GAVS Technologies developed an AIOps based TechOps platform – Zero Incident FrameworkTM (ZIF) that enables proactive detection and remediation of incidents. The ZIF Platform is, available in two versions for our customers to evaluate and experience the power of AI-driven Business Service Reliability: 

ZIF Business Xpress: ZIF Business Xpress has been engineered for enterprises to evaluate AIOps before adoption. 10 to 40 devices can be connected to ZIFBusiness Xpress, to experiment with the value proposition. 

ZIF Business: Targeted for enterprise-wide adoption.

For more details, please visit https://zif.ai

About the Author:

Sri Chaganty


Sri is a Serial Entrepreneur with over 30 years’ experience delivering creative, client-centric, value-driven solutions for bootstrapped, and venture-backed startups.

Automating IT ecosystems with ZIF Remediate

Alwinking N Rajamani

Alwinking N Rajamani


Zero Incident FrameworkTM (ZIF) is an AIOps based TechOps platform that enables proactive detection and remediation of incidents helping organizations drive towards a Zero Incident Enterprise™. ZIF comprises of 5 modules, as outlined below.

This article’s focus is on the Remediate function of ZIF. Most ITSM teams envision a future of ticketless ITSM, driven by AI and Automation.

Remediate being a key module ofZIF, has more than 500+ connectors to various ITSMtools, Monitoring, Security and Incident management tools, storage/backup tools and others.Few of the connectors are referenced below that enables quick automation building.

Key Features of Remediate

  • Truly Agent-less software.
  • 300+ readily available templates – intuitive workflow/activity-based tool for process automation from a rich repository of pre-coded activities/templates.
  • No coding or programming required to create/deploy automated workflows. Easy drag & drop to sequence activities for workflow design.
  • Workflow execution scheduling for pre-determined time or triggering from events/notifications via email or SMS alerts.
  • Can be installed on-premise or on the cloud, on physical or virtual servers
  • Self Service portal for end-users/admins/help-desk to handle tasks &remediation automatically
  • Fully automated service management life cycle from incident creation to resolution and automatic closure
  • Has integration packs for all leading ITSM tools

Key features for futuristic Automation Solutions

Although the COVID pandemic has landed us in unprecedented times, we have been able to continue supporting our customers and enabled their IT operations with ZIF Remediate.

  • Self-learning capability to deliver Predictive/Prescriptive actionable alerts.
  • Access to multiple data sources and types – events, metrics, thresholds, logs, event triggers e.g. mail or SMS.
  • Support for a wide range of automation
    • Interactive Automation – Web, SMS, and email
    • Non-interactive automation – Silent based on events/trigger points
  • Supporting a wide range of advanced heuristics.

Benefits of AIOPS driven Automation

  • Faster MTTR
  • Instant identification of threats and appropriate responses
  • Faster delivery of IT services
  • Quality services leading to Employee and Customer satisfaction
  • Fulfillment and Alignment of IT services to business performance

Interactive and Non-interactive automation

Through our automation journey so far, we have understood that the best automation empowers humans, rather than replacing them. By implementing ZIF Remediate, organizations can empower their people to focus their attention on critical thinking and value-added activities and let our platform handle mundane tasks by bringing data-driven insights for decision making.

  • Interactive Automation – Web portal, Chatbot and SMS based
  • Non-interactive automations – Event or trigger driven automation

Involved decision driven Automations

ZIF Remediate has its unique, interactive automation capabilities, where many automation tools do not allow interactive decision making. Need approvals built into an automated change management process that involves sensitive aspects of your environment? Need numerous decision points that demand expert approval or oversight? We have the solution for you. Take an example of Phishing automation, here a domain or IP is blocked based on insights derived by mimicking an SOC engineer’s actions – parsing the observables i.e. URL, suspicious links or attachments in a phish mail and have those observables validated for threat against threat response tools, virus total, and others.

Some of the key benefits realized by our customers which include one of the largest manufacturing organizations, a financial services company, a large PR firm, health care organizations, and others.

  • Reduction of MTTR by 30% across various service requests.
  • Reduction of 40% of incidents/tickets, thus enabling productivity improvements.
  • Ticket triaging process automation resulting in a reduction of time taken by 50%.
  • Reclaiming TBs of storage space every week through snapshot monitoring and approval-driven model for a large virtualized environment.
  • Eliminating manual threat analysis by Phishing Automation, leading to man-hours being redirected towards more critical work.
  • Reduction of potential P1 outages by 40% through self-healing automations.

For more detailed information on ZIF Remediate, or to request a demo please visit https://zif.ai/products/remediate/

About the Author:

Alwin leads the Product Engineering for ZIF Remediate and zIrrus. He has over 20 years of IT experience spanning across Program & Portfolio Management for large customer accounts of various business verticals.

In his free time, Alwin loves going for long drives, travelling to scenic locales, doing social work and reading & meditating the Bible.

Machine Learning: Building Clustering Algorithms

Gireesh Sreedhar KP


Clustering is a widely-used Machine Learning (ML) technique. Clustering is an Unsupervised ML algorithm that is built to learn patterns from input data without any training, besides being able of processing data with high dimensions. This makes clustering the method of choice to solve a wide range and variety of ML problems.

Since clustering is widely used, for Data Scientists and ML Engineer’s it is critical to understand how to practically build clustering algorithms even though many of us have a high-level understanding of clustering. Let us understand the approach to build a clustering algorithm from scratch.

What is Clustering and how does it work?

Clustering is finding groups of objects (data) such that objects in the same group will be similar (related) to one another and different from (unrelated to) objects in other groups.

Clustering works on the concept of Similarity/Dissimilarity between data points. The higher similarity between data points, the more likely these data points will belong to the same cluster and higher the dissimilarity between data points, the more likely these data points will be kept out of the same cluster.

Similarity is the numerical measure of how alike two data objects are. Similarity will be higher when objects are more alike. Dissimilarity is the numerical measure of how different two data objects. Dissimilarity is lower when objects are more alike.

We create a ‘Dissimilarity Matrix’ (also called Distance Matrix) as an input to a clustering algorithm, where the dissimilarity matrix gives algorithm the notion of dissimilarity between objects. We build a dissimilarity matrix for each attribute of data considered for clustering and then combine the dissimilarity matrix for each data attribute to form an overall dissimilarity matrix. The dissimilarity matrix is an NxN square matrix where N is the number of data points considered for clustering and each element of the NxN square matrix gives dissimilarity between two objects.

Building Clustering Algorithm

Building a clustering algorithm involve the following:

  • Selection of most suited clustering techniques and algorithms to solve the problem. This step needs close collaboration among SMEs, business users, data scientists, and ML engineers. Based on inputs and data study, a possible list of algorithms (one or more) is selected for modeling and development along with tuning parameters are decided (to give algorithm more flexibility for tuning and learning from SME).
  • The selection of data attributes for the formulation of the dissimilarity matrix and methodology for the formation of the dissimilarity matrix (discussed later).
  • Building algorithms and doing the Design of experiments to select the best-suited algorithm and algorithm parameters for implementation.
  • Implementation of algorithm and fine-tuning of parameters as required.

Building a Dissimilarity matrix:

There are different approaches to build a dissimilarity matrix, here we consider building a dissimilarity matrix containing the distance (called Distance Matrix) between data objects (another alternative approach is to feed in coordinate points and let the algorithm compute distance). Let us consider a group of N data objects to be clustered based on three data attributes of each data object. The steps for building a Distance matrix are:

Build a Distance matrix for individual data attributes. Here we build three individual distance matrices (one for each attribute) containing distance between data objects calculated for each attribute. The data is always scaled between [0,1] using one of the standard normalization methods such as Min-Max Scalar. Here is how the distance matrix for an attribute looks like.

Properties of Distance Matrix:

  1. Distance Matrix is NxN square matrix (N – number of objects in clustering space)
  2. Matrix is symmetric with diagonal as zero (zero diagonal as distance of an object from itself is zero)
  3. For categorical data, distance between two points = 0, if both are same; =1 otherwise
  4. For numeric/ordered data, distance between two points = difference between scaled attribute values of two points.

Build Complete Distance matrix. Here we build a complete distance matrix combining distance matrix of individual attributes forming the input for clustering algorithm.

Complete distance matrix = (element-wise sum of individual attribute level matrix)/3;

Generalized Complete distance matrix = (element-wise sum of individual attribute level matrix)/M, where M is the number of attribute level matrix formed.

Considerations for the selection of clustering algorithms:

Before the selection of a clustering algorithm, the following considerations need to be evaluated to identify the right clustering algorithms for the given problem.

  • Partition criteria: Single Level vs hierarchical portioning
  • Separation of clusters: Exclusive (one data point belongs to only one class) vs non-exclusive (one data point can belong to more than one class)
  • Similarity measures: Distance-based vs Connectivity-based
  • Clustering space: Full space (used when low dimension data is processed) vs Subspace (used when high dimension data is processed, where only subspace can be processed and interesting clustering can be formed)
  • Attributes processing: Ability to deal with different types of attributes: Numerical, Categorical, Text, Media, a combination of data types in inputs
  • Discovery of clusters: Ability to form a predefined number of clusters or an arbitrary number of clusters
  • Ability to deal with noise in data
  • Scalability to deal with huge volumes of data, high dimensionality, incremental, or streaming data.
  • Ability to deal with constraints on user preference and domain requirements.

Application of Clustering

There are broadly two applications of clustering.

As an ML tool to get insight into data. Like building Recommendation Systems or Customer segmentation by clustering like-minded users or similar products, Social network analysis, Biological data analysis like Gene/Protein sequence analysis, etc.

As a pre-processing or intermediate step for other classes of algorithms. Like some Pattern-mining algorithms use clustering to group patterns mined and select most representative patterns instead of selecting entire patterns mined.

Conclusion

Building ML algorithm is teamwork with a team consisting of SMEs, users, data scientists, and ML engineers, each playing their part for success. The article gives steps to build a clustering algorithm, this can be used as reference material while attempting to build your algorithm.

About the Author:

Gireesh is a part of the projects run in collaboration with IIT Madras for developing AI solutions and algorithms. His interest includes Data Science, Machine Learning, Financial markets, and Geo-politics. He believes that he is competing against himself to become better than who he was yesterday. He aspires to become a well-recognized subject matter expert in the field of Artificial Intelligence.

Modern IT Infrastructure

Infrastructure today has grown beyond the physical confines of the traditional data center, has spread its wings to the cloud, and is increasingly distributed, virtual, and abstract. With the cloud gaining wide acceptance, most enterprises have their workloads spread across data centers, colocations, multi-cloud, and edge locations. On-premise infrastructure is also being replaced by Hyperconverged Infrastructure (HCI) where software-defined, virtualized compute, storage, and network are in one single system, greatly simplifying IT operations. Infrastructure is also becoming increasingly elastic, scales & shrinks on demand and doesn’t have to be provisioned upfront.

Let’s look at a few interesting technologies that are steering the modern IT landscape.

Containers and Serverless

Traditional application deployment on physical servers comes with the overhead of managing the infrastructure, middleware, development tools, and everything in between. Application developers would rather have this grunt work be handled by someone else, so they could focus on just their applications. This is where containers and serverless technologies come into picture. Both are cloud-based offerings and provide different levels of abstraction, in a way that hides layers beyond the front end, from the developer. They typically deploy smaller components of monolithic applications, microservices, and functions.

A Container is like an all-in-one-box, containing the app, and all its dependencies like libraries, executables & config files. The containerized application is highly portable, will run anywhere the container runtime is installed, and behave the same regardless of the OS or hardware it is deployed on. Containers give developers great flexibility and control since they cater to specific application requirements like the OS, S/W versions. The flip side is that there is still a need for manual maintenance of the runtime environment, like security patches, software updates, etc. Secondly, the flexibility it affords translates into high operational costs, since it lacks agility in scaling.

Serverless technologies provide much greater abstraction of the OS and infrastructure. ‘Serverless’ though, does not imply that there are no servers, it just means application developers do not have to worry about the underlying OS, the server environment, or the infra that their applications will be deployed on. Serverless is event-driven and is based on the premise that the application is split into functions that get executed based on events. The developer only needs to deploy function code and define the event(s) that will trigger them! The rest of the magic is done by the cloud service provider (with the help of third parties). 

The biggest advantage of serverless is that consumers are billed only for the running time of the function instances or the number of times the function gets executed, depending on the provider. Since it has zero administrative overhead, it guarantees rapid iterative deployment and faster time to market. Since the architecture is intrinsically auto-scaling, it is a perfect fit for applications with undefinable usage patterns. The other side of the coin is that developers need to deal with a black box back-end environment, so, holistic testing, debugging of the application becomes a challenge. Vendor lock-in is a real problem since the consumer is restricted by the technology stack supported by the vendor. Since serverless best practices dictate light, isolated functions with limited scope, building complex applications can get difficult. Function as a Service (FaaS) is a subset of serverless computing.

Internet of Things (IoT)

IoT is about connecting everyday things – beyond just computing devices or smartphones – to the internet. It is possible to convert practically anything into an IoT device, with a computer chip installation & internet access, and have it communicate independently with the internet – without any human intervention. But why would we want everyday things like for instance a watch or a light bulb, to become IoT devices? It’s in a bid to bridge the chasm between the physical and digital worlds and make the environment around us more intelligent, communicative, and responsive to our needs.

IoT’s use cases are just about everywhere; in personal devices, self-driving cars, smart homes, smart workspaces, smart cities, and industries across all verticals. For instance, live data from sensors in products while in use, gives good visibility into their operations on the ground, helps remediate issues proactively & aids improvements in design/manufacturing processes.

The Industrial Internet of Things (IIoT) is the use of IoT data in business, in tandem with Big Data, AI, Analytics, Cloud, and High-speed networks, with the primary goal of finding efficient business models to improve productivity & optimize expenditure. The need for real-time response to sensor data and advanced analytics to power insights has increased the demand for 5G networks for speed, cloud technologies for storage and computing, edge computing to reduce latency, and hyper-scale data centers for rapid scaling.

With IoT devices extending an organization’s infrastructure landscape, and the likelihood that IT staff may not even be aware of all the IoT devices in it is a security nightmare that could open corporate networks & sensitive data for attacks. Global standards and regulations for IoT device security are in the works. Until then, it is up to the enterprise security team to safeguard against IoT-related vulnerabilities.

Hyperscaling

The ability of infrastructure to rapidly scale out on a massive level is called hyperscaling.

Unprecedented needs for high-power computing and on-demand massive scalability has given rise to a new breed of hyperscale computing architectures, where traditional elements are replaced by hyper-converged, software-defined infrastructure with a high degree of virtualization. These hyperscale environments are characterized by high-density server racks, with software designed and specifically built for scale-out environments. Since high-density implies heavy power consumption, heating problems need to be handled by specialized cooling solutions like liquid cooling. Hyperscale data centre operators usually look for renewable energy options to save on power & cooling.

Today, there are several hundred hyperscale data centers in the world, with the dominant players being Microsoft, Google, Apple, Amazon & Facebook.

Edge Computing

Edge computing as the name indicates means moving data processing away from distant servers or the cloud, closer to the source of data.  This is to reduce latency and network bandwidth used for back & forth communication between the data source and the server. Edge, also called the network edge refers to where the data source connects to the internet. The explosive growth of IoT and applications like self-driving cars, virtual reality, smart cities for instance, that require real-time computing and analytics are paving the way for edge computing. Most cloud providers now provide geographically distributed edge servers. As with IoT devices, data at the edge can be a ticking security time bomb necessitating appropriate security mechanisms.

The evolution of IT technologies continuously raises the bar for the IT team. IT personnel have been forced to move beyond legacy practices and mindsets & constantly up-skill themselves to be able to ride the wave. For customers pampered by sophisticated technologies, round the clock availability of systems and immersive experiences have become baseline expectations. With more & more digitalization, there is increasing reliance on IT infrastructure and hence lesser tolerance for outages. The responsibilities of maintaining a high-performing IT infrastructure with near-zero downtime fall on the shoulders of the IT operations team.

This has underscored the importance of AI in IT operations since IT needs have now surpassed human capabilities. Gavs’ AI-powered Platform for IT operations, ZIF, caters to the entire ITOps spectrum, right from automated discovery of the landscape, monitoring, to predictive and prescriptive analytics that proactively drive the organization towards zero incidents. For more details, please visit https://zif.ai

About the Author:

Padmapriya Sridhar

Priya is part of the Marketing team at GAVS. She is passionate about Technology, Indian Classical Arts, Travel, and Yoga. She aspires to become a Yoga Instructor someday!

GAVS’ commitment during COVID-19

MARCH 23. 2020

Dear Client leaders & Partners,

I do hope all of you, your family and colleagues are keeping good health, as we are wading through this existential crisis of COVID 19.

This is the time for shared vulnerabilities and in all humility, we want to thank you for your business and continued trust. For us, the well being of our employees and the continuity of clients’ operations are our key focus. 

I am especially inspired by my GAVS colleagues who are supporting some of the healthcare providers in NYC. The GAVS leaders truly believe that they are integral members of these  institutions and it is incumbent upon them to support our Healthcare clients during these trying times.

We would like to confirm that 100% of our client operations are continuing without any interruptions and 100% of our offshore employees are successfully executing their responsibilities remotely using GAVS ZDesk, Skype, collaborating through online Azure ALM Agile Portal. GAVS ZIF customers are 100% supported 24X7 through ROTA schedule & fall back mechanism as a backup.

Most of GAVS Customer Success Managers, Client Representative Leaders, and Corporate Leaders have reached out to you with GAVS Business Continuity Plan and the approach that we have adopted to address the present crisis. We have put communication, governance, and rigor in place for client support and monitoring.  

GAVS is also reaching out to communities and hospitals as a part of our Corporate Social Responsibility.  

We have got some approvals from the local Chennai police authorities in Chennai to support the movement of our leaders from and to the GAVS facility and we have, through US India Strategic Partnership Forum applied for GAVS to be considered an Essential Service Provider in India.  

I have always maintained that GAVS is an IT Service concierge to all of our clients and we individually as leaders and members of GAVS are committed to our clients. We shall also ensure that our employees are safe. 

Thank you, 

Sumit Ganguli
GAVS Technologies


Heroes of GAVS | BronxCare

gavs

“Every day we witness these heroic acts: one example out of many this week was our own Kishore going into our ICU to move a computer without full PPE (we have a PPE shortage). The GAVS technicians who come into our hospital every day are, like our doctors and healthcare workers,  the true heroes of our time.” – Ivan Durbak, CIO, BronxCare

“I am especially inspired by my GAVS colleagues who are supporting some of the healthcare providers in NYC. The GAVS leaders truly believe that they are integral members of these institutions and it is incumbent upon them to support our Healthcare clients during these trying times. We thank the Doctors, Nurses and Medical Professionals of Bronx Care and we are privileged to be associated with them. We would like to confirm that 100% of our client operations are continuing without any interruptions and 100% of our offshore employees are successfully executing their responsibilities remotely using GAVS ZDesk, and other tools.” – Sumit Ganguli, CEO

The Hands that rock the cradle, also crack the code

It was an unguarded moment for my church-going, straight-laced handyman & landscaper, “ I am not sure if I am ready to trust a woman leader”, and finally the loss of first woman Presidential candidate in the US, that led me to ruminate about Women and Leadership and indulge in my most “ time suck” activities, google and peruse through Wikipedia.

I had known about this, but I was fascinated to reconfirm that the first programmer in the world was a woman, and daughter of the famed poet, Lord Byron, no less. The first Programmer in the World, Augusta Ada King-Noel, Countess of Lovelace nee Byron; was born in 1815 and was the only legitimate child of the poet laureate, Lord Byron and his wife Annabella. A month after Ada was born, Byron separated from his wife and forever left England. Ada’s mother remained bitter towards Lord Byron and promoted Ada’s interest in mathematics and logic in an effort to prevent her from developing what she saw as the insanity seen in her father.

Ada grew up being trained and tutored by famous mathematicians and scientists. She established a relationship with various scientists and authors, like Charles Dickens, etc..   Ada described her approach as “poetical science”[6] and herself as an “Analyst & Metaphysician”.

As a teenager, Ada’s prodigious mathematical talents, led her to have British mathematician Charles Babbage, as her mentor. By then Babbage had become very famous and had come to be known as ‘the father of computers’. Babbage was reputed to have developed the Analytical Engine. Between 1842 and 1843, Ada translated an article on the Analytical Engine, which she supplemented with an elaborate set of notes, simply called Notes. These notes contain what many consider to be the first computer program—that is, an algorithm designed to be carried out by a machine. As a result, she is often regarded as the first computer programmer. Ada died at a very young age of 36.

As an ode to her, the mathematical program used in the Defense Industry has been named Ada. And to celebrate our first Programmer, the second Tuesday of October has been named Ada Lovelace Day. ALD celebrates the achievement of women in Science, Technology and Engineering and Math (STEM). It aims to increase the profile of women in STEM and, in doing so, create new role models who will encourage more girls into STEM careers and support women already working in STEM.

Most of us applauded Benedict Cumberbatch’s turn as Alan Turing in the movie,  Imitation Game. We got to know about the contribution, that Alan Turning and his code breaking team at the Bletchley Park, played in singularly cracking the German Enigma code and how the code helped them to proactively know when the Germans were about to attack the Allied sites and in the process could conduct preemptive strikes. In the movie, Kiera Knightly played the role of Joan Clark Joan was an English code-breaker at the British Intelligence wing, MI5, at Bletchley Park during the World War II. She was appointed a Member of the Order of the British Empire (MBE) in 1947, because of the important part she essayed in decoding the famed German Enigma code along with Alan Turing and the team.

Joan Clark attended Cambridge University with a scholarship and there she gained a double first degree in mathematics. But the irony of it all was that she was denied a full degree, as till 1948, Cambridge only awarded degrees to men. The head of the Code-breakers group, Hugh Alexander,  described her as “one of the best in the section”, yet while promoting Joan Clark, they had initially given her a job title of a typist, as women were not allowed to be a Crypto Analyst. Clarke became deputy head of British Intelligence unit, Hut 8 in 1944.  She was paid less than the men and in the later years she believed that she was prevented from progressing further because of her gender.

In World War II the  US Army was tasked with a Herculean job to calculate the trajectories of ballistic missiles. The problem was that each equation took 30 hours to complete, and the Army needed thousands of them. So the Army, started to recruit every mathematician they could find. They placed ads in newspapers;  first in Philadelphia, then in New York City, then in far out west in places like Missouri, seeking women “computers” who could hand-compute the equations using mechanical desktop calculators. The selected applicants would be stationed at the  University of Pennsylvania in Philly. At the height of this program, the US Army employed more than 100 women calculators. One of the last women to join the team was a farm girl named Jean Jennings. To support the project, the US Army-funded an experimental project to automate the trajectory calculations. Engineers John Presper Eckert and John W. Mauchly, who are often termed as the Inventors of Mainframe computers, began designing the Electronic Numerical Integrator and Computer, or ENIAC as it was called.  That experimenting paid off: The 80-foot long, 8-foot tall, black metal behemoth, which contained hundreds of wires, 18,000 vacuum tubes, 40 8-foot cables, and 3000 switches, would become the first all-electric computer called ENIAC.

When the ENIAC was nearing completion in the spring of 1945, the US Army randomly selected six women, computer programmers,  out of the 100 or so workers and tasked them with programming the ENIAC. The engineers handed the women the logistical diagrams of ENIAC’s 40 panels and the women learned from there. They had no programming languages or compilers. Their job was to program ENIAC to perform the firing table equations they knew so well.

The six women—Francis “Betty” Snyder Holberton, Betty “Jean” Jennings Bartik, Kathleen McNulty Mauchly Antonelli, Marlyn Wescoff Meltzer, Ruth Lichterman Teitelbaum, and Frances Bilas Spence—had no documentation and no schematics to work with.

There was no language, no operating system, the women had to figure out what the computer was, how to interface with it, and then break down a complicated mathematical problem into very small steps that the ENIAC could then perform.  They physically hand-wired the machine,  using switches, cables, and digit trays to route data and program pulses. This might have been a very complicated and arduous task. The ballistic calculations went from taking 30 hours to complete by hand to taking mere seconds to complete on the ENIAC.

Unfortunately, ENIAC was not completed in time, hence could not be used during World War II. But 6 months after the end of the war, on February 14, 1946 The ENIAC was announced as a modern marvel in the US. There was praise and publicity for the Moore School of Electrical Engineering at the University of Pennsylvania, Eckert and Mauchly were heralded as geniuses. However, none of the key programmers, all the women were not introduced in the event. Some of the women appeared in photographs later, but everyone assumed they were just models, perfunctorily placed to embellish the photograph.

After the war, the government ran a campaign asking women to leave their jobs at the factories and the farms so returning soldiers could have their old jobs back. Most women did, leaving careers in the 1940s and 1950s and perforce were required to become homemakers. Unfortunately, none of the returning soldiers knew how to program the ENIAC.

All of these women programmers had gone to college at a time when most men in this country didn’t even go to college. So the Army strongly encouraged them to stay, and for the most part, they did, becoming the first professional programmers, the first teachers of modern programming, and the inventors of tools that paved the way for modern software.

The Army opened the ENIAC up to perform other types of non-military calculations after the war and Betty Holberton and Jean Jennings converted it to a stored-program machine. Betty went on to invent the first sort routine and help design the first commercial computers, the UNIVAC and the BINAC, alongside Jean. These were the first mainframe computers in the world.

Today the Indian IT  industry is at $ 160 B and is at 7.7 %age of the Indian GDP and employs approximately 2.5 Million direct employees and a very high percentage of them are women. Ginni Rommeti, Meg Whitman are the CEOs of IBM and HP while Sheryl Sandberg is the COO of Facebook. They along with Padmasree Warrior, ex CTO of CISCO have been able to crack the glass ceiling.    India boasts of Senior Leadership in leading IT companies like Facebook, IBM, CapGemini, HP, Intel  etc.. who happen to be women. At our company, GAVS, we are making an effort to put in policies, practices, culture that attract, retain, and nurture women leaders in IT. The IT industry can definitely be a major change agent in terms of employing a large segment of women in India and can be a transformative force for new vibrant India. We must be having our Indian Ada, Joan, Jean and Betty and they are working at ISRO, at Bangalore and Sriharikota, at the Nuclear Plants at Tarapur.

ABOUT THE AUTHOR

Sumit Ganguli

Sumit Ganguli

Understanding Reinforcement Learning in five minutes

Reinforcement learning (RL) is an area of Machine Learning (ML) that takes suitable actions to maximize rewards situations. The goal of reinforcement learning algorithms is to find the best possible action to take in a specific situation. Just like the human brain, it is rewarded for good choices and penalized for bad choices and learns from each choice. RL tries to mimic the way that humans learn new things, not from a teacher but via interaction with the environment. At the end, the RL learns to achieve a goal in an uncertain, potentially complex environment.

Understanding Reinforcement Learning

How does one learn cycling? How does a baby learn to walk? How do we become better at doing something with more practice? Let us explore learning to cycle to illustrate the idea behind RL.

Did somebody tell you how to cycle or gave you steps to follow? Or did you learn it by spending hours watching videos of people cycling? All these will surely give you an idea about cycling; but will it be enough to actually get you cycling? The answer is no. You learn to cycle only by cycling (action). Through trials and errors (practice), and going through all the positive experiences (positive reward) and negative experiences (negative rewards or punishments), before getting your balance and control right (maximum reward or best outcome). This analogy of how our brain learns cycling applies to reinforcement learning. Through trials, errors, and rewards, it finds the best course of action.

Components of Reinforcement Learning

The major components of RL are as detailed below:

  • Agent: Agent is the part of RL which takes actions, receives rewards for actions and gets a new environment state as a result of the action taken. In the cycling analogy, the agent is a human brain that decides what action to take and gets rewarded (falling is negative and riding is positive).
  • Environment: The environment represents the outside world (only relevant part of the world which the agent needs to know about to take actions) that interacts with agents. In the cycling analogy, the environment is the cycling track and the objects as seen by the rider.
  • State: State is the condition or position in which the agent is currently exhibiting or residing. In the cycling analogy, it will be the speed of cycle, tilting of the handle, tilting of the cycle, etc.
  • Action: What the agent does while interacting with the environment is referred to as action. In the cycling analogy, it will be to peddle harder (if the decision is to increase speed), apply brakes (if the decision is to reduce speed), tilt handle, tilt body, etc.
  • Rewards: Reward is an indicator to the agent on how good or bad the action taken was. In the cycling analogy, it can be +1 for not falling, -10 for hitting obstacles and -100 for falling, the reward for outcomes (+1, -10, -100) are defined while building the RL agent. Since the agent wants to maximize rewards, it avoids hitting and always tries to avoid falling.

Characteristics of Reinforcement Learning

Instead of simply scanning the datasets to find a mathematical equation that can reproduce historical outcomes like other Machine Learning techniques, reinforcement learning is focused on discovering the optimal actions that will lead to the desired outcome.

There are no supervisors to guide the model on how well it is doing. The RL agent gets a scalar reward and tries to figure out how good the action was.

Feedback is delayed. The agent gets an instant reward for action, however, the long-term effect of an action is known only later. Just like a move in chess may seem good at the time it is made, but may turn out to be a bad long term move as the game progress.

Time matters (sequential). People who are familiar with supervised and unsupervised learning will know that the sequence in which data is used for training does not matter for the outcome. However, for RL, since action and reward at current state influence future state and action, the time and sequence of data matters.

Action affects subsequent data RL agent receives.

Why Reinforcement Learning

The type of problems that reinforcement learning solves are simply beyond human capabilities. They are even beyond the solving capabilities of ML techniques. Besides, RL eliminates the need for data to learn, as the agent learns by interacting with the environment. This is a great advantage to solve problems where data availability or data collection is an issue.

Reinforcement Learning applications

RL is the darling of ML researchers now. It is advancing with incredible pace, to solve business and industrial problems and garnering a lot of attention due to its potential. Going forward, RL will be core to organizations’ AI strategies.

Reinforcement Learning at GAVS

Reinforcement Learning is core to GAVS’ AI strategy and is being actively pursued to power the IP led AIOps platform – Zero Incident FrameworkTM (ZIF). We had our first success on RL; developing an RL agent for automated log rotation in servers.

References:

Reinforcement Learning: An Introduction second edition by Richard S. Sutton and Andrew G. Barto

https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

About the Author:

Gireesh Sreedhar KP

Gireesh is a part of the projects run in collaboration with IIT Madras for developing AI solutions and algorithms. His interest includes Data Science, Machine Learning, Financial markets, and Geo-politics. He believes that he is competing against himself to become better than who he was yesterday. He aspires to become a well-recognized subject matter expert in the field of Artificial Intelligence.

Disaster Recovery for Modern Digital IT

A Disaster Recovery strategy includes policies, tools and processes for recovery of data and restoration of systems in the event of a disruption. The cause of disruption could be natural, like earthquakes/floods, or man-made like power outages, hardware failures, terror attacks or cybercrimes. The aim of Disaster Recovery(DR) is to enable rapid recovery from the disaster to minimize data loss, extent of damage, and disruption to business. DR is often confused with Business Continuity Planning(BCP). While BCP ensures restoration of the entire business, DR is a subset of that, with focus on IT infrastructure, applications and data.

IT disasters come at the cost of lost revenue, tarnished brand image, lowered customer confidence and even legal issues relating to data privacy and compliance. The impact can be so debilitating that some companies never fully recover from it. With the average cost of IT downtime running to thousands of dollars per minute, it goes without saying that an enterprise-grade disaster recovery strategy is a must-have.

Why do companies neglect this need?

Inspite of the obvious consequences of a disaster, many organizations shy away from investing in a DR strategy due to the associated expenditure. Without a clear ROI in sight, these organizations decide to risk the vulnerability to catastrophic disruptions. They instead make do with just data backup plans or secure only some of the most critical elements of their IT landscape.

Why is Disaster Recovery different today?

The ripple effects of modern digital infrastructure have forced an evolution in DR strategies. Traditional Disaster Recovery methods are being overhauled to cater to the new hybrid IT infrastructure environment. Some influencing factors:

  • The modern IT Landscape

o Infrastructure – Today’s IT environment is distributed between on-premise, colocation facilities, public/private cloud, as-a-service offerings and edge locations. Traditional data centres are losing their prominence and are having to share their monopoly with these modern technologies. This trend has significant advantages such as reduced CapEx in establishing data centers, reduced latency because of data being closer to the user, and high dynamic scalability.

o Data – Adding to the complexity of modern digital infrastructure is the exponential growth in data from varied sources and of disparate types like big data, mobile data, streaming content, data from cloud, social media, edge locations, IoT, to name a few.

  • Applications – The need for agility has triggered the shift away from monolith applications towards microservices that typically use containers to provide their execution environment. Containers are ephemeral and so scale, shrink, disappear or move between nodes based on demand.
  • While innovation in IT helps digital transformation in unimaginable ways, it also makes it that much harder for IT teams to formulate a disaster recovery strategy for today’s IT landscape that is distributed, mobile, elastic and transient.
  • Cybercrimes are becoming increasingly prevalent and are a big threat to organizations. Moderntechnologies fuel increasing sophistication in malware and ransomware. As their complexity increases, they are becoming harder to even detect while they lie low and do their harm quietly inside the environment. By the time they are detected, the damage is done and it’s too late. DR strategies are also constantly challenged by the lucrative underworld of ransomware.

Solution Strategies for Disaster Recovery

  • On-Premise DR: This is the traditional option that translates toheavy upfront investments towardsthe facility, securing the facility, infrastructure including the network connectivity/firewalls/load balancers, resources to scale as needed, manpower, test drills, ongoing management and maintenance, software licensing costs, periodic upgrades for ongoing compatibility with the production environment and much more.

A comprehensive DR strategy involves piecing together several pieces of a complex puzzle. Due to the staggering costs and time involved in provisioning and managing infra for the duplicate storage and compute, companies are asking themselves if it is really worth the investment, and are starting to explore more OpEx based solutions. And, they are discovering that the cloud may be the answer to this challenge of evolving infra, offering cost-effective top-notch resiliency.

  • Cloud-based DR: The easy availability of public cloud infrastructure & services, with affordablemonthly subscription plans and pay per use rates, has caused an organic switch to the cloud for storage, infra and as a Service(aaS) needs. To complement this, replication techniques have also evolved to enable cloud replication. With backup on the cloud, the recovery environment needs to be paid for only when used in the event of a disaster!

Since maintaining the DR site is the vendor’s responsibility, it reduces the complexity in managing the DR site and the associated operating expenses as well. Most DR requirements are intrinsically built into cloud solutions: redundancy, advanced networks, bandwidth, scalability, security & compliance. These can be availed on demand, as necessitated by the environment and recovery objectives. These features have made it feasible for even small businesses to acquire DR capabilities.

Disaster Recovery-as-a-Service(DRaaS) which is fast gaining popularity, is a DR offering on the cloud, where the vendor manages the replication, failover and failback mechanisms as needed for recovery, based on a SLA driven service contract .

On the flip side, as cloud adoption becomes more and more prevalent, there are also signs of a reverse drain back to on-premise! Over time, customers are noticing that they are bombarded by hefty cloud usage bills, way more than what they had bargained for. There is a steep learning curve

in assimilating the nuances of new cloud technologies and the innumerable options they offer. It is critical for organizations to clearly evaluate their needs, narrow down on reliable vendors with mature offerings, understand their feature set and billing nitty-gritties and finalize the best fit for their recovery goals. So, it is Cloud, but with Caution!

  • Integrating DR with the Application: Frank Jablonski, VP of Global Marketing, SIOS Technology Corppredicts that applications will soon have Disaster Recovery architected into their core, as a value-add. Cloud-native implementations will leverage the resiliency features of the cloud to deliver this value.

The Proactive Approach

Needless to say, investing in a proactive approach for disaster prevention will help mitigate the chances for a disaster in the first place. One sure-fire way to optimize IT infrastructure performance, prevent certain types of disasters and enhance business services continuity is to use AI augmented ITOps platforms to manage the IT environment. GAVS’ AIOps platform, Zero Incident FrameworkTM(ZIF) has modules powered by Advanced Machine Learning to Discover, Monitor, Analyze, Predict, and Remediate, helping organizations drive towards a Zero Incident EnterpriseTM. For more information, please visit the ZIF website.

READ ALSO OUR NEW UPDATES

CCPA for Healthcare

The California Consumer Privacy Act (CCPA) is a state statute intended to enhance consumer protection and data privacy rights of the residents of California, United States. It is widely considered one of the most sweeping consumer privacy laws, giving Californians the strongest data privacy rights in the U.S.

The focus of this article is CCPA as it applies to Healthcare. Let’s take a quick look at what CCPA is and then move onto its relevance for Healthcare entities. CCPA is applicable to any for-profit organization – regardless of whether it physically operates out of California – that interacts with, does business with and/or collects, processes or monetizes personal information of California residents AND meets at least one of these criteria: has annual gross revenue in excess of $25 million USD; collects or transacts with the personal information of 50,000 or more California consumers, households, or devices; earns 50% or more of its annual revenue by monetizing such data. CCPA also empowers California consumers with the rights to complete ownership; control; and security of their personal information and imposes new stringent responsibilities on businesses to enable these rights for their consumers.

Impact on Healthcare Companies

Companies directly or indirectly involved in the healthcare sector and dealing with medical information are regulated by the Confidentiality of Medical Information Act (CMIA) and the Health Insurance Portability and Accountability Act (HIPAA). CCPA does not supersede these laws & does not apply to ‘Medical Information (MI)’ as defined by CMIA, or to ‘Protected Health Information (PHI)’ as defined by HIPAA. CCPA also excludes de- identified data and information collected by federally-funded clinical trials, since such research studies are regulated by the ‘Common Rule’.

The focus of the CCPA is ‘Personal Information (PI)’ which means information that “identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household.” PI refers to data including but not limited to personal identifiers such as name, address, phone numbers, email ids, social security number; personal details relating to education, employment, family, finances; biometric information, geolocation, consumer activity like purchase history, product preferences; internet activity.

So, if CCPA only regulates personal information, are healthcare companies that are already in compliance with CMIA and HIPAA safe? Is there anything else they need to do?

Well, there is a lot that needs to be done! This only implies that such companies should continue to comply with those rules when handling Medical Information as defined by the CMIA, or Protected Health Information, as defined by HIPAA. They will still need to adhere to CCPA regulations for personal data that is outside of MI and PHI. This will include

employee personal information routinely obtained and processed by the company’s HR; those collected from websites, health apps, health devices, events; clinical studies that are not funded by the federal government; information of a CCPA-covered entity that is handled by a non-profit affiliate, to give a few examples.

There are several possibilities – some not so apparent – even in healthcare entities, for personal data collection and handling that would fall under the purview of CCPA. They need to take stock of the different avenues through which they might be obtaining/handling such data and prioritize CCPA compliance. Else, with the stringent CCPA regulations, they could quickly find themselves embroiled in class action lawsuits (which by the way, do not require proof of damage to the plaintiff) in case of data breaches, or statutory penalties of up to $7500 for each violation.

The good news is that since CCPA carves out a significant chunk of data that healthcare companies/those involved in healthcare-related functions collect and process, entities that are already complying with HIPAA and CMIA are well into the CCPA compliance journey. A peek into the kind of data CMIA & HIPAA regulate will help gauge what other data needs to be taken care of.

CMIA protects the confidentiality of Medical Information (MI) which is “individually identifiable information, in electronic or physical form, in possession of or derived from a provider of health care, health care service plan, pharmaceutical company, or contractor regarding a patient’s medical history, mental or physical condition, or treatment.”

HIPAA regulates how healthcare providers, health plans, and healthcare clearinghouses, referred to as ‘covered entities’ can use and disclose Protected Health Information (PHI), and requires these entities to enable protection of data privacy. PHI refers to individually identifiable medical information such as medical records, medical bills, lab tests, scans and the like. This also covers PHI in electronic form(ePHI). The privacy and security rule of HIPAA is also applicable to ‘business associates’ who provide services to the ‘coveredentities’ that involve the use or disclosure of PHI.

Two other types of data that are CCPA exempt are Research Data & De-Identified Data. As mentioned above, the ‘Common Rule’ applies only to federally-funded research studies, and the CCPA does not provide much clarity on exemption status for data from clinical trials that are not federally-funded.

And, although the CCPA does not apply to de-identified data, the definitions of de-identified data of HIPAA and CCPA slightly differ which makes it quite likely that de-identified data by HIPAA standards may not qualify under CCPA standards and therefore would not be exempt from CCPA regulations.

Compliance Approach

Taking measures to ensure compliance with regulations is cumbersome and labour-intensive, especially with the constantly evolving regulatory environment. Using this opportunity for a proactive, well-thought-out approach for comprehensive enterprise-wide data security and governance will be strategically wise since it will minimize the need for policy and process rehaul with each new regulation.

The most crucial step is a thorough assessment of the following:

  • Policies, procedures, workflows, entities relating to/involved in data collection, sharing and processing, in order to arrive at clear enterprise-wide data mapping; to determine what data, data activities, data policies would fall under the scope of CCPA; and to identify gaps and decide on prioritized action items for compliance.
  • Business processes, contracts, terms of agreement with affiliates, partners and third-party entities the company does business with, to understand CCPA applicability. In some cases,

HIPAA and CMIA may be applicable to only the healthcare-related business units, subjecting other business units to CCPA compliance.

  • Current data handling methods, not just its privacy & security. CCPA dictates that companies need to have mechanisms put in place to cater to CCPA consumer right to request all information relating to the personal data collected about them, right to opt-out of sale of their data, right to have their data deleted by the organization (which will extend to 3rd parties doing business with this organization as well).

Consumer Consent Management

With CCPA giving full ownership and control of personal data back to its owners, consent management mechanisms become the pivot of a successful compliance strategy. An effective mechanism will ensure proper administration and enforcement of consumer authorizations.

Considering the limitations of current market solutions for data privacy and security, GAVS has come up with its Blockchain-based Rhodium Framework (pending patent) for Customer Master Data Management and Compliance with Data Privacy Laws like CCPA.

You can get more details on CCPA in general and GAVS’ solution for true CCPA Compliance in our White Paper, Blockchain Solution for CCPA Compliance.

READ ALSO OUR NEW UPDATES