Today the role of the Database Administrator (DBA) has evolved beyond the traditional responsibilities of maintaining database systems, ensuring performance, and managing backups. DBAs are now critical players in implementing and maintaining data governance frameworks. Data governance is essential for organizations to manage their data assets effectively, ensuring data quality, security, and compliance with regulations. In this article, we will explore the concept of data governance, its importance, and the pivotal role DBAs play in this crucial area.
Understanding Data Governance
Data governance refers to the collection of practices, policies, and standards used to manage and control data within an organization. It encompasses the processes and responsibilities that ensure data is accurate, consistent, secure, and used appropriately. Effective data governance enables organizations to derive maximum value from their data while minimizing risks related to data breaches, non-compliance, and poor data quality.
The core components of data governance include:
- Data Quality Management: Ensuring data is accurate, complete, and reliable
- Data Security: Protecting data from unauthorized access and breaches
- Data Compliance: Adhering to relevant laws and regulations
- Data Stewardship: Assigning responsibility for data management tasks
- Data Architecture: Structuring data in a way that supports business needs
- Data Policies and Procedures: Establishing guidelines for data management and usage
Data governance is crucial for several reasons, but one of the most important reasons is to ensure that your organization is complying with governmental and industry regulations, that is, regulatory compliance.
Regulatory compliance is the goal of organizations in their efforts to ensure that they are aware of and take steps to comply with relevant laws, policies, and regulations. Regulations dictate the type of data we must protect, the type of controls we should use, and the penalties for non-compliance. Such penalties may involve individuals who willingly violate privacy law. According to the U.S. Department of Justice, “any officer or employee of an agency, …who willfully discloses the material (PII) in any manner to any person or agency not entitled to receive it, shall be guilty of a misdemeanor and fined not more than $5,000.”
There are many types of regulations that impact your data and the things you need to do to protect that data.
- There are Corporate Governance regulations that dictate the way companies are directed and controlled; Examples of these include State Corporate codes, SEC Securities Exchange Act, Sarbanes Oxley (SOX), GLBA, FACTA, and Basel II.
- Then there are Data Privacy & Protection regulations that dictate how data must be secured and protected from access by other than authorized entities. Some examples of these include HIPAA, PCI-DSS, GDPR, as well as country and US State regulations such as CA SB 1386, and CCPA (California Consumer Privacy Act).
- Finally, there are Data Retention & Request regulations specifying the length of time that data must be maintained. The longer the data must be kept the more controls that need to be in place to ensure that the data is protected. Examples of these regulations include FRCP, HIPAA, GDPR, CCPA, and others.
Of course, data governance implies much more than just being an aid to complying with industry and governmental regulations. Other aspects of data governance include improving data quality, mitigating risk, streamlining operational efficiency, and delivering clear data management policies and processes to enable businesses to operate effectively.
The DBA and Regulatory Compliance
Traditionally, DBAs are tasked with managing database systems and the applications that access them. This typically involves understanding database structures, internals, and access points, but not necessarily an in-depth knowledge of the data itself. Nevertheless, data governance tasks can significantly impact database administration duties.
Whenever data management policies and procedures change, the DBA team is inevitably involved in helping to implement them and assist the users as they access and modify data and programs that access that data. Furthermore, when additional products are required, as they so often are, DBAs will be involved in various capacities. These responsibilities include evaluation of vendors and products, analyzing the impact of the product on existing procedures, installation and documentation, and integrating the product into the environment.
Furthermore, the DBA will need to be involved in providing guidance on the practicality of new policies, procedures, and products, as well as on the impact to existing workload. Let’s focus on how several of the data governance requirements for regulatory compliance impact the DBA.
Although there are many areas that need to be addressed to comply with regulations, we’ll look at three important ones, starting with the types of data that are protected by regulations. Then we’ll look at two of the important tasks that may be required for your protected data. Anonymization and defensible deletion.
Types of Data That Need to be Protected
There are four broad categories of data that require protection:
- PII – personally identifiable information
- PHI – protected health information
- PCI – payment card information
- IP – intellectual property
The US Department of Homeland Security defines personally identifiable information (or PII) as any information that permits the identity of an individual to be directly or indirectly inferred, including any information that is linked or linkable to that individual. This covers a broad swath of data that is collected by most organizations.
HIPAA (Health Insurance Portability and Accountability Act) is a US federal law that protects health information and ensures healthcare access, portability, and renewability. The Privacy Rule of HIPAA defines PHI or protected health information as all forms of individuals’ protected
health information, whether electronic, written, or oral. It also defines the protections required for PHI data for all organizations, not just healthcare providers.
The Payment Card Industry Data Security Standard (PCI DSS) is an industry security standard administered by the Payment Card Industry Security Standards Council. It protects payment cardholder information (PCI) which extends data protection requirements with specific rules and practices for protecting not just personal information, but also card numbers and information.
Finally, your organization may have IP, or intellectual property, that it wants to protect from competitive exposure. Such information may need to be protected based solely on your desire to keep it private, or based on the industry and type of information, IP may also be subject to regulatory compliance requirements.
The identification of all the PII in your organization can be a difficult task. It requires identifying all the data elements in all your applications and databases that qualify. The degree of difficulty will depend on the level of documentation that exists, whether you have data models and a data catalog at your disposal, and, indeed, your existing practice of data governance.
But identification is just the first part of the problem. Once identified, you then need to take steps to protect the PII, and I’ll talk about two issues that you will need to establish: data anonymization and defensible deletion of data.
Data Anonymization
Data anonymization, sometimes called data masking, is an example of a method you can use to protect PII. But what problem does data anonymization solve?
Many organizations rely on copying production data to a test environment for developing and testing applications. But simply copying data from prod to test exposes the PII. Some of the data is sensitive & should not be accessible by application developers. You don’t want to expose data such as salary information or phone numbers of co-workers and customers to all your developers! Or worse yet, exposing customer credit card details to everybody!
But you can’t just create a bunch of gibberish either. You need to ensure that referential integrity is maintained in testing, even as data values change. You need useful data for test cases. And you may also need to ensure consistent data conversions.
These are not inconsequential requirements.
The solution is to deploy data anonymization, which is the process of protecting sensitive information in non-production databases from inappropriate visibility. When you anonymize data, valid production data is replaced with consistent, usable, referentially intact, but not accurate data. After anonymization, the test data is usable just like production; but the information content is secure.
As you consider a data anonymization solution you need to keep the following desirable qualities in mind. By assuring that your solution has these qualities you can safely mask data for compliance with data privacy regulations. What are these qualities?
The first is permanence. When the data is masked, or anonymized, it cannot be unmasked. The second quality is that masking should be irreversible. Once the data is anonymized it should not be reversible, unless that is what is needed. And the final quality is that you should not be able to infer the unmasked value from the masked value. It must not be possible to infer or deduce the content of the original, unmasked data.
The bottom line is that if you are copying production data to test there are privacy issues you need to address. A data anonymization process that understands what data is sensitive PII and masks it accordingly is needed to be in compliance with the data privacy regulations we discussed earlier.
If sensitive data is statically masked, then it won’t matter if hackers get access to it. Because the values of the data elements will not be correct anyway!
Defensible Deletion
Another tactic you may need for governing data appropriately is a solution to delete data. At first glance, this sounds simple, but it is really more complex if you dive into the details. What is required is a strategy and methodology for defensibly deleting data this no longer required by your organization.
Defensible deletion is the practice of systematically disposing of data that is no longer needed for legal, regulatory, or business purposes. It is part of an overall information governance strategy. Defensible deletion is methodically deleting electronically stored information when it is no longer useful.
Defensible deletion can reduce the storage costs and legal risks associated with the retention of electronically stored information.
Every piece of data follows a standard lifecycle, wherein the data gets created at some point, usually by means of a transaction. For a period of time after creation, the data enters its first state: it is operational, that is, the data is needed to complete ongoing business transactions. The operational state is followed by the reference state. This is the time during which the data is still needed for reporting and query purposes, but it is not necessarily driving business transactions. After a time period, the data moves into an area where it is no longer needed for completing business transactions and the chance of it being needed for querying and reporting is small to none. However, the data still needs to be retained for regulatory compliance and other legal purposes, particularly if it pertains to a financial transaction. This is the archive state.
Finally, after a designated period of time in the archive, the data is no longer needed and can be discarded. This needs to be emphasized much stronger: the data must be discarded. In most cases, the only reason older data is being kept at all is to comply with regulations, many of which help to enable lawsuits. When there is no legal requirement to maintain such data, it is only right and proper for organizations to demand that it be destroyed – why enable anyone to sue you if it is not a legal requirement to do so?
Furthermore, many regulations and laws require data to be deleted at some point. GDPR, for example, famously has the “right to be forgotten.” When a customer or client protected by GDPR requests to be forgotten, you must be able to delete all of their data from your systems. And the “Right to deletion” is being enacted in many additional regulations and laws, such as California’s CPRA.
Deleting a single record is a piece of cake. But what about being able to delete all interconnected pieces of data no matter where they reside? This can be a thorny issue without a solution that understands the interrelated intricacies of data storage and retention.
There are products on the market that can aid data governance teams in data anonymization and defensible deletion, such as deepeo™ and Arvitam™ from Infotel Corp, based in Tampa, Florida. Be sure to investigate such solutions as you implement your data governance policies and procedures.
The Bottom Line
Although understanding the regulations and their requirements is a job for businesspeople and subject matter experts, DBAs need to get involved in the implementation of the software and processes that implement the requirements. Anything that impacts the data in the database is the purview of the DBA, and anonymizing and deleting data surely impacts the data! As such, DBAs need to participate in the data governance work of their organization with businesspeople, auditors, and the company’s legal team to ensure that data is being treated properly, effectively, and in an efficient manner.
About Craig S. Mullins
Craig S. Mullins is President & Principal Consultant of Mullins Consulting, Inc., an IBM Gold Consultant, and an IBM Champion for Data and AI. He possesses over three decades of experience in all aspects of database systems development, including database administration, performance management, and data modeling. Visit https://mullinsconsulting.com.