Organizations are now faced with an exponentially expanding volume of big data, and while the effective management of that data is essential to business growth and regulatory compliance, achieving that goal is no easy task.
Faced with trying to absorb, maintain, organize and analyze data assets from an abundance of sources, legacy departments with incompatible software and systems, and various technologies that have previously tried to organize that data, is something the CIO or CDO tasked with the metadata management of an organization can be left pulling their hair out over.
Without an effective metadata management system that organizes and streamlines the information about that data, organizations are in danger of being left with what is essentially a big data swamp that is not understood, integrated or accessible by the personnel or departments that need it to drive business decisions and meet compliance standards.
Even if they do understand what data they have, and where it is, they can struggle to find an effective and efficient way to bring it all together in a working infrastructure.
The data is there, but if it comes from different places and means different things to different people, departments or organizations, it is more or less useless. Therefore, standard concepts that define and categorize the data itself are needed. Here’s an outline of the different types of data at an organization’s disposal.
The 4 Types of Data:
- Acquired Data: Acquired datasets contain data that the company does not directly collect, generate, or create internally, but obtains from external sources such as commercial data vendors and aggregators, agencies, industry groups, etc. This data may be stored within the organization, or accessed when needed from the provider’s technology platform through a web service.
- Collected Data: Information collected from entities external to the organization, where the organization retains control over the use and ownership of the data. Collection types include (but are not limited to) surveys, monitoring collections and other ad hoc collections required in response to specific events or needs.
- Derived Data: Information derived or calculated from other datasets and stored for use for a variety of purposes including (but not limited to) statistical analysis, research, supervision or policy. It includes data that has been obtained as a result of ad hoc or one-time research or analysis, as well as data resulting from economic or financial models run in a continuous fashion.
- Operational Data: Data generated by the organization’s systems in the course of normal business activity and functions, which would include data garnered from customer interaction, such as on an application form, as well as information generated internally regarding how and where that data is integrated and shared within the organization.
Recognizing these various data types is the first step. The second step is to try not to panic at the complexity of the problem presented. At a glance, it is easy to see how disparate and disjointed these data assets can be – thousands of data attributes delivered by countless internal and external sources and stored in any number of unconnected databases. What is the term used to reference the data, how many people are using how many different terms, and how do they relate to each other? It is clear that this can present a problem, and the need for model characteristics to be captured with standardized terminology is even clearer.
Align the 4 Data Types Your Company Holds
The third step is to find the best way to align this data using a metadata management system with a standards-based ontology at its core. Combining all of the information in one place and translating it into a common language - using the Web Ontology Language (OWL) - a categorization system or taxonomy within a web-based software platform that facilitates data production and maps it to ensure quality and transparency, will make the whole process of data management more efficient.
Once this is in place, all data can be categorized using the same common language to give greater data governance that more easily meets compliance demands.