Big Data Governance: How to Govern Data Outside of Databases

By April Reeve June 4, 2013

Last week in my post, I discussed why you would choose to govern data outside of the database.  Today I will discuss how you do it. Most conversations about Big Data Governance focus on why it is necessary but rarely on how it is done.  The activities performed to set up a Data Governance program, whether involved with structured or unstructured data, still include defining an organizational structure, creating a charter, defining common business terms, and identifying the Data Stewards in the organization.

In fact, when you hear Data Governance described by a document oriented organization, there is almost no difference in document governance and the governance of data that is found in databases.

Data Steward Responsibilities by Data Store

Data Governance is implemented by Data Stewards who are responsible for one or more data stores against which they perform the following initial and on-going activities:

  • Assign data steward responsibilities for domains and data stores
  • Profile and perform quality assessment of data
  • Ensure retention policies are established, creation and change procedures are documented
  • Identify and log issues with data in data stores
  • Assign analysts to determine source of problems and recommend solutions
  • Implement recommended process improvements, data cleanup, system changes
  • Document business rules and set up data quality monitoring
  • Monitor data and data quality
  • Respond to requests for information and reported issues

Organizing Data Stewards in the Organization

Since Data Stewardship is usually organized by data store, and structured and unstructured data are stored in separate data stores, it is usually true that the Data Stewards may be by data type, as well as the other breakdowns of responsibility.  For example, the Data Steward focused on the governance of documents in a business area may be different from the Data Steward responsible for customer master data. Data Stewards are usually organized across various dimensions:

  • Business / technical
  • Producers / Consumers
  • Line of Business / Department
  • Function / Application (Data Store)
  • Data Domain?
  • Data Type?

Data Stewardship Tools

The tools used for Data Stewardship include

  • Business Glossary
  • Issue Tracking
  • Data Profiling
  • Metadata Repository
  • Data Governance Review and Approval Workflow

The tool sets for unstructured data stewardship may be significantly different to those focused on the management of data in databases.  The tools for unstructured data management will also include:

  • Ontology and hierarchy management
  • Content management
  • Scanning and OCR
  • Email
  • Search

NoSQL Data Stewardship

Most NoSQL databases (non-relational databases) can be governed in the same way as relational databases, although the profiling tools used on relational databases will usually have to be replaced by profiling using utilities specific to the database in question.  Text search tools work on document databases and Hadoop data structures.

Managing Data In Motion

Traditional Data Governance programs may not be including managing the non-persistent data passing through the organization, or the “data in motion”.  Even Data Governance programs that focus on the data in databases, and certainly Big Data Governance programs, should be establishing responsibility for the rules that govern the movement and transformation of data in the organization.  These things are not just technical “code” but the business decisions on how critical data in the organization is transformed and calculated.

  • Transformation rules (into and out of data warehouses and marts, MDM hubs)
  • Canonical models
  • Message layouts
  • External data sources
  • Matching and merging rules
  • Data streams
  • Key data extracts and calculations

See my new book on Data Integration for more on “Managing Data In Motion.”

Data Governance Maturity

For organizations that are focused on the creation or management of unstructured data or documents, such as mortgage companies, publishers, media companies, and pharmaceutical companies who file drug submissions, the governance of unstructured data is crucial to their franchise and Data Governance of this data is very mature.  For most organizations, some policies and tools for the governance of email and documents probably exist, but having a full Big Data Governance program is usually limited to organizations with very mature Data Governance capabilities.

About April Reeve

With 25 years of experience as an enterprise architect and program manager, April fully deserves her Twitter handle: @Datagrrl.

She knows data extremely well, having spent more than a decade in the financial services industry where she managed implementations of very large application systems.

April is a Data Management Specialist as part of EMC Global Services, with expertise in Data Governance, Master Data Management, Business Intelligence, Data Warehousing Conversion, Data Integration and Data Quality. All of these add up to one simple statement: April is very good at helping large companies organize their data and capture value from it. April works for EMC Consulting as a Business Consultant in the Enterprise Information Management practice.

Read More

Share this Story
Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *

2 thoughts on “Big Data Governance: How to Govern Data Outside of Databases