Metadata

The metadata catalog helps you understand, organize, and discover all your data assets. Know what data you have, where it is, and how to use it.

What is Metadata?

Metadata is information about data:

  • What it is (table name, description)
  • Where it lives (database, collection)
  • Who owns it (team, person)
  • How to use it (schema, examples)
  • When it updates (frequency, last update)
  • Why it exists (business purpose)

Browsing the Catalog

Search and Discover

  1. Click Metadata in sidebar
  2. Search by name, owner, or tag
  3. Browse categories
  4. Click any asset for details

Asset Information

View key details about any data asset:

Basic Info

  • Name and description
  • Type (table, collection, view)
  • Owner and steward
  • Created and updated dates

Schema

  • Column/field names and types
  • Required vs optional
  • Sample values
  • Data quality metrics

Lineage

  • Where data comes from
  • How it's transformed
  • Where it goes
  • Dependencies

Usage

  • Who uses it
  • Which reports/dashboards use it
  • Query frequency
  • Last access date

Organizing Data Assets

Descriptions

Add meaningful descriptions:

Asset: orders
Description: "All customer orders from our e-commerce platform. 
Updated daily at 2 AM UTC. Contains order details, amounts, 
and customer IDs. Used by Finance and Sales teams."

Tags and Categories

Tag assets for easy discovery:

  • Business domain: Sales, Finance, Operations
  • Data quality: Verified, Raw, Experimental
  • Access level: Public, Restricted, Internal
  • Update frequency: Real-time, Daily, Weekly

Ownership

Assign ownership:

  • Owner - Primary contact
  • Steward - Data quality responsible
  • Domain expert - Knows the business context

Schema Management

Column Documentation

Document each field:

Column: customer_id
Type: Integer (64-bit)
Nullable: No
Format: Unique identifier
Example: 12345
Business meaning: Uniquely identifies a customer
Used in: Orders, Payments, Returns

Data Types and Constraints

  • String: Max 255 characters
  • Integer: Range -2,147,483,648 to 2,147,483,647
  • Timestamp: ISO 8601 format
  • Boolean: true/false values
  • Array: List of values
  • Object: Nested structure

Sample Data

View examples of actual data:

customer_id | name              | email
------------|-------------------|------------------
12345       | Alice Smith       | alice@example.com
12346       | Bob Johnson       | bob@example.com
12347       | Carol White       | carol@example.com

Data Quality Metrics

Built-in Metrics

Track data quality automatically:

  • Completeness - % of non-null values (target: >99%)
  • Uniqueness - % of unique values
  • Timeliness - How current is the data
  • Accuracy - Matches expected ranges and formats
  • Consistency - Matches across systems

Quality Rules

Define what "good" looks like:

Rule: customer_email must be unique
Rule: order_date cannot be in future
Rule: customer_id must reference valid customer
Rule: price must be > 0

Quality Reports

See quality trends:

Last 30 days:
Completeness: 99.8% → 99.9% ↑
Uniqueness: 99.99% (steady)
Accuracy: 99.5% → 99.7% ↑
Alerts: None

Data Governance

Access Control

Define who can use each asset:

  • Public - Anyone in organization
  • Restricted - Specific teams only
  • Confidential - Limited access
  • Sensitive - Requires approval

Retention Policy

How long to keep data:

Historical orders: Keep 7 years (compliance)
Customer emails: Keep 2 years (GDPR)
Session logs: Keep 90 days (storage optimization)
Error logs: Keep 30 days (debugging)

Classification

Classify data by sensitivity:

PII (Personally Identifiable Information):
- Names, emails, phone numbers, addresses
- Requires: Encryption, limited access, audit logging

Financial:
- Bank accounts, credit cards, invoices
- Requires: Encryption, approval workflows, compliance

Public:
- Product names, descriptions, prices
- Available to all

Data Contracts

Define Expectations

Specify what consuming teams can expect:

Data Asset: revenue_daily
Provider: Finance team
Update frequency: Daily at 5 AM UTC
Schema: revenue (decimal), date (timestamp), region (string)
Quality SLA: 99.5% completeness
Support: finance-data@company.com

Breaking Changes

Plan for data changes:

⚠️ Breaking Change Alert
Asset: customer_phone
Action: Removing this field on 2024-06-30
Reason: Moving to separate phone_numbers table
Migration: See customer_phone_migration.md
Impact: 3 dashboards, 2 reports, 1 API

Common Workflows

Finding Data

Need customer data?
↓
Search "customer" in Metadata
↓
See all customer-related assets
↓
Filter by quality rating
↓
Choose the best one
↓
Click to see schema
↓
Use in your query

Documenting New Dataset

Created new data?
↓
Click "Add to Catalog"
↓
Fill in name, description, owner
↓
Add schema
↓
Set tags and classification
↓
Add to catalog
↓
Team can discover and use it

Data Quality Investigation

Dashboard shows strange numbers?
↓
Click Metadata
↓
Find source data asset
↓
Check quality metrics
↓
See if metrics degraded
↓
Find owner
↓
Contact to fix

Advanced Features

Data Profiling

Automatic analysis showing:

Column: age
Type: Integer
Values: 1,234,567
Unique: 102 different values
Min: 18
Max: 99
Average: 42.3
Nulls: 0 (0%)
Distribution: Mostly 30-50 age group

Relationship Mapping

Show how data connects:

customers
  ↓ foreign key (customer_id)
orders
  ↓ foreign key (order_id)
order_items
  ↓ foreign key (product_id)
products

Impact Analysis

See what breaks if you change data:

Planning to remove "legacy_id" column?
Impact: 
- 5 reports depend on it
- 2 dashboards use it
- 1 integration feeds it
Recommendation: Keep 6 more months before removing

Best Practices

Keep Descriptions Updated

  • Update when schema changes
  • Add examples
  • Document business rules

Use Consistent Naming

  • Snake_case for columns
  • Meaningful names
  • Avoid abbreviations

Tag Strategically

  • Use consistent tag names
  • Don't over-tag
  • Combine tags for filtering

Set Quality Rules

  • Define data quality expectations
  • Monitor compliance
  • Alert on degradation

Troubleshooting

Can't find data I need

  1. Try different search terms
  2. Browse by category
  3. Ask data owner
  4. Check Lineage for dependencies

Data quality degraded

  1. Check what changed
  2. Review transformations
  3. Look at source data
  4. Contact data owner

Schema mismatch

  1. Compare with documentation
  2. Check for recent changes
  3. Verify integration is updated