Explanation

Database Design Patterns Every Developer Should Know

Adela
Adela9 min read
Database Design Patterns Every Developer Should Know

Database design patterns are reusable structural approaches to organizing tables, relationships, and indexes in a relational database. Picking the wrong pattern early on is expensive. The pattern you choose defines your query performance ceiling, your data integrity guarantees, and how painful your next schema migration will be.

This guide covers six patterns that show up in most production systems, the trade-offs behind each, and the anti-patterns that cause real-world outages.

What are database design patterns?

A database design pattern is a repeatable solution to a common data modeling problem. It is a structural decision about how tables relate to each other, how data is split or combined, and where you put the boundaries between entities. Some patterns optimize for write safety (normalization), others for read speed (star schema), others for flexibility (EAV, JSONB columns), and others for organizational isolation (multi-tenancy). Most production databases use several patterns together.

Normalization (1NF through 3NF)

Normalization removes redundant data by splitting it across related tables. The goal is to make sure each fact is stored exactly once, so updates only happen in one place.

First Normal Form (1NF): Every column holds a single value. No comma-separated lists, no arrays stuffed into a VARCHAR. Each row is uniquely identifiable by a primary key.

Second Normal Form (2NF): Every non-key column depends on the entire primary key, not just part of it. This matters when you have composite keys. If an order_items table is keyed on (order_id, product_id), the product_name depends only on product_id and should live in a separate products table.

Third Normal Form (3NF): Non-key columns depend only on the primary key, not on other non-key columns. If a customers table has city and state, and city determines state, then state should move to a cities table. This eliminates transitive dependencies.

When to denormalize

3NF is the default target for transactional systems. But read-heavy workloads (dashboards, reports, search) often need denormalized data to avoid expensive multi-table joins. Denormalization means deliberately adding redundancy, like storing total_orders_count directly on a customer row instead of computing COUNT(*) from the orders table on every request.

The rule: start normalized, denormalize when you have measured evidence that a specific query is too slow.

Star schema and snowflake schema

These patterns are for analytical workloads, not transactional ones.

Star schema puts a central fact table (containing measures like revenue, quantity, or duration) surrounded by denormalized dimension tables (date, product, customer, region). Each query is one join from fact to dimension. It is simple and fast.

Snowflake schema normalizes the dimension tables into sub-dimensions. A location dimension splits into city → state → country tables. This saves storage and improves data integrity at the cost of more joins.

Star schemaSnowflake schema
Dimension tablesDenormalizedNormalized into sub-tables
Query joinsFewerMore
StorageHigher (redundant data)Lower
Query speedFasterSlower
Best forDashboards, ad-hoc BIComplex hierarchies, storage-sensitive environments

Modern columnar warehouses (Snowflake, BigQuery, Redshift) have optimized away much of the performance penalty of snowflake schemas, so the choice is less about speed and more about how your team thinks about the data. Star schema remains the default for most analytics use cases because it is easier to understand and query.

Entity-Attribute-Value (EAV)

EAV stores data in three columns: entity_id, attribute_name, and value. Instead of a wide table with one column per attribute, each attribute becomes a row.

-- EAV table
CREATE TABLE product_attributes (
    entity_id   INT REFERENCES products(id),
    attribute   VARCHAR(100),
    value       VARCHAR(255)
);

-- A T-shirt has size and color; a laptop has cpu and ram_gb.
-- Both go in the same table.
INSERT INTO product_attributes VALUES (1, 'size', 'XL');
INSERT INTO product_attributes VALUES (1, 'color', 'blue');
INSERT INTO product_attributes VALUES (2, 'cpu', 'M3 Pro');
INSERT INTO product_attributes VALUES (2, 'ram_gb', '36');

When it looks attractive: Product catalogs with hundreds of optional attributes. Clinical records with thousands of possible fields per patient.

Why it usually fails: Every query that retrieves multiple attributes requires self-joins or pivot operations. Filtering is slow because value is always VARCHAR (no type checking, no numeric indexing). Magento 1.x used EAV for product data and was notorious for poor query performance at scale. PostgreSQL benchmarks show that JSONB is over 50,000x faster than EAV for unindexed queries, while producing a database 3x smaller.

The modern alternative: Use typed columns for core fields and a JSONB column for the variable remainder:

CREATE TABLE products (
    id          SERIAL PRIMARY KEY,
    name        VARCHAR(200) NOT NULL,
    price       DECIMAL(10,2) NOT NULL,
    category_id INT REFERENCES categories(id),
    attributes  JSONB DEFAULT '{}'
);

Rule of thumb: if you write WHERE attributes->>'status' = 'active' in most queries, status should be a proper column, not buried in JSON.

Polymorphic associations

A polymorphic association uses a single foreign key column plus a "type" column to reference rows in multiple different tables.

CREATE TABLE comments (
    id              SERIAL PRIMARY KEY,
    body            TEXT,
    commentable_id  INT,          -- could point to posts, photos, or videos
    commentable_type VARCHAR(50)  -- 'Post', 'Photo', or 'Video'
);

This pattern is common in Rails and Django ORMs. The problem: the database cannot enforce a foreign key constraint because commentable_id points to different tables depending on commentable_type. That means no CASCADE deletes, no referential integrity checks, and orphaned records that accumulate silently.

GitLab's engineering documentation explicitly recommends against polymorphic associations. The alternative: separate association tables per type.

CREATE TABLE post_comments (
    id      SERIAL PRIMARY KEY,
    body    TEXT,
    post_id INT REFERENCES posts(id) ON DELETE CASCADE
);

CREATE TABLE photo_comments (
    id       SERIAL PRIMARY KEY,
    body     TEXT,
    photo_id INT REFERENCES photos(id) ON DELETE CASCADE
);

More tables, but the database enforces every relationship. This is almost always the right trade-off for systems that need long-term data integrity.

Multi-tenant patterns

Multi-tenancy decides how you isolate customer data when multiple organizations share the same application. There are three standard approaches, each with a different cost-isolation trade-off.

Shared schema (pool model): All tenants share the same tables. A tenant_id column on every table separates the data. Cheapest and simplest, but a missing WHERE tenant_id = ? filter can leak data across tenants.

Schema-per-tenant: Each tenant gets a separate schema within the same database instance. Supported in PostgreSQL and SQL Server, but not in MySQL. Bytebase's analysis found that it introduces complexity comparable to database-per-tenant without proportional isolation benefits.

Database-per-tenant (silo model): Each tenant gets a fully separate database. Maximum isolation, but the highest operational cost. Managing schema migrations across hundreds of databases requires dedicated tooling.

Shared schemaSchema-per-tenantDatabase-per-tenant
CostLowestMediumHighest
Data isolationWeakestMediumStrongest
Per-tenant customizationNoneSomeFull
Migration complexityOne migrationPer-schemaPer-database
Best forStartups, MVPsMid-tier SaaSEnterprise, regulated industries

Start with shared schema unless you have a compliance requirement (HIPAA, GDPR data residency) that demands physical isolation. PostgreSQL's row-level security can add a database-enforced isolation layer on top of application-level tenant_id filtering.

Anti-patterns to avoid

These are the design decisions that cause production incidents.

The God Table. One table stores everything: users, orders, products, logs. It starts with 50 columns and grows to 200. Splitting into focused tables typically reduces query times by 80% or more.

Missing primary keys. Always use a surrogate primary key (auto-increment integer or UUID). Natural keys like name + date of birth lead to collisions and mixed-up records.

Over-normalization. Splitting data across too many tables where simple queries require 10+ joins. A fintech startup decomposed currency codes, transaction types, and timestamp components into separate lookup tables, and the system collapsed under peak load.

Storing money as FLOAT. Floating-point arithmetic causes rounding errors. 0.1 + 0.2 = 0.30000000000000004 in most languages. Use DECIMAL(10,2) for monetary values.

No indexes on filtered columns. Index the columns that appear in your WHERE clauses and JOIN conditions, and use EXPLAIN to validate. Too many indexes slow writes; too few cause full table scans.

Wrong data types. Dates stored as VARCHAR prevent range queries. Enums stored as freeform strings allow typos. Use DATE/TIMESTAMP for times, DECIMAL for money, and ENUM or lookup tables for fixed value sets.

Getting the pattern right at the schema design stage is cheaper than fixing it in production.

How Bytebase helps enforce design patterns

Bytebase is a database DevSecOps platform that catches pattern violations before they reach production. Its SQL review policy can enforce rules like:

These rules run automatically on every schema change before it is applied, which means pattern violations get caught in review, not in a 3 AM incident.

Further readings

Contact Us