Explanation

Postgres Schema Migration without Downtime Best Practice

Tianzhou
Tianzhou8 min read
Postgres Schema Migration without Downtime Best Practice

Understanding Postgres Locks

Postgres uses a sophisticated locking system with multiple lock modes. DDL operations often acquire locks that conflict with routine DML operations like INSERT, UPDATE, and DELETE. This means a seemingly harmless schema change can bring your entire application to a halt if not executed carefully.

Let's understand what locks DML operations acquire:

DMLLock ModeDescription
SELECTACCESS SHAREOnly conflicts with ACCESS EXCLUSIVE
INSERT/UPDATE/DELETEROW EXCLUSIVEConflicts with SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE

Now we can see how the two most common schema migration operations conflict with DML:

DDL OperationLock ModeSELECTINSERT/UPDATE/DELETE
ALTER TABLE (most cases)ACCESS EXCLUSIVE❌ BLOCKS❌ BLOCKS
CREATE INDEXSHARE✅ Allows❌ BLOCKS

Many ALTER TABLE forms acquire an ACCESS EXCLUSIVE lock, which is the most restrictive lock in Postgres. This completely blocks your application - no reads, no writes, nothing can happen on that table until the operation completes.

CREATE INDEX uses a SHARE lock, which allows SELECT statements but blocks all write operations (INSERT, UPDATE, DELETE). For a large table, index creation can take hours, effectively making your application read-only.

Both scenarios cause downtime. In the following sections, we'll explore alternatives that avoid these blocking behaviors.

ALTER TABLE without Downtime

While most ALTER TABLE operations acquire ACCESS EXCLUSIVE locks, Postgres provides several techniques to avoid downtime for common schema changes.

Adding Columns with DEFAULT Values

Since Postgres 11, adding a column with a DEFAULT value no longer requires a table rewrite in many cases.

-- ✅ Fast operation (Postgres 11+)
-- No table rewrite, minimal lock time
ALTER TABLE users ADD COLUMN status text DEFAULT 'active';

-- ❌ Slow operation (requires table rewrite)
-- Only use if you need a non-constant default
ALTER TABLE users ADD COLUMN created_at timestamp DEFAULT now();

How it works: Postgres stores the default value in the system catalog instead of rewriting every row. When you read a row that doesn't have the new column, Postgres automatically returns the default value.

Adding Constraints in Two Phases

For constraints that need validation against existing data, use the NOT VALID approach:

Check Constraints

-- Step 1: Add constraint without validating existing data
-- Fast - only blocks briefly to update catalog
ALTER TABLE users ADD CONSTRAINT users_age_positive
  CHECK (age > 0) NOT VALID;

-- Step 2: Validate existing data (can take time but doesn't block writes)
-- Uses SHARE UPDATE EXCLUSIVE lock - allows reads/writes
ALTER TABLE users VALIDATE CONSTRAINT users_age_positive;

Foreign Key Constraints

-- Step 1: Add foreign key without validation
-- Fast - only updates catalog, doesn't validate existing data
ALTER TABLE orders ADD CONSTRAINT orders_user_id_fk
  FOREIGN KEY (user_id) REFERENCES users(id) NOT VALID;

-- Step 2: Validate existing relationships
-- Can take time but allows concurrent operations
ALTER TABLE orders VALIDATE CONSTRAINT orders_user_id_fk;

Setting Columns to NOT NULL

The safest approach for SET NOT NULL is a four-step process:

-- Step 1: Add a check constraint (fast with NOT VALID)
ALTER TABLE users ADD CONSTRAINT users_email_not_null
  CHECK (email IS NOT NULL) NOT VALID;

-- Step 2: Validate the constraint (allows concurrent operations)
ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null;

-- Step 3: Set NOT NULL (fast since constraint guarantees no nulls)
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

-- Step 4: Drop the redundant check constraint
ALTER TABLE users DROP CONSTRAINT users_email_not_null;

Why this works: The final SET NOT NULL is fast because Postgres can see from the validated constraint that no null values exist.

Postgres 18 Update: The upcoming Postgres 18 introduces a simplified form that allows NOT NULL constraints to use the NOT VALID attribute directly:

-- Postgres 18+ - Simplified approach
ALTER TABLE users ALTER COLUMN email SET NOT NULL NOT VALID;
ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null;

CREATE INDEX without Downtime

Instead of the blocking CREATE INDEX, use CREATE INDEX CONCURRENTLY to build indexes without interfering with application traffic:

-- ❌ Blocks all writes during index creation
CREATE INDEX idx_users_email ON users(email);

-- ✅ Allows writes to continue during index creation
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

CREATE INDEX CONCURRENTLY uses a SHARE UPDATE EXCLUSIVE lock instead of a SHARE lock, allowing INSERT, UPDATE, and DELETE operations to continue normally while the index is being built.

Trade-offs: While concurrent index creation avoids downtime, it takes longer to complete and has some limitations (cannot be used inside transactions, requires more careful error handling).

For a comprehensive guide on concurrent index creation, see our detailed article on CREATE INDEX CONCURRENTLY.

Use lock_timeout

Regardless of the lock level, all DDL operations should use lock_timeout to prevent creating lock queues that can freeze your application.

The Lock Queue Problem

When an ACCESS EXCLUSIVE operation waits for existing queries to complete, it blocks all subsequent operations from starting. Your application can appear completely frozen while waiting for one long query to finish.

-- Session 1: Long-running query (holds ACCESS SHARE)
SELECT COUNT(*) FROM large_table; -- Takes 5 minutes

-- Session 2: DDL waiting for ACCESS EXCLUSIVE
ALTER TABLE large_table ALTER COLUMN description TYPE text; -- Waits behind the SELECT

-- Session 3: New application query
SELECT * FROM large_table WHERE id = 123; -- BLOCKED by waiting DDL!

Set lock_timeout

The lock_timeout parameter can be set at multiple levels:

-- Option 1: Session level (temporary)
SET lock_timeout = '5s';

-- Option 2: Role level (recommended - persistent across sessions)
CREATE ROLE ddl_user WITH LOGIN PASSWORD 'secure_password';
ALTER ROLE ddl_user SET lock_timeout = '5s';

-- Option 3: Database level (applies to all connections)
ALTER DATABASE mydb SET lock_timeout = '5s';

Setting lock_timeout at the Postgres role level is recommended because:

  1. The setting applies to all sessions for that role
  2. No need to remember to set it in each DDL session
-- Use the dedicated DDL user for schema migrations
-- Connect as ddl_user and run:
ALTER TABLE users ALTER COLUMN description TYPE text;

If the operation can't acquire the lock within 5 seconds, it fails with an error instead of blocking the application indefinitely.

Retry Logic

Combine lock_timeout with retry logic to handle busy periods:

-- Example DDL script with retry logic
DO $$
DECLARE
    max_attempts INTEGER := 10;
    attempt INTEGER := 1;
    success BOOLEAN := FALSE;
BEGIN
    WHILE attempt <= max_attempts AND NOT success LOOP
        BEGIN
            SET lock_timeout = '2s';
            ALTER TABLE users ALTER COLUMN description TYPE text;
            success := TRUE;
            RAISE NOTICE 'DDL succeeded on attempt %', attempt;
        EXCEPTION
            WHEN lock_not_available THEN
                RAISE NOTICE 'Attempt % failed, retrying in 30 seconds...', attempt;
                PERFORM pg_sleep(30);
                attempt := attempt + 1;
        END;
    END LOOP;

    IF NOT success THEN
        RAISE EXCEPTION 'DDL failed after % attempts', max_attempts;
    END IF;
END $$;

Bytebase automatically handles retry logic during schema migrations.

Conclusion

Achieving zero-downtime Postgres schema migrations requires understanding locks and applying the right techniques:

  1. Understand the impact: Know which DDL operations block your application
  2. Use modern Postgres features: Leverage CONCURRENTLY, NOT VALID, and other optimizations
  3. Always set lock_timeout: Prevent lock queues that can freeze your application
  4. Plan for retries: Handle busy periods gracefully with retry logic

By following these practices, you can deploy schema changes confidently without impacting your users.

References

  1. Postgres locking mode
  2. Postgres 11: ADD COLUMN with DEFAULT optimization
  3. Postgres 18: NOT NULL constraints with NOT VALID
  4. How to Use Postgres CREATE INDEX CONCURRENTLY