Explanation

How to Use Postgres CREATE INDEX CONCURRENTLY

Tianzhou
Tianzhou7 min read
How to Use Postgres CREATE INDEX CONCURRENTLY

The Problem with Regular CREATE INDEX

When you run a standard CREATE INDEX command in PostgreSQL, it acquires a SHARE (ShareLock) on the table, which has severe implications for concurrent operations.

-- This will block all writes to the users table
CREATE INDEX idx_users_email ON users(email);

Technical Lock Details

The SHARE lock acquired by CREATE INDEX conflicts with several other lock modes:

  • RowExclusiveLock (used by INSERT, UPDATE, DELETE)
  • ShareUpdateExclusiveLock (used by VACUUM, ANALYZE, REINDEX CONCURRENTLY)
  • ShareRowExclusiveLock (used by CREATE TRIGGER and some ALTER TABLE variants)
  • ExclusiveLock (used by REFRESH MATERIALIZED VIEW CONCURRENTLY)
  • AccessExclusiveLock (used by DROP TABLE, TRUNCATE, REINDEX, CLUSTER)

This lock compatibility matrix explains why regular index creation is so disruptive. Per the PostgreSQL locking documentation, a SHARE lock allows concurrent SELECT operations but blocks all write operations.

CREATE INDEX CONCURRENTLY: The Solution

PostgreSQL's CREATE INDEX CONCURRENTLY allows index building without blocking writes by using a SHARE UPDATE EXCLUSIVE (ShareUpdateExclusiveLock) instead of a SHARE lock:

-- This allows writes to continue during index creation
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

Lock Mode Comparison

The key difference lies in the lock mode used:

OperationLock Mode❌ Conflicts With✅ Allows
CREATE INDEXShareLockINSERT, UPDATE, DELETE, VACUUM, other DDLSELECT only
CREATE INDEX CONCURRENTLYShareUpdateExclusiveLockOther DDL operations, multiple concurrent index buildsSELECT, INSERT, UPDATE, DELETE

lock-mode

The ShareUpdateExclusiveLock is specifically designed to allow concurrent data modifications while preventing conflicting DDL operations.

How It Works

CREATE INDEX CONCURRENTLY uses a multi-phase approach:

  1. Initial Catalog Entry: Creates index metadata with indisvalid = false
  2. First Table Scan: Builds initial index structure while allowing writes
  3. Second Table Scan: Catches up with changes that occurred during first scan
  4. Validation: Marks index as valid (indisvalid = true)

During this process:

  • Writes continue normally - INSERT, UPDATE, DELETE work without interruption
  • Takes longer than regular indexing - Typically 2-3x slower due to multiple scans
  • Uses more resources - Higher CPU and I/O load from tracking concurrent changes

Key Limitations

While powerful, CREATE INDEX CONCURRENTLY has important restrictions:

Cannot Run Inside Transactions

BEGIN;
CREATE INDEX CONCURRENTLY idx_users_email ON users(email); -- ERROR!
COMMIT;
-- ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block

This limitation exists because the operation needs to commit multiple internal transactions during its phases.

Other Limitations

  • Only one concurrent index per table - Multiple concurrent index builds on the same table will serialize
  • Failure leaves invalid index - Must be manually dropped if creation fails
  • Foreign keys reference check - Creating unique index concurrently may fail if duplicate values are inserted during creation

Tracking Index Creation Progress

Using pg_stat_progress_create_index

We can use pg_stat_progress_create_index to track the index creation:

-- Monitor index creation progress
SELECT
    pid,
    datname,
    relid::regclass AS table_name,
    index_relid::regclass AS index_name,
    phase,
    lockers_total,
    lockers_done,
    current_locker_pid,
    blocks_total,
    blocks_done,
    tuples_total,
    tuples_done,
    partitions_total,
    partitions_done
FROM pg_stat_progress_create_index;

The phase column shows the current operation stage:

  • initializing: Starting up
  • waiting for writers before build: Waiting for concurrent writes to finish
  • building index: Main index creation phase
  • waiting for writers before validation: Preparing for validation
  • index validation: scanning index: Validating index entries
  • index validation: scanning table: Final validation
  • waiting for old snapshots: Waiting for transactions to complete
  • waiting for readers before marking dead: Final cleanup

Quick Validity Check with pg_index

The fastest way to check if an index is ready for use:

-- Check if index is valid and ready for use
SELECT
    schemaname,
    tablename,
    indexname,
    indexdef
FROM pg_indexes i
JOIN pg_class c ON c.relname = i.indexname
JOIN pg_index idx ON idx.indexrelid = c.oid
WHERE NOT idx.indisvalid
  AND schemaname NOT IN ('pg_catalog', 'information_schema');

The indisvalid column:

  • true: Index is complete and being used by the query planner
  • false: Index is either being built or failed during concurrent creation

Postgres doesn't have built-in INVISBLE INDEX clause. You can achieve the behavior by setting indisvalid to false.

Best Practices

Automatic Review

To prevent developers from running CREATE INDEX and accidentally locking the database, implement automatic SQL linting during the review process.

Bytebase SQL Review provides automated enforcement of the CREATE INDEX CONCURRENTLY rule and can be integrated with CI/CD pipelines to catch violations before deployment.

Always Verify Index Validity After Creation

-- After CREATE INDEX CONCURRENTLY completes
SELECT indisvalid
FROM pg_index
WHERE indexrelid = 'idx_users_email'::regclass;

-- If false, the index creation failed and needs cleanup
DROP INDEX IF EXISTS idx_users_email;

Clean Up Failed Indexes

-- Find and drop all invalid indexes
DO $$
DECLARE
    r RECORD;
BEGIN
    FOR r IN
        SELECT schemaname, indexname
        FROM pg_indexes i
        JOIN pg_class c ON c.relname = i.indexname
        JOIN pg_index idx ON idx.indexrelid = c.oid
        WHERE NOT idx.indisvalid
          AND schemaname NOT IN ('pg_catalog', 'information_schema')
    LOOP
        EXECUTE format('DROP INDEX %I.%I', r.schemaname, r.indexname);
        RAISE NOTICE 'Dropped invalid index %.%', r.schemaname, r.indexname;
    END LOOP;
END $$;

Performance Consideration

AspectCREATE INDEXCREATE INDEX CONCURRENTLY
Lock LevelSHARE (blocks writes)SHARE UPDATE EXCLUSIVE (allows writes)
DurationBaseline (1x)2-3x longer
CPU UsageHigh burstSustained moderate
I/O ImpactSingle intensive scanMultiple moderate scans
Memory Usagemaintenance_work_memSimilar, held longer
Transaction LogMinimalHigher due to concurrent changes

Even though CREATE INDEX CONCURRENTLY doesn't block writes, it still impacts performance:

  • Schedule during low-traffic periods when possible
  • Monitor CPU and I/O metrics during creation
  • Consider increasing maintenance_work_mem temporarily for faster indexing

References

  1. Postgres locking mode
  2. Index creation progress table
  3. Source code for CREATE INDEX CONCURRENTLY