How to Migrate database from MySQL to PostgreSQL
When you should consider migrating from MySQL to PostgreSQL
Advanced feature requirements
- Complex Data Types: PostgreSQL provides robust support for JSON, arrays, hstore, and custom types, making it ideal for applications with complex data structures.
- Geospatial Support: PostgreSQL with PostGIS offers superior geospatial capabilities compared to MySQL's spatial extensions.
Scalability needs
- Table Partitioning: PostgreSQL's declarative partitioning is more flexible and powerful than MySQL's partitioning system.
- Parallel Query Execution: PostgreSQL can utilize multiple CPU cores for single queries, improving performance for complex analytical workloads.
- Advanced Indexing: PostgreSQL supports more index types (B-tree, Hash, GiST, SP-GiST, GIN, and BRIN) and offers partial and expression indexes.
Licensing concerns
PostgreSQL offers freedoms that MySQL's GPL version doesn't:
-
Permissive License: PostgreSQL uses a PostgreSQL License (similar to MIT/BSD), which:
- Allows unrestricted use in proprietary applications
- Doesn't require source code disclosure
- Permits creating closed-source derivatives
-
Unrestricted Embedding: You can embed PostgreSQL in commercial products without licensing fees or source code obligations.
-
Fork Freedom: You can create proprietary forks of PostgreSQL without license obligations.
-
No Corporate Control: PostgreSQL is developed by a community organization rather than a single company, reducing concerns about commercial interests affecting the license.
When you should think twice
- If your application handles a high volume of write operations, PostgreSQL may perform less efficiently. Check Uber's switch from PostgreSQL to MySQL
- All-in-one vs best of breed. While PostgreSQL is like an all-in-one database thanks to its extensible architecture, it may be better to let your relational database handle transactional processing and use more specialized systems for analytical processing, full-text search, etc. If you are considering migrating databases, yours has likely reached a certain scale and encountered bottlenecks. The all-in-one approach is more desirable when you’re just getting started.
MySQL and PostgreSQL schema differences
Data types
While many data types are similar, important differences exist:
MySQL Type | PostgreSQL Equivalent | Notes |
---|---|---|
INT | INTEGER | Similar functionality |
BIGINT | BIGINT | Similar functionality |
FLOAT | REAL | PostgreSQL's REAL is equivalent to MySQL's FLOAT |
DOUBLE | DOUBLE PRECISION | Similar functionality |
DECIMAL | NUMERIC | Similar functionality |
DATETIME | TIMESTAMP | PostgreSQL's TIMESTAMP has no automatic initialization |
TIMESTAMP | TIMESTAMP WITH TIME ZONE | PostgreSQL handles time zones more explicitly |
ENUM | ENUM or CHECK constraint | PostgreSQL's ENUM is a custom type, not a string constraint |
SET | Array or JSONB | No direct equivalent; arrays or JSONB can replace functionality |
TINYTEXT, TEXT, MEDIUMTEXT, LONGTEXT | TEXT | PostgreSQL has a single TEXT type with no practical size limit |
VARCHAR | VARCHAR | PostgreSQL's VARCHAR has no performance penalty for full length |
BLOB | BYTEA | Different functions for manipulation |
Constraints and keys
PostgreSQL handles constraints differently:
- Primary Keys: Both systems support primary keys, but PostgreSQL automatically creates an index for each primary key.
- Foreign Keys: PostgreSQL enforces foreign key constraints more strictly and offers more deferral options.
- CHECK Constraints: PostgreSQL fully enforces CHECK constraints, while MySQL historically stored but ignored them (this has improved in recent MySQL versions).
- Unique Constraints: Both support unique constraints, but PostgreSQL distinguishes between unique constraints and unique indexes.
Sequences and auto-increment
- MySQL uses
AUTO_INCREMENT
for generating sequential values. - PostgreSQL uses sequences, typically with
SERIAL
orIDENTITY
columns. - Migration requires converting
AUTO_INCREMENT
to PostgreSQL sequences or identity columns.
Default values
- PostgreSQL supports more complex default values, including functions.
- MySQL's
CURRENT_TIMESTAMP
default forDATETIME
columns becomesCURRENT_TIMESTAMP
in PostgreSQL. - PostgreSQL allows defaults on TEXT columns, which some MySQL versions restricted.
Schema naming and case sensitivity
- PostgreSQL is case-sensitive for identifiers unless quoted, while MySQL's case sensitivity depends on the operating system and configuration.
- PostgreSQL automatically converts unquoted identifiers to lowercase, which can cause issues during migration.
- PostgreSQL uses schemas (similar to namespaces) more extensively than MySQL's databases.
Stored procedures and functions
- PostgreSQL uses PL/pgSQL as its primary procedural language, while MySQL uses its own syntax.
- PostgreSQL supports multiple procedural languages (PL/pgSQL, PL/Python, PL/Perl, etc.).
Views and materialized views
- Both support views, but PostgreSQL also offers materialized views that store data physically.
- PostgreSQL's view updating capabilities are more advanced.
- PostgreSQL allows indexing of materialized views.
Data migration strategies
Strategy | Pros | Cons |
---|---|---|
Direct export/import | Simple to implement for small databases | Requires downtime Challenging for large databases Manual conversion may be needed |
ETL process | Highly customizable Can handle complex transformations Can be parallelized | Requires more development effort Potentially complex to set up |
Replication-based migration | Minimal downtime Continuous validation possible Phased migration | More complex setup Requires monitoring Potential replication lag |
Cloud migration services | Managed service Often includes schema conversion Typically supports continuous replication | Vendor lock-in Potential costs May require cloud-to-cloud or on-premises-to-cloud networking |
Direct export/import
The simplest approach involves exporting data from MySQL and importing it into PostgreSQL:
-
Export MySQL data using mysqldump:
mysqldump --compatible=postgresql --default-character-set=utf8 \ --no-create-info --complete-insert --extended-insert --single-transaction \ --skip-triggers --routines=0 --skip-tz-utc \ database_name > mysql_data.sql
-
Convert the SQL to PostgreSQL format using tools like pgloader or custom scripts.
-
Import into PostgreSQL using psql:
psql -d database_name -f converted_data.sql
ETL process
For more complex migrations, an Extract-Transform-Load (ETL) process offers greater control:
- Extract data from MySQL into an intermediate format (CSV, JSON, etc.).
- Transform the data to match PostgreSQL's requirements (data types, constraints, etc.).
- Load the transformed data into PostgreSQL.
Replication-based migration
For minimal downtime, consider a replication-based approach:
- Set up initial data copy using tools like pgloader or AWS DMS.
- Establish ongoing replication from MySQL to PostgreSQL using tools likes Debezium.
- Validate data consistency between the systems.
- Cut over to PostgreSQL when ready.
Cloud migration services
Cloud providers offer specialized services for database migration:
- AWS Database Migration Service
- Google Cloud Database Migration Service
- Azure Database Migration Service
Handling large datasets
For very large databases, consider these additional strategies:
- Partitioned Migration: Migrate data in chunks based on logical partitions.
- Parallel Processing: Use multiple threads or processes for data extraction and loading.
- Incremental Migration: Migrate historical data first, then recent data during cutover.
- Data Validation: Implement checksums or row counts to verify migration completeness.
Application code changes
SQL syntax differences
- String Concatenation: MySQL uses
CONCAT()
function, PostgreSQL uses||
operator. - Date Functions: Functions like
DATE_ADD()
in MySQL becomedate + interval
in PostgreSQL. - LIMIT/OFFSET: MySQL uses
LIMIT x,y
while PostgreSQL usesLIMIT y OFFSET x
. - Group By Handling: PostgreSQL requires all non-aggregated columns in the SELECT list to appear in the GROUP BY clause.
- Boolean Values: MySQL uses 0/1, PostgreSQL uses true/false.
- REPLACE INTO: PostgreSQL doesn't support this MySQL shorthand; use DELETE + INSERT or upsert with ON CONFLICT.
Connection string
- Connection Libraries: Some libraries are database-specific and need replacement.
- Connection Strings: Format differs between MySQL and PostgreSQL.
- Connection Pooling: Configuration parameters differ between systems.
Example MySQL connection string:
mysql://user:password@host:3306/database
Equivalent PostgreSQL connection string:
postgresql://user:password@host:5432/database
ORM configurations
- Dialect Configuration: Change the database dialect to PostgreSQL.
- Type Mappings: Update custom type mappings to match PostgreSQL types.
- Query Generation: Some ORMs generate different SQL for different databases.
Example changes for popular ORMs:
Hibernate (Java):
// MySQL
properties.setProperty("hibernate.dialect", "org.hibernate.dialect.MySQLDialect");
// PostgreSQL
properties.setProperty("hibernate.dialect", "org.hibernate.dialect.PostgreSQLDialect");
Sequelize (Node.js):
// MySQL
const sequelize = new Sequelize('database', 'username', 'password', {
dialect: 'mysql',
});
// PostgreSQL
const sequelize = new Sequelize('database', 'username', 'password', {
dialect: 'postgres',
});
Transaction management
- Default Isolation Level: PostgreSQL uses
Read Committed
by default, while MySQL traditionally usedRepeatable Read
. - Locking Behavior: PostgreSQL's approach to row locking differs from MySQL's. MySQL uses next-key locking in InnoDB to prevent phantom reads. PostgreSQL uses multi-version concurrency control (MVCC) without next-key locking. This can lead to different behavior in highly concurrent applications.
- Serialization Failures: PostgreSQL may throw serialization failures that MySQL wouldn't.
Error handling
- Error Codes: Different numeric codes for similar errors.
- Constraint Violations: Different formats for constraint violation messages.
- Connection Errors: Different error handling for connection issues.
Database-specific features
If your application uses MySQL-specific features, alternatives must be implemented:
- Full-Text Search: Replace MySQL's full-text search with PostgreSQL's text search capabilities.
- Stored Procedures: Rewrite in PL/pgSQL syntax.
- User-Defined Functions: Convert to PostgreSQL's function syntax.
- Triggers: Update to PostgreSQL's trigger syntax.
Other notable compatibility issues
Beyond schema and code changes, several other compatibility issues require attention:
Case sensitivity
- MySQL is typically case-insensitive for table and column names on Windows, but case-sensitive on Unix/Linux.
- PostgreSQL is always case-sensitive unless identifiers are quoted, and converts unquoted identifiers to lowercase.
- This can cause unexpected behavior if your application relies on case-insensitive identifiers.
NULL handling
NULL value handling differs between the systems:
- In MySQL, NULL = NULL returns NULL, while in PostgreSQL, NULL = NULL returns false.
- MySQL treats empty strings as NULL in some contexts, while PostgreSQL distinguishes between empty strings and NULL.
- These differences can affect query results and application logic.
Character sets and collations
- MySQL uses character sets and collations at the server, database, table, and column levels.
- PostgreSQL uses encoding at the database level and collations at the column level.
- Default character sets differ: MySQL often defaults to
latin1
(though the sane setup should beutf8mb4
), while PostgreSQL typically usesUTF-8
.
Cutover process
The final phase of migration is the cutover—transitioning production traffic from MySQL to PostgreSQL. Here's a structured approach:
Cutover strategies
Strategy | Description | Steps | Pros | Cons |
---|---|---|---|---|
Big Bang Cutover | Switch all traffic at once | 1. Stop all write traffic to MySQL 2. Perform final data synchronization 3. Verify data consistency 4. Update application configuration 5. Restart application services 6. Resume traffic | Simpler to implement; no need to maintain both systems simultaneously | Higher risk; longer downtime; all-or-nothing approach |
Phased Cutover | Transition traffic gradually | 1. Identify components for independent migration 2. Migrate one component at a time 3. Maintain data synchronization during transition 4. Monitor each component before proceeding 5. Complete when all components are transitioned | Lower risk; issues affect only part of the system; easier rollback | More complex; requires maintaining both systems; potential data consistency challenges |
Read/Write Split | Separate read and write operations | 1. Direct reads to PostgreSQL, writes to MySQL 2. Maintain real-time replication 3. Migrate write operations when confident 4. Decommission MySQL after transition | Gradual transition; reduced risk for read-heavy applications; easier performance validation | Requires robust replication; potential replication lag; complex application changes |
Zero-downtime approaches
For systems that cannot tolerate downtime:
-
Dual-Write Pattern:
- Modify application to write to both MySQL and PostgreSQL
- Read from MySQL initially
- Gradually shift reads to PostgreSQL
- Once confident, stop writing to MySQL
-
Change Data Capture (CDC):
- Use tools like Debezium to capture changes from MySQL
- Apply changes to PostgreSQL in real-time
- Switch application connection to PostgreSQL when ready
-
Proxy-Based Approach:
- Implement a database proxy (like ProxySQL or PgBouncer)
- Configure the proxy to route traffic appropriately during migration
- Switch routing rules to complete migration
Common tools
- pgloader. A powerful and flexible PostgreSQL migration tool that excels at rapidly loading data into PostgreSQL databases.
- Ora2Pg. While primarily designed for Oracle to PostgreSQL migrations, Ora2Pg can also be used to migrate from MySQL to PostgreSQL.
- pg_dump and pg_restore. These core PostgreSQL utilities are often used in conjunction with other tools.
- Cloud migration services like AWS Database Migration Service (DMS).