
Elon Musk's comment about SSN database de-duplication has sparked significant debate within the developer community. Many software engineers have dismissed his remark. Let's see how OpenAI and DeepSeek designs this SSN database schema.
What is SSN?
For those of you not living in the US, SSN is the Social Security Number. It is a 9-digit number that is used to identify an individual in the US, primarily used for:
-
Income Identification: Employers use SSNs to report employees' wages and salaries to the government, ensuring earnings are properly recorded.
-
Tax Collection: The IRS requires SSNs for tax filings, including income tax returns, to track individual tax obligations and withholdings.
-
Social Welfare Distribution: Government agencies use SSNs to determine eligibility and distribute benefits such as Social Security, Medicare, Medicaid, and unemployment assistance.
-
Credit Score: Lenders, banks, and credit bureaus use SSNs to track an individual's credit history, loan applications, and debt repayment behavior, forming the basis of credit scores.
In US, the SSN is a highly sensitive piece of information. True love isn't defined by wedding vows—it’s when you trust your partner enough to share your SSN.
First Iteration
Prompt:
You are a relational database schema designer. You are given the task to design
the schema for the United States' Social Security Number (SSN) database.
The database needs to track basic personal information and payment profile.
Minimal design, no whistles and bells. Generate Postgres SQL dialect.
ChatGPT 4o
CREATE TABLE person (
-- SSN is always 9 digits (without dashes)
ssn CHAR(9) PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
date_of_birth DATE NOT NULL,
-- M: Male, F: Female, O: Other
gender CHAR(1) CHECK (gender IN ('M', 'F', 'O')),
address TEXT NOT NULL
);
CREATE TABLE payment_profile (
ssn CHAR(9) PRIMARY KEY REFERENCES person(ssn) ON DELETE CASCADE,
-- Basic validation for routing number
bank_routing_number CHAR(9) NOT NULL CHECK (bank_routing_number ~ '^[0-9]{9}$'),
bank_account_number VARCHAR(20) NOT NULL,
payment_method VARCHAR(10) CHECK (payment_method IN ('ACH', 'Check')) NOT NULL,
last_payment_date DATE
);
CREATE INDEX idx_person_last_name ON person(last_name);
CREATE INDEX idx_payment_method ON payment_profile(payment_method);
ChartGPT also explains the design choices:
person.ssn
is the primary key, ensuring uniqueness.person.date_of_birth
ensures valid date entry.person.gender
is a single-character field with constraints (M, F, O
).payment_profile
referencesperson.ssn
(ON DELETE CASCADE
ensures the payment profile is removed when a person is deleted).payment_profile.bank_routing_number
has a regex constraint to ensure it contains exactly 9 digits.payment_profile.payment_method
allows only ACH or Check.- Indexes on
last_name
andpayment_method
optimize common lookups.
A pleasant surprise is the model designs gender
column as a non-binary option and use O
to represent other gender.
DeepSeek R1
-- Table to store personal information
CREATE TABLE person (
ssn CHAR(9) PRIMARY KEY, -- Social Security Number (9 digits, no hyphens)
first_name VARCHAR(50) NOT NULL, -- First name
last_name VARCHAR(50) NOT NULL, -- Last name
date_of_birth DATE NOT NULL, -- Date of birth
gender CHAR(1) NOT NULL, -- Gender (M/F/O for Male, Female, Other)
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Record creation timestamp
);
-- Table to store payment profiles
CREATE TABLE payment_profile (
id SERIAL PRIMARY KEY, -- Unique payment profile ID
ssn CHAR(9) NOT NULL REFERENCES person(ssn) ON DELETE CASCADE, -- Foreign key to person
total_contributions DECIMAL(12, 2) DEFAULT 0.00, -- Total contributions made
last_payment_date DATE, -- Date of the last payment
last_payment_amount DECIMAL(10, 2), -- Amount of the last payment
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Record creation timestamp
);
DeepSeek designs a similar schema as ChatGPT. DeepSeek doesn't explain the design choices and just comment the columns.
One shortcoming is DeepSeek doesn't use column checks to ensure the validity of the data.
Second Iteration - Account for Name Change
Prompt:
A person can change the name (e.g. after marriage)
ChatGPT 4o
CREATE TABLE person (
ssn CHAR(9) PRIMARY KEY, -- Immutable identifier
date_of_birth DATE NOT NULL,
gender CHAR(1) CHECK (gender IN ('M', 'F', 'O')), -- M: Male, F: Female, O: Other
address TEXT NOT NULL
);
CREATE TABLE person_name (
id SERIAL PRIMARY KEY,
ssn CHAR(9) NOT NULL REFERENCES person(ssn) ON DELETE CASCADE,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT TRUE, -- Tracks the current name
changed_at TIMESTAMP DEFAULT NOW() -- Records when the name was changed
);
CREATE TABLE payment_profile (
ssn CHAR(9) PRIMARY KEY REFERENCES person(ssn) ON DELETE CASCADE,
bank_routing_number CHAR(9) NOT NULL CHECK (bank_routing_number ~ '^[0-9]{9}$'), -- 9-digit validation
bank_account_number VARCHAR(20) NOT NULL,
payment_method VARCHAR(10) CHECK (payment_method IN ('ACH', 'Check')) NOT NULL,
last_payment_date DATE
);
-- Ensure only one current name per person
CREATE UNIQUE INDEX idx_person_name_unique ON person_name(ssn) WHERE is_current = TRUE;
CREATE INDEX idx_person_name_last_name ON person_name(last_name);
CREATE INDEX idx_payment_method ON payment_profile(payment_method);
Changes & justifications:
-
Separated names into person_name:
- Allows multiple historical names per person.
- Tracks current vs. past names using
is_current
(enforced with a unique index). - Uses
changed_at
to record when a name was changed.
-
Kept SSN immutable:
person
table stores attributes that do not change (SSN, DOB, gender, address).- Name history is handled separately.
-
Indexing Enhancements:
idx_person_name_unique
ensures only oneis_current = TRUE
record per SSN.idx_person_name_last_name
allows searching by last name efficiently.
DeepSeek R1
-- Table to store personal information
CREATE TABLE person (
ssn CHAR(9) PRIMARY KEY, -- Social Security Number (9 digits, no hyphens)
first_name VARCHAR(50) NOT NULL, -- Current first name
last_name VARCHAR(50) NOT NULL, -- Current last name
date_of_birth DATE NOT NULL, -- Date of birth
gender CHAR(1) NOT NULL, -- Gender (M/F/O for Male, Female, Other)
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Record creation timestamp
);
-- Table to store historical name changes
CREATE TABLE name_history (
id SERIAL PRIMARY KEY, -- Unique identifier for each name change record
ssn CHAR(9) NOT NULL REFERENCES person(ssn) ON DELETE CASCADE, -- Foreign key to person
first_name VARCHAR(50) NOT NULL, -- First name at the time of the change
last_name VARCHAR(50) NOT NULL, -- Last name at the time of the change
change_date DATE NOT NULL, -- Date of the name change
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Record creation timestamp
);
-- Table to store payment profiles
CREATE TABLE payment_profile (
id SERIAL PRIMARY KEY, -- Unique payment profile ID
ssn CHAR(9) NOT NULL REFERENCES person(ssn) ON DELETE CASCADE, -- Foreign key to person
total_contributions DECIMAL(12, 2) DEFAULT 0.00, -- Total contributions made
last_payment_date DATE, -- Date of the last payment
last_payment_amount DECIMAL(10, 2), -- Amount of the last payment
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Record creation timestamp
);
This time DeepSeek also explains the design benefits:
-
Current Name: The
person
table always reflects the most up-to-date name. -
Historical Tracking: The
name_history
table tracks all name changes over time. -
Data Integrity: Foreign keys and
ON DELETE CASCADE
ensure data consistency.
DeepSeek makes a different design choice here as the person
table still keeps the first_name
and last_name
columns.
They are just repurposed to store the current name.
Third Iteration - Add Surrogate Key
Prompt:
Using SSN as the primary key could expose security risk as SSN is sensitive.
ChatGPT 4o
CREATE TABLE person (
person_id SERIAL PRIMARY KEY, -- Surrogate key for security
ssn CHAR(9) UNIQUE NOT NULL CHECK (ssn ~ '^[0-9]{9}$'), -- Enforced uniqueness and validation
date_of_birth DATE NOT NULL,
gender CHAR(1) CHECK (gender IN ('M', 'F', 'O')), -- M: Male, F: Female, O: Other
address TEXT NOT NULL
);
CREATE TABLE person_name (
id SERIAL PRIMARY KEY,
person_id INT NOT NULL REFERENCES person(person_id) ON DELETE CASCADE,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
is_current BOOLEAN NOT NULL DEFAULT TRUE, -- Tracks the current name
changed_at TIMESTAMP DEFAULT NOW() -- Records when the name was changed
);
CREATE TABLE payment_profile (
person_id INT PRIMARY KEY REFERENCES person(person_id) ON DELETE CASCADE,
bank_routing_number CHAR(9) NOT NULL CHECK (bank_routing_number ~ '^[0-9]{9}$'), -- 9-digit validation
bank_account_number VARCHAR(20) NOT NULL,
payment_method VARCHAR(10) CHECK (payment_method IN ('ACH', 'Check')) NOT NULL,
last_payment_date DATE
);
-- Ensure only one current name per person
CREATE UNIQUE INDEX idx_person_name_unique ON person_name(person_id) WHERE is_current = TRUE;
CREATE INDEX idx_person_name_last_name ON person_name(last_name);
CREATE INDEX idx_payment_method ON payment_profile(payment_method);
Improvements for Security & Best Practices:
-
Surrogate Primary Key (person_id):
- Replaces
ssn
as the primary key, reducing exposure risk. - Makes foreign key relationships reference
person_id
instead ofssn
.
- Replaces
-
SSN remains unique & validated:
UNIQUE NOT NULL
constraint ensures no duplicates.- Regular expression constraint (
CHECK (ssn ~ '^[0-9]{9}$')
) ensures valid format.
-
Foreign Keys use person_id:
person_name
andpayment_profile
referenceperson_id
instead ofssn
.- Avoids exposing SSN in multiple places.
DeepSeek R1
-- Table to store personal information
CREATE TABLE person (
id SERIAL PRIMARY KEY, -- Surrogate key (auto-incrementing integer)
ssn CHAR(9) NOT NULL UNIQUE, -- Social Security Number (9 digits, no hyphens)
first_name VARCHAR(50) NOT NULL, -- Current first name
last_name VARCHAR(50) NOT NULL, -- Current last name
date_of_birth DATE NOT NULL, -- Date of birth
gender CHAR(1) NOT NULL, -- Gender (M/F/O for Male, Female, Other)
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Record creation timestamp
);
-- Table to store historical name changes
CREATE TABLE name_history (
id SERIAL PRIMARY KEY, -- Unique identifier for each name change record
person_id INT NOT NULL REFERENCES person(id) ON DELETE CASCADE, -- Foreign key to person
first_name VARCHAR(50) NOT NULL, -- First name at the time of the change
last_name VARCHAR(50) NOT NULL, -- Last name at the time of the change
change_date DATE NOT NULL, -- Date of the name change
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Record creation timestamp
);
-- Table to store payment profiles
CREATE TABLE payment_profile (
id SERIAL PRIMARY KEY, -- Unique payment profile ID
person_id INT NOT NULL REFERENCES person(id) ON DELETE CASCADE, -- Foreign key to person
total_contributions DECIMAL(12, 2) DEFAULT 0.00, -- Total contributions made
last_payment_date DATE, -- Date of the last payment
last_payment_amount DECIMAL(10, 2), -- Amount of the last payment
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Record creation timestamp
);
Key Changes for Security:
-
Surrogate Key (
id
) inperson
table:- The
person
table now uses an auto-incrementing integer (id
) as the primary key. - The
ssn
column is still stored but is no longer the primary key. It is marked asUNIQUE
to ensure no two people share the same SSN.
- The
-
Foreign Key References:
- The
name_history
andpayment_profile
tables now reference theperson
table using theperson_id
(surrogate key) instead of thessn
.
- The
-
SSN Storage:
- The
ssn
column is stored securely in theperson
table. In a real-world application, you should consider encrypting this column or using other security measures (e.g., hashing or tokenization) to protect sensitive data.
- The
Summary
Both ChatGPT 4o and DeepSeek can perform the SSN schema design task well:
- They both come up with a sane schema design upfront.
- They are able to alter the design based on the followup prompts. In the above example, both are able to incorporate the name change and introduce surrogate key to address the security concern.
ChatGPT 4o's highlight:
- Design
gender
column as a non-binary option. - Enforce the data validity using column checks.
- Include indexes on the columns that are frequently queried.
- Explain the design choices and justify the design decisions.
DeepSeek R1's highlight:
- I personally like its refined
person
table design as it can return the current name, which will make the query more convenient. - The explanation is terse and to the point.
Final Thoughts
As shown above, schema design involves various trade-offs. Determining whether a unique SSN is a good design choice requires considering the surrounding context. Even state-of-the-art models still require human guidance to generate high-quality schema designs that accommodate real-world use cases (e.g., name changes) and address security concerns (e.g., SSN exposure).