SQL String Functions Cheat Sheet: CONCAT, TRIM, SUBSTRING, and More
String manipulation in SQL is tested more than most candidates expect. This cheat sheet covers CONCAT, TRIM, UPPER, LOWER, SUBSTRING, REPLACE, LIKE, and the string patterns that keep appearing in data analyst interviews.
String functions are tested more than most people expect in SQL interviews — particularly in data cleaning and data quality questions. This cheat sheet covers every string function you are likely to need, with real examples for each.
1. UPPER and LOWER — changing case
sqlSELECT UPPER(name) AS name_upper, -- "john smith" → "JOHN SMITH" LOWER(email) AS email_lower -- "[email protected]" → "[email protected]" FROM customers;
Use case in interviews: standardising data before comparison.
sql-- Case-insensitive search WHERE LOWER(email) = LOWER('[email protected]') -- Or use ILIKE in PostgreSQL (case-insensitive LIKE) WHERE email ILIKE '%gmail.com'
2. LENGTH — counting characters
sqlSELECT name, LENGTH(name) AS name_length FROM customers WHERE LENGTH(name) > 50; -- find suspiciously long names
MySQL uses
LENGTH()for bytes andCHAR_LENGTH()for characters. For multi-byte characters (e.g. accented letters), useCHAR_LENGTH().
3. TRIM, LTRIM, RTRIM — removing whitespace
sql-- Remove spaces from both ends SELECT TRIM(' hello world ') AS cleaned; -- "hello world" -- Remove only leading spaces SELECT LTRIM(' hello') AS cleaned; -- "hello" -- Remove only trailing spaces SELECT RTRIM('hello ') AS cleaned; -- "hello" -- Remove a specific character (PostgreSQL) SELECT TRIM(BOTH '.' FROM '...hello...') AS cleaned; -- "hello"
Always TRIM before comparing string columns that come from user input or CSV imports.
4. CONCAT — joining strings together
sql-- Basic concat SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM customers; -- PostgreSQL also supports || operator SELECT first_name || ' ' || last_name AS full_name FROM customers; -- CONCAT_WS — concat with separator (skips NULLs) SELECT CONCAT_WS(', ', city, state, country) AS location FROM addresses; -- "London, UK" (skips NULL state cleanly)
CONCAT_WS is especially useful because it automatically skips NULL values, unlike plain CONCAT which returns NULL if any argument is NULL.
5. SUBSTRING — extracting part of a string
sql-- SUBSTRING(string, start_position, length) SELECT SUBSTRING('Hello World', 1, 5) AS result; -- "Hello" SELECT SUBSTRING('Hello World', 7) AS result; -- "World" (to end) -- Extract domain from email SELECT email, SUBSTRING(email, POSITION('@' IN email) + 1) AS domain FROM customers; -- "[email protected]" → "gmail.com"
6. POSITION / CHARINDEX — finding a substring
sql-- PostgreSQL / MySQL: POSITION(substring IN string) SELECT POSITION('@' IN email) AS at_position FROM customers; -- "[email protected]" → 5 -- SQL Server: CHARINDEX(substring, string) SELECT CHARINDEX('@', email) AS at_position FROM customers;
Used to find where a character appears so you can split or extract around it.
7. LEFT and RIGHT — extracting from edges
sql-- First N characters SELECT LEFT(phone, 3) AS area_code FROM customers; -- Last N characters SELECT RIGHT(product_code, 4) AS suffix FROM products; -- Extract country code (first 2 chars of phone) SELECT phone, LEFT(phone, 2) AS country_code FROM customers WHERE LEFT(phone, 2) IN ('+1', '44', '91');
8. REPLACE — substituting characters
sql-- Remove dashes from phone numbers SELECT REPLACE(phone, '-', '') AS clean_phone FROM customers; -- "123-456-7890" → "1234567890" -- Mask sensitive data SELECT REPLACE(email, SUBSTRING(email, 1, POSITION('@' IN email) - 1), '****') AS masked FROM customers;
9. LIKE and ILIKE — pattern matching
sql-- % matches any number of characters WHERE email LIKE '%@gmail.com' -- ends with @gmail.com WHERE name LIKE 'J%' -- starts with J WHERE code LIKE '%_2024%' -- contains _2024 -- _ matches exactly one character WHERE phone LIKE '___-___-____' -- matches 3-3-4 format -- ILIKE — case insensitive (PostgreSQL only) WHERE email ILIKE '%GMAIL%'
Performance tip:
LIKE '%something'with a leading wildcard cannot use an index.LIKE 'something%'(no leading wildcard) can. This matters on large tables.
10. SPLIT_PART — splitting strings by delimiter
sql-- PostgreSQL: SPLIT_PART(string, delimiter, part_number) SELECT SPLIT_PART('[email protected]', '@', 1) AS username; -- "john.smith" SELECT SPLIT_PART('[email protected]', '@', 2) AS domain; -- "gmail.com" SELECT SPLIT_PART('New York, NY, USA', ', ', 2) AS state; -- "NY"
11. REGEXP_LIKE / ~ — regular expression matching
sql-- PostgreSQL: ~ operator WHERE email ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' -- MySQL WHERE email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' -- BigQuery WHERE REGEXP_CONTAINS(email, r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
12. Quick reference table
| Function | PostgreSQL | MySQL | BigQuery |
|---|---|---|---|
| Concatenate | || or CONCAT | CONCAT | CONCAT |
| String length | LENGTH | CHAR_LENGTH | LENGTH |
| Substring | SUBSTRING(s, pos, len) | SUBSTRING(s, pos, len) | SUBSTR(s, pos, len) |
| Find position | POSITION(x IN s) | LOCATE(x, s) | STRPOS(s, x) |
| Split by delimiter | SPLIT_PART(s, d, n) | SUBSTRING_INDEX(s, d, n) | SPLIT(s, d)[OFFSET(n)] |
| Regex match | s ~ pattern | s REGEXP pattern | REGEXP_CONTAINS(s, pattern) |
13. Common interview pattern: email validation
sql-- Find users with invalid internal emails SELECT user_id, email FROM users WHERE email NOT LIKE '%@thequerylab.com' OR email LIKE '% %' -- contains space OR LENGTH(email) < 6; -- too short
14. Practice string function problems on TheQueryLab
String functions are heavily tested in data quality and data cleaning interview questions. On TheQueryLab, problems tagged with data cleaning and string operations will help you build fluency.
Start with the SQL Interview Preparation Kit — problems like finding valid emails, extracting domains, and standardising messy data all require the functions covered in this guide.
