有些时候,我们可能想要比较一下两个数据表,以找到其中不同的数据。比如,在进行数据移植的时候,或是在合并数据的时候,或是在比对验证数据的时候。当然比较两个表,需要这两个表结构是一样的。
我们先假设一下有如下表结构:
CREATE TABLE jajal ( user_id integer NOT NULL, first_name character varying(255), last_name character varying(255), grade character(1), CONSTRAINT jajal_pkey PRIMARY KEY (user_id) )
然后,我们有两张表——jajal和jajal_copy,其内容如下:
目录
user_id
first_name
last_name
grade
1
Some
Dude
A
2
Other
Guy
B
3
You are
Welcome
B
4
What
Other
A
5
INeed
You
C
6
Mixed
Nuts
Z
7
Kirk
Land
B
8
Bit
Shooter
A
9
Sun
Microsystem
C
10
Extra
Fancy
B
user_id
first_name
last_name
grade
1
Some
Dude
A
2
Other
Guy
B
3
You are
Welcome
B
4
What
Other
A
5
INeed
You
C
6
Mixed
Nuts
C
7
Kirk
Land
B
8
Bit
Shooter
A
9
Sun
Microsystem
C
10
Extra
Fancy
B
要比较这两张表的数据,找出不一样的数据行。我们可以使用outer join 技术。我给outer join做了一个链接,是Wikipedia的,如果你对这个技术不是很清楚,还请你行看看其技术细节。
下面是具体的SQL语句:
SELECT * FROM jajal j FULL OUTER JOIN jajal_copy jc ON jc.first_name = j.first_name AND jc.last_name = j.last_name AND jc.grade = j.grade AND jc.user_id = j.user_id WHERE j.user_id IS NULL OR jc.user_id IS NULL
运行结果如下:
user_id
first_name
last_name
grade
user_id
first_name
last_name
grade
[NULL]
[NULL]
[NULL]
[NULL]
6
Mixed
Nuts
C
6
Mixed
Nuts
Z
[NULL]
[NULL]
[NULL]
[NULL]
关于natural join,你可以看看Wikipedia是怎么说的。
SELECT * FROM jajal j NATURAL FULL OUTER JOIN jajal_copy jc WHERE j.user_id IS NULL OR jc.user_id IS NULL
运行结果如下:
user_id
first_name
last_name
grade
6
Mixed
Nuts
C
6
Mixed
Nuts
Z
MySQL 并不支持 FULL OUTER JOIN,但是我们可以使用LEFT JOIN 和 RIGHT JOIN 来实现这一功能。如下所示。
SELECT * FROM jajal j LEFT JOIN jajal_copy jc ON jc.first_name = j.first_name AND jc.last_name = j.last_name AND jc.grade = j.grade AND jc.user_id = j.user_id WHERE jc.user_id IS NULL UNION ALL SELECT * FROM jajal j RIGHT JOIN jajal_copy jc ON jc.first_name = j.first_name AND jc.last_name = j.last_name AND jc.grade = j.grade AND jc.user_id = j.user_id WHERE j.user_id IS NULL
或者你更喜欢NATURAL JOIN 版本
SELECT * FROM jajal j NATURAL LEFT JOIN jajal_copy jc WHERE jc.user_id IS NULL UNION ALL SELECT * FROM jajal j NATURAL RIGHT JOIN jajal_copy jc WHERE j.user_id IS NULL
当然,如果你需要一个MySQL的存储过程的话,下面是一个示例:
DELIMITER $$
CREATE PROCEDURE `db_schema`.`tablediff` (schema_name VARCHAR(64), table1 VARCHAR(64), table2 VARCHAR(64)) BEGIN DECLARE done INT DEFAULT 0; DECLARE sql_statement TEXT DEFAULT ''; DECLARE sql_statement_where TEXT DEFAULT ''; DECLARE sql_statement_pk TEXT DEFAULT ''; DECLARE col_name VARCHAR(64); DECLARE col_name_cur CURSOR FOR SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA = schema_name AND TABLE_NAME = table1 ; DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN col_name_cur; traverse_columns: LOOP FETCH col_name_cur INTO col_name;
IF done THEN CLOSE col_name_cur; LEAVE traverse_columns; END IF;
SET sql_statement_where = CONCAT(sql_statement_where, ' AND a.', col_name, ' = b.', col_name); SET sql_statement_pk = CONCAT(sql_statement_pk, 'AND b.', col_name, ' IS NULL'); END LOOP;
SELECT COLUMN_NAME INTO col_name FROM information_schema.KEY_COLUMN_USAGE WHERE CONSTRAINT_SCHEMA = schema_name AND CONSTRAINT_NAME = 'PRIMARY' AND TABLE_NAME = table1 LIMIT 1 ; IF col_name IS NOT NULL THEN SET sql_statement_pk = CONCAT('AND b.', col_name, ' IS NULL'); END IF;
SET sql_statement = CONCAT('SELECT * FROM ', schema_name, '.', table1, ' a LEFT JOIN ', schema_name, '.', table2, ' b ON TRUE'); SET sql_statement = CONCAT(sql_statement, sql_statement_where, ' WHERE TRUE ', sql_statement_pk); SET sql_statement = CONCAT(sql_statement, ' UNION ALL SELECT * FROM ', schema_name, '.', table1, ' b RIGHT JOIN ', schema_name, '.', table2, ' a ON TRUE'); SET sql_statement = CONCAT(sql_statement, sql_statement_where, ' WHERE TRUE ', sql_statement_pk);
SET @s = sql_statement; PREPARE stmt1 FROM @s; EXECUTE stmt1; DEALLOCATE PREPARE stmt1;
END$$ DELIMITER ;
下面是PostgreSQL的一个存储过程:
CREATE OR REPLACE FUNCTION tablediff ( IN schema_name VARCHAR(64), IN table1 VARCHAR(64), IN table2 VARCHAR(64) ) RETURNS BIGINT AS $BODY$ DECLARE the_result BIGINT DEFAULT 0; sql_statement TEXT DEFAULT ''; sql_statement_where TEXT DEFAULT ''; sql_statement_pk TEXT DEFAULT ''; col_name VARCHAR(64); col_name_cur CURSOR FOR SELECT column_name FROM information_schema.columns WHERE table_catalog = schema_name AND table_schema = 'public' AND table_name = table1 ; BEGIN OPEN col_name_cur;
LOOP FETCH col_name_cur INTO col_name; IF NOT FOUND THEN EXIT; END IF;
sql_statement_where := sql_statement_where || ' AND a.' || col_name || ' = b.' || col_name; END LOOP;
SELECT column_name INTO col_name FROM information_schema.table_constraints tc JOIN information_schema.constraint_column_usage ccu ON ccu.constraint_name = tc.constraint_name WHERE tc.table_catalog = schema_name AND tc.table_schema = 'public' AND tc.table_name = table1 LIMIT 1 ;
IF col_name IS NOT NULL THEN sql_statement_pk := ' a.' || col_name || ' IS NULL'; sql_statement_pk := sql_statement_pk || ' OR b.' || col_name || ' IS NULL'; END IF;
sql_statement := 'SELECT COUNT(*) FROM ' || schema_name || '.public.' || table1 || ' a FULL OUTER JOIN ' || schema_name || '.public.' || table2 || ' b ON TRUE'; sql_statement := sql_statement || sql_statement_where || ' WHERE ' || sql_statement_pk;
EXECUTE sql_statement INTO the_result;
RETURN the_result; END;$BODY$ LANGUAGE 'plpgsql' STABLE;
文章:来源