title: 617.如何比较两个数据表 outline: deep

有些时候,我们可能想要比较一下两个数据表,以找到其中不同的数据。比如,在进行数据移植的时候,或是在合并数据的时候,或是在比对验证数据的时候。当然比较两个表,需要这两个表结构是一样的。

我们先假设一下有如下表结构:

CREATE TABLE jajal (   user_id integer NOT NULL,   first_name character varying(255),   last_name character varying(255),   grade character(1),   CONSTRAINT jajal_pkey PRIMARY KEY (user_id) )

然后,我们有两张表——jajal和jajal_copy,其内容如下:

目录

jajal

user_id

first_name

last_name

grade

1

Some

Dude

A

2

Other

Guy

B

3

You are

Welcome

B

4

What

Other

A

5

INeed

You

C

6

Mixed

Nuts

Z

7

Kirk

Land

B

8

Bit

Shooter

A

9

Sun

Microsystem

C

10

Extra

Fancy

B

jajal_copy

user_id

first_name

last_name

grade

1

Some

Dude

A

2

Other

Guy

B

3

You are

Welcome

B

4

What

Other

A

5

INeed

You

C

6

Mixed

Nuts

C

7

Kirk

Land

B

8

Bit

Shooter

A

9

Sun

Microsystem

C

10

Extra

Fancy

B

要比较这两张表的数据,找出不一样的数据行。我们可以使用outer join 技术。我给outer join做了一个链接,是Wikipedia的,如果你对这个技术不是很清楚,还请你行看看其技术细节。

下面是具体的SQL语句:

使用FULL OUTER JOIN

SELECT   * FROM   jajal j   FULL OUTER JOIN jajal_copy jc ON jc.first_name = j.first_name    AND jc.last_name = j.last_name    AND jc.grade = j.grade    AND jc.user_id = j.user_id WHERE   j.user_id IS NULL   OR jc.user_id IS NULL

运行结果如下:

user_id

first_name

last_name

grade

user_id

first_name

last_name

grade

[NULL]

[NULL]

[NULL]

[NULL]

6

Mixed

Nuts

C

6

Mixed

Nuts

Z

[NULL]

[NULL]

[NULL]

[NULL]

使用NATURAL FULL OUTER JOIN

关于natural join,你可以看看Wikipedia是怎么说的。

SELECT        * FROM        jajal j        NATURAL FULL OUTER JOIN jajal_copy jc WHERE        j.user_id IS NULL        OR jc.user_id IS NULL

运行结果如下:

user_id

first_name

last_name

grade

6

Mixed

Nuts

C

6

Mixed

Nuts

Z

MySQL SQL 代码

MySQL 并不支持 FULL OUTER JOIN,但是我们可以使用LEFT JOIN 和 RIGHT JOIN 来实现这一功能。如下所示。

SELECT * FROM jajal j LEFT JOIN jajal_copy jc ON jc.first_name = j.first_name AND jc.last_name = j.last_name AND jc.grade = j.grade AND jc.user_id = j.user_id WHERE jc.user_id IS NULL UNION ALL SELECT * FROM jajal j RIGHT JOIN jajal_copy jc ON jc.first_name = j.first_name AND jc.last_name = j.last_name AND jc.grade = j.grade AND jc.user_id = j.user_id WHERE j.user_id IS NULL

或者你更喜欢NATURAL JOIN 版本

SELECT * FROM jajal j NATURAL LEFT JOIN jajal_copy jc WHERE jc.user_id IS NULL UNION ALL SELECT * FROM jajal j NATURAL RIGHT JOIN jajal_copy jc WHERE j.user_id IS NULL

当然,如果你需要一个MySQL的存储过程的话,下面是一个示例:

DELIMITER $$

CREATE PROCEDURE `db_schema`.`tablediff`     (schema_name VARCHAR(64), table1 VARCHAR(64), table2 VARCHAR(64)) BEGIN     DECLARE done INT DEFAULT 0;     DECLARE sql_statement TEXT DEFAULT '';     DECLARE sql_statement_where TEXT DEFAULT '';     DECLARE sql_statement_pk TEXT DEFAULT '';     DECLARE col_name VARCHAR(64);     DECLARE col_name_cur CURSOR FOR         SELECT             COLUMN_NAME         FROM             information_schema.COLUMNS         WHERE             TABLE_SCHEMA = schema_name             AND TABLE_NAME = table1     ;     DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

    OPEN col_name_cur;     traverse_columns: LOOP         FETCH col_name_cur INTO col_name;

        IF done THEN             CLOSE col_name_cur;             LEAVE traverse_columns;         END IF;

        SET sql_statement_where = CONCAT(sql_statement_where,             ' AND a.', col_name, ' = b.', col_name);         SET sql_statement_pk = CONCAT(sql_statement_pk,             'AND b.', col_name, ' IS NULL');     END LOOP;

    SELECT         COLUMN_NAME INTO col_name     FROM         information_schema.KEY_COLUMN_USAGE     WHERE         CONSTRAINT_SCHEMA = schema_name         AND CONSTRAINT_NAME = 'PRIMARY'         AND TABLE_NAME = table1     LIMIT 1     ;     IF col_name IS NOT NULL THEN         SET sql_statement_pk = CONCAT('AND b.', col_name, ' IS NULL');     END IF;

    SET sql_statement = CONCAT('SELECT * FROM ', schema_name, '.', table1, ' a LEFT JOIN ', schema_name, '.', table2, ' b ON TRUE');     SET sql_statement = CONCAT(sql_statement, sql_statement_where, ' WHERE TRUE ', sql_statement_pk);     SET sql_statement = CONCAT(sql_statement, ' UNION ALL SELECT * FROM ', schema_name, '.', table1, ' b RIGHT JOIN ', schema_name, '.', table2, ' a ON TRUE');     SET sql_statement = CONCAT(sql_statement, sql_statement_where, ' WHERE TRUE ', sql_statement_pk);

    SET @s = sql_statement;     PREPARE stmt1 FROM @s;     EXECUTE stmt1;     DEALLOCATE PREPARE stmt1;

END$$ DELIMITER ;

PostgreSQL 下的SQL语句

下面是PostgreSQL的一个存储过程:

CREATE OR REPLACE FUNCTION tablediff (     IN schema_name VARCHAR(64),     IN table1 VARCHAR(64),     IN table2 VARCHAR(64) ) RETURNS BIGINT AS $BODY$ DECLARE     the_result BIGINT DEFAULT 0;     sql_statement TEXT DEFAULT '';     sql_statement_where TEXT DEFAULT '';     sql_statement_pk TEXT DEFAULT '';     col_name VARCHAR(64);     col_name_cur CURSOR FOR         SELECT             column_name         FROM             information_schema.columns         WHERE             table_catalog = schema_name             AND table_schema = 'public'             AND table_name = table1     ; BEGIN     OPEN col_name_cur;

    LOOP         FETCH col_name_cur INTO col_name;         IF NOT FOUND THEN             EXIT;         END IF;

        sql_statement_where := sql_statement_where || ' AND a.' || col_name || ' = b.' || col_name;     END LOOP;

    SELECT         column_name INTO col_name     FROM         information_schema.table_constraints tc         JOIN information_schema.constraint_column_usage ccu ON             ccu.constraint_name = tc.constraint_name     WHERE         tc.table_catalog = schema_name         AND tc.table_schema = 'public'         AND tc.table_name = table1     LIMIT 1     ;

    IF col_name IS NOT NULL THEN         sql_statement_pk := ' a.' || col_name || ' IS NULL';         sql_statement_pk := sql_statement_pk || ' OR b.' || col_name || ' IS NULL';     END IF;

    sql_statement := 'SELECT COUNT(*) FROM ' || schema_name || '.public.' || table1 || ' a FULL OUTER JOIN ' || schema_name || '.public.' || table2 || ' b ON TRUE';     sql_statement := sql_statement || sql_statement_where || ' WHERE ' || sql_statement_pk;

    EXECUTE sql_statement INTO the_result;

    RETURN the_result; END;$BODY$ LANGUAGE 'plpgsql' STABLE;

文章:来源