SQL中in和exists的区别以及执行效率分析。

发表时间：2011-09-07 22:29 | 分类：其他知识 | 浏览：1,577 次

SQL中in和exists的区别

SQL中in可以分为三类：

1、形如select * from t1 where f1 in ('a','b')，应该和以下两种比较效率

select * from t1 where f1='a' or f1='b'

或者 select * from t1 where f1 ='a' union all select * from t1 f1='b'

你可能指的不是这一类，这里不做讨论。

2、形如select * from t1 where f1 in (select f1 from t2 where t2.fx='x')，

其中子查询的where里的条件不受外层查询的影响，这类查询一般情况下，自动优化会转成exist语句，也就是效率和exist一样。

3、形如select * from t1 where f1 in (select f1 from t2 where t2.fx=t1.fx)，

其中子查询的where里的条件受外层查询的影响，这类查询的效率要看相关条件涉及的字段的索引情况和数据量多少，一般认为效率不如exists。除了第一类in语句都是可以转化成exists 语句的SQL，一般编程习惯应该是用exists而不用in，而很少去考虑in和exists的执行效率.

in、exists和join连接选择：

A，B两个表，

1、当只显示一个表的数据如A，关系条件只一个如ID时，使用IN更快：

select * from A where id in (select id from B)

2、当只显示一个表的数据如A，关系条件不只一个如ID，col1时，使用IN就不方便了，可以使用EXISTS：

select * from A

where exists (select 1 from B where id = A.id and col1 = A.col1)

3、当只显示两个表的数据时，使用IN，EXISTS都不合适，要使用连接：

select * from A left join B on id = A.id

所以使用何种方式，要根据要求来定。

根据公司老大的说法，在使用in和exists的时候，只要（）里头没有子查询执行效率还是很快的；而当（）里头又含有子查询时，建议能使用join连接的一定要使用join连接。

in和exists的执行效率分析：

章郎虫看到这个的时候头都大了，不得不承认写原文的作者是个高手。哈哈...接下来童鞋们就挑重点看吧。

这是一般情况下做的测试：

这是偶的测试结果:
set statistics io on
select * from sysobjects where exists (select 1 from syscolumns where id=syscolumns.id)
select * from sysobjects where id in (select id from syscolumns )
set statistics io off
(47 行受影响)
表'syscolpars'。扫描计数 1，逻辑读取 3 次，物理读取 0 次，预读 2 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。
表'sysschobjs'。扫描计数 1，逻辑读取 3 次，物理读取 0 次，预读 0 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。
(1 行受影响)
(44 行受影响)
表'syscolpars'。扫描计数 47，逻辑读取 97 次，物理读取 0 次，预读 0 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。
表'sysschobjs'。扫描计数 1，逻辑读取 3 次，物理读取 0 次，预读 0 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。
(1 行受影响)
set statistics io on
select * from syscolumns where exists (select 1 from sysobjects where id=syscolumns.id)
select * from syscolumns where id in (select id from sysobjects )
set statistics io off
(419 行受影响)
表'syscolpars'。扫描计数 1，逻辑读取 10 次，物理读取 0 次，预读 15 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。
表'sysschobjs'。扫描计数 1，逻辑读取 3 次，物理读取 0 次，预读 0 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。
(1 行受影响)
(419 行受影响)
表'syscolpars'。扫描计数 1，逻辑读取 10 次，物理读取 0 次，预读 0 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。
表'sysschobjs'。扫描计数 1，逻辑读取 3 次，物理读取 0 次，预读 0 次，lob 逻辑读取 0 次，lob 物理读取 0 次，lob 预读 0 次。
(1 行受影响)
测试结果(总体来讲exists比in的效率高):
效率:条件因素的索引是非常关键的
把syscolumns 作为条件:syscolumns 数据大于sysobjects
用in
扫描计数 47，逻辑读取 97 次，
用exists
扫描计数 1，逻辑读取 3 次
把sysobjects作为条件:sysobjects的数据少于syscolumns
exists比in多预读 15 次

对此我记得还做过如下测试：
表
test
结构
id int identity(1,1), --id主键\自增
sort int, --类别,每一千条数据为一个类别
sid int --分类id
插入600w条数据
如果要查询每个类别的最大sid 的话
select * from test a
where not exists(select 1 from test where sort = a.sort and sid > a.sid)
比
select * from test a
where sid in (select max(sid) from test where sort = a.sort)
的执行效率要高三倍以上。具体的执行时间忘记了。但是结果我记得很清楚。在此之前我一直推崇第二种写法，后来就改第一种了。

in和exists的sql执行效率分析,再简单举一个例子：
declare @t table(id int identity(1,1), v varchar(10))
insert @t select'a'
union all select'b'
union all select'c'
union all select'd'
union all select'e'
union all select'b'
union all select'c'
--a语句in的sql写法
select * from @t where v in (select v from @t group by v having count(*)>1)
--b语句exists的sql写法
select * from @t a where exists(select 1 from @t where id!=a.id and v=a.v)
两条语句功能都是找到表变量@t中,v含有重复值的记录.

第一条sql语句使用in,但子查询中与外部没有连系.

第二条sql语句使用exists,但子查询中与外部有连系.

大家看SQL查询计划,很清楚了.

selec v from @t group by v having count(*)> 1

这条Sql语句,它的执行不依赖于主查询主句(我也不知道怎么来描述in外面的和里面的,暂且这么叫吧,大家明白就行)

那么,SQL在查询时就会优化,即将它的结果集缓存起来

即缓存了

---

后续的操作,主查询在每处理一步时,相当于在处理 where v in('b','c') 当然,语句不会这么转化, 只是为了说明意思,也即主查询每处理一行(记为currentROW时,子查询不会再扫描表, 只会与缓存的结果进行匹配

而

select 1 from @t where id!=a.id and v=a.v

这一句,它的执行结果依赖于主查询中的每一行.

当处理主查询第一行时即 currentROW(id=1)时, 子查询再次被执行 select 1 from @t where id!=1 and v='a' 扫描全表,从第一行记 currentSubROW(id=1) 开始扫描,id相同,过滤,子查询行下移,currentSubROW(id=2)继续,id不同,但v值不匹配,子查询行继续下移...直到currentSubROW(id=7)没找到匹配的, 子查询处理结束,第一行currentROW(id=1)被过滤,主查询记录行下移

处理第二行时,currentROW(id=2), 子查询 select 1 from @t where id!=2 and v='b' ,第一行currentSubROW(id=1)v值不匹配,子查询下移,第二行,id相同过滤,第三行,...到第六行,id不同,v值匹配, 找到匹配结果,即返回,不再往下处理记录. 主查询下移.

处理第三行时，以此类推...

sql优化中，使用in和exist？主要是看你的筛选条件是在主查询上还是在子查询上。

通过分析，相信大家已经对in和exists的区别、in和exists的SQL执行效率有较清晰的了解。

原文地址：http://www.cnblogs.com/diction/archive/2008/01/18/1043844.html

本文标签：SQL

本文链接：https://www.sijitao.net/532.html

欢迎您在本博客中留下评论，如需转载原创文章请注明出处，谢谢！

下一篇：在GAE平台上搭建免费独立博客
上一篇：Mysql:You can’t specify target table ‘table_name’ for update in FROM clause

日历
2024年四月

一二三四五六日

« 十

1 2 3 4 5 6 7

8 9 10 11 12 13 14

15 16 17 18 19 20 21

22 23 24 25 26 27 28

29 30
标签
360 apache CentOS chrome Fail2ban find Firefox GAE Gmail Google htaccess Life linux MongoDB MSN Mysql nagios Nginx PHP Postfix PostgresQL Python QQ Redis SEO Shell SQL ssl tomcat ubuntu virtualbox VPS windows Wordpress XML Zabbix 主机代理发牢骚域名小百科搜索热门百度邮箱

2024年四月
一	二	三	四	五	六	日
« 十
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

SQL中in和exists的区别以及执行效率分析。

日历

标签

最新发表