当前位置：首页 -> 技术开发 -> 数据库 -> 利用排序规则特点计算汉字笔划和取得拼音首字母

利用排序规则特点计算汉字笔划和取得拼音首字母

时间： 2021-07-31 作者：daque

sql　server的排序准则平常运用不是很多，大概不少入门者还比拟生疏，但有一个缺点大师应是常常碰到: sql server数据库，在跨库多表贯穿查问时，若两数据库默许字符集各别，体例就会归来如许的缺点： “没辙处置 equal to 操纵的排序准则辩论。” 一.缺点领会：　　这个缺点是由于排序准则不普遍形成的，咱们做个尝试，比方：create table #t1(name varchar(20) collate albanian_ci_ai_ws,value int)create table #t2(name varchar(20) collate chinese_prc_ci_ai_ws,value int )表建好后，实行贯穿查问：select * from #t1 a inner join #t2 b on a.name=b.name 如许，缺点就展示了：效劳器: 动静 446，级别 16，状况 9，行 1 没辙处置 equal to 操纵的排序准则辩论。　　要废除这个缺点，最简片面法是，表贯穿时指定它的排序准则，如许缺点就不复展示了。语句如许写：select * from #t1 a inner join #t2 b on a.name=b.name collate chinese_prc_ci_ai_ws二.排序准则简介：什么叫排序准则呢？ms是如许刻画的："在 microsoft sql server 2000 中，字符串的物理保存由排序准则遏制。排序准则指定表白每个字符的位形式以及保存和比拟字符所运用的准则。"　　在查问领会器内实行底下语句，不妨获得sql　server扶助的一切排序准则。　　　　select * from ::fn_helpcollations() 排序准则称呼由两部份形成，前半部份是指本排序准则所扶助的字符集。如：　　chinese_prc_cs_ai_ws 前半部份：指unicode字符集，chinese_prc_南针对陆地简化汉字unicode的排序准则。排序准则的后半部份即后缀含意：　　_bin 二进制排序　　_ci(cs) 能否辨别巨细写，ci不辨别，cs辨别　　_ai(as) 能否辨别重音，ai不辨别，as辨别　　　　　_ki(ks) 能否辨别化名典型,ki不辨别，ks辨别　_wi(ws) 能否辨别宽窄 wi不辨别，ws辨别　辨别巨细写:即使想让比拟将小写假名和小写假名视为不等，请采用该选项。辨别重音:即使想让比拟将重音和非重音假名视为不等，请采用该选项。即使采用该选项，比拟还将重音各别的假名视为不等。辨别化名:即使想让比拟将片化名宁静化名日语音缀视为不等，请采用该选项。辨别宽窄:即使想让比拟将半角字符和全角字符视为不等，请采用该选项三.排序准则的运用：　　sql server供给了洪量的windows和sqlserver专用的排序准则，但它的运用常常被开拓职员所忽视。本来它在试验中文大学有用途。　　例1:让表name列的实质按拼音排序：create table #t(id int,name varchar(20))insert #t select 1,'中'union all select 2,'国'union all select 3,'人'union all select 4,'阿'select * from #t order by name collate chinese_prc_cs_as_ks_ws drop table #t/*截止：idname ----------- -------------------- 4 阿2 国3 人1 中*/　　例2：让表name列的实质按姓氏笔划排序：create table #t(id int,name varchar(20))insert #t select 1,'三'union all select 2,'乙'union all select 3,'二'union all select 4,'一'union all select 5,'十'select * from #t order by name collate chinese_prc_stroke_cs_as_ks_wsdrop table #t/*截止：idname ----------- -------------------- 4 一2 乙3 二5 十1 三*/四.在试验中排序准则运用的扩充　　sql server中国字排序准则不妨按拼音、笔划等排序，那么咱们怎样运用这种功效来处置中国字的少许困难呢？我此刻举个例子：　　　　　　　　　　用排序准则的个性计划中国字笔划　　要计划中国字笔划，咱们得先做筹备处事，咱们领会，windows多国中国字，unicode暂时收录中国字共20902个。简体gbk码中国字unicode值从19968发端。　　开始，咱们先用sqlserver本领获得一切中国字，不必字典，咱们大略运用sql语句就不妨获得：select top 20902 code=identity(int,19968,1) into #t from syscolumns a,syscolumns b再用以次语句，咱们就获得一切中国字，它是按unicode值排序的：　　select code,nchar(code) as cnword from #t 　　而后，咱们用select语句，让它按笔划排序。select code,nchar(code) as cnword from #t order by nchar(code) collate chinese_prc_stroke_cs_as_ks_ws,code截止：codecnword ----------- ------ 19968 一20008 丨20022 丶20031 丿20032 乀20033 乁20057 乙20058 乚20059 乛20101 亅19969 丁..........　从上头的截止，咱们不妨领会的看到，一笔的中国字，code是从19968到20101，自小到大排，但到了二笔中国字的第一个字“丁”，code为19969，就不按程序而从新发端了。有了这截止，咱们就不妨轻快的用sql语句获得每种笔划中国字归类的第一个或结果一个中国字。底下用语句获得结果一个中国字：create table #t1(id int identity,code int,cnword nvarchar(2))insert #t1(code,cnword)select code,nchar(code) as cnwordfrom #t order by nchar(code) collate chinese_prc_stroke_cs_as_ks_ws,codeselect a.cnword from #t1 a left join #t1 b on a.id=b.id-1 and a.code<b.code where b.code is nullorder by a.id获得36个中国字，每个中国字都是每种笔划数按chinese_prc_stroke_cs_as_ks_ws排序准则排序后的结果一个中国字：亅阝马风龙齐龟齿鸩龀龛龂龆龈龊龍龠龎龐龑龡龢龝齹龣龥齈龞麷鸞麣龖龗齾齉龘　　上头不妨看出：“亅”是一切一笔中国字排序后的结果一个字，“阝”是一切二笔中国字排序后的结果一个字......之类。　　但同声也创造，从第33个中国字“龗(33笔)”反面的笔划有些乱，不精确。但不妨，比“龗”笔划多的惟有四个中国字，咱们细工加上：齾35笔，齉36笔，靐39笔，龘64笔建中国字笔划表（tab_hzbh）：create table tab_hzbh(id int identity,cnword nchar(1))--先插入前33个中国字insert tab_hzbhselect top 33 a.cnword from #t1 a left join #t1 b on a.id=b.id-1 and a.code<b.code where b.code is nullorder by a.id--再加结果四个中国字set identity_insert tab_hzbh ongoinsert tab_hzbh(id,cnword)　　　　　select 35,n'齾'union all select 36,n'齉'union all select 39,n'靐'union all select 64,n'龘'goset identity_insert tab_hzbh offgo　　到此为止，咱们不妨获得截止了，比方咱们想获得中国字“国”的笔划：declare @a nchar(1)set @a='国'select top 1 id fromtab_hzbh where cnword>=@a collate chinese_prc_stroke_cs_as_ks_wsorder by idid----------- 8(截止：中国字“国”笔划数为8)　　上头一切筹备进程，不过为了写底下这个因变量，这个因变量撇开上头建的一切偶尔表和恒定表，为了通用和代码变化简单，把表tab_hzbh的实质写在语句内，而后计划用户输出一串中国字的总笔划：create function fun_getbh(@str nvarchar(4000))returns intasbegindeclare @word nchar(1),@n int,@i intset @i=1set @n=0while substring(@str,@i,1)<>'' or @i<=len(@str)beginset @word=substring(@str,@i,1)--即使非中国字，笔划当0计set @n=@n+(case when unicode(@word) between 19968 and 19968+20901then (select top 1 id from (select 1 as id,n'亅' as word union all select 2,n'阝' union all select 3,n'马' union all select 4,n'风' union all select 5,n'龙' union all select 6,n'齐' union all select 7,n'龟' union all select 8,n'齿' union all select 9,n'鸩' union all select 10,n'龀' union all select 11,n'龛' union all select 12,n'龂' union all select 13,n'龆' union all select 14,n'龈' union all select 15,n'龊' union all select 16,n'龍' union all select 17,n'龠' union all select 18,n'龎' union all select 19,n'龐' union all select 20,n'龑' union all select 21,n'龡' union all select 22,n'龢' union all select 23,n'龝' union all select 24,n'齹' union all select 25,n'龣' union all select 26,n'龥' union all select 27,n'齈' union all select 28,n'龞' union all select 29,n'麷' union all select 30,n'鸞' union all select 31,n'麣' union all select 32,n'龖' union all select 33,n'龗' union all select 35,n'齾' union all select 36,n'齉' union all select 39,n'靐' union all select 64,n'龘' ) t where word>=@word collate chinese_prc_stroke_cs_as_ks_wsorder by id asc) else 0 end)set @i=@i+1endreturn @nend--因变量挪用范例：select dbo.fun_getbh('中华群众民主国'),dbo.fun_getbh('中華群众集权國')　　　实行截止：笔划总额辨别为39和46，简繁体都行。固然，你也不妨把上头“union　all”内的中国字和笔划改生存恒定表内，在中国字列建clustered index，列排序准则设定于：　　　 chinese_prc_stroke_cs_as_ks_ws如许速率更快。即使你用的是big5码的操纵体例，你得其余天生中国字，本领一律。但有一点要记取：那些中国字是经过sql语句select出来的，不是细工输出的，更不是查字典得来的，由于新华字典究竟各别于unicode字符集，查字典的截止会不精确。　　　　　　用排序准则的个性获得中国字拼音首假名　　用获得笔划总额沟通的本领，咱们也不妨写出求中国字拼音首假名的因变量。如次：create function fun_getpy(@str nvarchar(4000))returns nvarchar(4000)asbegindeclare @word nchar(1),@py nvarchar(4000),@i intset @py=''set @i=1while (substring(@str,@i,1)<>'' or @i<=len(@str))beginset @word=substring(@str,@i,1)--即使非中国字字符，归来原字符set @py=@py+(case when unicode(@word) between 19968 and 19968+20901then (select top 1 py from (select 'a' as py,n'驁' as wordunion all select 'b',n'簿'union all select 'c',n'錯'union all select 'd',n'鵽'union all select 'e',n'樲'union all select 'f',n'鰒'union all select 'g',n'腂'union all select 'h',n'夻'union all select 'j',n'攈'union all select 'k',n'穒'union all select 'l',n'鱳'union all select 'm',n'旀'union all select 'n',n'桛'union all select 'o',n'漚'union all select 'p',n'曝'union all select 'q',n'囕'union all select 'r',n'鶸'union all select 's',n'蜶'union all select 't',n'籜'union all select 'w',n'鶩'union all select 'x',n'鑂'union all select 'y',n'韻'union all select 'z',n'咗') t where word>=@word collate chinese_prc_cs_as_ks_ws order by py asc) else @word end)set @i=@i+1endreturn @pyend--因变量挪用范例：select dbo.fun_getpy('中华群众民主国'),dbo.fun_getpy('中華群众集权國')截止都为：zhrmghg　你若有爱好，也可用沟通的本领，扩充为获得中国字全拼的因变量，以至还不妨获得全拼的读音腔调，然而全拼分门别类大多了。获得全拼最佳是用比较表，两万多中国字探求速率很快，用比较表还不妨充溢运用表的索引。排序准则再有很多其它的精巧用法，限于篇幅在此就不复精细证明。欢送大师共通商量。

利用排序规则特点计算汉字笔划和取得拼音首字母

相关推荐

推荐下载

热门阅览

最新排行