hive常用函数参考手册.docx-道客多多

资源描述

1、学大数据，上小牛学堂课程视频地址：http:/ CLI 命令显示当前会话有多少函数可用 SHOW FUNCTIONS;显示函数的描述信息 DESC FUNCTION concat;显示函数的扩展描述信息 DESC FUNCTION EXTENDED concat;学大数据，上小牛学堂课程视频地址：http:/ 关系运算数学运算逻辑运算数值计算类型转换日期函数条件函数字符串函数统计函数聚合函数函数处理的数据粒度为多条记录。 sum()求和 count()求数据量 avg()求平均直 distinct求不同值数 min求最小值 max求最人值集合函数复合类型构建复杂类型访问复杂

2、类型长度特殊函数学大数据，上小牛学堂课程视频地址：http:/ 用于分区排序动态 Group By Top N 累计计算层次查询Windowing functionsleadlagFIRST_VALUELAST_VALUE分析函数Analytics functionsRANKROW_NUMBERDENSE_RANKCUME_DISTPERCENT_RANKNTILE混合函数学大数据，上小牛学堂课程视频地址：http:/ ,arg1 ,arg2)reflect(class,method ,arg1 ,arg2)hash(a1 ,a2.)UDTFlateralView: LATERAL VIE

3、W udtf(expression) tableAlias AS columnAlias (, columnAlias)* fromClause: FROM baseTable (lateralView)* ateral view 用于和 split, explode 等 UDTF 一起使用，它能够将一行数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。lateral view 首先为原始表的每行调用UDTF，UTDF 会把一行拆分成一或者多行，lateral view 再把结果组合，产生一个支持别名表的虚拟表。常用函数 Demo：create table employee(id str

4、ing,money double,type string)row format delimited fields terminated by t lines terminated by n stored as textfile;load data local inpath /liguodong/hive/data into table employee;select * from employee;优先级依次为 NOT AND ORselect id,money from employee where (id=1001 or id=1002) and money=100;学大数据，上小牛学堂课

5、程视频地址：http:/ 类型转换select cast(1.5 as int);学大数据，上小牛学堂课程视频地址：http:/ 判断if(con,);hive (default) select if(21,YES,NO);YEScase when con then when con then else end (里面类型要一样)学大数据，上小牛学堂课程视频地址：http:/ case when id=1001 then v1001 when id=1002 then v1002 else v1003 end from employee;get_json_objectget_json_obje

6、ct(json 解析函数，用来处理 json，必须是 json 格式)select get_json_object(“name“:“jack“,“age“:“20“,$.name);URL 解析函数parse_url(string urlString, string partToExtract , string keyToExtract)学大数据，上小牛学堂课程视频地址：http:/ parse_url(http:/ concat 语法: concat(string A, string B) 返回值: string 说明：返回输入字符串连接后的结果，支持任意个输入字符串举例：hive se

7、lect concat(abc,def,gh) from lxw_dual;abcdefgh带分隔符字符串连接函数： concat_ws 语法: concat_ws(string SEP, string A, string B) 返回值: string 说明：返回输入字符串连接后的结果， SEP 表示各个字符串间的分隔符concat_ws(string SEP, array)举例：学大数据，上小牛学堂课程视频地址：http:/ select concat_ws(,abc,def,gh) from lxw_dual;abc,def,gh列出该字段所有不重复的值，相当于去重collect_set(

8、id) /返回的是数组列出该字段所有的值，列出来不去重 collect_list(id) /返回的是数组select collect_set(id) from taborder;学大数据，上小牛学堂课程视频地址：http:/ sum(num),count(*) from taborder;窗口函数first_value(第一行值)first_value(money) over (partition by id order by money)select ch,num,first_value(num) over (partition by ch order by num) from tabor

9、der;学大数据，上小牛学堂课程视频地址：http:/ between 1 preceding and 1 following (当前行以及当前行的前一行与后一行 )hive (liguodong) select ch,num,first_value(num) over (partition by ch order by num ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) from taborder;学大数据，上小牛学堂课程视频地址：http:/ 最后一行值hive (liguodong) select ch,num,last_value(num) ov

10、er (partition by ch) from taborder;lead去当前行后面的第二行的值lead(money,2) over (order by money)学大数据，上小牛学堂课程视频地址：http:/ 去当前行前面的第二行的值lag(money,2) over (order by money)select ch, num, lead(num,2) over (order by num) from taborder;select ch, num, lag(num,2) over (order by num) from taborder;学大数据，上小牛学堂课程视频地址：http

11、:/ 排名rank() over(partition by id order by money)select ch, num, rank() over(partition by ch order by num) as rank from taborder;select ch, num, dense_rank() over(partition by ch order by num) as dense_rank from taborder;学大数据，上小牛学堂课程视频地址：http:/ (相同值的最大行号/行数)cume_dist() over (partition by id order by

12、money)percent_rank (相同值的最小行号-1)/(行数- 1)第一个总是从 0 开始percent_rank() over (partition by id order by money)select ch,num,cume_dist() over (partition by ch order by num) as cume_dist,percent_rank() over (partition by ch order by num) as percent_rankfrom taborder;学大数据，上小牛学堂课程视频地址：http:/ 分片 ntile(2) over (o

13、rder by money desc) 分两份 select ch,num,ntile(2) over (order by num desc) from taborder;混合函数学大数据，上小牛学堂课程视频地址：http:/ id,java_method(“java.lang,Math“,“sqrt“,cast(id as double) as sqrt from hiveTest;UDTFselect id,adid from employee lateral view explode(split(type,B) tt as adid;explode 把一列转成多行hive (liguod

14、ong) select id,adid from hiveDemo lateral view explode(split(str,) tt as adid;正则表达式使用正则表达式的函数 regexp_replace(string subject A,string B,string C) regexp_extract(string subject,string pattern,int index)hive select regexp_replace(foobar, oo|ar, ) from lxw_dual;学大数据，上小牛学堂课程视频地址：http:/ select regexp_replace(979|7.10.80|8684, .*|(.*),1) from hiveDemo limit 1;hive select regexp_replace(979|7.10.80|8684, (.*?)|(.*),1) from hiveDemo limit 1;

展开阅读全文