Hive内嵌表生成函数UDTF:explode,posexplode,json_tuple,parse_url_tuple,stack

   日期:2020-07-07     浏览:138    评论:0    
核心提示:0.Hive内嵌表生成函数Built-in Table-Generating Functions (UDTF)普通的用户定义函数,如concat(),接受单个输入行并输出单个输出行。相反,表生成函数将单个输入行转换为多个输出行 Row-set columns types Name(Signature) Description T explode(ARRAY a) E

0.Hive内嵌表生成函数

Built-in Table-Generating Functions (UDTF)

普通的用户定义函数,如concat(),接受单个输入行并输出单个输出行。相反,表生成函数将单个输入行转换为多个输出行

Row-set columns types

Name(Signature)

Description

T

explode(ARRAY<T> a)

Explodes an array to multiple rows. Returns a row-set with a single column (col), one row for each element from the array.

将数组分解为多行。返回一个单列(col)的行集,数组中的每个元素对应一行。

Tkey,Tvalue

explode(MAP<Tkey,Tvalue> m)

Explodes a map to multiple rows. Returns a row-set with a two columns (key,value) , one row for each key-value pair from the input map. (As of Hive 0.8.0.).

int,T posexplode(ARRAY<T> a)

Explodes an array to multiple rows with additional positional column of int type (position of items in the original array, starting with 0). Returns a row-set with two columns (pos,val), one row for each element from the array.

数组分解为多行,其中包含int类型的附加位置列(原始数组中的位置,从0开始),意思行转列以后,还多一列数组元素对应下标的列

T1,...,Tn

inline(ARRAY<STRUCT<f1:T1,...,fn:Tn>> a)

Explodes an array of structs to multiple rows. Returns a row-set with N columns (N = number of top level elements in the struct), one row per struct from the array. (As of Hive 0.10.)

将一个结构数组分解为多行。返回一个包含N列的行集(N =结构中顶级元素的数量),数组中的每个结构中有一行。

T1,...,Tn/r stack(int r,T1 V1,...,Tn/r Vn)

Breaks up n values V1,...,Vn into rows. Each row will have n/r columns. must be constant.

分解n值V1,…,将Vn分成r行。每一行都有n/r列。r必须是常数

string1,...,stringn

json_tuple(string jsonStr,string k1,...,string kn)

Takes JSON string and a set of n keys, and returns a tuple of n values. This is a more efficient version of the get_json_object UDF because it can get multiple keys with just one call.

string 1,...,stringn

parse_url_tuple(string urlStr,string p1,...,string pn)

Takes URL string and a set of n URL parts, and returns a tuple of n values. This is similar to the parse_url() UDF but can extract multiple parts at once out of a URL. Valid part names are: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO, QUERY:<KEY>.

1.案例演示

1.1 explode,一般和lateral view explode一起使用

    一般用来实现表的行转列

select explode(array('A','B','C'));
select explode(array('A','B','C')) as col;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;

select explode(map('A',10,'B',20,'C',30));
select explode(map('A',10,'B',20,'C',30)) as (key,value);
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;

1.2. posexplode的使用


select posexplode(array('A','B','C'));
select posexplode(array('A','B','C')) as (pos,val);
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view posexplode(array('A','B','C')) tf as pos,val;

1.3 inline的使用

select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02')));
select inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) as (col1,col2,col3);
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf;
select tf.* from (select 0) t lateral view inline(array(struct('A',10,date '2015-01-01'),struct('B',20,date '2016-02-02'))) tf as col1,col2,col3;

1.4 stack的使用

select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01');
select stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') as (col0,col1,col2);
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf;
select tf.* from (select 0) t lateral view stack(2,'A',10,date '2015-01-01','B',20,date '2016-01-01') tf as col0,col1,col2;

1.5 json_turple 的使用

       在Hive 0.7中引入了一个新的json_tuple() ,是UDTF。它的参数是一个JSON字符串和一组key键,并使用一个函数返回一个元组值。这比调用GET_JSON_OBJECT从一个JSON字符串中检索多个键要高效得多。相当于加强版的get_JSON_OBJECT。它可以一次返回多个object,以元组的形式存储。


select a.timestamp, b.*
from log a lateral view json_tuple(a.appevent, 'eventid', 'eventname') b as f1, f2;

1.6 parse_url_tuple的使用

         parse_url_tuple() 是UDTF函数,类似于UDF中的parse_url(),但是可以同时提取给定URL的多个部分,以元组的形式返回数据。例如parse_url_tuple('http://facebook.com/path1/p.php?(k1=v1&k2=v2#Ref1', 'QUERY:k1', 'QUERY:k2')返回一个值为'v1','v2'的元组。这比多次调用parse_url()更有效。所有输入参数和输出列类型都是string。

 

 
打赏
 本文转载自:网络 
所有权利归属于原作者,如文章来源标示错误或侵犯了您的权利请联系微信13520258486
更多>最近资讯中心
更多>最新资讯中心
0相关评论

推荐图文
推荐资讯中心
点击排行
最新信息
新手指南
采购商服务
供应商服务
交易安全
关注我们
手机网站:
新浪微博:
微信关注:

13520258486

周一至周五 9:00-18:00
(其他时间联系在线客服)

24小时在线客服