with temp as(select 1 as id ,'a,b,c' as name
union
select 2 as id ,'d,e,f' as name)
测试数据如下:

select id,name,s_name
from temp
lateral view explode(split(name,',')) t as s_name
结果如下:

with temp as(select '1,2,3' as id ,'a,b,c' as name
union
select '4,5,6' as id ,'d,e,f' as name)
测试数据如下:

select id,name,s_name_index,s_name
from temp
lateral view posexplode(split(name,',')) t as s_name_index,s_name
结果如下:

select id,name,s_id_index,s_id,s_name_index,s_name
from temp
lateral view posexplode(split(id,',')) t as s_id_index,s_id
lateral view posexplode(split(name,',')) t as s_name_index,s_name
结果如下: 发现此时是全连接

select id,name,s_id,s_name
from temp
lateral view posexplode(split(id,',')) t as s_id_index,s_id
lateral view posexplode(split(name,',')) t as s_name_index,s_name
where s_id_index = s_name_index
结果如下:

- 用法介绍:
find_in_set(str,stlist) strlist必须是用逗号分割的字符串,返回str在strlist的索引,没有则返回0。
主要针对想筛选只包含某个编码的情况。
select find_in_set('a','a,b,c')
结果如下:

注意:在hive中,需要使用find_in_set(‘a’,‘a,b,c’) >0,才可以;在mysql中不需要加>0就可以判断。
另外,str,strlist都可以是表中的一列数据,str就是没有带逗号的常量列,strlist是有可能在逗号的list列,那这个找在strlist列中出现过str列中的常量都会被挑选出来。
测试数据如下:
with temp as(select 1 as id ,'a,b,c,d' as name
union
select 2 as id ,'d,e,f' as name
union
select 3 as id ,'g,h,k' as name)
select *
from temp
where (find_in_set('a',name) >0 or find_in_set('d',name)>0)
结果如下:

with temp as(select 1 as id ,'a,b,c,d' as name
union
select 2 as id ,'d,e,f' as name
union
select 3 as id ,'g,h,k' as name),
temp2 as (select 'd' as dict
union
select 'e' as dict)
select temp.*,temp2.*
from temp left join temp2 on find_in_set(temp2.dict,temp.name)>0
结果如下:id 为2的行既包含d又包含e,故拆成两行;id为1的行只包含d,故也只有一行;id为3的行不包含d,e,故left join时dict字段下为空。

ps:可以看出explode或posexplode方法是将索引逗号隔开的字段全部拆开,而find_in_set是根据你的需要符合某种条件的才拆开,可以根据需求选择使用方法。
如何将拆分的多行,合并成一行,mysql可以使用group_concat,hive可以使用collect_list搭配concat_ws实现。
【参考资料】
1、Hive–sql中的explode()函数和posexplode()函数
列转行1、使用explode或posexplode方法1.1 对单列实行列转行 explode 配合 lateral view 使用-- 测试数据with temp as(select 1 as id ,'a,b,c' as name union select 2 as id ,'d,e,f' as name)测试数据如下:select id,name,s_namefrom temp lateral view explode(s
select reverse('gian') from iteblog; --'naig'
2.带分隔符字符串连接函数concat_ws(string sep,string a,string b,...) --返回值类型string
select concat_ws('.',...
2 、使用 INSERT .... DIRECTORY.... ,如下面例子所示:
INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/stu'
SELECT ID,NAME,AG...
我们在Hive中在创建表时,一般会根据导入的数据格式来指定字段分隔符和列分隔符。一般导入的文本数据字段分隔符多为逗号分隔符或者制表符(但是实际开发中一般不用着这种容易在文本内容中出现的的符号作为分隔符),如果我们使用了逗号。或者分号这类分割符,一旦文本中存在,那么就会丢失数据。当然也有一些别的分隔符,也可以自定义分隔符。有时候也会使用hive默认的分隔符来存储数据。
如上可以看出hive默认的列分割类型为org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
// t_ClueInfo clue = t_ClueInfo.Get(this.Id);
// string sql = "select * from jw_Subject where fId "+
"in(SELECT RESUTLTB.value('.', 'VARCHAR(50)') AS fId FROM "