티스토리 뷰
In this case I usually create Hive table with tsv file, and then analysis data with HQL query.
First, you need to do is create table
CREATE EXTERNAL TABLE 'database_name.table_name' (
id bigint COMMENT 'put column exactly same with tsv file header name',
class string,
type string,
access string,
category string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'file_location(s3, hdfs, etc)'
TBLPROPERTIES ('serialization.null.format' = '')
Second, now you could analysis with this table like this!
SELECT * FROM table_name WHERE collumn = '' LIMIT 100
TAG
more
Recent Post