hive - standard process after ingesting data in hadoop -


i importing data oracle hadoop , want keep data hive.

  1. what steps followed after ingesting data hadoop?

  2. how perform data cleaning or error check in ingested data?

1. steps followed after ingesting data hadoop?

you don't need (importing data hadoop transferring hive)

as per docs,

you need add --hive-import in import command.

change hive table

the table name used in hive is, default, same of source table. can control output table name --hive-table option.

overwrite hive table

if hive table exists, can specify --hive-overwrite option indicate existing table in hive must replaced

@sachin mentioned handling of null values in data. can check docs more details

2. how perform data cleaning or error check in ingested data?

i assume "data cleaning" mean cleaning data in hadoop.

after data imported hdfs or step omitted, sqoop generate hive script containing create table operation defining columns using hive’s types, , load data inpath statement move data files hive’s warehouse directory.

data moved hive. so, no data in temporary hdfs location.

some of common issues mentioned in troubleshooting docs. can check basis errors.


Comments

Popular posts from this blog

amazon web services - S3 Pre-signed POST validate file type? -

c# - Check Keyboard Input Winforms -