AWS Essentials for Hadoop Developers

MapReduce (with HDFS Path)

hadoop jar WordCount.jar WordCount /analytics/aws/input/result.csv /analytics/aws/output/1

 

MapReduce (with S3 Path)

hadoop jar WordCount.jar WordCount s3://emr-analytics-dev/input/result.csv s3://emr-analytics-dev/output/2

 

AWS S3 Cp

Usage: Copy files from EBS (mounted on EMR) to S3

aws s3 cp /mnt1/analytics/aruba/aruba_2016_clean/aruba_2016_full.csv s3://emr-analytics-dev/hdfs/analytics/aruba/

 

S3DistCp

Usage: Copy files from (a) HDFS to S3; (b) S3 to HDFS; (c) S3 to S3

s3-dist-cp –src=hdfs:///nag/sample.xml –dest=s3://emr-analytics-dev/conf/
s3-dist-cp –src=s3://emr-analytics-dev/jars/ –dest=hdfs:///nag/
s3-dist-cp –src=s3://emr-analytics-dev/jars/ –dest=/analytics/aws/input/
s3-dist-cp –src=hdfs:///analytics/aws/input/result.csv –dest=s3://emr-analytics-dev/conf/

 

WGet

Usage: Copy files from S3 to EMRFS

wget http://emr-analytics-dev.s3.amazonaws.com/jars/WordCount.jar [Action Required: S3 Folder -> Actions -> Make Public]

 

S3Put

Usage: Copy files from EMRFS to S3

s3put -a <Access Key Id> -s <Secret Access Key> -b emr-analytics-dev –region ap-southeast-1 /home/hadoop/WordCountTest.jar
s3put -b emr-analytics-dev –region ap-southeast-1 /home/hadoop/WordCountTest.jar
s3put -b emr-analytics-dev -p /home/hadoop -k jars –region ap-southeast-1 /home/hadoop/WordCountTest.jar

 

Hive External Table with S3

CREATE EXTERNAL TABLE aruba_open_word_cloud_v5_s3(
product string,
category string,
sub_category string,
calendar_year string,
calendar_quarter string,
csat string,
sentiment string,
sentiment_outlier string,
word string,
count int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
STORED AS INPUTFORMAT
‘org.apache.hadoop.mapred.TextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
LOCATION
‘s3://emr-analytics-dev/hdfs/analytics/aruba/word_cloud_v5_output’;

 

Location of HDFS Site Configuration (hdfs-site.xml) in AWS

/usr/lib/hadoop/etc/hadoop

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s