Monday 29 September 2014

Can we change the default key-value input seperator in Hadoop MapReduce

Previous Post


Yes, We can change it using "key.value.separator.in.input.line" property in Driver class.

There may be cases where we need to take each line with specific delimiter.


eg:
one	first line
two	second line
Inorder to read a file like this we will be using KeyValueTextInputFormat.class as it takes the line with TAB  as default seperator.

So while printing each line in map() , the key will be "one" and value will be "first line".


What if we need other delimiters instead of TAB delimiter


eg:
one,first line
two,second line

Here also we need to get key as "one" and value as "first line".

It is possible by adding an extra configuration along with KeyValueTextInputFormat to change the default seperator.

//New API
Configuration conf = new Configuration();
conf.set("key.value.separator.in.input.line", ","); 
Job job = new Job(conf);
job.setInputFormatClass(KeyValueTextInputFormat.class);

No comments:

Post a Comment