[Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
Boris Derzhavets
bderzhavets at hotmail.com
Fri Feb 12 15:50:27 UTC 2016
________________________________________
From: Luigi Toscano <ltoscano at redhat.com>
Sent: Friday, February 12, 2016 8:58 AM
To: rdo-list at redhat.com
Cc: Boris Derzhavets; egafford at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
On Thursday 11 of February 2016 20:37:33 Boris Derzhavets wrote:
> After several failures in Sahara UI environment, I just attempted to set up
> Pig on Mater Node (VM) following
> https://pig.apache.org/docs/r0.13.0/start.html#build ( current release
> 0.15)
>
> Downloaded Pig on Master VM and configured it, so that I could run
> simple pig script like (filtering "ERROR" line from sahara-engine.log) :-
You should not install Pig from apache.org, but it's already bundled on the
image.
[BD]
We may check :-
1. I will drop all VMs in Cluster
2. I will recreate new Vanilla 2.7.1 cluster
3 .Log into master and worker VMs and run :-
ubuntu$ sudo su - hadoop
hadoop$ which pig
If the last command wouldn't return me anything would it be a fair ?
Is it correct way for verification ?
> [...]
>
> Either I have wrong understanding how set up Pig on Hadoop Cluster or doing
> mistakes in Sahara GUI environment. One more thing confusing me Pig should
> be installed during Sahara's Hadoop Cluster generating, it is not a part of
> of Hadoop Cluster ( no matter which plugin has been used ), how Sahara
> suggests to set up Pig Jobs if there is no any Pig on cluster's VMs. I am
> missing something here.
It should be. Is there anything that let you think that Pig is not included?
Did you add oozie to the cluster templates?
[BD]
Yes . I did it on Master's Template (Vanilla 2.7.1). Only datanode and nodemanager
were removed from Master template and added to Worker Template.
Master template :-
Everything except
- datanode
- nodemanager
Worker template:-
Only
+ datanode
+ nodemanager
Clusters Template = 1*Master+3*Workers
I started without any Pig setup on Master and each time job finally got killed
and I could watch Java exception as above on one of Worker Nodes attempting
to run prg1.pig as job-binary :-
messages = LOAD '$INPUT';
out = FILTER messages BY $0 MATCHES '^+.*ERROR+.*';
STORE out INTO '$OUPUT' USING PigStorage() ;
with
1) input defined as swift ://demo.sahara/input.txt ( previously uploaded to public swift
container "demo" )
2) output defined as swift://demo/sahara/output
Just following up Mirantis's Video for Juno :-
https://www.mirantis.com/blog/sahara-updates-in-openstack-juno-release/
I installed Pig on Master , to make sure that in Hadoop environment I am able to
succeed with
hadoop$ pig -x mapreduce prg1.pig ( i.e. running pig script within Hadoop Cluster, by hadoop user )
There is one strange thing , when I suspend all VMs in Cluster and then resume all VMs in Cluster (4)
It requires 10-15 min to be able manually run :-
hadooop$ pig -x mapreduce prg1.pig
Like some Java services are awakened very slow , even Master VM is resumed and available for
logging into.
Thank you.
Boris
Regards
--
Luigi
More information about the dev
mailing list