[Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?

Fri Feb 12 15:50:27 UTC 2016

________________________________________
From: Luigi Toscano <ltoscano at redhat.com>
Sent: Friday, February 12, 2016 8:58 AM
To: rdo-list at redhat.com
Cc: Boris Derzhavets; egafford at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?

On Thursday 11 of February 2016 20:37:33 Boris Derzhavets wrote:
> After several failures in Sahara UI environment, I just attempted to set up
> Pig on Mater Node (VM) following
> https://pig.apache.org/docs/r0.13.0/start.html#build ( current release
> 0.15)
>
> Downloaded  Pig on Master VM and configured it, so that I could run
> simple pig script like (filtering "ERROR" line from sahara-engine.log)  :-

You should not install Pig from apache.org, but it's already bundled on the
image.

 [BD]
  We may check :-
  1. I will drop all VMs in Cluster
  2. I will recreate new Vanilla 2.7.1 cluster
  3 .Log into master and worker VMs and run :-
     ubuntu$ sudo su - hadoop
     hadoop$ which pig

 If the last command wouldn't  return me anything would  it be  a fair ?
 Is it  correct way for verification ?

> [...]
>
> Either I have wrong understanding how set up Pig on Hadoop Cluster or doing
> mistakes in Sahara GUI environment. One more thing confusing me Pig should
> be installed during Sahara's Hadoop Cluster generating, it is not a part of
> of Hadoop Cluster ( no matter which plugin has been used ), how Sahara
> suggests to set up Pig Jobs if there is no any Pig on cluster's VMs. I am
> missing something here.

It should be. Is there anything that let you think that Pig is not included?
Did you add oozie to the cluster templates?

[BD]

Yes . I did it on  Master's Template (Vanilla 2.7.1). Only datanode and nodemanager 
were removed from Master template and added to Worker Template.

Master template :- 
  Everything except 
- datanode
- nodemanager

Worker template:-
    Only
 + datanode
 + nodemanager

Clusters Template = 1*Master+3*Workers

I started without any Pig setup on Master and each time job finally got killed
and I could watch Java exception as above on one of Worker Nodes attempting 
to run  prg1.pig as job-binary :-

messages = LOAD '$INPUT';
out = FILTER messages BY $0 MATCHES '^+.*ERROR+.*';
STORE out  INTO '$OUPUT' USING PigStorage() ;

with 
1) input defined as swift ://demo.sahara/input.txt ( previously uploaded to public swift
                                                                                       container "demo" )
 2) output defined as swift://demo/sahara/output

Just following up Mirantis's Video for Juno :-
https://www.mirantis.com/blog/sahara-updates-in-openstack-juno-release/

I installed Pig on Master , to make sure that in Hadoop environment I am able to
succeed with
hadoop$ pig -x mapreduce prg1.pig ( i.e. running pig script within  Hadoop Cluster, by hadoop user )

There is one strange thing , when I suspend all VMs in Cluster and then resume all VMs in Cluster (4)
It requires 10-15 min to be able manually run :-
hadooop$  pig -x mapreduce prg1.pig
Like some Java services are awakened very slow , even Master VM is resumed and available for
logging into. 

Thank you.
Boris

Regards
--
Luigi