Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?

Friday, 12 February 2016

________________________________________
From: rdo-list-bounces(a)redhat.com <rdo-list-bounces(a)redhat.com&gt; on behalf of Boris
Derzhavets <bderzhavets(a)hotmail.com&gt;
Sent: Friday, February 12, 2016 11:27 AM
To: Luigi Toscano; rdo-list(a)redhat.com
Cc: egafford(a)redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO
Liberty ) ?

________________________________________
From: rdo-list-bounces(a)redhat.com <rdo-list-bounces(a)redhat.com&gt; on behalf of Boris
Derzhavets <bderzhavets(a)hotmail.com&gt;
Sent: Friday, February 12, 2016 10:50 AM
To: Luigi Toscano; rdo-list(a)redhat.com
Cc: egafford(a)redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO
Liberty ) ?

________________________________________
From: Luigi Toscano <ltoscano(a)redhat.com&gt;
Sent: Friday, February 12, 2016 8:58 AM
To: rdo-list(a)redhat.com
Cc: Boris Derzhavets; egafford(a)redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO
Liberty ) ?

On Thursday 11 of February 2016 20:37:33 Boris Derzhavets wrote:
...
 After several failures in Sahara UI environment, I just attempted to
set up
 Pig on Mater Node (VM) following
 https://pig.apache.org/docs/r0.13.0/start.html#build ( current release
 0.15)

 Downloaded  Pig on Master VM and configured it, so that I could run
 simple pig script like (filtering "ERROR" line from sahara-engine.log)  :-

You should not install Pig from apache.org, but it's already bundled on the
image.

 [BD]
  We may check :-
  1. I will drop all VMs in Cluster
  2. I will recreate new Vanilla 2.7.1 cluster
  3 .Log into master and worker VMs and run :-
     ubuntu$ sudo su - hadoop
     hadoop$ which pig

 If the last command wouldn't  return me anything would  it be  a fair ?
 Is it  correct way for verification ?

[BD]
Yes , I see presence of bundled pig on Master

./opt/hive/hcatalog/share/webhcat/svr/lib/pig-0.10.1.jar
./opt/hive/hcatalog/share/hcatalog/hcatalog-pig-adapter-0.11.0.jar
./opt/oozie/lib/oozie-sharelib-pig-4.2.0.jar
./opt/oozie/oozie-server/webapps/oozie/WEB-INF/lib/oozie-sharelib-pig-4.2.0.jar

[BD]
Cluster based on  Vanilla 271 plugin recreated . 

hadoop@demo-cluster271-hadoop-master-node-0:~$ jps
4020 Bootstrap
2540 ResourceManager
4452 Jps
2285 NameNode
4242 RunJar
2977 JobHistoryServer
2392 SecondaryNameNode

ubuntu@demo-cluster271-hadoop-work-node-1:~$ sudo su - hadoop
hadoop@demo-cluster271-hadoop-work-node-1:~$ jps
2250 NodeManager
2193 DataNode
4159 Jps

hadoop-master-node  -Template 1
    vanilla  2.7.1 	

    namenode
    secondarynamenode
    resourcemanager
    historyserver
    oozie
    hiveserver

hadoop-work-node  -Template 2
    vanilla 2.7.1 	

    datanode
    nodemanager

Cluster Template ( XFS && SWIFT support enabled)

  demo-template271
    vanilla 	2.7.1 	
    hadoop-work-node: 3
    hadoop-master-node: 1

Attempt to run same job-binary ( prg1.pig )
same input and output.

Generated logs on work-node-1
http://bderzhavets.blogspot.com/2016/02/snaphots-on-mastenode-workernode-...
http://textuploader.com/52dzu

...
 [...]

 Either I have wrong understanding how set up Pig on Hadoop Cluster or doing
 mistakes in Sahara GUI environment. One more thing confusing me Pig should
 be installed during Sahara's Hadoop Cluster generating, it is not a part of
 of Hadoop Cluster ( no matter which plugin has been used ), how Sahara
 suggests to set up Pig Jobs if there is no any Pig on cluster's VMs. I am
 missing something here. 
It should be. Is there anything that let you think that Pig is not included?
Did you add oozie to the cluster templates?

[BD]

Yes . I did it on  Master's Template (Vanilla 2.7.1). Only datanode and nodemanager
were removed from Master template and added to Worker Template.

Master template :-
  Everything except
- datanode
- nodemanager

Worker template:-
    Only
 + datanode
 + nodemanager

Clusters Template = 1*Master+3*Workers

I started without any Pig setup on Master and each time job finally got killed
and I could watch Java exception as above on one of Worker Nodes attempting
to run  prg1.pig as job-binary :-

messages = LOAD '$INPUT';
out = FILTER messages BY $0 MATCHES '^+.*ERROR+.*';
STORE out  INTO '$OUPUT' USING PigStorage() ;

with
1) input defined as swift ://demo.sahara/input.txt ( previously uploaded to public swift

container "demo" )
 2) output defined as swift://demo/sahara/output

Just following up Mirantis's Video for Juno :-
https://www.mirantis.com/blog/sahara-updates-in-openstack-juno-release/

I installed Pig on Master , to make sure that in Hadoop environment I am able to
succeed with
hadoop$ pig -x mapreduce prg1.pig ( i.e. running pig script within  Hadoop Cluster, by
hadoop user )

There is one strange thing , when I suspend all VMs in Cluster and then resume all VMs in
Cluster (4)
It requires 10-15 min to be able manually run :-
hadooop$  pig -x mapreduce prg1.pig
Like some Java services are awakened very slow , even Master VM is resumed and available
for
logging into.

Thank you.
Boris

Regards
--
Luigi

_______________________________________________
Rdo-list mailing list
Rdo-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list

To unsubscribe: rdo-list-unsubscribe(a)redhat.com

_______________________________________________
Rdo-list mailing list
Rdo-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list

To unsubscribe: rdo-list-unsubscribe(a)redhat.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?