[Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
Boris Derzhavets
bderzhavets at hotmail.com
Sun Feb 14 07:31:20 UTC 2016
Fresh install on new box same task. Oozie Launcher starts complete log is attached
However log4j generates java.io.FileNotFoundException on second worker node
due to output should be a directory , but setFile(null,true) expects a file
$ cat log4j_ERROR.txt
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: /mnt/yarn/logs/userlogs/application_1455432090546_0001/container_1455432090546_0001_01_000001 (Is a directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at java.io.FileOutputStream.<init>(FileOutputStream.java:142)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.hadoop.yarn.ContainerLogAppender.activateOptions(ContainerLogAppender.java:55)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)
at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
at org.apache.log4j.Logger.getLogger(Logger.java:104)
at org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:262)
at org.apache.commons.logging.impl.Log4JLogger.<init>(Log4JLogger.java:108)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1025)
at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:844)
at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:541)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:292)
at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:269)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
at org.apache.hadoop.service.AbstractService.<clinit>(AbstractService.java:43)
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
log4j:ERROR Could not find value for key log4j.appender.CLA
log4j:ERROR Could not instantiate appender named "CLA".
________________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
Sent: Saturday, February 13, 2016 3:46 AM
To: Ethan Gafford
Cc: Luigi Toscano; rdo-list at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
Verifying pig 0.10.1 and 0.11 and Java Version compatibility
Per https://archive.apache.org/dist/pig/pig-0.10.1/RELEASE_NOTES.txt
Trying the Release
==================
1. Download pig-0.10.1.tar.gz
2. Unpack the file: tar -xzvf pig-0.10.1.tar.gz
3. Move into the installation directory: cd pig-0.10.1
4. To run pig without Hadoop cluster, execute the command below. This will
take you into an interactive shell called grunt that allows you to navigate
the local file system and execute Pig commands against the local files
bin/pig -x local
5. To run on your Hadoop cluster, you need to set PIG_CLASSPATH environment
variable to point to the directory with your hadoop-site.xml file and then run
pig. The commands below will take you into an interactive shell called grunt
that allows you to navigate Hadoop DFS and execute Pig commands against it
export PIG_CLASSPATH=/hadoop/conf
bin/pig
6. To build your own version of pig.jar run
ant
7. To run unit tests run
ant test
8. To build jar file with available user defined functions run commands below.
This currently only works with Java 1.6.x. <==== Limitations
Per http://people.apache.org/~billgraham/pig-0.11.0-candidate-0/RELEASE_NOTES.txt
Trying the Release
==================
1. Download pig-0.11.0.tar.gz
2. Unpack the file: tar -xzvf pig-0.11.0.tar.gz
3. Move into the installation directory: cd pig-0.11.0
4. To run pig without Hadoop cluster, execute the command below. This will
take you into an interactive shell called grunt that allows you to navigate
the local file system and execute Pig commands against the local files
bin/pig -x local
5. To run on your Hadoop cluster, you need to set PIG_CLASSPATH environment
variable to point to the directory with your hadoop-site.xml file and then run
pig. The commands below will take you into an interactive shell called grunt
that allows you to navigate Hadoop DFS and execute Pig commands against it
export PIG_CLASSPATH=/hadoop/conf
bin/pig
6. To build your own version of pig.jar run
ant
7. To run unit tests run
ant test
8. To build jar file with available user defined functions run commands below.
This currently only works with Java 1.6.x. <===== Same limitations
Testing installed Java Version
hadoop at demo-cluster271-hadoop-master-node-0:/opt$ java -version
java version "1.7.0_79" <===== Currently Java Release installed 1.7.79
OpenJDK Runtime Environment (IcedTea 2.5.6) (7u79-2.5.6-0ubuntu1.14.04.1)
OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
________________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
Sent: Saturday, February 13, 2016 3:08 AM
To: Ethan Gafford
Cc: Luigi Toscano; rdo-list at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
Not sure . Per https://archive.apache.org/dist/pig/pig-0.10.1/RELEASE_NOTES.txt
System Requirements
===================
Highlights
==========
This is a maintenance release of Pig 0.10. See CHANGES.txt for a list of changes.
1. Java 1.6.x or newer, preferably from Sun. Set JAVA_HOME to the root of your
Java installation
2. Ant build tool: http://ant.apache.org - to build source only
3. Cygwin: http://www.cygwin.com/ - to run under Windows
4. This release is compatible with all Hadoop 0.20.X, 1.X, 0.23.X and 2.X releases <== here
________________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
Sent: Saturday, February 13, 2016 2:56 AM
To: Ethan Gafford
Cc: Luigi Toscano; rdo-list at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
Check versions of bundled Pig and version of Hadoop installed via upstream image for Sahara cluster
( sahara-liberty-vanilla-2.7.1-centos-7.qcow2 from http://sahara-files.mirantis.com/images/upstream/liberty/)
[boris at ServerCentOS7 Downloads]$ ssh -i oskeydsa.pem ubuntu at 192.168.1.162
Welcome to Ubuntu 14.04.3 LTS (GNU/Linux 3.13.0-65-generic x86_64)
* Documentation: https://help.ubuntu.com/
System information as of Fri Feb 12 18:32:34 UTC 2016
System load: 0.0 Processes: 84
Usage of /: 12.2% of 18.70GB Users logged in: 0
Memory usage: 54% IP address for eth0: 50.0.0.101
Swap usage: 0%
Graph this data and manage this system at:
https://landscape.canonical.com/
Get cloud support with Ubuntu Advantage Cloud Guest:
http://www.ubuntu.com/business/services/cloud
0 packages can be updated.
0 updates are security updates.
Last login: Fri Feb 12 18:32:35 2016 from 192.168.1.57
ubuntu at demo-cluster271-hadoop-master-node-0:~$ sudo su - hadoop
hadoop at demo-cluster271-hadoop-master-node-0:~$ hadoop version
Hadoop 2.7.1 <==== Hadoop Version
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
>From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar
hadoop at demo-cluster271-hadoop-master-node-0:~$ logout
ubuntu at demo-cluster271-hadoop-master-node-0:~$ sudo su -
root at demo-cluster271-hadoop-master-node-0:~# cd /
root at demo-cluster271-hadoop-master-node-0:/# find . -name "*pig*" -print
./opt/hive/hcatalog/share/webhcat/svr/lib/pig-0.10.1.jar <== Seems like version of Pig is 0.10.1
./opt/hive/hcatalog/share/hcatalog/hcatalog-pig-adapter-0.11.0.jar <==== As Max 0.11.0
./opt/oozie/lib/oozie-sharelib-pig-4.2.0.jar
./opt/oozie/oozie-server/webapps/oozie/WEB-INF/lib/oozie-sharelib-pig-4.2.0.jar
Per https://archive.apache.org/dist/pig/pig-0.10.1/RELEASE_NOTES.txt,
http://people.apache.org/~billgraham/pig-0.11.0-candidate-0/RELEASE_NOTES.txt
System Requirements
===================
1. Java 1.6.x or newer, preferably from Sun ( Oracle acquired Sun in 2010 )
Set JAVA_HOME to the root of your Java installation
2. Ant build tool: http://ant.apache.org - to build source only
3. Cygwin: http://www.cygwin.com/ - to run under Windows
4. This release is compatible with all Hadoop 0.20.X, 1.0.X and 0.23.X releases <=== doesn't include 2.7.1
Conclusion: bundled version of Pig 0.10.1 is not compatible with Hadoop 2.7.1
Please, advise if I am wrong.
Boris.
________________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
Sent: Friday, February 12, 2016 3:41 PM
To: Ethan Gafford
Cc: Luigi Toscano; rdo-list at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
Ethan ,
I am attaching real script from the box ( the one I am on right now, running AIO RDO Liberty + Sahara+Heat )
[boris at ServerCentOS7 ~]$ cat prg1.pig
messages = LOAD '$INPUT';
out = FILTER messages BY $0 MATCHES '^+.*ERROR+.*';
STORE out INTO '$OUTPUT' USING PigStorage();
Thank you
Boris.
________________________________________
From: Ethan Gafford <egafford at redhat.com>
Sent: Friday, February 12, 2016 3:14 PM
To: Boris Derzhavets
Cc: Luigi Toscano; rdo-list at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
Hi Boris,
One possible source of error in your script is that (at least in your paste here) your $OUTPUT variable seems to be misspelled ('$OUPUT').
If that's not the issue (and it might well not be, if that was only a typo in this mail,) then I fear that debugging Pig jobs is one of the roughest spots of Sahara at present. It looks, though, as though your job is being processed by grunt (the Pig interpreter, as all things must be cute and pig-related), and that the interpreter is failing to parse the job file, so the misspelling diagnosis does fit (if you're telling it to output to a nonexistent variable, that fits; the rest of your script looks syntactically correct to me and to my Lipstick equivalent.)
Could you check to see if this is a problem? If not, I can try to help you do a deeper dive.
Cheers,
Ethan
----- Original Message -----
From: "Boris Derzhavets" <bderzhavets at hotmail.com>
To: "Luigi Toscano" <ltoscano at redhat.com>, rdo-list at redhat.com
Cc: egafford at redhat.com
Sent: Friday, February 12, 2016 12:29:59 PM
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
________________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
Sent: Friday, February 12, 2016 11:27 AM
To: Luigi Toscano; rdo-list at redhat.com
Cc: egafford at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
________________________________________
From: rdo-list-bounces at redhat.com <rdo-list-bounces at redhat.com> on behalf of Boris Derzhavets <bderzhavets at hotmail.com>
Sent: Friday, February 12, 2016 10:50 AM
To: Luigi Toscano; rdo-list at redhat.com
Cc: egafford at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
________________________________________
From: Luigi Toscano <ltoscano at redhat.com>
Sent: Friday, February 12, 2016 8:58 AM
To: rdo-list at redhat.com
Cc: Boris Derzhavets; egafford at redhat.com
Subject: Re: [Rdo-list] How to set up EDP Pig jobs on Sahara Vanialla 2.7.1 Cluster (RDO Liberty ) ?
On Thursday 11 of February 2016 20:37:33 Boris Derzhavets wrote:
> After several failures in Sahara UI environment, I just attempted to set up
> Pig on Mater Node (VM) following
> https://pig.apache.org/docs/r0.13.0/start.html#build ( current release
> 0.15)
>
> Downloaded Pig on Master VM and configured it, so that I could run
> simple pig script like (filtering "ERROR" line from sahara-engine.log) :-
You should not install Pig from apache.org, but it's already bundled on the
image.
[BD]
We may check :-
1. I will drop all VMs in Cluster
2. I will recreate new Vanilla 2.7.1 cluster
3 .Log into master and worker VMs and run :-
ubuntu$ sudo su - hadoop
hadoop$ which pig
If the last command wouldn't return me anything would it be a fair ?
Is it correct way for verification ?
[BD]
Yes , I see presence of bundled pig on Master
./opt/hive/hcatalog/share/webhcat/svr/lib/pig-0.10.1.jar
./opt/hive/hcatalog/share/hcatalog/hcatalog-pig-adapter-0.11.0.jar
./opt/oozie/lib/oozie-sharelib-pig-4.2.0.jar
./opt/oozie/oozie-server/webapps/oozie/WEB-INF/lib/oozie-sharelib-pig-4.2.0.jar
[BD]
Cluster based on Vanilla 271 plugin recreated .
hadoop at demo-cluster271-hadoop-master-node-0:~$ jps
4020 Bootstrap
2540 ResourceManager
4452 Jps
2285 NameNode
4242 RunJar
2977 JobHistoryServer
2392 SecondaryNameNode
ubuntu at demo-cluster271-hadoop-work-node-1:~$ sudo su - hadoop
hadoop at demo-cluster271-hadoop-work-node-1:~$ jps
2250 NodeManager
2193 DataNode
4159 Jps
hadoop-master-node -Template 1
vanilla 2.7.1
namenode
secondarynamenode
resourcemanager
historyserver
oozie
hiveserver
hadoop-work-node -Template 2
vanilla 2.7.1
datanode
nodemanager
Cluster Template ( XFS && SWIFT support enabled)
demo-template271
vanilla 2.7.1
hadoop-work-node: 3
hadoop-master-node: 1
Attempt to run same job-binary ( prg1.pig )
same input and output.
Generated logs on work-node-1
http://bderzhavets.blogspot.com/2016/02/snaphots-on-mastenode-workernode-1.html
http://textuploader.com/52dzu
> [...]
>
> Either I have wrong understanding how set up Pig on Hadoop Cluster or doing
> mistakes in Sahara GUI environment. One more thing confusing me Pig should
> be installed during Sahara's Hadoop Cluster generating, it is not a part of
> of Hadoop Cluster ( no matter which plugin has been used ), how Sahara
> suggests to set up Pig Jobs if there is no any Pig on cluster's VMs. I am
> missing something here.
It should be. Is there anything that let you think that Pig is not included?
Did you add oozie to the cluster templates?
[BD]
Yes . I did it on Master's Template (Vanilla 2.7.1). Only datanode and nodemanager
were removed from Master template and added to Worker Template.
Master template :-
Everything except
- datanode
- nodemanager
Worker template:-
Only
+ datanode
+ nodemanager
Clusters Template = 1*Master+3*Workers
I started without any Pig setup on Master and each time job finally got killed
and I could watch Java exception as above on one of Worker Nodes attempting
to run prg1.pig as job-binary :-
messages = LOAD '$INPUT';
out = FILTER messages BY $0 MATCHES '^+.*ERROR+.*';
STORE out INTO '$OUPUT' USING PigStorage() ;
with
1) input defined as swift ://demo.sahara/input.txt ( previously uploaded to public swift
container "demo" )
2) output defined as swift://demo/sahara/output
Just following up Mirantis's Video for Juno :-
https://www.mirantis.com/blog/sahara-updates-in-openstack-juno-release/
I installed Pig on Master , to make sure that in Hadoop environment I am able to
succeed with
hadoop$ pig -x mapreduce prg1.pig ( i.e. running pig script within Hadoop Cluster, by hadoop user )
There is one strange thing , when I suspend all VMs in Cluster and then resume all VMs in Cluster (4)
It requires 10-15 min to be able manually run :-
hadooop$ pig -x mapreduce prg1.pig
Like some Java services are awakened very slow , even Master VM is resumed and available for
logging into.
Thank you.
Boris
Regards
--
Luigi
_______________________________________________
Rdo-list mailing list
Rdo-list at redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list
To unsubscribe: rdo-list-unsubscribe at redhat.com
_______________________________________________
Rdo-list mailing list
Rdo-list at redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list
To unsubscribe: rdo-list-unsubscribe at redhat.com
_______________________________________________
Rdo-list mailing list
Rdo-list at redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list
To unsubscribe: rdo-list-unsubscribe at redhat.com
_______________________________________________
Rdo-list mailing list
Rdo-list at redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list
To unsubscribe: rdo-list-unsubscribe at redhat.com
_______________________________________________
Rdo-list mailing list
Rdo-list at redhat.com
https://www.redhat.com/mailman/listinfo/rdo-list
To unsubscribe: rdo-list-unsubscribe at redhat.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Oozie-Launcher-starts.txt.gz
Type: application/gzip
Size: 15083 bytes
Desc: Oozie-Launcher-starts.txt.gz
URL: <http://lists.rdoproject.org/pipermail/dev/attachments/20160214/4e3aacf1/attachment.gz>
More information about the dev
mailing list