博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Hadoop.2.x_伪分布环境搭建
阅读量:5282 次
发布时间:2019-06-14

本文共 13639 字,大约阅读时间需要 45 分钟。

 

一、 基本环境搭建

  1. 设置主机名、静态IP/DNS、主机映射、windows主机映射(方便ssh访问与IP修改)等

设置主机名: vi /etc/sysconfig/network # 重启系统生效(临时修改: hastname xxx;另起一个终端将会看到效果,需要注意的是: 若即将搭建Hadoop,这里起的hostname禁止使用"_")设置静态IP/DNS: vi /etc/sysconfig/network-scripts/ifcfg-eth0(示例:修改BOOTPROTO=static;IPADDR=192.168.0.111;GATEWAY=192.168.0.1;DNS1=192.168.0.1,重启网络服务: service network restart)设置主机映射: vi /etc/hosts (格式:IP 主机名(hostname))设置window主机映射: 修改host文件,添加 [IP 主机名]关闭防火墙:chkconfig iptables off/service iptables restart(临时修改: service iptables stop/start 立即生效)关闭selinx:vi /etc/sysconfig/selinux   # 需要重启系统生效(linux的一个加强安全子系统,加强对文件的访问控制,临时关闭(放开):setenforce 0;临时开启:setenforce 1)查看linux中是否有自带的open jdk,有则卸载,以免后期和后面安装jdk冲突而不生效(查看是否存在: java -version,如果已存在则查看java版本: rpm -qa | grep "java",卸载 rpm -e "查出来的java版本" 或 yum -y remove "查出来的java版本")准备压缩包:hadoop-2.5.0.tar.gzhadoop-2.5.0-src.tar.gz(可选包,编译源码包时使用)native-2.5.0.tar.gz(可选包,已编译好的hadoop库,可直接替换使用)protobuf-2.5.0.tar.gz(可选包,编译源码是必备组件)jdk-7u67-linux-x64.tar.gz(hadoop2.x要求jdk版本1.7+)apache-maven-3.0.5-bin.tar.gz(Maven包)repository.tar.gz(可选包,Maven仓库,在编译Hadoop源码,会用到,若不用,则在编译时会花费更长时间去下载)eclipse-jee-kepler-SR1-linux-gtk-x86_64.tar.gz(linux下使用,编写mr程序本地测试使用)

  2. 添加好用户,建立文件夹,并将准备文件上传至files

[root@centos66-bigdata-hadoop ~]# su - liuwl[liuwl@centos66-bigdata-hadoop ~]$ cd opt/[liuwl@centos66-bigdata-hadoop opt]$ lsdata  files  localsrc  modules  software  workspace---------------------------------------------------------------上传搭建Hadoop2.x的所有tar压缩包,压缩包自备,使用上传工具上传工具很多:filezilla,FlashFXP,Xftp,vmware-tools,notepad++...可能会有文件夹权限问题,需要检查一下

  3. 创建用户分配权限liuwl,并使用visudo给liuwl

[root@centos66-bigdata-hadoop ~]# visudo...liuwl   ALL=(root)      NOPASSWD:ALL[root@centos66-bigdata-hadoop ~]# su - liuwl[liuwl@centos66-bigdata-hadoop ~]$ sudo -l...User liuwl may run the following commands on this host:    (root) NOPASSWD: ALL

  4. 建立文件目录

[root@centos66-bigdata-hadoop ~]# su - liuwl[liuwl@centos66-bigdata-hadoop ~]$ cd opt/[liuwl@centos66-bigdata-hadoop opt]$ lsdata  files  localsrc  modules  software  workspace    # 文件夹随意,自己知道是装载什么的就好

  5. 安装 jdk-7u67-linux-x64(注意jdk版本号和是合适的系统位数,我这里是CentOS_66_64)

[liuwl@centos66-bigdata-hadoop ~]$ vi /etc/profile...#JAVA_HOMEexport JAVA_HOME=/opt/modules/jdk1.7.0_67export PATH=$PATH:$JAVA_HOME/bin[liuwl@centos66-bigdata-hadoop ~] source /etc/profile[liuwl@centos66-bigdata-hadoop ~]$ echo $JAVA_HOME/opt/modules/jdk1.7.0_67[liuwl@centos66-bigdata-hadoop ~]$ java -versionjava version "1.7.0_67"Java(TM) SE Runtime Environment (build 1.7.0_67-b01)Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

  6. 解压hadoop-2.5.0.tar.gz并删除doc文档(doc文件太大,且不常使用可拷出来日常查看)

# 有兴趣的朋友可以使用lynx在终端查看doc文档,当然需要使用root用户安装lynx:yum -y instatll lynx # 然后lynx xxx.html 退出:q-->y [liuwl@centos66-bigdata-hadoop ~]$ cd /home/liuwl/opt/files[liuwl@centos66-bigdata-hadoop files]$ tar -zxf hadoop-2.5.0.tar.gz -C ../modules/[liuwl@centos66-bigdata-hadoop files]$ sudo rm -rf ../modules/hadoop-2.5.0/share/doc/

二、 Hadoop伪分布模式搭建(正题)

   ★ 配置文件目录:/home/liuwl/opt/modules/hadoop-2.5.0/etc/hadoop

          PS:使用notepad++(NppFTP,若没有自行下载该组件)

  1. 为xxx.env.sh配置jdk,即JAVA_HOME

hadoop-env.sh    export JAVA_HOME=/opt/modules/jdk1.7.0_67mapred-env.sh    export JAVA_HOME=/opt/modules/jdk1.7.0_67yarn-env.sh    export JAVA_HOME=/opt/modules/jdk1.7.0_67

  2. 配置Hadoop自定义文件

    1> hdfs >>

      ▶ namenode >>

core-site.xml >>  
   
fs.defaultFS
hdfs://centos66-bigdata-hadoop.com:8020
  
hadoop.tmp.dir
/home/liuwl/opt/modules/hadoop-2.5.0/data/tmp
  
   
hadoop.http.staticuser.user
liuwl

      ▶ datanode >>

slaves >>   centos66-bigdata-hadoop.comhdfs-site.xml >>    
dfs.replication
1
dfs.permissions.enabled
false

    2> 格式化hdfs >>

[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/hdfs namenode -format[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ ls data/tmp/dfs

    3> 配置Yarn环境(包括SecondaryNameNode,JobHistoryServer) >>

yarn-site.xml >>    
   
yarn.resourcemanager.hostname
centos66-bigdata-hadoop.com
   
  
   
yarn.nodemanager.aux-services
mapreduce_shuffle
  
   
yarn.log-aggregation-enable
true
yarn.log-aggregation.retain-seconds
108600

    4> 配置mapreduce环境

mapred.site.xml >>  
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
centos66-bigdata-hadoop.com:10020
mapreduce.jobhistory.webapp.address
centos66-bigdata-hadoop.com:19888

    5> 分别启动

[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ sbin/hadoop-daemon.sh start namenodestarting namenode, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/hadoop-liuwl-namenode-centos66-bigdata-hadoop.com.out[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanodestarting datanode, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/hadoop-liuwl-datanode-centos66-bigdata-hadoop.com.out[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanagerstarting resourcemanager, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/yarn-liuwl-resourcemanager-centos66-bigdata-hadoop.com.out[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanagerstarting nodemanager, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/yarn-liuwl-nodemanager-centos66-bigdata-hadoop.com.out[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ sbin/mr-jobhistory-daemon.sh start historyserverstarting historyserver, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/mapred-liuwl-historyserver-centos66-bigdata-hadoop.com.out[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ sbin/hadoop-daemon.sh start secondarynamenodestarting secondarynamenode, logging to /home/liuwl/opt/modules/hadoop-2.5.0/logs/hadoop-liuwl-secondarynamenode-centos66-bigdata-hadoop.com.out[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ jps10772 NameNode11179 NodeManager10853 DataNode10938 ResourceManager11382 SecondaryNameNode11302 JobHistoryServer11420 Jps

  3. 测试hdfs文件系统

[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p /user/liuwl/tmp16/09/14 07:51:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ vi ../../data/wordcount.input[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/hdfs dfs -mkdir -p /user/liuwl/tmp/input16/09/14 07:54:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/hdfs dfs -put ../../data/wordcount.input /user/liuwl/tmp/input16/09/14 07:54:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/hdfs dfs -cat /user/liuwl/tmp/input/wordcount.input16/09/14 07:55:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicablehadoop mapreduceyarn historyserver hadoopmapreduce yarnnamenode datanodedatanode[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/hdfs dfs -get /user/liuwl/tmp/input/wordcount.input /opt/modules/wc.input16/09/14 07:56:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableget: /opt/modules/wc.input._COPYING_ (Permission denied)[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/hdfs dfs -get /user/liuwl/tmp/input/wordcount.input ~/opt/data/wc.input16/09/14 07:57:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ cat ../../data/wc.input hadoop mapreduceyarn historyserver hadoopmapreduce yarnnamenode datanodedatanode

  4. 使用mapreduce运行jar文件

[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /user/liuwl/tmp/input /user/liuwl/tmp/output16/09/14 07:59:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable16/09/14 07:59:55 INFO client.RMProxy: Connecting to ResourceManager at centos66-bigdata-hadoop.com/192.168.0.110:803216/09/14 07:59:57 INFO input.FileInputFormat: Total input paths to process : 116/09/14 07:59:57 INFO mapreduce.JobSubmitter: number of splits:116/09/14 07:59:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1473864360962_000116/09/14 07:59:59 INFO impl.YarnClientImpl: Submitted application application_1473864360962_000116/09/14 08:00:00 INFO mapreduce.Job: The url to track the job: http://centos66-bigdata-hadoop.com:8088/proxy/application_1473864360962_0001/16/09/14 08:00:00 INFO mapreduce.Job: Running job: job_1473864360962_000116/09/14 08:00:30 INFO mapreduce.Job: Job job_1473864360962_0001 running in uber mode : false16/09/14 08:00:30 INFO mapreduce.Job:  map 0% reduce 0%16/09/14 08:01:19 INFO mapreduce.Job:  map 100% reduce 0%16/09/14 08:01:47 INFO mapreduce.Job:  map 100% reduce 100%16/09/14 08:01:49 INFO mapreduce.Job: Job job_1473864360962_0001 completed successfully16/09/14 08:01:54 INFO mapreduce.Job: Counters: 49	File System Counters		FILE: Number of bytes read=96		FILE: Number of bytes written=194473		FILE: Number of read operations=0		FILE: Number of large read operations=0		FILE: Number of write operations=0		HDFS: Number of bytes read=226		HDFS: Number of bytes written=66		HDFS: Number of read operations=6		HDFS: Number of large read operations=0		HDFS: Number of write operations=2	Job Counters 		Launched map tasks=1		Launched reduce tasks=1		Data-local map tasks=1		Total time spent by all maps in occupied slots (ms)=48483		Total time spent by all reduces in occupied slots (ms)=21661		Total time spent by all map tasks (ms)=48483		Total time spent by all reduce tasks (ms)=21661		Total vcore-seconds taken by all map tasks=48483		Total vcore-seconds taken by all reduce tasks=21661		Total megabyte-seconds taken by all map tasks=49646592		Total megabyte-seconds taken by all reduce tasks=22180864	Map-Reduce Framework		Map input records=5		Map output records=10		Map output bytes=125		Map output materialized bytes=96		Input split bytes=141		Combine input records=10		Combine output records=6		Reduce input groups=6		Reduce shuffle bytes=96		Reduce input records=6		Reduce output records=6		Spilled Records=12		Shuffled Maps =1		Failed Shuffles=0		Merged Map outputs=1		GC time elapsed (ms)=293		CPU time spent (ms)=2970		Physical memory (bytes) snapshot=313458688		Virtual memory (bytes) snapshot=1680084992		Total committed heap usage (bytes)=136450048	Shuffle Errors		BAD_ID=0		CONNECTION=0		IO_ERROR=0		WRONG_LENGTH=0		WRONG_MAP=0		WRONG_REDUCE=0	File Input Format Counters 		Bytes Read=85	File Output Format Counters 		Bytes Written=66[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/hdfs dfs -ls /user/liuwl/tmp/output16/09/14 08:02:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 2 items-rw-r--r--   1 liuwl supergroup          0 2016-09-14 08:01 /user/liuwl/tmp/output/_SUCCESS-rw-r--r--   1 liuwl supergroup         66 2016-09-14 08:01 /user/liuwl/tmp/output/part-r-00000[liuwl@centos66-bigdata-hadoop hadoop-2.5.0]$ bin/hdfs dfs -text /user/liuwl/tmp/output/part*16/09/14 08:02:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicabledatanode    2hadoop  2historyserver   1mapreduce   2namenode    1yarn    2

  5. 简述hadoop四大组件原理

1> Hadoop Common:hadoop的公共类,方法,功能2> Hadoop Distributed File System(hafs)    hadoop 分布式 文件系统        架构:主从架构(分工明确,namenode存储从节点信息,datanode存储具体数据)        可靠性:            系统块副本机制(自定义副本个数,坏块就近自动填补,定期校验副本块)            文件系统使用SecondaryNameNode定期合并edit与影像文件        可扩展性:            在集群全有机器基础上可任意添加多台机器        运行原理:            客户端写入文件,告知namenode,namenode存储着datanode以及以前文件的所有信息,分配系统块给予客户端写入            客户端读文件,namenode根据文件信息快速找到文件,采用就近原则,返回给用户3> Hadoop Yarn:hadoop统一资源管理与任务调度框架    架构:主从架构(ResourceManager与NodeManager)    个人认为,yarn类似javaee中spring框架,作为了一个容器使用    yarn工作流程:客户端提交一个job,ResourceManager中ApplicationManager为job通过NodeManager建立ApplicationMaster用于管理job和反馈信息,ApplicationMaster告知ApplicationManager,所需要的所有正常运行job的资源,包括cpu,内存等,ApplicationManager返回给ApplicationMaster一个container(容器),让job在该容器中运行,其他job无法争夺其中的的资源,起到很好的隔离作用,job运行完毕会将运行信息发回给ApplicationMaster,ApplicationMaster通知ApplicationManager任务运行的情况,并记录job运行历史文件,收回资源等4> Hadoop MapReduce:MapReduce是一个任务运行工具,每一个map便会开启一个java虚拟机,在MapReduceOnYarn时每个任务通过RPC协议向ApplicationManager报告自己的状态

转载于:https://www.cnblogs.com/eRrsr/p/5923544.html

你可能感兴趣的文章
配置EditPlus使其可以编译运行java程序
查看>>
java中的占位符\t\n\r\f
查看>>
7.14
查看>>
SDN2017 第一次作业
查看>>
MySQL通过frm 和 ibd 恢复数据过程
查看>>
SRS源码——Listener
查看>>
Java面向对象抽象类案例分析
查看>>
对SPI、IIC、IIS、UART、CAN、SDIO、GPIO的解释
查看>>
Thymeleaf模板格式化LocalDatetime时间格式
查看>>
庖丁解“学生信息管理系统”
查看>>
Pyltp使用
查看>>
其他ip无法访问Yii的gii,配置ip就可以
查看>>
使用json格式输出
查看>>
php做的一个简易爬虫
查看>>
x的x次幂的值为10,求x的近似值
查看>>
hdu-5009-Paint Pearls-dp
查看>>
jquery获取html元素的绝对位置和相对位置的方法
查看>>
ios中webservice报文的拼接
查看>>
Power BI 报告的评论服务支持移动设备
查看>>
ACdream 1068
查看>>