It’s been a while without writing any post but this one is going to be useful for people that face the task of installing a CDH cluster from scratch.
There are a few prerequisites before starting with the installation that need to be configured or otherwise the installation can/will crash. To address some of these I wrote some small scripts in ansible to automatize the adjustment of these parameters or the data copy. The scripts are not finished, and I plan to improve them adding a few parts that I still haven’t automatized like the creation of the yum repositories for installing cloudera manager and a few other small tasks.
For this manual we assume that Cloudera manager has already been installed, and the database to hold Cloudera manager data repository and some of the other tools has been installed as well.
The script is written in ansible and it has three playbooks and an inventory file. The inventory file is in yaml format. I grouped the nodes between master(s) / workers (datanodes) the Cloudera manager server and the gateway. The inventory file is defined as follows and you can save it as inventory.yaml:
all:
children:
cm:
hosts:
cm_host:
master:
hosts:
master_host:
workers:
hosts:
worker1_host:
worker2_host:
worker3_host:
gateway:
hosts:
gateway_host:
Replace the xxx_host by the fully qualified domain name of your server.
Then we have the prerequisites playbook, this one has more substance. Apart from the prerequisites enunciated in Cloudera’s website I added some tweaks and actions like the change of the mysql jdbc driver, as the one in yum is outdated and will make the creation of the dbs to crash in the wizard. You can save this one as cloudera_prerequisites.yaml:
---
- hosts: all
connection: ssh
remote_user: youruser
become: yes
become_method: sudo
become_user: root
tasks:
- service: name=firewalld state=stopped enabled=False
- selinux: state=disabled
- sysctl: name=net.ipv6.conf.all.disable_ipv6 value=1 state=present
- sysctl: name=net.ipv6.conf.default.disable_ipv6 value=1 state=present
- sysctl: name=vm.swappiness value=1 state=present
- shell: sysctl -w vm.swappiness=1
- copy: src=/etc/hosts dest=/etc/hosts owner=root group=root mode=0644
- yum: name=java-1.8.0-openjdk-devel state=latest
- systemd: name=tuned state=started
- shell: tuned-adm off
- systemd: name=tuned state=stopped enabled=False
- name: Disable THP support scripts added to rc.local
lineinfile:
path: /etc/rc.local
line: |
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
- name: Change permissions of /etc/rc.local to make it run on boot
shell: chmod +x /etc/rc.d/rc.local
become_method: sudo
- service: name=ntpd state=started
- name: Allow 'simigsolutions' user to have passwordless sudo
lineinfile:
path: /etc/sudoers
state: present
regexp: '^youruser'
line: 'youruser ALL =(ALL) NOPASSWD: ALL'
# - name: Install mysql jdbc driver
# yum:
# name: mysql-connector-java
# state: latest
- name: download newer jdbc for mysql to avoid crash
get_url: url=https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz dest=/tmp/mysql-connector-java-5.1.46.tar.gz
- name: Check if /tmp/mysql-connector-java-5.1.46.tar.gz exists
stat:
path: /tmp/mysql-connector-java-5.1.46.tar.gz
register: stat_result
- block:
- name: Extract downloaded jdbc
unarchive:
src: /tmp/mysql-connector-java-5.1.46.tar.gz
dest: /tmp/
- name: Creates directory for the java driver if it does not exist
file:
path: /usr/share/java
state: directory
mode: 0755
recurse: yes
- name: Copies the file
copy:
src: /tmp/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar
dest: /usr/share/java/mysql-connector-java.jar
owner: root
group: root
mode: 0644
- name: Copies the file to sqoop as well
copy:
src: /tmp/mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar
dest: /var/lib/sqoop/mysql-connector-java.jar
owner: sqoop
group: sqoop
mode: 0644
when: stat_result.stat.exists == True
The script can be called with the following: ansible-playbook -i inventory.yaml cloudera_prerequisites.yaml –ask-pass –ask-become-pass
I’ve also created two aditional playbooks, one to create folders in the mount points to store hdfs and yarn data:
---
- hosts: workers
connection: ssh
remote_user: youruser
become: yes
become_method: sudo
become_user: root
tasks:
- name: Creates directory datanodes
file:
path: /home/data/dfs/dn
state: directory
owner: hdfs
group: hdfs
mode: 0700
recurse: yes
- name: Creates directory namenodes
file:
path: /home/data/yarn/nm
state: directory
owner: yarn
group: yarn
mode: 0700
recurse: yes
- name: Creates directory namenodes
file:
path: home/data/impala/impalad
state: directory
owner: impala
group: impala
mode: 0700
recurse: yes
- hosts: master
connection: ssh
remote_user: youruser
become: yes
become_method: sudo
become_user: root
tasks:
- name: Creates directory namenode
file:
path: /home/data/dfs/nn
state: directory
owner: hdfs
group: hdfs
mode: 0700
recurse: yes
- name: Creates directory secondary namenode
file:
path: /home/data/dfs/snn
state: directory
owner: hdfs
group: hdfs
mode: 0700
recurse: yes
And another one to copy some jdbc driver and distribute it in all machines of the cluster (sql server) but can be adapted to any downloadable jdbc driver:
- hosts: all
connection: ssh
remote_user: youruser
become: yes
become_method: sudo
become_user: root
tasks:
- name: download sqlserver jdbc driver
get_url: url=https://download.microsoft.com/download/4/D/C/4DCD85FA-0041-4D2E-8DD9-833C1873978C/sqljdbc_7.2.2.0_enu.tar.gz dest=/tmp/sqljdbc_7.2.2.0_enu.tar.gz
- name: Check if /tmp/sqljdbc_7.2.2.0_enu.tar.gz exists
stat:
path: /tmp/sqljdbc_7.2.2.0_enu.tar.gz
register: stat_result
- block:
- name: Extract downloaded jdbc
unarchive:
src: /tmp/sqljdbc_7.2.2.0_enu.tar.gz
dest: /tmp/
- name: Copies the file into the aqoop folder
copy:
src: /tmp/sqljdbc_7.2/enu/mssql-jdbc-7.2.2.jre8.jar
dest: /var/lib/sqoop/mssql-jdbc-7.2.2.jre8.jar
owner: sqoop
group: sqoop
mode: 0644
when: stat_result.stat.exists == True
Happy Installation 🙂