Hadoop image is not required. But it can speed things up, because it contains pre-downloaded and pre-installed Hadoop packages.
Primary goal of this project is to build Hadoop cluster. But the most part is generic - Hadoop deployment can be replaced by implementing different deployment type (providing user-data and optionally implementing plugin for orchestration).
# Requirements
Locally installed:
*[Terraform](https://www.terraform.io/)
*[Ansible](https://www.ansible.com/)
# Image
# Hadoop image
To setup Hadoop on single machine, launch:
/usr/local/sbin/hadoop-setup.sh
Launch */usr/local/sbin/hadoop-setup.sh* to setup Hadoop on single machine.
Hadoop image can be used also to build Hadoop cluster. It contains pre-downloaded and pre-installed Hadoop packages and dependencies, so this will speed things up.
# Cluster
...
...
@@ -22,3 +28,11 @@ Build cluster:
2. launch setup script
./launch.sh
The *launch.sh* script is doing this:
./terraform apply
./terraform output -json > config.json
./orchestrate.py
The orchestration script has multiple steps and dry-run option. See *./orchestrate.py --help*.