Installation¶
Bootstrap a default version of Amundsen using Docker¶
The following instructions are for setting up a version of Amundsen using Docker.
- Make sure you have at least 3GB available to docker. Install
docker
anddocker-compose
. - Clone this repo and its submodules by running:
$ git clone --recursive git@github.com:lyft/amundsen.git
- Enter the cloned directory and run:
# For Neo4j Backend $ docker-compose -f docker-amundsen.yml up # For Atlas $ docker-compose -f docker-amundsen-atlas.yml up
-
Ingest dummy data into Neo4j by doing the following: (Please skip if you are using Atlas backend)
-
Change directory to the amundsendatabuilder submodule.
- Run the following commands in the
amundsendatabuilder
upstream directory:$ python3 -m venv venv $ source venv/bin/activate $ pip3 install -r requirements.txt $ python3 setup.py install $ python3 example/scripts/sample_data_loader.py
- View UI at
http://localhost:5000
and try to searchtest
, it should return some result.
Atlas Note: Atlas takes some time to boot properly. So you may not be able to see the results immediately
after docker-compose up
command.
Atlas would be ready once you’ll have the following output in the docker output Amundsen Entity Definitions Created...
Verify setup¶
- You can verify dummy data has been ingested into Neo4j by by visiting
http://localhost:7474/browser/
and runMATCH (n:Table) RETURN n LIMIT 25
in the query box. You should see two tables: hive.test_schema.test_table1
dynamo.test_schema.test_table2
- You can verify the data has been loaded into the metadataservice by visiting:
http://localhost:5000/table_detail/gold/hive/test_schema/test_table1
http://localhost:5000/table_detail/gold/dynamo/test_schema/test_table2
Troubleshooting¶
- If the docker container doesn’t have enough heap memory for Elastic Search,
es_amundsen
will fail duringdocker-compose
. - docker-compose error:
es_amundsen | [1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
-
Increase the heap memory detailed instructions here
- Edit
/etc/sysctl.conf
- Make entry
vm.max_map_count=262144
. Save and exit. - Reload settings
$ sysctl -p
- Restart
docker-compose
- Edit
-
If
docker-amundsen-local.yml
stops because oforg.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to create node environment
, thenes_amundsen
cannot write to.local/elasticsearch
. chown -R 1000:1000 .local/elasticsearch
- Restart
docker-compose
- If when running the sample data loader you recieve a connection error related to ElasticSearch or like this for Neo4j:
Traceback (most recent call last): File "/home/ubuntu/amundsen/amundsendatabuilder/venv/lib/python3.6/site-packages/neobolt/direct.py", line 831, in _connect s.connect(resolved_address) ConnectionRefusedError: [Errno 111] Connection refused
- If
elastic search
container stops with an errormax file descriptors [4096] for elasticsearch process is too low, increase to at least [65535]
, then add the below code to the filedocker-amundsen-local.yml
in theelasticsearch
definition.ulimits: nofile: soft: 65535 hard: 65535
Then check if all 5 Amundsen related containers are running withdocker ps
? Can you connect to the Neo4j UI at http://localhost:7474/browser/ and similarly the raw ES API at http://localhost:9200? Does Docker logs reveal any serious issues?