Commit ea6af4c6 authored by Fahad Ashraf's avatar Fahad Ashraf
Browse files

Initial commit

parents
# Elasticsearch Test Task
This is a test task for the collaborative intelligence project WS 2020/2021. Task consists of following:
1. Setting up elasticsearch on a linux
2. Create an index and add a mapping via curl
3. Add the "waterLevelReports.csv" statistics as document to the index
4. Execute a curl request to get an aggregation of one of the fields in the documents
## Installation
### Elasticseach
```bash
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-x86_64.rpm
sudo rpm -i elasticsearch-7.9.2-x86_64.rpm
sudo service elasticsearch start
```
## Usage
The main.py python script contains all the functions for the creating index mapping, indexing documents, aggregation and deleting index.
Run the following command on your terminal in this repo:
```bash
python main.py
```
## Curl requests and responses
Here I'm just going to put some curl requests and their responses
### Creating index mappping
```bash
curl -X PUT "localhost:9200/water-levels" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"Messst-Nr": { "type": "long" },
"Messstelle": { "type": "text" },
"Datum": { "type": "text" },
"Probeart": { "type": "keyword" },
"Parameter-Nr": { "type": "text" },
"Parameter": { "type": "text" },
"Einheit": { "type": "text" },
"Status": { "type": "keyword" },
"Wert": { "type": "long" }
}
}
}'
```
Response:
```json
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "water-levels"
}
```
### Indexing a document
```bash
curl -X PUT "localhost:9200/water-levels/_doc/1" -H 'Content-Type: application/json' -d'
{
"Messst-Nr": "2546188600",
"Messstelle": "3039 Kaiserslautern, Erfenbach",
"Datum": "13.11.1978",
"Probeart": "Messung - GW-Stände-/Quellschüttungen",
"Parameter-Nr": "09030/00",
"Parameter": "Wasserstand unter Messpunkt",
"Einheit": "m",
"Status": "",
"Wert": "8.91"
}'
```
Response:
```json
{
"_index": "water-levels",
"_type": "_doc",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
```
### Get an aggregation
```bash
curl -X GET "localhost:9200/water-levels/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"wert_avg": {
"avg": {
"field": "Wert"
}
}
}
}'
```
Response:
```json
{
"took": 883,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2136,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"wert_avg": {
"value": 7.79696394686907
}
}
}
```
\ No newline at end of file
import csv
import requests
import json
def create_index_mappings():
print("creating index amd mappings...")
r = requests.put("http://localhost:9200/water-levels", json = {"mappings": {
"properties": {
"Messst-Nr": { "type": "long" },
"Messstelle": { "type": "text" },
"Datum": { "type": "text" },
"Probeart": { "type": "keyword" },
"Parameter-Nr": { "type": "text" },
"Parameter": { "type": "text" },
"Einheit": { "type": "text" },
"Status": { "type": "keyword" },
"Wert": { "type": "long" }
}
}})
print(r.text)
def delete_index():
print("deleting index...")
r = requests.delete("http://localhost:9200/water-levels")
print(r.text)
def index_documents():
print("indexing documents...")
with open("waterLevelReports.csv", mode="r") as csv_file:
csv_reader = csv.DictReader(csv_file)
next(csv_reader, None) # skip the headers
line_count = 1
for row in csv_reader:
# formatting to valid decimal value by replacing ',' with '.'
row["Wert"] = row["Wert"].replace(",", ".")
r = requests.put("http://localhost:9200/water-levels/_doc/" + str(line_count), json=row)
print(r.text)
line_count += 1
print(f"Indexed {line_count} documents.")
def get_aggregation():
r = requests.get("http://localhost:9200/water-levels/_search", json = {"size": 0,
"aggs": {
"wert_avg": {
"avg": {
"field": "Wert"
}
}
}})
print(r.text)
print(f"Avg wert ist: {r.json()['aggregations']['wert_avg']['value']}")
# create_index_mappings()
index_documents()
# get_aggregation()
# delete_index()
\ No newline at end of file
This source diff could not be displayed because it is too large. You can view the blob instead.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment