ElasticSearch Guide for Beginners

We all use google and other search engines on a daily basis. You might have noticed that for every search, you get multiple results with reduced relevance. Have you wondered how the search engine analyses relevancy? How does one article gain more popularity than the others?

These can be analyzed by search and analytics tools. They are used by companies to promote websites using Search Engine Marketing(SEM) and Search Engine Optimisation(SEO). One such tool that is easy to use for beginners is ElasticSearch.

ElasticSearch is a widely used search and analytics tool. It’s built on Java and uses a NoSQL database. It is simple to use and requires minimal coding knowledge. In this article, we will explore certain basic must-know aspects of ElasticSearch to get started.

Without further ado, let’s jump right in!

Download ElasticSearch for your system

You can directly download elastic search from here. This version is a standalone distribution. Download the latest version available for your OS.
Once you download, unzip the compressed file and go to /bin.
You will notice an “elasticsearch.bat” file. Run it directly or using the cmd interface.
Access the elastic search using the local host server by running the following in the cmd interface.
- curl http://localhost:9200
Access the default URL after starting the elastic search.

Basic ElasticSearch Commands

ElasticSearch provides REST API-based CRUD operations. We can use HTTP methods in ElasticSearch to execute commands.

HTTP Method	Usage
GET	To read the data from ElasticSearch
POST	Update or create data
DELETE	Delete data from ElasticSearch

ElasticSearch CRUD Operations

POST

To create a new Document, use an HTTP “POST” request method.
Our Node: Port Number: http://localhost:9200
Index name: pythonistaplanet
Type name: posts
As Request body type as JSON or add request header: “Content-Type”: “application/JSON”
Click on the “SEND” button to the Response.
You will notice the following Key-Value pairs in the Response data.
- “_index”:”pythonistaplanet”
- “_type”:”posts”
- “result”:”created”
- “created”:true

GET

To read a Document, use an HTTP “GET” request method.
Our Node: Port Number: http://localhost:9200
Index name: pythonistaplanet
Type name: posts
As Request body type as JSON or add request header: “Content-Type”:“application/JSON”
You will notice the following Key-Value pairs in the Response data.
- “_index”:”pythonistaplanet”
- “_type”:”posts”
- “found”:true

UPDATE

To update a Document, use an HTTP “POST” request method.
A partial document can be updated and then merged into an existing document
Only a single document can be updated
Our Node: Port Number: http://localhost:9200
Index name: pythonistaplanet
Type name: posts
You will notice the following Key-Value pairs in the Response data.
- “_index”:”pythonistaplanet”
- “_type”:”posts”
- “result”:”updated”

DELETE

To delete an index or document, use an HTTP “DELETE” request method.
The document which is stored in an index is deleted
Our Node: Port Number: http://localhost:9200
Index name: pythonistaplanet
Type name: posts
You will notice the following Key-Value pairs in the Response data
- “_index” : “pythonistaplanet”
- “_type” : “posts”
- “Result” : “deleted”

ElasticSearch Data Types

Basic Data Types

Some of the core datatypes supported by the system are short, integer, long, float, double, string, byte, Boolean, and date.

User Defined Data Types

These are complex Data Types that are made as a collection of core data types: – JSON Objects, arrays, and nested datatypes.

Geo Data Types – stores geographical details like geo_point, geo_shape.
Specialized Data Types – holds unique information like token_count, IP address, and auto-complete suggestions.

Key Concepts of ElasticSearch

Node

When ElasticSearch starts running, it creates an instance. An instance can store data. For administration purposes, it is referred to by its name.

Cluster

A cluster is a group of one or more nodes connected that work together to hold data. It’s a group of systems that work together and have joint indexing and searching capabilities and distribution of tasks across all nodes. A cluster is also referred to by its name.

Document

It is a basic unit of information and is a collection of fields expressed in JSON format. It is similar to a row in a relational database which represents a particular entity. It can be any structured data that is encoded in JSON, and it need not necessarily be text. Every document is unique and has a data type entity.

Index

It’s a group of different types of documents that helps us perform basic CRUD operations as well as indexing. It is the highest-level entity you can query in ElasticSearch. Documents in an index are typically related logically. Index resembles a database in the relational database schema.

Shard

It allows us to subdivide the index so that the capacity can be exceeded beyond a single server. These subdivided pieces are called shards. These shards are distributed across multiple nodes. Due to this, redundancy is ensured, which increases query capacity as nodes are added, and it helps against hardware failures.

Replicas

An extra copy of shards is called a replica. ElasticSearch enables users to create replicas of their shards and indexes and perform queries. It helps to avoid any type of failure and ensures data is available, and also allows replicas to execute parallel search operations. Hence, performance is improved.

Mapping

Each index has a schema definition of the data associated with mapping. The mapping is held in the index. It can be done manually or automatically for the index. If not explicitly mentioned, mapping is added automatically.

Advantages of ElasticSearch

Developed and written in Java and hence can run on every platform.
Full Backups can be created easily since ElasticSearch offers a gateway.
Can easily be scaled up in large organizations as it has a distributed document-oriented architecture.
It is open-source.
Supports all document types that can support text rendering.
Has documentation in multiple languages.

Disadvantage of ElasticSearch

It is not as efficient in data storage as Hadoop, MongoDB, and so on.
Performance drops when working with terabytes of data.
MultiLanguage is not supported for handling request and response data.
It is a little more complicated than box search in terms of enterprise search usage and has a longer learning curve.

Application of ElasticSearch

Logs generated by disparate systems can be stored and analyzed by ElasticSearch.
Metrics and Analysis – A dashboard consisting of several system logs and emails can be analyzed which produces actionable insights and help businesses better understand their data.
Textual Search – Since ElasticSearch is document-oriented, a particular phrase also called pure text can be sought out from stored data.
Geo Search – Search queries can be made to extract geo-specific details from the data example.

Conclusion

ElasticSearch is basically a search engine that is fast and scalable because of its underlying architecture. It has a set of tools that can be used for many cases such as processing, analysis, search, and storage. It is scalable, multilingual, document-oriented, schema-less, and open-source.

Hope this article inspires you to venture into the vast and interesting topic of search engine analytics. Thanks for reading.