ElasticSearch Guide for Beginners


We all use google and other search engines on a daily basis. You might have noticed that for every search, you get multiple results with reduced relevance. Have you wondered how the search engine analyses relevancy? How does one article gain more popularity than the others?

These can be analyzed by search and analytics tools. They are used by companies to promote websites using Search Engine Marketing(SEM) and Search Engine Optimisation(SEO). One such tool that is easy to use for beginners is ElasticSearch.

ElasticSearch is a widely used search and analytics tool. It’s built on Java and uses a NoSQL database. It is simple to use and requires minimal coding knowledge. In this article, we will explore certain basic must-know aspects of ElasticSearch to get started.

Without further ado, let’s jump right in!

Download ElasticSearch for your system

  • You can directly download elastic search from here. This version is a standalone distribution. Download the latest version available for your OS.
  • Once you download, unzip the compressed file and go to /bin.
  • You will notice an “elasticsearch.bat” file. Run it directly or using the cmd interface.
  • Access the elastic search using the local host server by running the following in the cmd interface.
    • curl http://localhost:9200
  • Access the default URL after starting the elastic search.

Basic ElasticSearch Commands

ElasticSearch provides REST API-based CRUD operations. We can use HTTP methods in ElasticSearch to execute commands.

HTTP MethodUsage
GETTo read the data from ElasticSearch
POSTUpdate or create data
DELETEDelete data from ElasticSearch

ElasticSearch CRUD Operations

POST

  • To create a new Document, use an HTTP “POST” request method.
  • Our Node: Port Number: http://localhost:9200
  • Index name: pythonistaplanet
  • Type name: posts
  • As Request body type as JSON or add request header: “Content-Type”: “application/JSON”
  • Click on the “SEND” button to the Response.
  • You will notice the following Key-Value pairs in the Response data.
    • “_index”:”pythonistaplanet”
    • “_type”:”posts”
    • “result”:”created”
    • “created”:true

GET

  • To read a Document, use an HTTP “GET” request method.
  • Our Node: Port Number: http://localhost:9200
  • Index name: pythonistaplanet
  • Type name: posts
  • As Request body type as JSON or add request header: “Content-Type”:“application/JSON”
  • You will notice the following Key-Value pairs in the Response data.
    • “_index”:”pythonistaplanet”
    • “_type”:”posts”
    • “found”:true

UPDATE

  • To update a Document, use an HTTP “POST” request method.
  • A partial document can be updated and then merged into an existing document
  • Only a single document can be updated
  • Our Node: Port Number: http://localhost:9200
  • Index name: pythonistaplanet
  • Type name: posts
  • You will notice the following Key-Value pairs in the Response data.
    • “_index”:”pythonistaplanet”
    • “_type”:”posts”
    • “result”:”updated”

DELETE

  • To delete an index or document, use an HTTP “DELETE” request method.
  • The document which is stored in an index is deleted
  • Our Node: Port Number: http://localhost:9200
  • Index name: pythonistaplanet
  • Type name: posts
  • You will notice the following Key-Value pairs in the Response data
    • “_index” : “pythonistaplanet”
    • “_type” : “posts”
    • “Result” : “deleted”

ElasticSearch Data Types

Basic Data Types

Some of the core datatypes supported by the system are short, integer, long, float, double, string, byte, Boolean, and date.

User Defined Data Types

These are complex Data Types that are made as a collection of core data types: – JSON Objects, arrays, and nested datatypes.

  • Geo Data Types – stores geographical details like geo_point, geo_shape.
  • Specialized Data Types – holds unique information like token_count, IP address, and auto-complete suggestions.

Key Concepts of ElasticSearch

Node

When ElasticSearch starts running, it creates an instance. An instance can store data. For administration purposes, it is referred to by its name.

Cluster

A cluster is a group of one or more nodes connected that work together to hold data. It’s a group of systems that work together and have joint indexing and searching capabilities and distribution of tasks across all nodes. A cluster is also referred to by its name.

Document

It is a basic unit of information and is a collection of fields expressed in JSON format. It is similar to a row in a relational database which represents a particular entity. It can be any structured data that is encoded in JSON, and it need not necessarily be text. Every document is unique and has a data type entity.

Index

It’s a group of different types of documents that helps us perform basic CRUD operations as well as indexing. It is the highest-level entity you can query in ElasticSearch. Documents in an index are typically related logically. Index resembles a database in the relational database schema.

Shard

It allows us to subdivide the index so that the capacity can be exceeded beyond a single server. These subdivided pieces are called shards. These shards are distributed across multiple nodes. Due to this, redundancy is ensured, which increases query capacity as nodes are added, and it helps against hardware failures.

Replicas

An extra copy of shards is called a replica. ElasticSearch enables users to create replicas of their shards and indexes and perform queries. It helps to avoid any type of failure and ensures data is available, and also allows replicas to execute parallel search operations. Hence, performance is improved.

Mapping

Each index has a schema definition of the data associated with mapping. The mapping is held in the index. It can be done manually or automatically for the index. If not explicitly mentioned, mapping is added automatically.

Advantages of ElasticSearch

  • Developed and written in Java and hence can run on every platform.
  • Full Backups can be created easily since ElasticSearch offers a gateway.
  • Can easily be scaled up in large organizations as it has a distributed document-oriented architecture.
  • It is open-source.
  • Supports all document types that can support text rendering.
  • Has documentation in multiple languages.

Disadvantage of ElasticSearch

  • It is not as efficient in data storage as Hadoop, MongoDB, and so on.
  • Performance drops when working with terabytes of data.
  • MultiLanguage is not supported for handling request and response data.
  • It is a little more complicated than box search in terms of enterprise search usage and has a longer learning curve.

Application of ElasticSearch

  • Logs generated by disparate systems can be stored and analyzed by ElasticSearch.
  • Metrics and Analysis – A dashboard consisting of several system logs and emails can be analyzed which produces actionable insights and help businesses better understand their data.
  • Textual Search – Since ElasticSearch is document-oriented, a particular phrase also called pure text can be sought out from stored data.
  • Geo Search – Search queries can be made to extract geo-specific details from the data example.

Conclusion

ElasticSearch is basically a search engine that is fast and scalable because of its underlying architecture. It has a set of tools that can be used for many cases such as processing, analysis, search, and storage. It is scalable, multilingual, document-oriented, schema-less, and open-source.

Hope this article inspires you to venture into the vast and interesting topic of search engine analytics. Thanks for reading.

Ashwin Joy

I'm the face behind Pythonista Planet. I learned my first programming language back in 2015. Ever since then, I've been learning programming and immersing myself in technology. On this site, I share everything that I've learned about computer programming.

Leave a Reply

Your email address will not be published. Required fields are marked *

Recent Posts