PostgreSQL: Importance of Indexes
Should you index your database tables?
Here's the answer: it depends. In certain situations, an index makes a query fast. In some cases, it makes a query slow. Creating an index simply means that a new data structure is created to help makes queries faster. For example, if you have 10,000 rows of data and you need to search for a row with ID 353, the database will simply search for the index 353 using an index scan instead of getting the entire table using a table scan and filtering out the remaining 9,999 rows.

The new data structure that is created when the create index operation is triggered is the B-tree or the Balanced Tree (by default). The B-tree is outside the scope of this post but it simply speeds up the searching process since it takes advantage of the concept of logarithmic scalability. The depth of the B-tree does not grow as fast as the number of indexed entries. This allows accessing indexes in just a few steps.

We'll be showing below how to improve simple queries with indexes on PostgreSQL.
A basic example: 10,000 rows
Let's start by creating a table called companies which has 2 columns: id and name. PostgreSQL, by default, indexes primary keys so the index scan is used whenever we specify the desired id in the where clause. Take note that the name is not indexed yet.



      CREATE TABLE companies(
        id integer PRIMARY KEY,
        name text NOT NULL
      );
      

Next, we'll be creating a simple Ruby script that creates 10,000 insert statements so we can have 10,000 rows. Take note that all the company names are different for this example.



      (1..10000).each do |i|
        puts "INSERT INTO companies VALUES (%s, 'company-%s');" % [i, i]
      end; nil
      

You can see the output of the Ruby script below. The insert statements are then executed to fill up the table with 10,000 rows.



      INSERT INTO companies VALUES (1, 'company-1');
      INSERT INTO companies VALUES (2, 'company-2');
      INSERT INTO companies VALUES (3, 'company-3');

      ...

      INSERT INTO companies VALUES (9997, 'company-9997');
      INSERT INTO companies VALUES (9998, 'company-9998');
      INSERT INTO companies VALUES (9999, 'company-9999');
      INSERT INTO companies VALUES (10000, 'company-10000');
      

Now let use EXPLAIN on the select query looking for a specific company name. We'll be wrapping the EXPLAIN query with BEGIN and ROLLBACK since we're using the ANALYZE keyword which executes the query. For this example, it isn't really necessary since the rollback is needed when analyzing the UPDATE and DELETE operations.



      BEGIN; 
      EXPLAIN ANALYZE SELECT * FROM companies WHERE name = 'company-2000'; 
      ROLLBACK;
      

You can see below that the execution time is 1.946 ms and a full table scan was used to look for a single entry. Given that the name column is not indexed, this is the expected result.



      BEGIN
      
      QUERY PLAN                                              
      ----------------------------------------------------------
      Seq Scan on companies  
        (cost=0.00..188.31 rows=1 width=16) 
        (actual time=0.333..1.925 rows=1 loops=1)
      Filter: (name = 'company-2000'::text)
        Rows Removed by Filter: 9999
        Planning time: 0.104 ms
        Execution time: 1.946 ms
      (5 rows)

      ROLLBACK
      

Now, let us index the name column with CREATE INDEX.



      CREATE INDEX company_name_idx ON companies (name);
      

You can see below that the execution time is 0.065 ms and an index scan was used to look for the entry. That's significantly faster than 1.946 ms. That's about 30 times faster than the full table scan (for execution time)!


      BEGIN

      QUERY PLAN                                                          
      ----------------------------------------------------------
      Index Scan using company_name_idx on companies  
        (cost=0.29..8.30 rows=1 width=16) 
        (actual time=0.041..0.042 rows=1 loops=1)
      Index Cond: (name = 'company-2000'::text)
        Planning time: 0.190 ms
        Execution time: 0.065 ms
      (4 rows)

      ROLLBACK
      


A bigger example: 100,000 rows
Next, let us try a bigger example. We'll be dropping the index first and add 9,000 rows to our existing table.



      DROP INDEX company_name_idx;
      

We'll be executing the following Ruby script to prepare our insert statements. The script should produce the same pattern for the company names as with the initial script.



      (10001..100000).each do |i|
        puts "INSERT INTO companies VALUES (%s, 'company-%s');" % [i, i]
      end; nil
      

Next, the following lines are executed to fill up the table with 9,000 more rows. There should still be no duplicate company names after the insert statements are executed.



      INSERT INTO companies VALUES (10001, 'company-10001');
      INSERT INTO companies VALUES (10002, 'company-10002');
      INSERT INTO companies VALUES (10003, 'company-10003');

      ...

      INSERT INTO companies VALUES (99998, 'company-99998');
      INSERT INTO companies VALUES (99999, 'company-99999');
      INSERT INTO companies VALUES (100000, 'company-100000');
      

Let us now run the same EXPLAIN query again.



      BEGIN; 
      EXPLAIN ANALYZE SELECT * FROM companies WHERE name = 'company-2000'; 
      ROLLBACK;
      

Since we removed the index on the company names, a sequence scan has been performed instead of an index scan. Take note that it took 15.050 ms to perform the full table scan.



      BEGIN

      QUERY PLAN                                               
      ----------------------------------------------------------
      Seq Scan on companies  
        (cost=0.00..1885.28 rows=1 width=17) 
        (actual time=0.590..15.029 rows=1 loops=1)
      Filter: (name = 'company-2000'::text)
        Rows Removed by Filter: 99999
        Planning time: 0.134 ms
        Execution time: 15.050 ms
      (5 rows)

      ROLLBACK
      

Let's now create the index again on the company names with the CREATE INDEX command. After this, we'll run the EXPLAIN query again.



      CREATE INDEX company_name_idx ON companies (name);

      BEGIN; 
      EXPLAIN ANALYZE SELECT * FROM companies WHERE name = 'company-2000'; 
      ROLLBACK;
      

This time, an index scan was performed and it took 0.111 ms for the query to execute. That's significantly faster than 15.050 ms. That's 135 times faster than the full table scan (for the execution time)! Of course, when computing for the total time, the planning time should be taken into account as well.



      BEGIN

      QUERY PLAN                                               
      ----------------------------------------------------------
      Index Scan using company_name_idx on companies  
        (cost=0.42..8.44 rows=1 width=17) 
        (actual time=0.088..0.089 rows=1 loops=1)
      Index Cond: (name = 'company-2000'::text)
        Planning time: 0.194 ms
        Execution time: 0.111 ms
      (4 rows)

      ROLLBACK
      

As you can see, without indexes, the full table scan takes longer every time a row is added since each row is being checked for a match. This is just a simple example but this displays how important indexes are and how much it improves the performance of the database queries. For more information about this topic, click here.