How Inodes Work
How Inodes Work
What's an inode and how do inodes work?

Let's begin by showing a simple user-level representation of the Unix filesystem. We can see that we have 2 directories named directory_1 and directory_2. The first directory directory_1 has 3 files in it and the second directory directory_2 has 2 files in it. The file file_5 is a hard link of file_3.

        

    # User-level representation
    directory_1:
    +------------------ file_1
    +------------------ file_2
    +------------------ file_3

    directory_2:
    +------------------ file_4
    +------------------ file_5
        
      

Here's how the inode database looks like. An inode is simply a data structure that stores the details of a specific file such as the type (file or directory), the permissions, and the location where the data can be found. If the inode is a directory, it contains links to the inodes of the files and directories inside the said directory.

        

    # Inode structure
    [ inode number | links | type ]
    [    102309498 |     5 |  dir ]          | name           | inode_number |
                                             | .              |    102309498 |
                                             | ..             |       436084 |
                                             | directory_1    |    102309506 |
                                             | directory_2    |    102309509 |

    [    102309506 |     5 |  dir ]          | name           | inode_number |
                                             | .              |    102309506 |
                                             | ..             |    102309498 |
                                             | file_1         |    102309512 |
                                             | file_2         |    102309513 |
                                             | file_3         |    102309517 |

    [    102309512 |     1 | file ]
    [    102309513 |     1 | file ]
    [    102309517 |     2 | file ]
    [    102309509 |     4 |  dir ]          | name           | inode_number |
                                             | .              |    102309509 |
                                             | ..             |    102309498 |
                                             | file_4         |    102309520 |
                                             | file_5         |    102309517 |

    [    102309520 |     1 | file ]
        
      

The left side is the inode table and the right side is the data pool where each directory inode is pointing to. The inode table contains the inode number, the link count, and the type of inode. The data pool contains the name and inode number of the items inside the directory. Looks complicated?

To help us better understand how inodes work, let's create a simple Ruby script that emulates how the inode structure works. The script has 3 parts: the Inode class, the InodeReader module, and the InodeViewer module.

The Inode class
We start by creating the Inode class. Each inode instance has an inode number, a link count, a type, and a name. The inode number is the unique key to the table row. The link count is the number of total directory entries that point to the specific inode. The type can either be :file or :dir. The name is the filename or directory name.

        

    class Inode
      TABLE = {}

      attr_accessor :number, :link_count, :type, :name
      attr_accessor :directory_hash

      def initialize(number: number, 
                     link_count: link_count, 
                     type: type, name: name)

        @number = number
        @link_count = link_count
        @type = type
        @name = name

        @directory_hash = {}

        TABLE[number] = self
      end

      def [](key)
        directory_hash[key]
      end

      def []=(key, value)
        directory_hash[key] = value
      end

      def get(path)
        path_array = path.split("/")
        return_inode = self

        path_array.each do |component|
          return_inode = self.class::TABLE[return_inode[component]]
        end

        return_inode
      end
    end
        
      

Whenever a new inode is initialized, the inode automatically gets added to Inode::TABLE. We've also added convenience methods to the inode instance so that it can behave like a dictionary.

        

    root_dir = Inode.new(number: '99', 
                         link_count: 4, 
                         type: :dir, 
                         name: 'root_dir')

    root_dir['.'] = '99'
    root_dir['directory_1'] = '100'
    root_dir['directory_2'] = '500'

        
      

The same set of lines are applied to the directories as well. The inode's directory_hash container is filled with the filenames as keys and inode numbers as values.

        

    directory_1 = Inode.new(number: '100', 
                            link_count: 2, 
                            type: :dir, 
                            name: 'directory_1')

    directory_1['.'] = '100'
    directory_1['..'] = '99'
    directory_1['file_1'] = '101'
    directory_1['file_2'] = '102'
    directory_1['file_3'] = '103'

      
      

To add a file inode to the table, the type is just set to :file.

        

    file_1 = Inode.new(number: '101', 
                       link_count: 1, 
                       type: :file, 
                       name: 'file_1')

        
      

Next we create 2 modules: InodeReader and InodeViewer. InodeReader is in charge of reading the directory data and filling up the Inode::TABLE. InodeViewer, on the other hand, is in charge of printing the contents of the Inode::TABLE.

The InodeReader module
The InodeReader module recursively goes through all directories inside the current directory and creates an inode entry for each file or directory it encounters.

        

    module InodeReader
      def self.perform(directory: nil, root: true)
        Dir.chdir(directory) if directory

        ls_output_string = `ls -1ila .`
        ls_info = process_ls(ls_output_string)

        root_inode = nil

        ls_info.each do |row|
          if root && row[:name] == '.'
            root_inode = Inode.new(number: row[:number], 
                                   link_count: row[:link_count], 
                                   type: row[:type], 
                                   name: row[:name])

            root_inode[row[:name]] = row[:number]
          else
            root_inode[row[:name]] = row[:number] if root_inode

            unless (row[:name] == '.' || row[:name] == '..')
              inode = Inode.new(number: row[:number], 
                                link_count: row[:link_count], 
                                type: row[:type], 
                                name: row[:name])

              if row[:type] == :dir
                Dir.chdir(row[:name])
                directory_data = perform(root: false)
                directory_data.each do |row|
                  inode[row[:name]] = row[:number]
                end

                Dir.chdir('..')
              end
            end
          end
        end
      end

      private

      def self.process_ls(ls_output_string)
        detailed_list = ls_output_string.split("\n")
        output_list = []

        detailed_list.shift
        detailed_list.each do |row|
          row_info = row.split(" ")

          number = row_info[0]
          type = (row_info[1][0] == "d" ? :dir : :file)
          link_count = row_info[2].to_i
          name = row_info[-1]

          output_list << { number: number,
                           type: type,
                           link_count: link_count,
                           name: name }
        end

        output_list
      end
    end
        
      

The lines of code above are simply executed by running InodeReader.perform The InodeReader makes use of the shell command ls -1ila to display the inode details of the files inside the current directory.

The InodeViewer module
        

    module InodeViewer
      class << self
        def perform
          puts inode_header
          Inode::TABLE.each do |key, value|
            if value.directory_hash.empty?
              puts inode_row(value)
            else
              puts inode_row(value) + data_pool_row_header
            end

            value.directory_hash.each do |name, inode_number|
              puts data_pool_row(name, inode_number)
            end

            puts "" unless value.directory_hash.empty?
          end
        end

        def inode_header
          "[ %s | %s | %s ]" % [inode_number("inode number"),
                                link_count("links"),
                                type("type")]
        end

        def data_pool_row_header
          "          | %s | %s |" % [name("name"),
                                     inode_number("inode_number")]
        end

        def data_pool_row(name, inode_number)
          "%40s | %s | %s |" % ["",
                                name(name),
                                inode_number(inode_number)]
        end

        def inode_row(inode)
          "[ %s | %s | %s ]" % [inode_number(inode.number),
                                link_count(inode.link_count),
                                type(inode.type)]
        end

        def name(text)
          "%-14s" % text.to_s
        end

        def inode_number(text)
          "%12s" % text.to_s
        end

        def link_count(text)
          "%5s" % text.to_s
        end

        def type(text)
          "%4s" % text.to_s
        end
      end
    end
        
      

The lines of code above are simply executed by running InodeViewer.perform. Here's an example output of the inode structure.

        

    [ inode number | links | type ]
    [    102309498 |     5 |  dir ]          | name           | inode_number |
                                             | .              |    102309498 |
                                             | ..             |       436084 |
                                             | directory_1    |    102309506 |
                                             | directory_2    |    102309509 |

    [    102309506 |     5 |  dir ]          | name           | inode_number |
                                             | .              |    102309506 |
                                             | ..             |    102309498 |
                                             | file_1         |    102309512 |
                                             | file_2         |    102309513 |
                                             | file_3         |    102309517 |

    [    102309512 |     1 | file ]
    [    102309513 |     1 | file ]
    [    102309517 |     2 | file ]
    [    102309509 |     4 |  dir ]          | name           | inode_number |
                                             | .              |    102309509 |
                                             | ..             |    102309498 |
                                             | file_4         |    102309520 |
                                             | file_5         |    102309517 |

    [    102309520 |     1 | file ]
        
      

The left side is the inode table and the right side is the data pool where each directory inode is pointing to. Notice that the current directory (.) and the parent directory (..) are also included in the data pool entries.

How the Kernel uses the Inode structure
Now that we have the inode table and the data pool, how do we look for a specific file in the filesystem? Let's visit again the Inode#get method.

        

    class Inode
      ...

      def get(path)
        path_array = path.split("/")
        return_inode = self

        path_array.each do |component|
          return_inode = self.class::TABLE[return_inode[component]]
        end

        return_inode
      end
    end

    # how to explore specific entries in the filesystem
    root_dir.get('directory_1/file_1')
        
      

To examine a specific file inside the filesystem, the path is first split into several components. For example, if we're examining directory_1/file_1, the path is split into directory_1 and file_1. We then check the data pool of the root inode for the inode number of directory_1 which is 102309506. Once we have the inode number, we can get the inode of directory_1 from the inode table using the inode number as key. We check the data pool associated with the inode and look for the inode number of file_1. Finally we get the inode of file_1 from the inode table using the number 102309512.

It is important to understand how inodes work since filesystem issues occur when there are inconsistencies in the link counts, inode table data, and the block allocation data. It also makes debugging easier when you're having disk space issues.