Heap Tables


HEAP

Table and Index are Separately Stored:

  • In a heap table, the actual data (rows) is stored in an unordered format, which means there is no inherent organization to how data is stored on the disk. The table itself only contains the raw data, with no sorting or indexing applied to it.

  • The index is a separate structure that is used to facilitate faster data retrieval. It maintains a specific order based on the columns defined in the index, allowing the database to quickly locate the corresponding data in the heap.

Table Only Stores Data:

  • The heap table is solely responsible for storing the actual data records. This includes all the columns defined in the table schema but does not include any additional organization or sorting.
  • When new data is inserted into a heap, it simply goes to the end of the table without any particular order, leading to a potentially fragmented structure over time, especially with frequent insertions and deletions.

Index Only for Querying Data:

  • The primary purpose of the index is to optimize query performance. It serves as a roadmap to quickly locate rows based on the indexed columns, significantly speeding up data retrieval operations.
  • When a query is executed, the database first checks the index to find the relevant entries. Once it identifies the correct index entries, it uses the pointers (Row Identifiers, or RIDs) to access the actual data in the heap table.

Stored as Different Data Objects:

  • The table and index exist as distinct data objects within the database management system (DBMS). They are stored separately in the database file system, with the heap table storing the data rows and the index storing its own data structure (which may include key values and pointers).
  • This separation allows for flexibility in how data is accessed and managed. The index can be modified or rebuilt independently of the table data, which is beneficial for maintenance and optimization.

Advantages of Heap Tables

1. Data Not Affected by Index:

  • The organization of data in a heap table is independent of any indexes created. This means that data can be written and stored in the available space without concern for sorting or organizing it in a specific order.
  • As a result, insertions can be faster since data can simply be appended to the end of the heap.

2. Reduce Cost:

Because data is stored without regard to an index, the cost of writing data (in terms of time and resources) is reduced. The simplicity of the heap structure allows for efficient data handling.

3. Data Stored to Available Space, Without Sequence:

Heap tables take advantage of available space without needing to maintain a specific sequence, which simplifies the storage mechanism and speeds up data insertion.

4. To Build Index Later:

Since heap tables allow for flexible data storage, indexes can be built later when necessary. This means you can prioritize fast data insertion and later optimize query performance by creating indexes when needed.

5. Thus, Write is Fast!:

The ability to write data quickly without needing to maintain an order or update an index immediately makes heap tables suitable for scenarios where rapid data ingestion is crucial.

For many large datasets, especially where performance is focused on data input rather than immediate retrieval, heap tables provide an efficient method for managing large volumes of data due to their straightforward storage method.


Querying Data in Heap Tables

Full Table Scan

Since HEAP tables are unsorted and lack a clustered index, retrieving specific rows or ranges of data may require a full table scan. A full table scan means the database engine must examine every row in the table to find records that match the query criteria, as there’s no structured path to follow.

Drawbacks of Full Table Scans in HEAP Tables

Full table scans in HEAPs are usually inefficient because they involve scanning every row in the table. As the table grows, this process becomes increasingly resource-intensive, impacting CPU, memory, and disk I/O. Frequent full table scans can reduce database performance, especially if the table is large and the queries are complex or executed repeatedly.

Example of a HEAP Table

Suppose we have a Customers table, defined as a HEAP, with the following structure and some sample data:

SQL> CREATE TABLE Customers ( CustomerID INT, Name VARCHAR(100), Age INT, City VARCHAR(50) ); SQL> INSERT INTO Customers (CustomerID, Name, Age, City) VALUES (1, 'Alice', 24, 'New York'), (2, 'Bob', 35, 'Chicago'), (3, 'Carol', 29, 'San Francisco'), (4, 'David', 42, 'New York'), (5, 'Eve', 27, 'Boston'), (6, 'Frank', 50, 'Chicago'); SQL> SELECT * FROM Customers WHERE Age > 30; CUSTOMERID NAME AGE CITY ---------- -------------- ----- -------- 2 Bob 35 Chicago 4 David 42 New York 6 Frank 50 Chicago

Since Customers is a HEAP table with no clustered index, this query will trigger a full table scan. Here’s how the scan would work on this data:

  • The database engine starts at the beginning of the table and reads each row sequentially.
  • It checks each row's Age value to see if it meets the condition (Age > 30).
  • Rows 2, 4, and 6 meet the condition, so they are included in the result set:
    • (2, 'Bob', 35, 'Chicago')
    • (4, 'David', 42, 'New York')
    • (6, 'Frank', 50, 'Chicago')

In a HEAP table, these records are stored in the order of insertion on the data page. Let's assume that they are stored like this initially:

PageRow SequenceCustomerIDNameAgeCity
P1Row 11Alice24New York
P1Row 22Bob35Chicago
P1Row 33Carol29San Francisco
P1Row 44David42New York
P1Row 55Eve27Boston
P1Row 66Frank50Chicago

Deletion Example

Suppose we delete Eve's row:

SQL> DELETE FROM Customers WHERE CustomerID = 5;

PageRow SequenceCustomerIDNameAgeCity
P1Row 11Alice24New York
P1Row 22Bob35Chicago
P1Row 33Carol29San Francisco
P1Row 44David42New York
P1Gap(5)(Eve)(27)(Boston)
P1Row 66Frank50Chicago
This gap remains available for future insertions, allowing the database to reuse the space.

New Insertion Example

If we insert a new record, it might fill the gap left by Eve's deletion, depending on the DBMS's allocation strategy:

INSERT INTO Customers (CustomerID, Name, Age, City) VALUES (7, 'Grace', 34, 'Miami');

The new data might go into Eve's former slot:

PageRow SequenceCustomerIDNameAgeCity
P1Row 11Alice24New York
P1Row 22Bob35Chicago
P1Row 33Carol29San Francisco
P1Row 44David42New York
P1Row 5 (New)7Grace34Miami
P1Row 66Frank50Chicago

Update Example

Suppose Carol's information is updated with a longer city name, changing her city from 'San Francisco' to 'Los Angeles'. If this update causes the row size to increase and it no longer fits in the original space, it might be relocated to a different page. Let’s assume Carol is moved to a new page, with a forwarding pointer left in her original slot:

PageRow SequenceCustomerIDNameAgeCity
P1Row 11Alice24New York
P1Row 22Bob35Chicago
P1Forwarding3(Carol)(29)(Los Angeles)
P1Row 44David42New York
P1Row 57Grace34Miami
P1Row 66Frank50Chicago

PageRow SequenceCustomerIDNameAgeCity
P2Row 1 (New)3Carol29Los Angeles

Drawbacks of Full Table Scans in HEAP Tables

A full table scan in a HEAP table like Customers requires reading every row, even if only a few rows meet the query condition. This process is time-consuming and resource-intensive, especially as the table grows. For tables with thousands or millions of rows, this approach can drastically reduce performance.

Mitigation: Adding a non-clustered index on the Age column could help the database engine locate relevant rows without scanning the entire table, improving query performance.


Structure of Heap Tables

All systems normal

© 2025 2023 Sanjeeb KC. All rights reserved.