Data Structure Using C & C++

UNIT-I: Introduction to Data Structure

Introduction to Data Structure and its Characteristics. Array Representation of single and multidimensional arrays; Sparse arrays – lower and upper triangular matrices and Tri-diagonal matrices with Vector Representation also.

UNIT-II: Stacks and Queues

Introduction and primitive operations on stack; Stack application; Infix, postfix, prefix expressions; Evaluation of postfix expression; Conversion between prefix, infix and postfix, introduction and primitive operation on queues, D-queues and priority queues.

UNIT-III: Lists

Introduction to linked lists; Sequential and linked lists, operations such as traversal, insertion, deletion, searching, Two-way lists, and Use of headers.

UNIT-IV: Trees

Introduction and terminology; Traversal of binary trees; Recursive algorithms for tree operations such as traversal, insertion, deletion; Binary Search Tree.

UNIT-V: B-Trees

Introduction, The invention of B-Tree; Statement of the problem; Indexing with binary search trees; a better approach to tree indexes; B-Trees; working up from the bottom; Example for creating a B-Tree.

UNIT-VI: Sorting and Searching Techniques

Sorting Techniques: Insertion sort, selection sort, merge sort, heap sort. Searching Techniques: linear search, binary search, and hashing.

UNIT-I: Introduction to Data Structure and Arrays

1. Introduction to Data Structure

Data Structure refers to a particular way of organizing and storing data in a computer so that it can be accessed and modified efficiently.

Common Data Structures: Arrays, Linked Lists, Stacks, Queues, Trees, Graphs, etc.

Characteristics of Data Structures:

Efficient Storage: Saves memory and time.
Ease of Processing: Helps in fast data access and manipulation.
Reusability: Can be reused across multiple programs.
Implementation: Can be implemented using various programming languages like C/C++.

2. Array Representation

Array is a collection of elements stored in contiguous memory locations and can be accessed using an index.

Single Dimensional Array:

A linear array where elements are stored one after another.

int arr[5] = {10, 20, 30, 40, 50};

    Memory Representation:
    Index:   0   1   2   3   4
    Values: 10  20  30  40  50

Multi-dimensional Array (2D Array):

Used to store matrices or tabular data. Data is stored in a row-major or column-major format.

int matrix[2][3] = { {1, 2, 3}, {4, 5, 6} };

    Logical View:
    1  2  3
    4  5  6
    
    Memory (Row-Major):
    1 2 3 4 5 6

3. Sparse Arrays

A Sparse Array contains mostly zero values. Storing all zeroes wastes space, so we use optimized formats.

Lower Triangular Matrix:

All elements above the main diagonal are zero.

    Example (3x3):
    1  0  0
    4  5  0
    7  8  9

Upper Triangular Matrix:

All elements below the main diagonal are zero.

    Example (3x3):
    1  2  3
    0  5  6
    0  0  9

Tri-diagonal Matrix:

Only the main diagonal and two diagonals adjacent to it (upper and lower) have non-zero values.

    Example (5x5):
    x  y  0  0  0
    y  x  y  0  0
    0  y  x  y  0
    0  0  y  x  y
    0  0  0  y  x

4. Vector Representation of Sparse Matrices

Instead of storing all elements (including zeros), we store only non-zero elements with their row and column indices.

    Original Sparse Matrix:
    0  0  5
    0  8  0
    3  0  0

    Vector Representation:
    Row  Col  Value
     0    2     5
     1    1     8
     2    0     3

This format saves memory and is useful for mathematical operations like matrix addition and multiplication.

UNIT-II: Stacks and Queues

1. Introduction to Stacks

A Stack is a linear data structure that follows the LIFO (Last In First Out) principle.

Primitive Stack Operations:

push(item): Insert an element onto the top of the stack.
pop(): Remove the top element from the stack.
peek()/top(): View the top element without removing it.
isEmpty(): Check if the stack is empty.
isFull(): Check if the stack is full (for static implementation).

    Stack Example (Push/Pop):
    Push: 10 → 20 → 30
    Stack: [30, 20, 10]
    Pop → 30
    New Stack: [20, 10]

2. Applications of Stack

Expression evaluation (Infix, Postfix, Prefix)
Syntax parsing
Undo operations in editors
Backtracking algorithms (e.g., maze, recursion)

3. Infix, Postfix, and Prefix Expressions

Infix: Operator is in-between operands (A + B)

Postfix (Reverse Polish): Operator comes after operands (A B +)

Prefix (Polish): Operator comes before operands (+ A B)

    Infix:   A + B
    Postfix: A B +
    Prefix:  + A B

4. Evaluation of Postfix Expression

Use a stack to evaluate postfix expressions. Scan left to right:

Push operands onto stack
Pop two operands when operator is found
Apply operator and push result back

    Postfix: 5 6 2 + * 12 4 / -
    Steps:
    → Push 5, 6, 2
    → + → (6 + 2 = 8) → push 8
    → * → (5 * 8 = 40) → push 40
    → Push 12, 4
    → / → (12 / 4 = 3) → push 3
    → - → (40 - 3 = 37)
    Result = 37

5. Conversion Between Expressions

Infix → Postfix: Use stack and precedence rules.

Infix → Prefix: Reverse, convert to postfix, reverse again.

Prefix → Infix/Postfix: Use stack-based logic.

    Infix: (A + B) * C
    Postfix: A B + C *
    Prefix: * + A B C

6. Introduction to Queues

A Queue is a linear data structure that follows the FIFO (First In First Out) principle.

Primitive Queue Operations:

enqueue(item): Add element to rear.
dequeue(): Remove element from front.
front(): Access front element.
rear(): Access rear element.

    Queue Example:
    Enqueue: 10 → 20 → 30
    Queue: [10, 20, 30]
    Dequeue → 10
    New Queue: [20, 30]

7. Types of Queues

1. Deque (Double Ended Queue)

Insertion and deletion allowed at both front and rear ends.

    Types:
    - Input Restricted Deque: Insert at rear only, delete at both ends.
    - Output Restricted Deque: Delete from front only, insert at both ends.

2. Priority Queue

Each element has a priority. High priority elements are dequeued before low ones.

    Example:
    Enqueue: [Task A, Priority 2], [Task B, Priority 1], [Task C, Priority 3]
    Dequeue order: Task C → Task A → Task B

UNIT-III: Lists

1. Introduction to Linked Lists

A Linked List is a linear data structure where each element (node) contains data and a pointer to the next node. Unlike arrays, linked lists do not store elements in contiguous memory locations.

    Structure of a Node:
    struct Node {
        int data;
        struct Node* next;
    };

2. Sequential vs Linked Lists

Sequential (Array)	Linked List
Fixed size	Dynamic size
Easy indexing	Sequential access
Insertion/Deletion is costly	Efficient insertion/deletion
Memory waste possible	Efficient memory usage

3. Basic Operations on Linked Lists

a. Traversal: Visit each node starting from the head.

    while (temp != NULL) {
        printf("%d ", temp->data);
        temp = temp->next;
    }

b. Insertion: Add a new node at beginning, end, or specific position.

    // Insert at beginning
    new_node->next = head;
    head = new_node;

c. Deletion: Remove a node from beginning, end, or specific position.

    // Delete first node
    temp = head;
    head = head->next;
    free(temp);

d. Searching: Traverse list and compare data.

    while (temp != NULL) {
        if (temp->data == key)
            return true;
        temp = temp->next;
    }

4. Two-Way (Doubly) Linked Lists

A doubly linked list has nodes with two pointers: one to the next node and one to the previous node.

    struct DNode {
        int data;
        struct DNode* prev;
        struct DNode* next;
    };

Advantages:

Traversal in both directions
Easier to delete a node when given a pointer to it

5. Use of Headers

A Header Node is a special node that always exists at the beginning of the list. It may store metadata like length or simply serve as a dummy node to simplify list operations.

    // Header node example
    struct Node {
        int count;
        struct Node* next;
    };

Benefits of Header Nodes:

Helps simplify insert/delete logic
Can maintain extra info like size

UNIT-IV: Trees

1. Introduction to Trees

A Tree is a non-linear hierarchical data structure consisting of nodes connected by edges. The top node is called the root.

Terminology:

Root: Topmost node
Leaf: Node with no children
Parent: A node that has branches to children
Child: A node descended from a parent
Subtree: A smaller tree within a larger one
Degree: Number of children a node has
Height: Length of the longest path from root to leaf

2. Binary Tree Traversals

Traversal means visiting all nodes in a specific order. In binary trees, three types of traversals are common:

a. Inorder (LNR): Left → Node → Right

b. Preorder (NLR): Node → Left → Right

c. Postorder (LRN): Left → Right → Node

    // Inorder traversal
    void inorder(Node* root) {
        if (root != NULL) {
            inorder(root->left);
            printf("%d ", root->data);
            inorder(root->right);
        }
    }

    // Preorder traversal
    void preorder(Node* root) {
        if (root != NULL) {
            printf("%d ", root->data);
            preorder(root->left);
            preorder(root->right);
        }
    }

    // Postorder traversal
    void postorder(Node* root) {
        if (root != NULL) {
            postorder(root->left);
            postorder(root->right);
            printf("%d ", root->data);
        }
    }

3. Recursive Tree Operations

Insertion in Binary Tree:

    Node* insert(Node* root, int value) {
        if (root == NULL) {
            root = (Node*)malloc(sizeof(Node));
            root->data = value;
            root->left = root->right = NULL;
        } else if (value < root->data) {
            root->left = insert(root->left, value);
        } else {
            root->right = insert(root->right, value);
        }
        return root;
    }

Deletion in Binary Tree:

    Node* delete(Node* root, int key) {
        if (root == NULL) return root;
        if (key < root->data) {
            root->left = delete(root->left, key);
        } else if (key > root->data) {
            root->right = delete(root->right, key);
        } else {
            if (root->left == NULL) return root->right;
            else if (root->right == NULL) return root->left;
    
            Node* temp = findMin(root->right);
            root->data = temp->data;
            root->right = delete(root->right, temp->data);
        }
        return root;
    }

4. Binary Search Tree (BST)

A Binary Search Tree is a binary tree where each node follows this property:

Left subtree has values less than the node
Right subtree has values greater than the node

BST Example Insertion:

            30
           /  \
         20    40
        /        \
      10         50

Advantages of BST:

Efficient Searching: O(log n)
Efficient Insertion & Deletion

UNIT-V: B-Trees

1. Introduction to B-Trees

A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time.

It is commonly used in databases and file systems where large blocks of data are read/written at once.

2. The Invention of B-Tree

B-Trees were invented by Rudolf Bayer and Edward McCreight in 1971 at Boeing Labs. The goal was to create a structure suitable for disk-based storage and indexing systems.

3. Statement of the Problem

Binary Search Trees (BSTs) are not always efficient when dealing with large datasets due to their height. If not balanced, the search time degrades to linear time.

There was a need for a tree structure that minimizes disk access and maintains balance, even with many insertions and deletions.

4. Indexing with Binary Search Trees

BSTs are good for in-memory operations, but they are not efficient for indexing in large disk-based systems because they become unbalanced and require many disk reads.

    Example:
    Insert 10, 20, 5, 6, 12
    
            10
           /  \
          5   20
           \
            6
             \
             12

This results in skewed trees and poor performance.

5. A Better Approach: B-Trees

B-Trees reduce height by allowing multiple keys per node and multiple children, resulting in fewer levels and efficient disk access.

Each node can have up to m children (where m is the order of the B-tree), and nodes remain balanced after operations.

B-Tree Properties (Order m):

Each node has at most m children
Each internal node has at least ⌈m/2⌉ children
All leaves appear at the same level
Keys in each node are sorted

6. Working Up from the Bottom

B-Trees are built in a bottom-up manner. New nodes split when full and push a key up to the parent, maintaining balance.

    Insert keys: 10, 20, 30, 40, 50 (Order = 3)
    
    Step 1: [10, 20]
    Step 2: [10, 20, 30] → Full
    Split → Parent: [20]
            /    \
        [10]    [30]
    
    Step 3: Insert 40 → [30, 40]
    Step 4: Insert 50 → [30, 40, 50] → Full
    Split → Parent: [20, 40]
            /    |     \
        [10] [30]   [50]

7. Example: Creating a B-Tree (Order 3)

Insertions: 5, 10, 15, 20, 25, 30

    Step 1: [5, 10]
    Step 2: [5, 10, 15] → Full → Split
    Parent: [10]
            /    \
        [5]     [15]
    
    Step 3: Insert 20 → [15, 20]
    Step 4: Insert 25 → [15, 20, 25] → Full → Split
    Parent: [10, 20]
            /    |     \
        [5]   [15]   [25]
    
    Step 5: Insert 30 → [25, 30]

The tree grows in a balanced and optimized way for indexing and search.

UNIT-VI: Sorting Techniques and Searching Techniques

1. Sorting Techniques

Sorting is the process of arranging data in a specific order, typically in ascending or descending order. Here are some common sorting techniques:

Insertion Sort

Insertion Sort builds the final sorted array one item at a time by repeatedly picking the next element and placing it in its correct position.

    Algorithm: 
    for i = 1 to n-1:
        key = arr[i]
        j = i - 1
        while j >= 0 and arr[j] > key:
            arr[j+1] = arr[j]
            j = j - 1
        arr[j+1] = key

Time Complexity: O(n²) in the worst case.

Selection Sort

Selection Sort divides the array into two parts: the sorted part and the unsorted part. It selects the smallest element from the unsorted part and swaps it with the leftmost unsorted element.

    Algorithm: 
    for i = 0 to n-1:
        min_index = i
        for j = i+1 to n:
            if arr[j] < arr[min_index]:
                min_index = j
        swap(arr[i], arr[min_index])

Time Complexity: O(n²) in all cases.

Merge Sort

Merge Sort is a divide-and-conquer algorithm that divides the array into two halves, recursively sorts each half, and then merges the sorted halves.

    Algorithm: 
    MergeSort(arr):
        if length(arr) <= 1:
            return arr
        mid = length(arr) // 2
        left = MergeSort(arr[:mid])
        right = MergeSort(arr[mid:])
        return Merge(left, right)
    
    Merge(left, right):
        result = []
        while left and right:
            if left[0] < right[0]:
                result.append(left.pop(0))
            else:
                result.append(right.pop(0))
        result.extend(left)
        result.extend(right)
        return result

Time Complexity: O(n log n) in all cases.

Heap Sort

Heap Sort converts the array into a max heap and repeatedly removes the largest element to build the sorted array.

    Algorithm:
    1. Build a Max Heap.
    2. Swap the root (maximum element) with the last element of the heap.
    3. Heapify the reduced heap.
    4. Repeat steps 2 and 3 until the heap is empty.

Time Complexity: O(n log n) in all cases.

2. Searching Techniques

Searching is the process of finding an element in a data structure, such as an array or a list. Common searching techniques include:

Linear Search

Linear Search is a simple search algorithm that checks every element of the list until it finds the target element.

    Algorithm: 
    for i = 0 to n-1:
        if arr[i] == target:
            return i
    return -1

Time Complexity: O(n) in all cases.

Binary Search

Binary Search is an efficient algorithm for finding an item from a sorted list. It repeatedly divides the search interval in half.

    Algorithm:
    low = 0
    high = n - 1
    while low <= high:
        mid = (low + high) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
    return -1

Time Complexity: O(log n) in all cases.

Hashing

Hashing is a technique used to uniquely identify a specific item from a collection. A hash function maps the key to an index in a hash table.

    Algorithm:
    1. Define a hash function h(key) that returns an index.
    2. Use the hash value to store or retrieve the item from the hash table.
    
    Example:
    hash_table = [None] * 10
    index = hash_function(key)
    hash_table[index] = value

Time Complexity: O(1) on average for both search and insert operations.