Data Structure refers to a particular way of organizing and storing data in a computer so that it can be accessed and modified efficiently.
Common Data Structures: Arrays, Linked Lists, Stacks, Queues, Trees, Graphs, etc.
Characteristics of Data Structures:
Array is a collection of elements stored in contiguous memory locations and can be accessed using an index.
A linear array where elements are stored one after another.
int arr[5] = {10, 20, 30, 40, 50};
Memory Representation: Index: 0 1 2 3 4 Values: 10 20 30 40 50
Used to store matrices or tabular data. Data is stored in a row-major or column-major format.
int matrix[2][3] = { {1, 2, 3}, {4, 5, 6} };
Logical View: 1 2 3 4 5 6 Memory (Row-Major): 1 2 3 4 5 6
A Sparse Array contains mostly zero values. Storing all zeroes wastes space, so we use optimized formats.
All elements above the main diagonal are zero.
Example (3x3): 1 0 0 4 5 0 7 8 9
All elements below the main diagonal are zero.
Example (3x3): 1 2 3 0 5 6 0 0 9
Only the main diagonal and two diagonals adjacent to it (upper and lower) have non-zero values.
Example (5x5): x y 0 0 0 y x y 0 0 0 y x y 0 0 0 y x y 0 0 0 y x
Instead of storing all elements (including zeros), we store only non-zero elements with their row and column indices.
Original Sparse Matrix: 0 0 5 0 8 0 3 0 0
Vector Representation: Row Col Value 0 2 5 1 1 8 2 0 3
This format saves memory and is useful for mathematical operations like matrix addition and multiplication.
A Stack is a linear data structure that follows the LIFO (Last In First Out) principle.
Primitive Stack Operations:
Stack Example (Push/Pop): Push: 10 → 20 → 30 Stack: [30, 20, 10] Pop → 30 New Stack: [20, 10]
Infix: Operator is in-between operands (A + B)
Postfix (Reverse Polish): Operator comes after operands (A B +)
Prefix (Polish): Operator comes before operands (+ A B)
Infix: A + B Postfix: A B + Prefix: + A B
Use a stack to evaluate postfix expressions. Scan left to right:
Postfix: 5 6 2 + * 12 4 / - Steps: → Push 5, 6, 2 → + → (6 + 2 = 8) → push 8 → * → (5 * 8 = 40) → push 40 → Push 12, 4 → / → (12 / 4 = 3) → push 3 → - → (40 - 3 = 37) Result = 37
Infix → Postfix: Use stack and precedence rules.
Infix → Prefix: Reverse, convert to postfix, reverse again.
Prefix → Infix/Postfix: Use stack-based logic.
Infix: (A + B) * C Postfix: A B + C * Prefix: * + A B C
A Queue is a linear data structure that follows the FIFO (First In First Out) principle.
Primitive Queue Operations:
Queue Example: Enqueue: 10 → 20 → 30 Queue: [10, 20, 30] Dequeue → 10 New Queue: [20, 30]
Insertion and deletion allowed at both front and rear ends.
Types: - Input Restricted Deque: Insert at rear only, delete at both ends. - Output Restricted Deque: Delete from front only, insert at both ends.
Each element has a priority. High priority elements are dequeued before low ones.
Example: Enqueue: [Task A, Priority 2], [Task B, Priority 1], [Task C, Priority 3] Dequeue order: Task C → Task A → Task B
A Linked List is a linear data structure where each element (node) contains data and a pointer to the next node. Unlike arrays, linked lists do not store elements in contiguous memory locations.
Structure of a Node: struct Node { int data; struct Node* next; };
Sequential (Array) | Linked List |
---|---|
Fixed size | Dynamic size |
Easy indexing | Sequential access |
Insertion/Deletion is costly | Efficient insertion/deletion |
Memory waste possible | Efficient memory usage |
a. Traversal: Visit each node starting from the head.
while (temp != NULL) { printf("%d ", temp->data); temp = temp->next; }
b. Insertion: Add a new node at beginning, end, or specific position.
// Insert at beginning new_node->next = head; head = new_node;
c. Deletion: Remove a node from beginning, end, or specific position.
// Delete first node temp = head; head = head->next; free(temp);
d. Searching: Traverse list and compare data.
while (temp != NULL) { if (temp->data == key) return true; temp = temp->next; }
A doubly linked list has nodes with two pointers: one to the next node and one to the previous node.
struct DNode { int data; struct DNode* prev; struct DNode* next; };
Advantages:
A Header Node is a special node that always exists at the beginning of the list. It may store metadata like length or simply serve as a dummy node to simplify list operations.
// Header node example struct Node { int count; struct Node* next; };
Benefits of Header Nodes:
A Tree is a non-linear hierarchical data structure consisting of nodes connected by edges. The top node is called the root.
Terminology:
Traversal means visiting all nodes in a specific order. In binary trees, three types of traversals are common:
a. Inorder (LNR): Left → Node → Right
b. Preorder (NLR): Node → Left → Right
c. Postorder (LRN): Left → Right → Node
// Inorder traversal void inorder(Node* root) { if (root != NULL) { inorder(root->left); printf("%d ", root->data); inorder(root->right); } }
// Preorder traversal void preorder(Node* root) { if (root != NULL) { printf("%d ", root->data); preorder(root->left); preorder(root->right); } }
// Postorder traversal void postorder(Node* root) { if (root != NULL) { postorder(root->left); postorder(root->right); printf("%d ", root->data); } }
Insertion in Binary Tree:
Node* insert(Node* root, int value) { if (root == NULL) { root = (Node*)malloc(sizeof(Node)); root->data = value; root->left = root->right = NULL; } else if (value < root->data) { root->left = insert(root->left, value); } else { root->right = insert(root->right, value); } return root; }
Deletion in Binary Tree:
Node* delete(Node* root, int key) { if (root == NULL) return root; if (key < root->data) { root->left = delete(root->left, key); } else if (key > root->data) { root->right = delete(root->right, key); } else { if (root->left == NULL) return root->right; else if (root->right == NULL) return root->left; Node* temp = findMin(root->right); root->data = temp->data; root->right = delete(root->right, temp->data); } return root; }
A Binary Search Tree is a binary tree where each node follows this property:
BST Example Insertion:
30 / \ 20 40 / \ 10 50
Advantages of BST:
A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time.
It is commonly used in databases and file systems where large blocks of data are read/written at once.
B-Trees were invented by Rudolf Bayer and Edward McCreight in 1971 at Boeing Labs. The goal was to create a structure suitable for disk-based storage and indexing systems.
Binary Search Trees (BSTs) are not always efficient when dealing with large datasets due to their height. If not balanced, the search time degrades to linear time.
There was a need for a tree structure that minimizes disk access and maintains balance, even with many insertions and deletions.
BSTs are good for in-memory operations, but they are not efficient for indexing in large disk-based systems because they become unbalanced and require many disk reads.
Example: Insert 10, 20, 5, 6, 12 10 / \ 5 20 \ 6 \ 12
This results in skewed trees and poor performance.
B-Trees reduce height by allowing multiple keys per node and multiple children, resulting in fewer levels and efficient disk access.
Each node can have up to m children (where m is the order of the B-tree), and nodes remain balanced after operations.
B-Tree Properties (Order m):
B-Trees are built in a bottom-up manner. New nodes split when full and push a key up to the parent, maintaining balance.
Insert keys: 10, 20, 30, 40, 50 (Order = 3) Step 1: [10, 20] Step 2: [10, 20, 30] → Full Split → Parent: [20] / \ [10] [30] Step 3: Insert 40 → [30, 40] Step 4: Insert 50 → [30, 40, 50] → Full Split → Parent: [20, 40] / | \ [10] [30] [50]
Insertions: 5, 10, 15, 20, 25, 30
Step 1: [5, 10] Step 2: [5, 10, 15] → Full → Split Parent: [10] / \ [5] [15] Step 3: Insert 20 → [15, 20] Step 4: Insert 25 → [15, 20, 25] → Full → Split Parent: [10, 20] / | \ [5] [15] [25] Step 5: Insert 30 → [25, 30]
The tree grows in a balanced and optimized way for indexing and search.
Sorting is the process of arranging data in a specific order, typically in ascending or descending order. Here are some common sorting techniques:
Insertion Sort builds the final sorted array one item at a time by repeatedly picking the next element and placing it in its correct position.
Algorithm: for i = 1 to n-1: key = arr[i] j = i - 1 while j >= 0 and arr[j] > key: arr[j+1] = arr[j] j = j - 1 arr[j+1] = key
Time Complexity: O(n²) in the worst case.
Selection Sort divides the array into two parts: the sorted part and the unsorted part. It selects the smallest element from the unsorted part and swaps it with the leftmost unsorted element.
Algorithm: for i = 0 to n-1: min_index = i for j = i+1 to n: if arr[j] < arr[min_index]: min_index = j swap(arr[i], arr[min_index])
Time Complexity: O(n²) in all cases.
Merge Sort is a divide-and-conquer algorithm that divides the array into two halves, recursively sorts each half, and then merges the sorted halves.
Algorithm: MergeSort(arr): if length(arr) <= 1: return arr mid = length(arr) // 2 left = MergeSort(arr[:mid]) right = MergeSort(arr[mid:]) return Merge(left, right) Merge(left, right): result = [] while left and right: if left[0] < right[0]: result.append(left.pop(0)) else: result.append(right.pop(0)) result.extend(left) result.extend(right) return result
Time Complexity: O(n log n) in all cases.
Heap Sort converts the array into a max heap and repeatedly removes the largest element to build the sorted array.
Algorithm: 1. Build a Max Heap. 2. Swap the root (maximum element) with the last element of the heap. 3. Heapify the reduced heap. 4. Repeat steps 2 and 3 until the heap is empty.
Time Complexity: O(n log n) in all cases.
Searching is the process of finding an element in a data structure, such as an array or a list. Common searching techniques include:
Linear Search is a simple search algorithm that checks every element of the list until it finds the target element.
Algorithm: for i = 0 to n-1: if arr[i] == target: return i return -1
Time Complexity: O(n) in all cases.
Binary Search is an efficient algorithm for finding an item from a sorted list. It repeatedly divides the search interval in half.
Algorithm: low = 0 high = n - 1 while low <= high: mid = (low + high) // 2 if arr[mid] == target: return mid elif arr[mid] < target: low = mid + 1 else: high = mid - 1 return -1
Time Complexity: O(log n) in all cases.
Hashing is a technique used to uniquely identify a specific item from a collection. A hash function maps the key to an index in a hash table.
Algorithm: 1. Define a hash function h(key) that returns an index. 2. Use the hash value to store or retrieve the item from the hash table. Example: hash_table = [None] * 10 index = hash_function(key) hash_table[index] = value
Time Complexity: O(1) on average for both search and insert operations.