# Notes on Data Structures

Based on Data Structures Crash Course by Algo Expert

# Introduction

In this blog, I have written a brief summary of all the popular data structures and also talked about their time complexities based on my learnings from the 5 hours Data Structures Course.

# Data Structures

Data Structures help us in organizing and managing data. They are used every day. Eg; emails, stacks of books.

*Data Structure is a collection of data values, the relationships between them, and the functions we can operate on them.*

eg; tree, Linked List, arrays, graphs.

eg; insert, search, delete.

# Complexity Analysis

Complexity Analysis is the foundational knowledge that is needed to better understand DS. It is the bedrock of coding interviews. It also helps in finding the best solution among many solutions.

*Complexity Analysis is all about Time and Space Complexity.Time Complexity is the measure of how fast an algorithm runs.Space Complexity is the amount of memory algorithm uses up.*

# Memory

Memory is the foundational knowledge to understand complexity analysis and data structures. *A variable is stored somewhere in the PC in memory. The memory is a bounded canvas and has a finite number of memory slots. The less memory a program takes, the better it performs.*

The program will always store variables in continuous free memory slots and in back to back slots. Memory is made up of bits 0 and 1. Any piece of data can be transformed into a binary representation. Then it is stored in the PC in memory in blocks of 8 bits.

Int in C++/Java = 32 bits = 4 bytes

Long in Java = 64 bits = 8 bytes

Integers are fixed-width integers. eg, 64 bits. Endianness helps in ordering the bytes while representing them.

Array/ List of numbers will be stored only in a contiguous memory location, back to back, free memory slots. eg: [1,2,3]. Strings are converted into characters and then mapped to its ASCII value and then that ASCII is converted to binary digits and stored in memory slots.

Pointers are used to point to another memory address in base 2 format. We can quickly point to a very far away memory location by using a pointer. All of these memory slots can be accessed very quickly by PC. Eg; array access

# Big-O-Notation

We don't describe time complexity in terms of seconds. It is impossible and that doesn't make sense because it depends on the size of the input. The speed of the algorithm is dependant on the input size.

The same program may take different run time each time because of the computer's performance. But this is a very very negligible difference.

*Time and Space Complexity is the measure of an algorithm speed/ run time when the size of input increases. Asymptotic analysis means that we are analyzing the behavior of a function as its value tends to infinity.*

Big-O is the space/time complexity in the worst-case scenario. The unit of big-O is vague but can be defined as the time taken for one operation.

O(8) and O(1) is the same in asymptotic analysis

O(2N) is the same as O(N) as well.

O(N³) is removed as compared to O(N²)

But O(M+N) can not be removed as M and N are different

# Logarithm

*log.b(x) = y is same as b^y = x*

In Maths, we normally take b as 10 but

In CS, we normally take b as 2 as a binary log and x is always N*Log(N) = y is same as 2^y = N*

log(1) = 0

log(2) = 1

log(4) = 2

log(8) = 3

log(16) = 4

As the value inside log doubles (2x), the log value increases only by 1 (+1)

and hence, here comes the great power and use of log

This feature makes it so effective and hence uses very less time in algorithms

hence, it is better than O(N)

log(20) = 1M

log(30) = 1B

Binary Search has a time complexity of log(n) as it divides it by 2 each time

or by cutting a binary tree by half each time. If you are cutting down the number of operations by a half each time, then you are using log(n) time.

# Arrays

*Arrays are one of the most fundamental DS of all. Arrays are called List in Python. Arrays are converted in their binary formats and then stored in contiguous back to back memory slots.*

Arrays can be of 2 types; Static and Dynamic. Static and Dynamic are used in C++ and Java while Python or JS is always a dynamic array.

A dynamic array is an array that can change in size. In C++ it is called as a Vector and in Java, it is ArrayList. It allows us to have faster insertions at the end of the array.

Space and Time ComplexityO(1) T Get an element from array

O(1) S Set/Replace an element of arrayO(n) S Initialzing an array of size n

O(n) T Initialzing an array of size nO(1) S Traversing an array of size n

O(n) T Traversing an array of size nWhen we map, filter or reduce an array then it is O(n) time.

O(n) S Copying an array of size n

O(n) T Copying an array of size nO(1) S Inserting a new element in an array of size n STATIC

O(n) T Inserting a new element in an array of size n STATICO(1) S Inserting a new element in an array of size n DYNAMIC

O(1) T Inserting a new element in an array of size n DYNAMIC (Avg Time)

O(n) T Inserting a new element in an array of size n DYNAMIC (Worst Time Case)

When we do insert in begin or middle, then it is an O(n) Time in Dynamic as well. Pop is O(1) but the shift is O(n).

*Amortized analysis is the analysis when we take into consideration the edge cases when things take a lot of time and also the easy cases when things do not take so much time.*

# Linked List

Linked List (LL) is a fundamental DS which is used in coding interviews and forms the base for other useful DS. *It is very similar to an array but it is stored/implemented in a different way than an array in memory*

Arrays are read from left to right and even singly Linked List is read from left to right. The array is stored in contiguous memory blocks but the Linked List can be stored anywhere in memory.

** LL connects them using pointers by storing its address. Each node in a LL has a value and the pointer to the next node**. Thus, LL takes 2x size as an array because of storing the pointers. The first node is called the Head and the last node is called the Tail.

LL are of different variations;

Singly LL

Doubly LL

Circular LL, etc.

Space and Time ComplexityO(N) T Get an element value from LL (WORST CASE)

O(N) T Set/Replace an element of LL (WORST CASE)O(n) S Initializing an LL of size n

O(n) T Initializing an LL of size nO(1) S Traversing an LL of size n

O(n) T Traversing an LL of size nO(n) S Copying an LL of size n

O(n) T Copying an LL of size nO(1) S Inserting/Deleting a new element in an LL of size n at the beginning

O(1) T Inserting/Deleting a new element in an LL of size n at the beginningO(x) S Inserting/Deleting a new element in an LL of size n at any position (x)

O(x) T Inserting/Deleting a new element in an LL of size n at any position (x)

# Hash Tables

Hash tables are an extremely powerful and common DS. Using a hash table is quite easy and a lot of programming languages have built-in hash tables implementations. Eg javascript objects, python dictionary. But they are pretty complicated under the hood.

*Hash Table is a DS with key-value pair. We can access a value given a key but the vice versa is not possible. The keys can be strings, int, or even other data structures.*

Under the hood, the hash table is built on top of the array. The key is transformed into an index by using a hash function. And this index helps us in storing the value in the array. And hence it allows us to have O(1) time complexity in insertion/deletion/searching

It is an array where each index maps to a LinkedList of potential values. This happens precisely to take care when 2 keys get hashed to the same index by the hashing function. This is known as a collision in the hash table. Every node of the LinkedList points back to the original key.

In the worst case, we might have a long single n linked list and lead to O(n). But the hashing functions are powerful and made brilliantly and hence, reduce this worst case in all cases. If we run out of arrays while mapping, then we use a resizing array hash table Hence it reduces the collisions.

Space-Time ComplexityInsertion T = O(1)=avg O(n)=worst S = O(1)

Deletion T = O(1)=avg O(n)=worst S = O(1)

Search T = O(1)=avg O(n)=worst S = O(1)

Initilising T = O(N) S = O(N)

# Stacks & Queues

Stacks and Queues are interesting DS that is commonly used in coding interviews. They are not hard to implement. They have real intuitive real-life examples. Eg; Stacks of books on a table and a queue of people at the cinema.

Stack is a DS that supports insertion/deletion that follows LIFO

Queue is a DS that supports insertion/deletion that follows FIFO

The queue is basically the opposite of stacks.*A Stack is a list of elements that are in some sort of order and follow LIFO A Queue is a list of elements that are in some sort of order and follow FIFO*

Space Time Complexity in Stacks and QueuesInsertion/Deletion of one element = O(1) ST

Searching one element in stack/queue = O(n) T = O(1) S

Initisating/Storing the whole s/q = O(n) ST

A Stack is actually a dynamic array under the hood. ie, it can be implemented like a dynamic array. We don't care about adding elements in the middle or bottom of the stack. eg; max stack, min stack

Insert = Push

Delete = Pop

But dynamic array can NOT be used in a queue. We can not add an element in the beginning or middle as it is not constant time. A queue is typically implemented as a Linked List. eg; priority queue

Insert = Enqueue

Delete = Dequeue

# Strings

** The string is a fundamental data type but it also works as a data structure. ** In ASCII, a = 97, A = 65. Every char in a string is stored in a fixed amount of byte. Hence, it works like an array

A string is stored in memory as an array of char where each char is matched to an integer by using eg; ASCII.

Space-Time Complexity of StringsTraverse S O(1)

Traverse T O(N)Copy S O(N)

Copy T O(N)Get S O(1)

Get T O(1)

In C++, a string is mutable, and then adding a new value at the end of the string is O(1) but in Java, C, Python, etc strings are immutable, and adding a new value at the end creates a brand new string and hence, it is an O(n+m) ST complexity. Hence, it is recommended to split out this string in an actual array of characters in actual code. So that it can act like a mutable string and can perform operations faster

# Graphs

Graphs are a very common DS. They are used in every day all the time*. A graph is a collection of nodes that may or may not be connected together. Every node is called a vertex and the connections are called an edge.*

Key things about graphs;

1) Connected / Disconnected Graphs

eg; a location in the world that can only be reached with some special vehicle

2) Directed / Undirectional

eg; plane flights path, friends on FB

3) Cyclic / Acyclic Graphs

eg; pages and links on Wikipedia

Graphs pop up a lot in coding problems.

eg; 2D array where we care about neighboring nodes(element)

eg; a char in a string that is being replaced/swapped

A graph is represented by an adjacency list in code and the edges of the graph are basically pointers in memory.

Space-Time ComplexityStoring a graph = S O(v+e)

Traversing a graph = T O(v+e) = DFS / BFS

# Trees

Trees are one of the most prominent DS in all of CS and SE and a tree is a type of graph.

*A tree is a type of graph that is 1) rooted2) every node has child nodes3) the structure is directed4) the structure will be acyclic 5) each node in the tree will only have one parent6) the tree is not allowed to be disconnected.*

eg; management chain, human beings family tree

There are a lot of different types of Trees;

1) Binary Tree

2) Ternary Tree

3) K-nary Tree

4) Binary Search Trees

5) Min/Max Heap

6) Tries

Space-Time ComplexityStoring a tree = O(N)

Traversing the whole tree = O(N)

Traversing one path of a binary tree = O(logN) (avg) = O(N) (worst case)

Red Black Trees and AVL Trees are examples of a balanced tree that maintains O(logN) always.

Key Terminologies in a tree;

1) Branch

2) Leaf

3) Level

4) Complete Tree

5) Full Tree

6) Perfect Tree

7) Depth/Height

# Conclusion & Course Review

It is a 5-hour course and covers only the theory of data structures and not even a single line of code is taught. Overall, it helps a lot in strengthing the foundational knowledge of data structures and can be a great starting point for further study. I will highly recommend this course to all budding computer science enthusiasts.

Rating: (4/5): ★★★★