Analysing the data structure performance of an Array (read, search, insertion and deletion

07 Jul 2020

Array

An array is one of the most basic data structures. It is a list of data elements, say for example, a shopping list. The index of an array is the number that identifies where a piece of data lives inside the array. In most programming languages, we begin counting the index at 0.

To understand the performance of a data structure, we need to analyze the common ways that our code might interact with that data structure.

Most data structures are used in four basic ways, which we refer to as operations.

When we measure how fast an operation takes, we don't refer to how fast the operation takes in terms of pure time, but instead in how many steps it takes. The amount of time something takes will differ from computer to computer. Whereas we can assume that operation A that takes 5 steps to complete will always be faster than operation B that takes 10 steps to complete on all pieces of hardware.

Four operations of an array

Reading (1 step)

Reading is where we look up what value is contained at a particular index inside an array.

Reading an array takes just one step, because the computer has the ability to jump to any particular index in the array and look inside.

A computer's memory can be viewed as a giant collection of cells.When a program declares an array, it allocates a contiguous set of empty cells for use in the program. So if you were creating an array meant to hold five elements, then your computer would find any group of five empty cells in a row and designate it to serve your array.

Every cell in a computers memory has a specific address (kind of like a street address). Each cells memory address is one number greater than the previous cell.

When the computer reads a value at a particular index of an array, it can jump straight to that index in one step because recorded in each array is the memory address which it begins at (e.g. our array might begin with index 0 at memory address 1010). Index 3 will be exactly three slots past index 3 (1013).

An operation with just one step is the fastest type of operation, which is why an array is such a powerful data structure, because we can look up the value at any index in a single step.

Searching N steps (linear search)

Searching an array is looking to see whether a particular value exists within an array and if so, which index it's located at.

To search for a value within an array, the computer starts at index 0, checks the value, and if it doesn't find what it's looking for, moves on to the next index. It does this until it finds the value it's seeking.

shopping_list = ["apples", "bananas", "cucumbers", "dates", "elderberries"]

If we want to find if the shopping list contains "dates" and where it is in the list, we would have to check each index from 0 individually until we found the index containing the value dates. 1. Index 0 contains apples, not dates. 2. Index 1 also doesn't contain the value dates. 3. Neither does index 2. 4. Index 3 does contain the value dates!!! Finding dates, took a total of four steps.

Checking each cell one at a time like this is called a linear search.

If the value we are seeking happens to be in the final cell in the array, then the computer would end up searching through every cell of the array until it finally finds the value it's lookinf for. Also, if the value we're looking for doesn't occur in the array at all, the computer would likewise have to search every cell so that it can be sure that the value doesn't exist within the array.

So, for an array of five cells, the maximum number of steps that a linear search would take is 5.

Another way of saying this is that for N cells in an array, linear search will take a maximum of N steps.

Searching is less efficient than reading, because it can take many steps, whereas reading always takes just one step no matter how large the array.

Insertion (N + 1) steps

The efficiency of inserting a new piece of data inside an array depends on where inside the array you'd like to insert it.

If we wanted to add "figs" to the end of our shopping list, this would only take one step, because the computer knows how many elements the array currently contains, and it knows which memory address the array begins at, so it can calculate where to insert the item in just one step.

Whereas inserting a new piece of data at the beginning or the middle of an array takes a lot more steps, because in these cases, we need to shift many pieces of data to make room for what we're inserting.

If we wanted to add "figs" to index 2 of our array, we'd need to move "cucumbers", "dates" and "elderberries" to the right by one cell in order to make room for the figs (3 steps to acheive this). Finally, we can insert "figs" into the now empty cell at index 2. This entire process for this particular small array took 4 steps. Three of the steps were shifting data to the right, while one step was th eactual insertion of the new value.

The worst-case scenario for inserting into an array is where we insert data at the beginning of the array, because we have to move all of the other values to the right.

So we can say that insertion in the worst-case scenario can take up to N + 1 steps for an array containing N elements (shifting every data element in the array, plus one assertion).

Deletion N steps

Deletion is the process of removing the value at a particular index. Deletion is like insertion, but in reverse.

If we want to delete the value at index 2, first we'd have to delete the value at index 2, and then we'd have to shift all of the elements to the left of their original index until we get to the end of the array.

Like insertion, the worst-case scenario of deleting an element is deleting the very first element of the array, because we would first remove the first element, and then we would have to shift all following elements to the left by 1 until the gap where the deletion was made is closed.

For an array of 5 elements, we'd spend one step deleting the first elements, and then four steps shifting the four remaining elments. For an array containing N elements, the maximum number of steps that deletion would take is N steps.

Kotlin example


val num = arrayOf(1,2,3,4,5) //implicit type declaration
val num = arrayOf<Int>(1,2,3,4,5) //explicit type declaration

//using an array constructor
val num = Array(3, {i-> i*1})

//initializes an array where each index is given the value of the array index * 1. So in this case 1,2,3

//factory methods to create arrays of primitive types
val num = intArrayOf(1,2,3,4,5)
byteArrayOf()
charArrayOf()
shortArrayOf()
longArrayOf()

Performance steps in worst case scenario where each operation is performed only once: 17 steps

Sets

Sets are just like arrays, except sets are arrays that contain no duplicate values. For example, you wouldn't want a phone book that lists your phone number twice for different people.

The performance for a set is the same as an array in all operations except for the insertion operation.

If we want to add an item to our shopping list which is stored in a set, we would first have to check to see whether the item already exists in the array by checking all of the indexes for the value, then shift all elements following the insertion index to the left by one, and finally insert your element in the newly emptied index.

If we wanted to insert an element to the start of our set, the performance would be 2N + 1.

In this case, we know that the array would be more efficient, but in the case of a phone book application, the set would be better despite the performance tradeoff for the insertion operation.