Wednesday, May 23, 2012

Linear Solution: Finding Kth smallest element in an array using Quickselect algorithm

Finding kth smallest element array is a common question across interviews. When this question is asked,  do not even bother to ask whether the input array is sorted. This question can be asked in different ways like:

  • Find the nth largest element in an unsorted array
  • Given two unsorted arrays find kth smallest element in both of the arrays and likewise.
Most widely known solution for this problem is to use QuickSelect algorithm, which is a modification of QuickSort itself. 

A brute force approach would be to sort the array and then to pick the kth largest or smallest element. How do we improve this solution without sorting the entire array?

Quicksort:

Before we get into QuickSelect , let us understand Quicksort in brief. Quicksort works faster in practical than explained. Quicksort can sort n items in O(nlogn) time in an average case. In worst case, Quicksort makes upto n^2 comparisons. With in-space partitioning we can achieve a space complexity of O(n). How a quick sort works?

Simple Quicksort Algorithm:

a. Find a pivot. Pivot is an index inside the bounds of input array chosen intelligently, uniformly and in a random way(Well that's way to make Quicksort efficient!). Let us call this step as "Pivot identification"
b. Group all the elements whose value is lesser than element at index 'pivot' into one array and all the elements that are greater than the value at index pivot into another array. Let us call this step as "partitioning".
c. Recursively call quicksort method on these smaller arrays and concatenate smaller array, pivot and greater array to make the sorted array. 

Simple enough, but this uses unnecessary space for arrays that are created while partitioning. Now lets do "partitioning" "In-place". We call an algorithm as in-space when algorithm works on the given data structure rather than using extra space.  

To make partitioning in-space:

a. We need to modify all the values in the given array such that all the values below pivot's index are less than value at pivot's index and all the values above pivot's index are greater than value at pivot's index.

b. Above step can be achieved by swapping all the elements that are less than value at pivot's index towards starting of array and keep this count say 'storeIndex'. Once we this is done all the values after pivot will be greater than value at pivot. To simplify the swapping lets move pivot element to last element of array using swapping and once we swap other values as required in step -a , we will move back pivot to end of the count 'storeIndex'. This storeIndex will have pivot's index at the end of partitioning. 

Pseudo code for this as in wiki:
 // left is the index of the leftmost element of the array
   // right is the index of the rightmost element of the array (inclusive)
   //   number of elements in subarray = right-left+1
   function partition(array, 'left', 'right', 'pivotIndex')
      'pivotValue' := array['pivotIndex']
      swap array['pivotIndex'] and array['right']  // Move pivot to end
      'storeIndex' := 'left'
      for 'i' from 'left' to 'right' - 1  // left ≤ i < right
          if array['i'] < 'pivotValue'
              swap array['i'] and array['storeIndex']
              'storeIndex' := 'storeIndex' + 1
      end for
      swap array['storeIndex'] and array['right']  // Move pivot to its final place
      return 'storeIndex'



Now lets write in-place Quicksort alogirthm:

Quicksort(array, start ,end)
{
    if array.length <= 1 return array; //recursive return condition
    choose  pivot such that start<=pivot<=end
    pivotindex = partition( array, start, end , pivot);
    return Quicksort(array, start, pivotindex -1) + pivot element + Quicksort(array, pivotindex+1, end); 
}


Let us end a about quicksort now and concentrate on given problem to find out the kth smallest element. To do this we will use quick sort, partitioning and finding pivot that we learnt above.


Quickselect

Quickselect is an alteration of quick sort, where we

  • Find the pivot 
  • Partition based on pivot
  • Check if the required 'k' belongs to which of these partitions and recursively use above steps to find out where is kth smallest element
Here only alteration is 3rd step, where we need to calculate which part of array we need to recursively select to find kth smallest element. Logic is ,
"When we partition an array based on pivot_index then there will be pivot_index -1 elements in first part of array and (actual length of array - pivot_index) in second part of array. If we have find "kth" smallest element, check if k falls in (0, pivot_index-1) or (pivot_index+1, end) sub array. Based on this decision recursively find out where it falls"

Below algorithm is from Yale university's wiki. Algorithm itself is self explanatory. 

*Finding the pivot in a randomized way will make it difficult to understand the complexity of algorithm. 
*In any average case, this algorithm will work in O(n) [tough calculations involved to prove this since we use randomized way to find out the pivot]. Even in worst case this algorithm does not grow beyond O(n^2).
Hence it is better than sorting out whole array and picking from the index, which in better case can be done in O(nlogn).
* Using in-place version of same algorithm will keep the space complexity in O(n).


QuickSelect(A, k)
  let r be chosen uniformly at random in the range 1 to length(A)
  let pivot = A[r]
  let A1, A2 be new arrays
  # split into a pile A1 of small elements and A2 of big elements
  for i = 1 to n
    if A[i] < pivot then
      append A[i] to A1
    else if A[i] > pivot then
      append A[i] to A2
    else
      # do nothing
  end for
  if k <= length(A1):
    # it's in the pile of small elements
    return QuickSelect(A1, k)
  else if k > length(A) - length(A2)
    # it's in the pile of big elements
    return QuickSelect(A2, k - (length(A) - length(A2))
  else
    # it's equal to the pivot
    return pivot

More Info and references:
1. http://en.wikipedia.org/wiki/Quicksort : Quicksort implementations and pseudo code snippets
2. http://pine.cs.yale.edu/pinewiki/QuickSelect :Quickselect algorithm and complexity calculation.

No comments:

Post a Comment