K largest subsequences in a set of integers (negative and positive)

Published on Sunday, December 13, 2009

The problem here is that we want to find the K largest subsequences of s set i of integers between -infinity and infinity. This implementation is done in haskell but can easily be made in another language, more easily then also. We start by defining our sorting method that we are going to use to sort the result. Notice: In this implementation the result of each subsequence is given by a tuple with three values, (Int, Int, Int) lets say (x,y,z) were x is the total sum of the subsequence between index y and index z. Here in the ins method we insert the element x before y if it is less or equal to y (notice names have nothing to do with above declaration of (x,y,z), they are new!). And if this is not the case we insert x later in the array of values (ys).

-- Inserts element at the correct location
ins x []        = [x]
ins x (y:ys)    = if (x <= y) then x:(y:ys) else y : ins x ys

The main sorting method is done by my earlier stated iSort with the only modification that we have a tuple with three values instead of a list with integers. Read my earlier post about iSort to get the hang of this method.

-- Definition of insertion sort, fold insert on every element of xs
iSort xs = foldr ins [] xs

We now have the mean of sorting the list of subsequences, what we want now is the mean of picking out k number of subsequence from the list, this can be done with the help of this method.

-- Returns the k first elements of a list
[] !? _               = []
xs !? k
        | k > 0         = (lstE) : ((filter(/=lstE) xs) !? (k-1))
        | otherwise     = []
                lstE = last xs

As you may have noticed this method is a bit hard to read, I suggest you make yourself comfortable with the Haskell syntax before you continue any further with this tutorial. What we do in this method is that we first pattern match the empty list and returns the empty list independent of what value of k we got. If we do not receive an empty list we pick out the list xs and k and if k > 0 we take the last element of the list (if we receive a sorted list this element should be the greatest one) and then we add this element to the list which does not contain this element and this list (filter…) will contain the k-1 greatest values.

Now we need the method that will calculate the sums for us, this is done by this method

-- Computes the sum for an array from index i to index j
sumIJ l i j
        | (i == j) = l !! (i-1)
        | otherwise = (l !! (i-1)) + (sumIJ l (i+1) j)

sumIJ takes the arguments [Int], Int and Int (if we are strict about the haskell type-system it only takes one argument, but in this simple tutorial we do not consider higher-order functions). The first argument is the list we want to compute our subsequence of, the second argument is the lower index and j is the higher index. We use guards to match our different options and the first one is if the index i equals the index j. In this case we return the (i-1)’th element of the list l. If this is not the case we return the sum of the (i-1)’th element and the sum of index i+1 to j on the given list.

Now we need to figure out how to compute all subsequent sums and not just between two indexes. This can be done by this method below.

-- Computes the subsequent sum from index i to j for all indexes i and j
computeUnique l i j
        | (j < length l)        = [(theSum, i, j)] ++ computeUnique l i (j+1)
        | (i < j)               = [(theSum, i, j)] ++ computeUnique l (i+1) (i+1)
        | otherwise             = [(theSum, i, j)]
                theSum = sumIJ l i j

What it does is that it takes the list l, the indexes i and j and increases the index j until we reach the last index and computes each subsequence sum, then it instead increases i until we reach the index j and computes those subsequences. And in all other cases we return the subsequence of index i and j. This can be pretty complex to understand at first glance, but try to read and process the code at your own pace and when understanding the sumIJ method the rest should be pretty clear. As a side note the ++ operator is used to concatenate two lists.

And now for the big finale we got the main method that is suppose to call the sorting method and the computeUnique method.

-- Computes the subseq. sum for an array l with k sets
kmaxsubunique [] _     = []
kmaxsubunique l k
        | k > 0         = (iSort (computeUnique (nub l) 1 1)) !? k
        | otherwise     = []

As seen above the method takes two arguments, the list and k. We start by pattern matching for the empty list as this is a case that could appear. We then check if k is greater than zero and if it is we return the sorted list of subsequences that has starting index i = 1 and j = 1, if k is less than or equal to zero we just return the empty list as we don’t got anything else to return. I would also like to mention the nub operator. nub removes all duplicate elements of a list making it unique, this was a requirement for this lab and as such it was added to the code. It is also pretty easy to implement a custom nub method like this

nub' [] = []
nub' (x:xs) = x : nub' (filter(/=x) xs)

This concludes this short crash course, more will come later. I will also try to improve the language to make it easier to understand, this guide was rushed a bit.

comments powered by Disqus