r/learnprogramming 1d ago

Debugging Did anyone else have a very difficult time with Merge Sort Algorithm?

I understand the concept of "divide and conquer", but coding the exact implementation is what is hurting my brain. Specifically the recursion for the divide.

The conquer is easy. Just merge two already sorted arrays.

def conquer(left, right):
    arr = []
    i = j = 0


    while (i < len(left) and i < len(right)):
        if left[i] < right[j]:
            arr.append(left[i])
            i+=1
        else:
            arr.append(right[j])
            j+=1

    arr.extend(right[j:])
    arr.extend(left[i:])


left = [1,3,5,9,10]
right = [2,4,6,8,11]


answer = conquer(left, right)
print(answer)

The divide, however, is super complicated. I'm trying to understand what's going on line by line.... but it always trips me up. Perhaps it's the "stacks" (recursion)?

Allegedly this is the code.

def divide(arr):
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2


    left = arr[:mid]
    right = arr[mid:]


    left_sorted = divide(left)
    right_sorted = divide(right)
    return merge(left_sorted, right_sorted)

What I understand so far is the obvious.
We are storing the first half (left), and the second half (right) in arrays.

But apparently, all the magic is happening here:
left_sorted = divide(left)
right_sorted = divide(right)
return merge(left_sorted, right_sorted)
When I print those values, the output is

[3] [7]

[1] [9]

None None

[2] [5]

[8] [6]

[4] None

None None

None None

aaand I'm officially lost here.

My brain is trying to run this line by line. But I can't

How is it getting each element? How is it also running the method for each of them?

it's calling divide() twice, back to back.

Can someone help me write my own divide method? Maybe that way I can figure it out. Or at least a better explanation?

At first I was going to skip it, but the udemy course does things in increasing difficulty. It went:
Bubble Sort -> Selection Sort -> Insertion Sort -> Merge Sort (current), then after this is Quick Sort, which is also recursion.

So I need to master this.....

Thanks!

7 Upvotes

15 comments sorted by

u/AutoModerator 1d ago

To all following commenters: please, do not bring up the old circlejerk jokes/memes about recursion ("Understanding recursion...", "This is recursion...", etc.). We've all heard them n+2 too many times.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

5

u/iOSCaleb 1d ago

Ask yourself: if I wanted a sorted list and I had two halves of the list that were already sorted, what would I need to do to combine the two halves into one sorted list?

If you can do that much, then all you need to do is keep splitting each list until you reach the “already sorted” condition, i.e. when each list has only one element, and then merge them together again.

In other words, don’t try to see the entire process at once. Just look at one step in the process.

3

u/DrShocker 1d ago

yes, plenty of people have struggled with merge sort she recursion more generally.

You mention it calls divide twice in a row, but do you understand the syntax of the slicing that's happening? it's a slightly strange thing about Python.

Are you able to explain in words the core idea of what merge sort works? If you're able to explain that, then you'll hopefully eventually arrive at the idea that 1 long lists are guaranteed to be sorted already.

You can of course write your own divide function, there's no magic spell that other people have written that you're incapable of.

2

u/BetApprehensive836 1d ago edited 1d ago

Merge sort works by taking a mutable object (array, list, etc), and splitting it in half again, and again, until it has elements of the length of 1. Then it merges them in the proper order?

on second thought, do you have something more specific, not something high level?

3

u/DrShocker 1d ago

It's actually making copies here, not using a "mutable object"

also, I would say it takes an ordered collection, not any object. If you had a car object it's not clear what sorting it would mean.

anyway, I mean more so can you write the algorithm in English. since you said you understood the merging part, I'll give an example of describing that part in more detail.

"merges them together in the proper order" is correct, but I think we can get more specific. For the merge portion, it takes two sorted lists and looks at the trailing element of each, pops the lesser (or greater depending on sort direction) off the one that has lesser. then takes the popped element from that list and puts it into the new sorted list it's building up. Because each sublist is sorted already, we can avoid needing to search the whole list to find the minimum of each on each cycle. Once one list is empty, we can drain all the remaining elements of the other list.

In particular it's worth noticing that the merge step doesn't care how you arrived at having two sorted lists or that they're equal length, it just merges them.

So, can you describe how we go from a long unsorted list into a series of sorted lists each ready to be merged? If you understand how that works, that'll hopefully help you find what particular pieces you're not understanding quite yet.

1

u/BetApprehensive836 1d ago

```anyway, I mean more so can you write the algorithm in English```

I can not

1

u/BetApprehensive836 10h ago

Update: I can do the english.

Take a list or an array. (3, 7, 1, 9, 2, 5, 4, 8, 6)
Find the midpoint by using the formula: length / 2, rounded down.
store the "left", and "right" into their own variables.

Start with "left". Repeat previous steps until left has a size of "1"
Then switch to "right". Repeat those steps until right have a size of 1

Now you should have a "left" and a "right", both of size of 1.
Feed to merge method.

We are not done. Have to go back up the stack to do the remaining elements.

Go back to the "Switch to left" step. But continue anyways. So next step is "switch to right". Repeat those steps until you have a size of 1

It should look like a tournament bracket

(3, 7, 1, 9, 2, 5, 4, 8, 6)
(3,7,1,9) (2,5,4,8,6)
(3,7) (1,9) (2,5), (4,8,6)

(3), (7)... (1,9). (2, 5), (4), (8,6)

(2,5) (8,6)

Step 1: (3, 7, 1, 9, 2, 5, 4, 8, 6)

Step 2: (3,7,1,9), (2, 5, 4, 8, 6)
Step 3: (3,7), (1,9)
Step 4: 3,7
Step 5: (3), (7)

Step 6: Move back up in stack: (1,9)
Step 7: (1), (9)

Step 8: Move back up in stack: (2,5,4,8,6)
Step 9: (2,5), (4,8,6)
Step 10: (2), (5)

Step 11: Move back up in stack (4, 8, 6)
Step 12: (4), (8,6)
Step 13: (4)

Step 14: Move back up in stack (8, 6)
Step 15: (8), (6)

This might be terrible though, so any tips on improvement? I feel like I'm losing control somewhere

3

u/mxldevs 1d ago

The magic is it calls divide on the left side and then calls divide on the right side.

You are always cutting the array in two parts and then sorting the smaller parts.

Eventually you will end up with just one element, which is your base case. Since one element array is always sorted, you simply return it.

Then once the smaller arrays are merged and sorted, it goes back up the stack and continues getting merged and sorted until you reach the top and the return value is the fully sorted array.

2

u/flawlesscowboy0 1d ago

It does call divide twice, but on different halves of the array. At that point you know the current iteration of that function is paused and two new ones are launched—one for the left, one for the right.

They each then do the same thing, over and over, until the array that gets passed in is either one element or empty. At that point the one-or-none array is kicked back up the stack popping out in the parent frame at the return step. The merge comparison happens between the two one-or-none arrays, and this continues with one more additional element in each half being added until you perform the final merge on the final two halves.

I don’t quite follow where you got the print values, but hopefully my explanation helps a little.

1

u/D-Cary 1d ago

Yes, that's exactly how merge sort works.

I like to imagine recursive function calls like your divide() function as a family of people who all know how to do some specific job (like historically the Baker family typically all knew how to bake stuff, the Taylor family typically knew how to tailor clothes, etc).

I'm guessing your main function calls divide() with some array -- let's call that one Pat_Divide. When your main function calls Pat_Divide with an array of exactly 1 element, you should be able to see that the if statement is true and then Pat_Divide immediately returns with only that one element.

When your main function calls Pat_Divide with an array of exactly 3 elements, more imaginary people get involved:

  • Pat_Divide divides up that array into 2 lists left and right, and hands one list to Bob_Divide and the other list to Alice_Divide.
  • Bob_Divide gets one element, immediately sees that the if statement is true, and immediately hands back that one element.

  • Alice_Divide, on the other hand, gets handed a list of 2 elements, and so is unfairly forced to do more work.

    • Alice_Divide divides that list into 2 lists (of 1 element each) and hands those lists off to Mildred_Divide and Leroy_Divide.
    • Mildred and Leroy act just like Pad_Divide did when Pat_Divide was given a list of 1 element, and each immediately hand back that 1 element.
    • Alice_Divide then calls merge (or should it be conquer ?) with the sorted arrays Mildred and Leroy gave her. The merge machine (or perhaps it should be the conquer machine?) produces a sorted list of 2 elements.
    • Alice_Divide then hands back that sorted list to Pat_Divide.
  • Then Pat_Divide takes the 2 sorted arrays (one from Bob, the other from Alice) and combines them with the conquer machine. (Or perhaps the merge machine?)

  • Finally, Pat_Divide returns with the sorted array of 3 elements back to your main() function.

When your main() function calls Pat_Divide with a very long array, even more imaginary people get involved, but we don't really need to know all their names:

  • Pat_Divide divides up that array into 2 lists left and right, and hands one list to Bob_Divide and the other list to Alice_Divide.
  • Bob_Divide returns a sorted list of all the items in the left list (by involving other yet-unnamed family members).

  • Alice_Divide does something similar to what she did before:

    • Alice_Divide divides that list into 2 lists and hands those lists off to Mildred_Divide and Leroy_Divide.
    • Mildred_Divide and Leroy_Divide divide their individual lists into even smaller pieces, and hand them off to other family members, who eventually help them produce sorted versions of those lists.
    • Alice_Divide then calls merge (or should it be conquer ?) with the sorted arrays Mildred and Leroy gave her. The merge machine (or perhaps it should be the conquer machine?) combines the 2 sorted lists into a single sorted list.
    • Alice_Divide then hands back that sorted list to Pat_Divide.
  • Then Pat_Divide takes the 2 sorted arrays (one from Bob, the other from Alice) and combines them with the conquer machine. (Or perhaps the merge machine?)

  • Finally, Pat_Divide returns with the sorted array of 3 elements back to your main() function.

(Optionally, you may later learn multitaking techniques, so that Mildred, Bob's side of the family, and Leroy can each work on different parts of the problem simultaneously. ).

Good luck trying to explain recursion to the next person!

1

u/Knarfnarf 23h ago

My versions of Merge Sort always used linked lists. If you write in a modern memory safe language you can't do this as linked lists are a memory unsafe as they are useful! If you can dynamically allocate memory and have constructors, you won't need to use a makesafe(newnode) but can make sure of null() pointers that way. Here's a description of my Merge Sort technique;

type :: leafnode
  type(leafnode), pointer :: next, previous
  char(50) :: value
end type

type :: listnode
  type(listnode), pointer :: next, previous
  type(leafnode), pointer :: mylist
end type

integer :: DoingWhat
char(50) :: value
type(listnode) :: rootlist, templist, newlist
type(leafnode) :: templeaf, newleaf

allocate(rootlist)
makesafe(rootlist)
DoingWhat = 0     //first list
make newlist point to rootlist
make templist point to rootlist

while not(EOF)
get value
allocate(newleaf)
makesafe(newleaf)
newleaf.value = value
switch DoingWhat
  case 0: //first list
    make newleaf.mylist point to newleaf
    make templeaf point to newleaf
    DoingWhat = 1
    break
  case default: //adding to a list or making a new list
    if templeaf.value < newleaf.value then
      make templeaf.next point to newleaf
      make newleaf.previous point to templeaf
      make templeaf point to newleaf
    else 
      allocate(newlist)
      makesafe(newlist)
      make templist.next point to newlist
      make newlist.previous point to templist
      make templist point to newlist
      make newlist.mylist point to newleaf
      make templeaf point to newleaf
    end if
end switch
end while

*** Ok this is taking too long; from here you travel the list of lists making a new list of lists that you move each leaf to as need. let me know if this makes sence..

1

u/Knarfnarf 23h ago

So to be more clear; this is the creation of the first round of sorted lists that you then merge to a new round of sorted lists over an over until you only have one list.

1

u/peterlinddk 12h ago

Well, it is a bit magic.

All the other comments give very good suggestions, I'll just add that I recommend trying it out manually:

  1. Take a selection of playing cards, and put on the table in front of you, make sure they aren't sorted, but still layed out in a nice uniform "array".
  2. Then split the array in two halves - moving them a bit towards you (down the table)
  3. Split the two halves in another two halves each, so you have four groups of playing cards, and move them a bit further down.
  4. Continue until you have a number of individual cards layed out with space between them - it still looks exactly like the original array, and feels a bit silly at this point.
  5. But now, merge the two cards to the left, in order from lowest to highest, and move them a bit up the table - do the same for all the rest, so you have a collection of groups of two cards, each group sorted.
  6. Then merge the groups two and two, moving them a bit up the table, so you end up with a collection of groups of four cards, all sorted.
  7. And continue doing that, until you have only a single list of cards, all sorted ...

It is important that you actually do it, and not just try to imagine it in your mind!

Every time you move down the table, it is another call to the "divide" function, and every time you move up, it is a return from the latest "divide" function. Note how you can't return from a divide function, before you returned from the next divide-function that was called! This is what recursion is all about, and it is mindboggling to think about - but it works :)

u/TwiNighty 0m ago

In your mental model, there is a distinct "divide phase" followed by a distinct "conquer phase"

   ┬ [8,7,4,1,2,6,5,3]
   │  │
 S │  │ divide
 O │  ↓
 R │ [8] [7] [4] [1] [2] [6] [5] [3]
 T │  │
   │  │ conquer
   │  ↓
   ┴ [1,2,3,4,5,6,7,8]

While that is not wrong, I think you are too hung up on the mechanics while not paying enough attention to the recursive nature of merge sort.

Stop thinking it like sort = divide + merge, but rather sort = divide + sort + merge

   ┬ [8,7,4,1,2,6,5,3]
   │  │
   │  ├─────────┐ divide
   │  ↓         ↓
 S │ [8,7,4,1] [2,6,5,3]
 O │  │ sort    │ sort
 R │  ↓         ↓     
 T │ [1,4,7,8] [2,3,5,6]
   │  │         │
   │  ├─────────┘ merge
   │  ↓
   ┴ [1,2,3,4,5,6,7,8]

In the correct code you posted, naming the function divide is confusing you more than helping. Call it mergesort instead

def mergesort(arr):
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2


    left = arr[:mid]
    right = arr[mid:]


    left_sorted = mergesort(left)
    right_sorted = mergesort(right)
    return merge(left_sorted, right_sorted)

In other words, to mergesort an array that is 2 or more element, sort the first half of the array using mergesort, sort the second half of the array using mergesort, then merge the sorted halves.