1
\$\begingroup\$

Problem Statement

(Source: Leetcode Problem 4: Median of Two Sorted Arrays [Hard])
(Topics: [Array] [Binary Search] [Divide and Conquer])

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.

The overall time complexity should be O(log(m+n)).

I was able to solve this problem with the following code:

def findMedianSortedArrays(self, nums1, nums2):
    nums = nums1 + nums2
    nums.sort()
    if len(nums) % 2 == 0:
        return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0
    else:
        return nums[len(nums) // 2] 

which has a time complexity of \$O((m+n)\log(m+n))\$ and space complexity of \$O(m+n)\$.

Admittedly while I didn't find the problem too difficult overall, I do want to still try to improve the time complexity of my code to at least \$O(\log(m+n))\$.

My question is:

What am I misunderstanding about the approach needed to make the code run in \$O(\log(m+n))\$ time?

Edit

Per toolic's suggestion, I have modified the code to run on newer versions of Python (of course also using snake_case to write function names for readability). I used this online Python editor to test my fixes:

def find_median_two_sorted_arrays(nums1: list, nums2: list):
    nums = nums1 + nums2
    nums.sort
    if len(nums) % 2 == 0:
        return (nums[(len(nums) // 2) - 1] + nums[len(nums) // 2]) / 2.0
    else:
        return nums[len(nums) // 2] 

Note: I would like to say that I know (after doing some research) that this can be done in \$O(\log(\min(n,m)))\$ time complexity with \$O(1)\$ space complexity. However, this is not what I want as this is admittedly very out of my reach with my understanding of Python.

\$\endgroup\$
3
  • 2
    \$\begingroup\$ A O(log(min(m, n))) approach has been mentioned here: codereview.stackexchange.com/a/202358/35991. \$\endgroup\$
    – Martin R
    Commented Dec 6, 2024 at 17:07
  • 2
    \$\begingroup\$ @TobySpeight The time-limit-exceeded tag states "You may use this tag instead of performance when an online judge rejects your solution to a programming-challenge due to time constraints". We explicitly allow code which doesn't scale. \$\endgroup\$
    – Peilonrayz
    Commented Dec 9, 2024 at 1:29
  • \$\begingroup\$ The crucial point of the task is finding a median of two arrays, not a single array obtained by joining the two. \$\endgroup\$
    – CiaPan
    Commented Dec 11, 2024 at 18:37

2 Answers 2

7
\$\begingroup\$

mentioned that \$O(\log \min(n,m))\$ would be really outside of what I can understand with my current knowledge.

Nah! It's really about the sizes \$n, m\$, rather than about the algorithm. If one is of comparable magnitude to the other, then \$O(\min(n, m))\$ is simply \$O(n)\$, a case that's easily dealt with. For the \$\min\$ to be interesting, one of these must hold:

  • \$n \ll m\$, or
  • \$n \gg m\$

That's saying that one of the input lists is essentially "empty" for purposes of Big-Oh analysis, and won't significantly contribute to the overall time.

Take e.g. the \$n \ll m\$ case. Since \$n\$ is so small, we could merge its entries into the larger list with negligible cost. Or just use a \$O(\log m+n))\$ solution, confident that that's really \$O(\log m)\$.


improve the time complexity of my code to at least \$O(\log m+n))\$.

Your code uses a linear time expression, nums1 + nums2. You need to be able to talk about "the midpoint of the combined arrays" without ever touching the majority of array elements. A datastructure like this will assist with that:

class SlicedList:
    """Models a slice of a list using start and end index, with zero copies."""

    def __init__(
        self, nums: list[float], start: int = 0, end: int | None = None
    ) -> None:
        self.nums = nums
        self.start = start
        self.end = len(nums) if end is None else end

    def slice(self, start: int, end: int | None = None) -> "SlicedList":
        length = self.end - self.start
        end_i: int = length if end is None else end
        assert 0 <= start <= end_i <= length

        return SlicedList(self.nums, self.start + start, self.start + end_i)

    def __getitem__(self, i: int) -> float:
        return self.nums[self.start + i]

    def __len__(self) -> int:
        return self.end - self.start

And then this will help with the binary searching.

def kth_idx(a: SlicedList, b: SlicedList, k: int) -> tuple[int, int]:  # noqa PLR0911
    """Given two sorted lists of numbers, return (list_id, idx) in O(log n) time.

    We are concerned with the kth element of the merged lists a and b.

    list_id: 0 for a, 1 for b, according to which contains the kth sorted value
    idx: index of the kth element in either list a or b
    """
    assert 0 <= k < len(a) + len(b)
    if not a:
        return 1, k
    if not b:
        return 0, k
    if k == 0:
        return int(a[0] > b[0]), k

    # binary search
    ia, ib = len(a) // 2, len(b) // 2
    if ia + ib < k:
        if a[ia] > b[ib]:
            return kth_idx(a, b.slice(ib + 1), k - ib - 1)
        return kth_idx(a.slice(ia + 1), b, k - ia - 1)

    if a[ia] > b[ib]:
        return kth_idx(a.slice(0, ia), b, k)
    return kth_idx(a, b.slice(0, ib), k)
\$\endgroup\$
1
  • \$\begingroup\$ And nums.sort() is even worse than nums1+nums2. \$\endgroup\$
    – Teepeemm
    Commented Dec 12, 2024 at 23:54
4
\$\begingroup\$

The previous answer addresses your main concern about complexity. This will address one of the now-deleted comments on the question regarding functionality.

Portability

The function that was posted must be part of a class which was not posted. The code can't be run as-is due to the presence of the self keyword. I removed self and added an example function call:

def findMedianSortedArrays(nums1, nums2):
    nums = nums1 + nums2
    nums.sort
    if len(nums) % 2 == 0:
        return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0
    else:
        return nums[len(nums) // 2] 

print(findMedianSortedArrays([6,4,5], [2,3,1]))

When I run the code on Python version 2.7, I get the expected answer:

3.5

However, when I run the code on Python version 3.7, I get an error:

TypeError: list indices must be integers or slices, not float

Since Python 2.7 is deprecated, I suggest modifying the code to run on newer 3.x versions as well.

Naming

The PEP 8 style guide recommends using snake_case for function names:

find_median_sorted_arrays
\$\endgroup\$
0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.