Discussion on Substring Diff Challenge

7 months ago+ 0 comments

One approach to solve the Longest Common Substring (LCS) problem involves aligning the two strings and iterating over possible alignments to find the longest match. Given two strings, e.g., "abcde" and "efghicd" with LCS "cd", the alignment could look like:

     abcde
efghicd

Multiple such alignments are possible. By iterating over these alignments and comparing characters, we can discover the LCS.

Below is the implementation of this method to find LCS:

function longestCommonString(a, b) {
  /**
   * The algorithm works with two strings of varying lengths.
   * Swap the strings to ensure the shorter string is assigned to the variable 'a' to simplify the logic.
   * This swap can be omitted in the main challenge since both input strings have the same length.
   */
  if (a.length > b.length) {
    [a, b] = [b, a];
  }

  let aStart = 0;
  let aEnd = 0;
  let bStart = b.length - 1;

  let result = "";

  do {
    // Section A
    /* This part will be modified to solve the given challenge */
    let tmp = "";
    let bIndex = bStart;
    for (let i = aStart; i <= aEnd; ++i) {
      const aChar = a[i];
      const bChar = b[bIndex++];
      if (aChar === bChar) {
        tmp += aChar;
      }

      // If characters are different or it's the last character, update the result if necessary.
      if (i === aEnd || aChar !== bChar) {
        if (tmp.length > result.length) {
          result = tmp;
        }
        tmp = "";
      }
    }
    /* End of Section A */

    // Slide window to align next characters of the strings
    if (aEnd < a.length - 1) {
      ++aEnd;
    } else if (bStart === 0 && aStart < a.length) {
      ++aStart;
    }

    if (bStart > 0) {
      --bStart;
    }
  } while (aStart !== a.length);

  return result;
}

In the algorithm, we use variables aStart, aEnd, and bStart to manage slices of the strings a and b for each alignment. The tmp string accumulates matching characters until we encounter a mismatch or reach the end. If tmp's length surpasses the current result, we update the result.

Given two strings, the challenge is to find the length of the longest common substring that has no more than 'k' differing characters. We approach this problem by adapting our earlier longest common substring algorithm, specifically by replacing Section A with Section B to account for the differences allowed.

Here's how our adapted algorithm works:

function substringDiff(k, a, b) {
  let aStart = 0;
  let aEnd = 0;
  let bStart = b.length - 1;

  let result = 0;
  do {
    // Section B
    /**
     * 1. Iterate through the slices of `a` and `b`, and save the indices of characters that differ in `diffIndices`.
     * 2. If the length of `diffIndices` is less than or equal to k, it means the whole substring meets the criteria.
     * 3. If not, we iterate through `diffIndices`, calculating the length of the substring that contains 
     *    exactly `k` differing characters.
    */
    let diffIndices = [];
    let lastDiffPosition = aStart;  // Beginning of the substring containing k differences.
    let bIndex = bStart;
    
    for (let i = aStart; i <= aEnd; ++i) {
      const aChar = a[i];
      const bChar = b[bIndex++];
      if (aChar !== bChar) {
        diffIndices.push(i);
      }
    }

    if (diffIndices.length <= k) {
      result = Math.max(result, aEnd - aStart + 1);
    } else {
      for (let i = 0; i <= diffIndices.length - k; ++i) {
        result = Math.max(
          result,
					/**
					 * Calculate the length of the substring with 'k' differing characters:
					 * - `diffIndexes[i + k]`: Represents the ending index of our substring, which is not included.
					 *   There are 'k' differences between the indices `diffIndices[i]` and `diffIndices[i + k] - 1`.
					 * - The subtraction of indices (like `A - B`) is used to determine the length of a substring 
					 *   starting from index B and ending before index A. E.g., for a difference of 5-2, the substring 
					 *   covers indices 2, 3, and 4.
					 * - `aEnd + 1`: Adjusts to ensure the last character in the substring is included.
					 */
          (diffIndices[i + k] ?? aEnd + 1) - lastDiffPosition
        );
        lastDiffPosition = diffIndices[i] + 1; // Update for the next substring starting position.
      }
    }

    // Proceed to the next slice.
    if (aEnd < a.length - 1) {
      ++aEnd;
    } else if (bStart === 0 && aStart < a.length) {
      ++aStart;
    }

    if (bStart > 0) {
      --bStart;
    }
  } while (aStart !== a.length);

  return result;
}

To understand the algorithm:

We start by aligning the two strings and sliding them against each other, similar to our earlier approach for finding the longest common substring.
For each alignment, we compute the number of differing characters (saved in diffIndices).
If the total differences are less than or equal to 'k', it's straightforward.
Otherwise, we use a sliding window on diffIndices to find the longest substring that has exactly k differences.
We continue this process, shifting the alignment, until we've explored all possible alignments.

To determine the time complexity of the algorithm, we need to focus on the number of iterations our inner loop performs.

Given two strings a and b with lengths n = |a| and m = |b| respectively, and assuming n <= m, the number of iterations in the inner loop can be captured as:

The sum of the first n integers for the increasing overlap of a with b: 1 + 2 + ... + n which equals n(n+1)/2.
n(m-n) iterations for the middle section where a slides over b with a constant length of n.
The sum of integers from n-1 to 1 for the decreasing overlap of a as it exits b: n-1 + n-2 + ... + 1 which equals n(n-1)/2.

Combining these:

n(n+1)/2 + n(m-n) + n(n-1)/2 = n^2 + nm - n^2 = nm

Therefore, the time complexity of the algorithm is O(nm).

Cookie support is required to access HackerRank