Python program to find Edit Distance between two strings

The edit distance between two strings measures the minimum number of operations that are needed to convert one string into another. There are various operations that may be performed, including insertion, deletion, and substitution of a single character. The edit distance is also referred to as the Levenshtein distance or the minimum edit distance.

An algorithm to find the edit distance between two strings:

  1. Initialize a 2D array `dp` with dimensions (m+1) x (n+1), where m and n are the lengths of the two strings.
  2. Initialize the first row and first column of `dp` with values from 0 to n and 0 to m, respectively.
  3. Loop through the remaining cells of `dp`:
    1. If the characters at the current positions in the two strings are the same, set `dp[i][j]` to the value of `dp[i-1][j-1]`.
    2. Otherwise, set `dp[i][j]` to the minimum of the following values:
      1. `dp[i-1][j] + 1` (deletion)
      2. `dp[i][j-1] + 1` (insertion)
      3. `dp[i-1][j-1] + 1` (substitution)
  4. The edit distance between the two strings is `dp[m][n]`.

Python code to implement this algorithm:

Output:

3
8

Explanation:

In the first example, the edit distance between "kitten" and "sitting" is 3. We can transform "kitten" into "sitting" by substituting "k" with "s", substituting "e" with "i", and inserting "g" at the end.

In the second example, the edit distance between "rosettacode" and "raisethysword" is 8. We can transform "rosettacode" into "raisethysword" by deleting "o", substituting "c" with "i", substituting "d" with "s", substituting "e" with "t", inserting "h" after "t",

Time Complexity:

  • The nested loops iterate through all the cells of the 2D array dp. The outer loop runs m+1 times and the inner loop runs n+1 Therefore, the time complexity of the nested loops is O(mn).
  • The operations performed inside the inner loop are all constant time operations.

Therefore, the overall time complexity of the function edit_distance() is O(mn).

Space Complexity:

  • The space complexity of the 2D array dp is (m+1) x (n+1).

Therefore, the space complexity of the function edit_distance() is O(mn).

There are other approaches to calculating the edit distance between two strings in Python. Here are a few:

  • Recursive approach with memoization: In this approach, we can use recursion to calculate the edit distance between the two strings. We can store the result of each subproblem in a memoization table so that we can reuse the result if we encounter the same subproblem again. This approach has a time complexity of O(mn) in the worst case, where m and n are the lengths of the two strings.

For Example:

Output:

3
  • Iterative approach with a rolling array: In this approach, we can use an iterative approach to calculate the edit distance between the two strings. We can use a rolling array to store only the previous row of the memoization table instead of the entire table. This approach reduces the space complexity to O(n), where n is the length of the shorter string. The time complexity remains O(mn) in the worst case.

For Example:

Output:

3
  • Using NumPy: NumPy is a popular Python library for numerical computing. It provides efficient and optimized functions for array operations. We can use the numpy library to implement the edit distance algorithm. This approach has a time complexity of O(mn) in the worst case and a space complexity of O(mn).

For Example:

Output:

3

Note: The NumPy approach is similar to the first approach in terms of time and space complexity, but it provides an efficient way to perform the matrix operations using the built-in functions of the NumPy library.

  • One such approach is using dynamic programming.

In dynamic programming, we create a matrix where each cell represents the edit distance between two substrings. We start by initializing the first row and column with incremental values from 0 to n, where n is the length of the string. After that, we iterate over the remaining cells in the matrix and fill them using the following formula:

The final answer will be in the last cell of the matrix, i.e., dp[n][n].

For Example:

Output:

3





Latest Courses