03. Substrings with K Distinct

โœ… GFG solution to the Substrings with K Distinct Characters problem using sliding window and optimized frequency count. ๐Ÿš€

The problem can be found at the following link: ๐Ÿ”— Question Link

๐Ÿงฉ Problem Description

Given a string consisting of lowercase characters and an integer k, the task is to count all possible substrings (not necessarily distinct) that have exactly k distinct characters.

๐Ÿ“˜ Examples

Example 1

Input: s = "abc", k = 2
Output: 2
Explanation: Possible substrings are ["ab", "bc"]

Example 2

Input: s = "aba", k = 2
Output: 3
Explanation: Possible substrings are ["ab", "ba", "aba"]

Example 3

Input: s = "aa", k = 1
Output: 3
Explanation: Possible substrings are ["a", "a", "aa"]

๐Ÿ”’ Constraints

  • $1 \le |s| \le 10^6$

  • s contains only lowercase English letters

  • $1 \le k \le 26$

โœ… My Approach

We use a two-pass sliding window technique to count the number of substrings with exactly k distinct characters. The idea is to leverage a helper function that counts substrings with at most k distinct characters, and then use the inclusion-exclusion principle to find the exact count.

๐Ÿ’ก Idea:

  • Define a helper function count(s, k) that returns the number of substrings with at most k distinct characters.

  • Use a sliding window with two pointers and a frequency array of size 26 (since the string contains only lowercase letters).

  • Expand the window by moving the right pointer, and keep track of the frequency of characters.

  • If the number of distinct characters exceeds k, shrink the window from the left until the window contains at most k distinct characters.

  • For each position j, add (j - i + 1) โ€” the number of valid substrings ending at j โ€” to the result.

  • Finally, compute:

    substrings with exactly k distinct = count(s, k) - count(s, k - 1)

This subtraction removes substrings that have fewer than k distinct characters, leaving only those with exactly k.

โš™๏ธ Algorithm Steps:

  1. Implement a helper function count(s, k):

    • Initialize a frequency array freq[26] and two pointers i = 0, j = 0.

    • Iterate through the string with pointer j:

      • Increment frequency of s[j].

      • If distinct characters exceed k, move i forward while updating frequencies until the window has at most k distinct characters.

      • Add (j - i + 1) to the result count for valid substrings ending at j.

  2. Call count(s, k) and count(s, k - 1) and return their difference.

๐Ÿ“ Time and Auxiliary Space Complexity

  • Time Complexity: O(n) โ€” Each character is processed at most twice (once when added to the window, once when removed).

  • Auxiliary Space Complexity: O(1) โ€” Fixed-size frequency array of length 26 for lowercase English letters.

๐Ÿง‘โ€๐Ÿ’ป Code (C++)

class Solution {
  public:
    int count(string &s, int k) {
        int n = s.length(), ans = 0;
        vector<int> freq(26, 0);
        int i = 0, distinct = 0;
        for (int j = 0; j < n; ++j) {
            if (++freq[s[j] - 'a'] == 1) ++distinct;
            while (distinct > k)
                if (--freq[s[i++] - 'a'] == 0) --distinct;
            ans += j - i + 1;
        }
        return ans;
    }

    int countSubstr(string &s, int k) {
        return count(s, k) - count(s, k - 1);
    }
};
โšก View Alternative Approaches with Code and Analysis

๐Ÿ“Š 2๏ธโƒฃ HashMap with Sliding Window

Instead of a fixed-size array for lowercase, use an unordered_map for any character set (e.g., Unicode).

๐Ÿ’ก Algorithm Steps:

  1. Define helper atMostK(string &s, int k) that uses unordered_map<char,int> to count frequencies.

  2. Maintain two pointers i and j over the string.

  3. Increment mp[s[j]]; if it becomes 1, decrement k.

  4. While k < 0, decrement mp[s[i]]; if it becomes 0, increment k, then increment i.

  5. Accumulate res += j - i + 1 for valid windows ending at j.

  6. Return atMostK(s, k) - atMostK(s, k - 1).

class Solution {
  public:
    int atMostK(string &s, int k) {
        unordered_map<char, int> mp;
        int i = 0, res = 0;
        for (int j = 0; j < s.size(); ++j) {
            if (++mp[s[j]] == 1) --k;
            while (k < 0)
                if (--mp[s[i++]] == 0) ++k;
            res += j - i + 1;
        }
        return res;
    }

    int countSubstr(string &s, int k) {
        return atMostK(s, k) - atMostK(s, k - 1);
    }
};

๐Ÿ“ Complexity Analysis:

  • Time: โฑ๏ธ O(n)

  • Auxiliary Space: ๐Ÿ’พ O(k)

โœ… Why This Approach?

  • Works with any character set (not just aโ€“z).

  • Maintains O(n) runtime with two-pointer sliding window.

  • Slight overhead for hash operations but generalizes to Unicode or larger alphabets.

๐Ÿ“Š 3๏ธโƒฃ Brute Force (Nested Loops + Distinct Count)

Enumerate every substring and count distinct characters directly.

๐Ÿ’ก Algorithm Steps:

  1. Initialize ans = 0.

  2. For each start index i from 0 to nโˆ’1:

    1. Initialize a vector<int> freq(26, 0) or unordered_set to track seen characters.

    2. For each end index j from i to nโˆ’1:

      1. Increment frequency or insert s[j].

      2. If distinct count equals k, increment ans.

      3. If distinct count exceeds k, break inner loop (further extensions only increase distinct).

  3. Return ans.

class Solution {
  public:
    int countSubstr(string &s, int k) {
        int n = s.length(), ans = 0;
        for (int i = 0; i < n; ++i) {
            vector<int> freq(26, 0);
            int distinct = 0;
            for (int j = i; j < n; ++j) {
                if (++freq[s[j] - 'a'] == 1) ++distinct;
                if (distinct == k) ++ans;
                if (distinct > k) break;
            }
        }
        return ans;
    }
};

๐Ÿ“ Complexity Analysis:

  • Time: โฑ๏ธ O(nยฒ) in the best case (break early when distinct > k), up to O(nยฒ) overall.

  • Auxiliary Space: ๐Ÿ’พ O(1) (or O(alphabet) for frequency).

โœ… Why This Approach?

  • Very straightforward: directly counts distinct for each substring.

  • Useful for small strings or educational purposes.

  • Not suitable for large inputs due to O(nยฒ) scanning.

๐Ÿ†š ๐Ÿ” Comparison of Approaches

๐Ÿš€ Approach

โฑ๏ธ Time Complexity

๐Ÿ’พ Space Complexity

โœ… Pros

โš ๏ธ Cons

๐Ÿ”ข Fixed Array + Sliding Window

๐ŸŸข O(n)

๐ŸŸข O(1)

Fastest for lowercase-aโ€“z, minimal overhead

Limited to fixed alphabet

๐Ÿ” HashMap + Sliding Window

๐ŸŸข O(n)

๐ŸŸก O(k)

Supports any character set (Unicode, larger alphabets)

Slight hash overhead

๐Ÿงฎ Brute Force (Nested Loops)

๐Ÿ”ด O(nยฒ)

๐ŸŸข O(1)

Simple to implement and understand

Very slow on large strings

๐Ÿ† Best Choice by Scenario

๐ŸŽฏ Scenario

๐Ÿฅ‡ Recommended Approach

๐ŸŒ Any character set (Unicode/large alphabets)

๐Ÿฅ‡ HashMap + Sliding Window

๐Ÿ”ฅ Strictly lowercase English letters (aโ€“z)

๐Ÿฅˆ Fixed Array + Sliding Window

๐Ÿ“š Educational or very small input size

๐Ÿฅ‰ Brute Force (Nested Loops)

๐Ÿง‘โ€๐Ÿ’ป Code (Java)

class Solution {
    int count(String s, int k) {
        int[] freq = new int[26];
        int i = 0, ans = 0, distinct = 0;
        for (int j = 0; j < s.length(); ++j) {
            if (++freq[s.charAt(j) - 'a'] == 1) ++distinct;
            while (distinct > k)
                if (--freq[s.charAt(i++) - 'a'] == 0) --distinct;
            ans += j - i + 1;
        }
        return ans;
    }

    int countSubstr(String s, int k) {
        return count(s, k) - count(s, k - 1);
    }
}

๐Ÿ Code (Python)

class Solution:
    def count(self, s, k):
        freq = [0] * 26
        i = ans = distinct = 0
        for j in range(len(s)):
            if freq[ord(s[j]) - ord('a')] == 0:
                distinct += 1
            freq[ord(s[j]) - ord('a')] += 1
            while distinct > k:
                freq[ord(s[i]) - ord('a')] -= 1
                if freq[ord(s[i]) - ord('a')] == 0:
                    distinct -= 1
                i += 1
            ans += j - i + 1
        return ans

    def countSubstr(self, s, k):
        return self.count(s, k) - self.count(s, k - 1)

๐Ÿง  Contribution and Support

For discussions, questions, or doubts related to this solution, feel free to connect on LinkedIn: ๐Ÿ“ฌ Any Questions?. Letโ€™s make this learning journey more collaborative!

โญ If you find this helpful, please give this repository a star! โญ


๐Ÿ“Visitor Count

Visitor counter

Last updated