07. Number of Distinct Subsequences

βœ… GFG solution to the Number of Distinct Subsequences problem: find the count of distinct subsequences of a string using dynamic programming with modulo arithmetic. πŸš€

The problem can be found at the following link: πŸ”— Question Link

🧩 Problem Description

Given a string str consisting of lowercase English alphabets, the task is to find the number of distinct subsequences of the string.

Note: Answer can be very large, so output will be answer modulo 10^9+7.

A subsequence is a sequence that can be derived from another sequence by deleting zero or more elements without changing the order of the remaining elements. For example, "abc" is a subsequence of "aebdc" but "aec" is not.

πŸ“˜ Examples

Example 1

Input: str = "gfg"
Output: 7
Explanation: 
The seven distinct subsequences are "", "g", "f", "gf", "fg", "gg" and "gfg".

Example 2

Input: str = "ggg"
Output: 4
Explanation: 
The four distinct subsequences are "", "g", "gg", "ggg".

πŸ”’ Constraints

  • $1 \le |str| \le 10^5$

  • str contains lowercase English alphabets

βœ… My Approach

The optimal approach uses Dynamic Programming to efficiently count distinct subsequences:

Dynamic Programming with Last Occurrence Tracking

  1. Understanding Subsequences:

    • For any string, the number of subsequences is 2^n (including empty string).

    • However, when characters repeat, some subsequences become identical.

    • We need to count only distinct subsequences.

  2. DP State Definition:

    • Let dp[i] represent the number of distinct subsequences of the prefix ending at index i.

    • Initialize dp[0] = 1 for the empty subsequence.

  3. Recurrence Relation:

    • When adding a new character, we can either include it in existing subsequences or not.

    • This doubles the count of subsequences: dp[i] = 2 * dp[i-1].

    • However, if the character has appeared before, we need to subtract the count of subsequences ending with the previous occurrence to avoid duplicates.

    • If last[c] represents the last index where character c appeared, then: dp[i] = 2 * dp[i-1] - dp[last[c]-1] (if character c has appeared before).

  4. Optimization:

    • Instead of storing the entire DP array, we can optimize space by just keeping track of the current result.

    • We maintain an array last to store the last occurrence of each character.

  5. Modulo Operation:

    • Since the result can be very large, we take modulo 10^9+7 at each step to prevent overflow.

πŸ“ Time and Auxiliary Space Complexity

  • Expected Time Complexity: O(n), where n is the length of the string. We process each character exactly once.

  • Expected Auxiliary Space Complexity: O(1), as we only use a fixed-size array of 26 elements to track last occurrences, regardless of the input size.

πŸ§‘β€πŸ’» Code (C++)

⚑ View Alternative Approaches with Code and Analysis

πŸ“Š 2️⃣ HashMap-Based Approach

πŸ’‘ Algorithm Steps:

  1. Use unordered map to store last occurrence of each character dynamically.

  2. Double the subsequence count at each step to include current character.

  3. Subtract previously counted subsequences ending with same character to avoid duplicates.

  4. Update map with current subsequence count for the character.

πŸ“ Complexity Analysis:

  • Time: ⏱️ O(n) - Single pass through string

  • Auxiliary Space: πŸ’Ύ O(k) - Map stores unique characters only (k ≀ 26)

βœ… Why This Approach?

  • Works with any character set beyond lowercase letters

  • Dynamic memory allocation based on actual characters present

  • Cleaner code without fixed array size assumptions

πŸ“Š 3️⃣ DP Array Approach

πŸ’‘ Algorithm Steps:

  1. Create DP array where dp[i] represents distinct subsequences up to index i.

  2. Initialize dp[0] as 1 for empty subsequence base case.

  3. For each position calculate by doubling previous count minus duplicates.

  4. Track last occurrence separately to handle repeated characters correctly.

πŸ“ Complexity Analysis:

  • Time: ⏱️ O(n) - Linear traversal with constant operations

  • Auxiliary Space: πŸ’Ύ O(n) - DP array stores all intermediate states

βœ… Why This Approach?

  • Explicit DP state representation for clarity

  • Easy to trace intermediate results for debugging

  • Standard DP pattern for subsequence problems

πŸ†š πŸ” Comparison of Approaches

πŸš€ Approach

⏱️ Time Complexity

πŸ’Ύ Space Complexity

βœ… Pros

⚠️ Cons

🎯 Array-Based

🟒 O(n)

🟒 O(1)

πŸš€ Fastest with fixed array

πŸ”§ Limited to lowercase letters

πŸ—ΊοΈ HashMap-Based

🟒 O(n)

🟒 O(k)

πŸ“– Flexible character set

πŸ’Ύ Slight overhead from map

πŸ“Š DP Array

🟒 O(n)

🟑 O(n)

🎯 Clear state representation

πŸ’Ύ Extra space for all states

πŸ† Best Choice Recommendation

🎯 Scenario

πŸŽ–οΈ Recommended Approach

πŸ”₯ Performance Rating

πŸ… Competitive programming

πŸ₯‡ Array-Based

β˜…β˜…β˜…β˜…β˜…

πŸ“– Extended character sets

πŸ₯ˆ HashMap-Based

β˜…β˜…β˜…β˜…β˜†

πŸ”§ Learning DP patterns

πŸ₯‰ DP Array

β˜…β˜…β˜…β˜…β˜†

β˜• Code (Java)

🐍 Code (Python)

🧠 Contribution and Support

For discussions, questions, or doubts related to this solution, feel free to connect on LinkedIn: πŸ“¬ Any Questions?. Let's make this learning journey more collaborative!

⭐ If you find this helpful, please give this repository a star! ⭐


πŸ“Visitor Count

Visitor counter

Last updated