githubEdit

30. Count of Distinct Substrings

βœ… GFG solution to the Count of Distinct Substrings problem: find total number of unique non-empty substrings using efficient trie-based approach. πŸš€

The problem can be found at the following link: πŸ”— Question Linkarrow-up-right

🧩 Problem Description

You are given a string s consisting of lowercase English characters. Your task is to determine the total number of distinct non-empty substrings present in the string.

A substring is defined as a contiguous block of characters within the string. Two substrings are considered distinct if their contents differ, even if they originate from different positions in the string.

Note: The empty substring is not counted.

πŸ“˜ Examples

Example 1

Input: s = "ababa"
Output: 9
Explanation: All distinct substrings of "ababa" are: "a", "b", "ab", "ba", "aba", "bab", "abab", "baba", "ababa".

Example 2

Input: s = "aaa"
Output: 3
Explanation: The distinct substrings of "aaa" are: "a", "aa", "aaa".

πŸ”’ Constraints

  • $1 \le \text{s.size()} \le 3000$

βœ… My Approach

The optimal approach uses a Trie (Prefix Tree) data structure to efficiently count distinct substrings:

Trie-Based Substring Counting

  1. Initialize Trie:

    • Create a trie root node with 26 children (for each lowercase letter).

    • Maintain a counter to track newly created nodes.

  2. Generate All Suffixes:

    • For each starting position i in the string, consider it as the beginning of a suffix.

    • Start from position i and traverse to the end of the string.

  3. Insert Characters into Trie:

    • For each character in the current suffix, navigate through the trie.

    • If a child node doesn't exist for the current character, create it and increment the counter.

    • Each new node represents a new unique substring.

  4. Count Unique Substrings:

    • The total count of new nodes created equals the number of distinct substrings.

    • Each path from root to any node represents a unique substring.

  5. Return Result:

    • Return the final count of distinct substrings.

πŸ“ Time and Auxiliary Space Complexity

  • Expected Time Complexity: O(nΒ²), where n is the length of the string. We generate all possible substrings by iterating through each starting position and extending to the end, and each character insertion in the trie takes O(1) time.

  • Expected Auxiliary Space Complexity: O(nΒ²), as in the worst case (all characters are distinct), the trie will store all possible substrings, requiring space proportional to the total number of characters across all unique substrings.

πŸ§‘β€πŸ’» Code (C++)

chevron-right⚑ View Alternative Approaches with Code and Analysishashtag

πŸ“Š 2️⃣ Set-Based Approach

πŸ’‘ Algorithm Steps:

  1. Use an unordered set to store all unique substrings.

  2. Generate all possible substrings using nested loops.

  3. Insert each substring into the set for automatic deduplication.

  4. Return the size of the set as the count.

πŸ“ Complexity Analysis:

  • Time: ⏱️ O(nΒ³) - Substring generation and insertion

  • Auxiliary Space: πŸ’Ύ O(nΒ²) - Storage for all substrings

βœ… Why This Approach?

  • Simple and intuitive implementation

  • Automatic handling of duplicates via set

  • Easy to debug and verify

Note: This approach results in Time Limit Exceeded (TLE) for large inputs (fails ~1110/1121 test cases due to time constraints).

πŸ“Š 3️⃣ Suffix Array Approach

πŸ’‘ Algorithm Steps:

  1. Build a suffix array of all suffixes.

  2. Sort the suffix array lexicographically.

  3. Compute longest common prefix between consecutive suffixes.

  4. Calculate unique substrings using formula: total substrings minus duplicate LCP counts.

πŸ“ Complexity Analysis:

  • Time: ⏱️ O(nΒ² log n) - Suffix sorting dominates

  • Auxiliary Space: πŸ’Ύ O(nΒ²) - Storage for suffixes

βœ… Why This Approach?

  • Mathematical optimization reduces redundant checking

  • Scales better for very large strings

  • Classic string algorithm pattern

πŸ“Š 4️⃣ Optimized Trie with Inline Node

πŸ’‘ Algorithm Steps:

  1. Create a compact trie structure with inline node definition.

  2. Insert all suffixes starting from each position.

  3. Count new nodes created during insertion.

  4. Each new node represents a unique substring.

πŸ“ Complexity Analysis:

  • Time: ⏱️ O(nΒ²) - Optimal for trie-based approach

  • Auxiliary Space: πŸ’Ύ O(nΒ²) - Worst case trie storage

βœ… Why This Approach?

  • Most space-efficient trie implementation

  • Compact code with inline struct

  • Best balance of readability and performance

πŸ†š πŸ” Comparison of Approaches

πŸš€ Approach

⏱️ Time Complexity

πŸ’Ύ Space Complexity

βœ… Pros

⚠️ Cons

🌲 Trie Standard

🟒 O(n²)

🟑 O(n²)

πŸš€ Optimal time complexity

πŸ’Ύ Moderate space usage

πŸ” Set-Based

πŸ”΄ O(nΒ³)

🟑 O(n²)

πŸ“– Simplest implementation

🐌 Slowest time complexity

πŸ“Š Suffix Array

🟑 O(n² log n)

🟑 O(n²)

🎯 Mathematical elegance

πŸ”§ Complex to implement

⚑ Optimized Trie

🟒 O(n²)

🟒 O(n²)

⭐ Most compact code

πŸ”§ Requires understanding tries

πŸ† Best Choice Recommendation

🎯 Scenario

πŸŽ–οΈ Recommended Approach

πŸ”₯ Performance Rating

πŸ… Optimal performance needed

πŸ₯‡ Optimized Trie

β˜…β˜…β˜…β˜…β˜…

πŸ“– Learning/Prototyping

πŸ₯ˆ Set-Based

β˜…β˜…β˜…β˜†β˜†

πŸ”§ Large strings (n > 10⁴)

πŸ₯‰ Suffix Array

β˜…β˜…β˜…β˜…β˜†

🎯 Interview/Competitive

πŸ… Trie Standard

β˜…β˜…β˜…β˜…β˜…

β˜• Code (Java)

🐍 Code (Python)

🧠 Contribution and Support

For discussions, questions, or doubts related to this solution, feel free to connect on LinkedIn: πŸ“¬ Any Questions?arrow-up-right. Let's make this learning journey more collaborative!

⭐ If you find this helpful, please give this repository a star! ⭐


πŸ“Visitor Count

Visitor counter

Last updated