githubEdit

21. Find H-Index

βœ… GFG solution to Find H-Index: calculate researcher's H-index using efficient frequency counting and bucket sort technique. πŸš€

The problem can be found at the following link: πŸ”— Question Linkarrow-up-right

🧩 Problem Description

You are given an array citations[], where each element citations[i] represents the number of citations received by the ith paper of a researcher. Calculate the researcher's H-index.

The H-index is defined as the maximum value H, such that the researcher has published at least H papers, and all those papers have citation value greater than or equal to H.

πŸ“˜ Examples

Example 1

Input: citations[] = [3, 0, 5, 3, 0]
Output: 3
Explanation: There are at least 3 papers with citation counts of 3, 5, and 3, 
all having citations greater than or equal to 3.

Example 2

Input: citations[] = [5, 1, 2, 4, 1]
Output: 2
Explanation: There are 3 papers (with citation counts of 5, 2, and 4) that have 
2 or more citations. However, the H-Index cannot be 3 because there aren't 3 papers 
with 3 or more citations.

Example 3

πŸ”’ Constraints

  • $1 \le \text{citations.size()} \le 10^6$

  • $0 \le \text{citations}[i] \le 10^6$

βœ… My Approach

The optimal solution uses Sorting in Descending Order:

Descending Sort Approach

  1. Sort Citations:

    • Sort the citations array in descending order.

    • This places papers with highest citations first.

  2. Find H-Index:

    • Iterate through the sorted array with index i from 0 to n-1.

    • At position i, we have checked i+1 papers.

    • If citations[i] >= i+1, it means at least i+1 papers have at least i+1 citations.

    • The H-index is the largest such i+1 value.

  3. Edge Case:

    • If no paper meets the criteria, H-index is 0.

    • This happens when even the highest cited paper has 0 citations.

  4. Return Result:

    • The last valid i+1 where citations[i] >= i+1 is the H-index.

Key Insight: After descending sort, papers are arranged from most cited to least. At any position i, if citations[i] >= i+1, we have found at least i+1 papers with i+1+ citations.

πŸ“ Time and Auxiliary Space Complexity

  • Expected Time Complexity: O(n log n), where n is the number of papers. The sorting operation dominates with O(n log n) complexity, while the linear scan to find H-index takes O(n).

  • Expected Auxiliary Space Complexity: O(1), as we sort in-place and only use constant extra space for variables. Some sorting implementations may use O(log n) stack space for recursion.

πŸ§‘β€πŸ’» Code (C)

πŸ§‘β€πŸ’» Code (C++)

chevron-right⚑ View Alternative Approaches with Code and Analysishashtag

πŸ“Š 2️⃣ Frequency Counting (Bucket Sort)

πŸ’‘ Algorithm Steps:

  1. Create a frequency array of size n+1 where freq[i] counts papers with exactly i citations.

  2. Bucket all citations >= n into freq[n] since they all contribute equally.

  3. Starting from n, accumulate count of papers with at least h citations.

  4. Find largest h where accumulated count >= h.

πŸ“ Complexity Analysis:

  • Time: ⏱️ O(n) - Linear time with counting

  • Auxiliary Space: πŸ’Ύ O(n) - Frequency array of size n+1

βœ… Why This Approach?

  • Linear time complexity (faster than sorting for large n)

  • Optimal for competitive programming

  • Bucket sort principle application

πŸ“Š 3️⃣ Binary Search on Answer

πŸ’‘ Algorithm Steps:

  1. Binary search on possible H-index values from 0 to n.

  2. For each candidate H, count papers with citations >= H.

  3. If count >= H, try for larger H; otherwise try smaller.

  4. Return the largest valid H found.

πŸ“ Complexity Analysis:

  • Time: ⏱️ O(n log n) - Binary search with linear counting

  • Auxiliary Space: πŸ’Ύ O(1) - Constant extra space

βœ… Why This Approach?

  • Demonstrates binary search on answer pattern

  • Space efficient

  • Good for educational purposes

πŸ“Š 4️⃣ Sorting with Early Termination

πŸ’‘ Algorithm Steps:

  1. Sort citations in ascending order.

  2. Iterate from end to beginning.

  3. At position i from the end, check if n-i papers have citations[i] citations.

  4. Return first valid H-index found.

πŸ“ Complexity Analysis:

  • Time: ⏱️ O(n log n) - Sorting dominates

  • Auxiliary Space: πŸ’Ύ O(1) - In-place sorting

βœ… Why This Approach?

  • Ascending sort with clean logic

  • Early termination optimization

  • Alternative sorting perspective

πŸ†š πŸ” Comparison of Approaches

πŸš€ Approach

⏱️ Time Complexity

πŸ’Ύ Space Complexity

βœ… Pros

⚠️ Cons

🎯 🎯 Descending Sort + Linear Scan

🟑 O(n log n)

🟒 O(1)

πŸš€ Clean, intuitive logic

🐌 Slower than linear approaches

πŸ“Š Frequency Counting

🟒 O(n)

🟑 O(n)

⚑ Fastest time complexity

πŸ’Ύ Extra space required

πŸ” Binary Search

🟑 O(n log n)

🟒 O(1)

πŸ“š Search pattern practice

πŸ”§ Repeated counting overhead

πŸ“ˆ Ascending Sort

🟑 O(n log n)

🟒 O(1)

🎯 Alternative perspective

πŸ”§ Similar to main approach

πŸ† Best Choice Recommendation

🎯 Scenario

πŸŽ–οΈ Recommended Approach

πŸ”₯ Performance Rating

πŸ… Optimal time complexity

πŸ₯‡ Frequency Counting

β˜…β˜…β˜…β˜…β˜…

πŸ“– Simplicity and clarity

πŸ₯ˆ Descending Sort + Linear Scan

β˜…β˜…β˜…β˜…β˜…

πŸ’Ύ Memory constrained

πŸ₯‰ Ascending Sort

β˜…β˜…β˜…β˜…β˜†

🎯 Learning binary search

πŸ… Binary Search

β˜…β˜…β˜…β˜†β˜†

β˜• Code (Java)

🐍 Code (Python)

🧠 Contribution and Support

For discussions, questions, or doubts related to this solution, feel free to connect on LinkedIn: πŸ“¬ Any Questions?arrow-up-right. Let's make this learning journey more collaborative!

⭐ If you find this helpful, please give this repository a star! ⭐


πŸ“Visitor Count

Visitor counter

Last updated