Occurrences After Bigram in Python

When working with text analysis, you might need to find words that appear after a specific bigram (a sequence of two consecutive words). This problem involves finding all words that come immediately after a given pair of words in a text.

For example, in the text "lina is a good girl she is a good singer", if we're looking for words that follow the bigram "a good", the result would be ["girl", "singer"].

Algorithm

To solve this problem, we follow these steps:

  • Split the text into individual words
  • Create an empty result list
  • Iterate through the words with their indices
  • Check if current word matches first word and next word matches second word
  • If match found and there's a third word, add it to results
  • Return the result list

Implementation

Using Class-based Approach

class Solution:
    def findOccurrences(self, text, first, second):
        words = text.split(" ")
        result = []
        
        for i in range(len(words)):
            # Check if we have enough words remaining and if bigram matches
            if (i + 2 < len(words) and 
                words[i] == first and 
                words[i + 1] == second):
                result.append(words[i + 2])
        
        return result

# Test the solution
solution = Solution()
text = "lina is a good girl she is a good singer"
first = "a"
second = "good"

result = solution.findOccurrences(text, first, second)
print(result)
['girl', 'singer']

Using Function-based Approach

def find_occurrences_after_bigram(text, first, second):
    words = text.split()
    result = []
    
    for i in range(len(words) - 2):
        if words[i] == first and words[i + 1] == second:
            result.append(words[i + 2])
    
    return result

# Example usage
text = "alice went to the market she went to the store"
first = "to"
second = "the"

result = find_occurrences_after_bigram(text, first, second)
print(f"Words after '{first} {second}': {result}")
Words after 'to the': ['market', 'store']

Multiple Examples

def find_occurrences_after_bigram(text, first, second):
    words = text.split()
    result = []
    
    for i in range(len(words) - 2):
        if words[i] == first and words[i + 1] == second:
            result.append(words[i + 2])
    
    return result

# Test with different examples
examples = [
    ("we will we will rock you", "we", "will"),
    ("the cat in the hat sat on the mat", "the", "cat"),
    ("python is great python is powerful", "python", "is"),
    ("no match here", "not", "found")
]

for text, first, second in examples:
    result = find_occurrences_after_bigram(text, first, second)
    print(f"Text: '{text}'")
    print(f"Bigram: '{first} {second}'")
    print(f"Result: {result}")
    print("-" * 40)
Text: 'we will we will rock you'
Bigram: 'we will'
Result: ['we', 'rock']
----------------------------------------
Text: 'the cat in the hat sat on the mat'
Bigram: 'the cat'
Result: ['in']
----------------------------------------
Text: 'python is great python is powerful'
Bigram: 'python is'
Result: ['great', 'powerful']
----------------------------------------
Text: 'no match here'
Bigram: 'not found'
Result: []
----------------------------------------

Key Points

  • The algorithm has O(n) time complexity where n is the number of words
  • Case sensitivity matters - "The" and "the" are treated as different words
  • Empty results are returned when no bigram matches are found
  • The function handles edge cases where there aren't enough words after a match

Conclusion

Finding occurrences after bigrams is useful for text analysis and natural language processing tasks. The key is to iterate through word pairs and collect the words that immediately follow matching bigrams.

Updated on: 2026-03-25T07:21:40+05:30

223 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements