Find String Between Two Substrings in Python When There is A Space After First Substring

While there are several posts on StackOverflow that are similar to this, none of them involve a situation when the target string is one space after one of the substrings.

I have the following string (example_string):
<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>

I want to extract “I want this string.” from the string above. The randomletters will always change, however the quote “I want this string.” will always be between [?] (with a space after the last square bracket) and Reduced.

Right now, I can do the following to extract “I want this string”.

target_quote_object = re.search('[?](.*?)Reduced', example_string)
target_quote_text = target_quote_object.group(1)
print(target_quote_text[2:])

This eliminates the ] and that always appear at the start of my extracted string, thus only printing “I want this string.” However, this solution seems ugly, and I’d rather make re.search() return the current target string without any modification. How can I do this?

Solution:

Your '[?](.*?)Reduced' pattern matches a literal ?, then captures any 0+ chars other than line break chars, as few as possible up to the first Reduced substring. That [?] is a character class formed with unescaped brackets, and the ? inside a character class is a literal ? char. That is why your Group 1 contains the ] and a space.

To make your regex match [?] you need to escape [ and ? and they will be matched as literal chars. Besides, you need to add a space after ] to actually make sure it does not land into Group 1. A better idea is to use \s* (0 or more whitespaces) or \s+ (1 or more occurrences).

Use

re.search(r'\[\?]\s*(.*?)Reduced', example_string)

See the regex demo.

import re
rx = r"\[\?]\s*(.*?)Reduced"
s = "<insert_randomletters>[?] I want this string.Reduced<insert_randomletters>"
m = re.search(r'\[\?]\s*(.*?)Reduced', s)
if m:
    print(m.group(1))
# => I want this string.

See the Python demo.

(Java) alphabetic substring comparison ends up with a wrong result

In one of these HackerRank Java challenges, there is a problem which is defined as:

The problem

We define the following terms:

  • Lexicographical Order, also known as alphabetic or dictionary order, orders characters as follows: A < B < …< Y < Z < a < b
    … < y < z

  • A substring of a string is a contiguous block of characters in the string. For example, the substrings of abc are a, b, c, ab, bc, and abc.

Given a string, s, and an integer, k, complete the
function so that it finds the lexicographically smallest and
largest substrings of length k.

Here is my (not fully working) solution:

My code

import java.util.*;

public class stringCompare {

    public static String getSmallestAndLargest(String s, int k) {
        String smallest, largest, temp;

        /* Initially, define the smallest and largest substrings as the first k chars */
        smallest = s.substring(0, k);
        largest = s.substring(0, k);

        for (int i = 0; i <= s.length() - k; i++) {
            temp = s.substring(i, i + k);
            for (int j = 0; j < k; j++) {

                /* Check if the first char of the next substring is greater than the largest ones' */
                if (temp.charAt(j) > largest.charAt(j)) {
                    largest = s.substring(i, i + k);
                    break;      
                }

                /* Check if the first char of the next substring is less than the smallest ones' */
                else if (temp.charAt(j) < smallest.charAt(j)) {
                    smallest = s.substring(i, i + k);
                    break;
                } 

                /* Check if the first char of the next substring is either equal to smallest or largest substrings' */
                else if (temp.charAt(j) == smallest.charAt(j)
                        || temp.charAt(j) == largest.charAt(j)) {
                    // If so, move to the next char till it becomes different
                } 

                /* If the first of char of the next substring is neither of these (between smallest and largest ones')
                    skip that substring */ 
                else {
                    break;
                }
            }
        }

        return smallest + "\n" + largest;
    }

    public static void main(String[] args) {
        String s;
        int k;
        try (Scanner scan = new Scanner(System.in)) {
            s = scan.next();
            k = scan.nextInt();
        }

        System.out.println(getSmallestAndLargest(s, k));
    }
}

According to the HackerRank, this code fails for 2 out of 6 cases. One is as follows:

ASDFHDSFHsdlfhsdlfLDFHSDLFHsdlfhsdlhkfsdlfLHDFLSDKFHsdfhsdlkfhsdlfhsLFDLSFHSDLFHsdkfhsdkfhsdkfhsdfhsdfjeaDFHSDLFHDFlajfsdlfhsdlfhDSLFHSDLFHdlfhs
30

The expected output is:

ASDFHDSFHsdlfhsdlfLDFHSDLFHsdl
sdlkfhsdlfhsLFDLSFHSDLFHsdkfhs

But mine becomes:

DFHSDLFHDFlajfsdlfhsdlfhDSLFHS
sdlkfhsdlfhsLFDLSFHSDLFHsdkfhs

At debug mode, I found that the smallest substring was correct until the 67th iteration (i). I don’t know why it changes to a wrong one at that step but it does.

Can anyone help me on that, please?

Thanks!

Solution:

I propose a simple optimisation: a quick peek at the first characters.

largest = smallest = s.substring(0, k);
for (int i = 1; i <= s.length() - k; i++) {
    if (s.charAt(i) > largest.charAt(0) ){
      largest = s.substring(i, i + k);
      continue;
    }
    if (s.charAt(i) < smallest.charAt(0) ){
      smallest = s.substring(i, i + k);
      continue;
    }

    if (s.charAt(i) == largest.charAt(0) ){
        String temp = s.substring(i, i + k);
        if( temp.compareTo(largest) > 0) {
            largest = temp;
            continue;
        }
    }
    if (s.charAt(i) == smallest.charAt(0) ){
        String temp = s.substring(i, i + k);
        if( temp.compareTo(smallest) < 0) {
            smallest = temp;
        }
    }
}

For the example, comparisons drop from 222 to 14.