How do I find files that do not contain a given string pattern?

grep -riL "foo" .

This is the explanation of the parameters used on grep

     -L, --files-without-match
             each file processed.
     -R, -r, --recursive
             Recursively search subdirectories listed.

     -i, --ignore-case
             Perform case insensitive matching.

grep for special characters in Unix

Tell grep to treat your input as a fixed string using -F option.

grep -F '*^%Q&$*&^@$&*!^@$*&^&^*&^&' application.log

Option -n is required to get the line number,

grep -Fn '*^%Q&$*&^@$&*!^@$*&^&^*&^&' application.log

Search for multiline String in a text file

I have a text file from which i am trying to search for a String which has multiple lines. A single string i am able to search but i need multi line string to be searched.

I have tried to search for single line which is working fine.

public static void main(String[] args) throws IOException 
{
  File f1=new File("D:\\Test\\test.txt"); 
  String[] words=null;  
  FileReader fr = new FileReader(f1);  
  BufferedReader br = new BufferedReader(fr); 
  String s;     
  String input="line one"; 

  // here i want to search for multilines as single string like 
  //   String input ="line one"+
  //                 "line two";

  int count=0;   
  while((s=br.readLine())!=null)   
  {
    words=s.split("\n");  
    for (String word : words) 
    {
      if (word.equals(input))   
      {
        count++;    
      }
    }
  }

  if(count!=0) 
  {
    System.out.println("The given String "+input+ " is present for "+count+ " times ");
  }
  else
  {
    System.out.println("The given word is not present in the file");
  }
  fr.close();
}

And below are the file contents.

line one  
line two  
line three  
line four

Solution:

Use the StringBuilder for that, read every line from file and append them to StringBuilder with lineSeparator

StringBuilder lineInFile = new StringBuilder();

while((s=br.readLine()) != null){
  lineInFile.append(s).append(System.lineSeparator());
}

Now check the searchString in lineInFile by using contains

StringBuilder searchString = new StringBuilder();

builder1.append("line one");
builder1.append(System.lineSeparator());
builder1.append("line two");

System.out.println(lineInFile.toString().contains(searchString));

How to Find Words Not Containing Specific Letters?

I’m trying to write a code using regex and my text file. My file contains these words line by line:

nana
abab
nanac
eded

My purpose is; displaying the words which are not contain the letters which are given a substring’s letters.

For example, if my substring is “bn”, my output should be only eded. Because nana and nanac contains “n” and abab contains “b”.

I have written a code but it only checks first letter of my substring.

import re
substring = "bn"
def xstring():
    with open("deneme.txt") as f:
        for line in f:
            for word in re.findall(r'\w+', line):
                for letter in substring:
                    if len(re.findall(letter, word)) == 0:
                        print(word)
                        #yield word
xstring()

How do I solve this problem?

Solution:

If you want to check if a string has a set of letters, use brackets.
For example using [bn] will match words that contain one of those letters.

import re
substring = "bn"
regex = re.compile('[' + substring + ']')
def xstring():
    with open("dename.txt") as f:
        for line in f:
            if(re.search(regex, line) is None):
                print(line)
xstring()

How to return the most frequent letters in a string and order them based on their frequency count

I have this string: s = "china construction bank". I want to create a function that returns the 3 most frequent characters and order them by their frequency of appearance and the number of times they appear, but if 2 characters appears the same number of times, they should be ordered based on their alphabetical order. I also want to print each character in a separate line.

I have built this code by now:

from collections import Counter
def ordered_letters(s, n=3):
    ctr = Counter(c for c in s if c.isalpha())
    print ''.join(sorted(x[0] for x in ctr.most_common(n)))[0], '\n', ''.join(sorted(x[0] for x in ctr.most_common(n)))[1], '\n', ''.join(sorted(x[0] for x in ctr.most_common(n)))[2]`

This code applied to the above string will yield:

a 
c 
n

But this is not what i really want, what i would like as output is:

1st most frequent: 'n'. Appearances: 4
2nd most frequent: 'c'. Appearances: 3
3rd most frequent: 'a'. Appearances: 2

I’m stuck in the part where i have to print in alphabetical order the characters which have the same frequencies. How could i do this?

Thank you very much in advance

Solution:

You can use heapq.nlargest with a custom sort key. We use -ord(k) as a secondary sorter to sort by ascending letters. Using a heap queue is better than sorted as there’s no need to sort all items in your Counter object.

from collections import Counter
from heapq import nlargest

def ordered_letters(s, n=3):
    ctr = Counter(c.lower() for c in s if c.isalpha())

    def sort_key(x):
        return (x[1], -ord(x[0]))

    for idx, (letter, count) in enumerate(nlargest(n, ctr.items(), key=sort_key), 1):
        print('#', idx, 'Most frequent:', letter, '.', 'Appearances:', count)

ordered_letters("china construction bank")

# 1 Most frequent: n . Appearances: 4
# 2 Most frequent: c . Appearances: 3
# 3 Most frequent: a . Appearances: 2

Splitting a string from right at intervals in python

I’m trying to split a string from the right. Following is the code.

string = "abcde" 
n = len(string)
slices = [string[i-3:i] for i in range(n,0,-3)]
print (slices)

I get the output as ['cde', '']. I’m trying to get ['cde', 'ab']

But when I split it from left it gives the proper output, i.e..,

string = "abcde" 
slices = [string[i:i+3] for i in range(0,n,3)]
print (slices)

output: ['abc', 'de']

Can anyone point out where am I going wrong?

Solution:

You are close. You need to floor the first indexing argument at 0:

x = "abcde" 
n = len(x)
slices = [x[max(0,i-3):i] for i in range(n,0,-3)]

['cde', 'ab']

The reason your code does not work is because with positive indices, falling off the end means going as far as you can.

While negative indices means starting from the end, rather than going to the start and no further.

Convert string list to list in python

I have a string as below ,

val = '["10249/54","10249/147","10249/187","10249/252","10249/336"]'

I need to parse it and take the values after / and put into list as below

['54','147','187','252','336']

My code: [a[a.index('/')+1:] for a in val[1:-1].split(',')]

Output : ['54"', '147"', '187"', '252"', '336"']

It has double quotes also ” which is wrong.
After i tried as below

c = []
for a in val[1:-1].split(','):
    tmp = a[1:-1]
    c.append(tmp[tmp.index('/')+1:])

Output :

['54', '147', '187', '252', '336']

Is there any better way to do this?

Solution:

You can do it in one line pretty easily:

from ast import literal_eval
a = [i.split('/')[-1] for i in literal_eval(val)]
a
>>>['54', '147', '187', '252', '336']

literal_eval() converts your string into a literal list.

Is it a good practice to nullifying String in java

I have a problem with storing a plain password in memory as a String. According to the reference, since Strings are immutable there is a vulnerability of using String data type for sensitive data storing in memory.

https://www.geeksforgeeks.org/use-char-array-string-storing-passwords-java/

Why is char[] preferred over String for passwords?

Can I overcome this security issue by nullifying the string variable instead of using char array or String buffer/builder.

eg : String password=”password”;
password = null;

Solution:

No. Nullifying a string would only delink the reference. But the value will still exist in string pool. Because to conserve memory, string values are retained in the string pool.

Any potential hacker, can retrieve the value by gaining access to the string pool.

Whereas, using char[], you can simply treat that object as any other object. And nullifying the char object will wipe off the data from heap at the time of garbage collection.

An even better option will be using a byte array.

Read more about String Constant pool.

Regex using increasing sequence of numbers Python

Say I have a string:

teststring =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!" 

That I would like as:

testlist = ["1.3 Hello how are you", "1.4 I am fine, thanks 1.2 Hi There", "1.5 Great!"]

Basically, splitting only on increasing digits where the difference is .1 (i.e. 1.2 to 1.3).

Is there a way to split this with regex but only capturing increasing sequential numbers? I wrote code in python to sequentially iterate through using a custom re.compile() for each one and it is okay but extremely unwieldy.

Something like this (where parts1_temp is a given list of the x.x. numbers in the string):

parts1_temp = ['1.3','1.4','1.2','1.5']
parts_num =  range(int(parts1_temp.split('.')[1]), int(parts1_temp.split('.')[1])+30)
parts_search = ['.'.join([parts1_temp.split('.')[0], str(parts_num_el)]) for parts_num_el in parts_num]
#parts_search should be ['1.3','1.4','1.5',...,'1.32']

for k in range(len(parts_search)-1):
    rxtemp = re.compile(r"(?:"+str(parts_search[k])+")([\s\S]*?)(?=(?:"+str(parts_search[k+1])+"))", re.MULTILINE)
    parts_fin = [match.group(0) for match in rxtemp.finditer(teststring)]

But man is it ugly. Is there a way to do this more directly in regex? I imagine this is feature that someone would have wanted at some point with regex but I can’t find any ideas on how to tackle this (and maybe it is not possible with pure regex).

Solution:

This method uses finditer to find all locations of \d+\.\d+, then tests whether the match was numerically greater than the previous. If the test is true it appends the index to the indices array.

The last line uses list comprehension as taken from this answer to split the string on those given indices.

Original Method

This method ensures the previous match is smaller than the current one. This doesn’t work sequentially, instead, it works based on number size. So assuming a string has the numbers 1.1, 1.2, 1.4, it would split on each occurrence since each number is larger than the last.

See code in use here

import re

indices = []
string =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!"
regex = re.compile(r"\d+\.\d+")
lastFloat = 0

for m in regex.finditer(string):
    x = float(m.group())
    if lastFloat < x:
        lastFloat = x
        indices.append(m.start(0))

print([string[i:j] for i,j in zip(indices, indices[1:]+[None])])

Outputs: ['1.3 Hello how are you ', '1.4 I am fine, thanks 1.2 Hi There ', '1.5 Great!']


Edit

Sequential Method

This method is very similar to the original, however, on the case of 1.1, 1.2, 1.4, it wouldn’t split on 1.4 since it doesn’t follow sequentially given the .1 sequential separator.

The method below only differs in the if statement, so this logic is fairly customizable to whatever your needs may be.

See code in use here

import re

indices = []
string =  "1.3 Hello how are you 1.4 I am fine, thanks 1.2 Hi There 1.5 Great!"
regex = re.compile(r"\d+\.\d+")
lastFloat = 0

for m in regex.finditer(string):
    x = float(m.group())
    if (lastFloat == 0) or (x == round(lastFloat + .1, 1)):
        lastFloat = x
        indices.append(m.start(0))

print([string[i:j] for i,j in zip(indices, indices[1:]+[None])])

Convert indexes in str to indexes in bytearray

I have some text, process it and find offset for some words in text. These offsets will be used by another application and that application operates with text as with sequence of bytes, so str indexes will be wrong for it.

Example:

>>> text = "“Hello there!” He said"
>>> text[7:12]
'there'
>>> text.encode('utf-8')[7:12]
>>> b'o the'

So how can I convert indexes in string to indexes in encoded bytearray?

Solution:

Encode the substrings and get their lengths in bytes:

text = "“Hello there!” He said"
start = len(text[:7].encode('utf-8'))
count = len(text[7:12].encode('utf-8'))
text.encode('utf-8')[start:start+count]

This gives b'there'.