Why in Python 3 do quadruple quotes produce a Syntax error?

I can add additional quotes to the beginning of a triple-quoted string, but not to the end. Why is that? This block of code:

print(""""
String that starts with quadruple quotes and ends with triple quotes
""")

Produces this output:

"
String that starts with quadruple quotes and ends with triple quotes

Yet this code block doesn’t work:

print(""""
String that starts with quadruple quotes and ends with quadruple quotes
"""")

It produces this error:

  File "example.py", line 3
    """")
        ^
SyntaxError: EOL while scanning string literal

I don’t ever need to use a quadruple-quote string, but I’m curious why Python won’t let me do it. Can anyone help me understand?

Solution:

You can’t use """ anywhere in the value of a triple-quoted string. Not at the start, and not at the end.

That’s because, after the first three """ opening characters denoting the start of such a string, another sequence of """ is always going to be the end of the string. Your fourth " lies outside of the string object you created, and a single " without a closing " is not a valid string.

Python has no other method of knowing when such a string ends. You can’t arbitrarily extend the string ‘inwards’ with additional " characters before the final """, because that’d be indistinguishable from the valid and legal*:

>>> """string 1"""" string 2"
'string 1 string 2'

If you must include a " before the closing """, escape it. You can do so by preceding it with a backslash:

>>> """This is triple-quoted string that
... ends in a single double quote: \""""
'This is triple-quoted string that\nends in a single double quote: "'

Note that there is no such thing as a quadruple-quote string. Python doesn’t let you combine " quotes into longer sequences arbitrarily. Only "single quoted" and """triple-quoted""" syntax exists (using " or '). The rules for a triple-quoted string differ from a single-quoted string; newlines are allowed in the former, not in the latter.

See the String and Bytes literals section of the reference documentation for more details, which defines the grammar as:

shortstring     ::=  "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring      ::=  "'''" longstringitem* "'''" | '"""' longstringitem* '"""'

and explicitly mentions:

In triple-quoted literals, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the literal. (A “quote” is the character used to open the literal, i.e. either ' or ".)

(bold emphasis mine).


* The expression is legal because it consists of two string literals, one with """ quoting, the next with " quoting. Consecutive string literals are automatically concatenated, just like they would in C. See String literal concatenation.

Python: Extract hashtags out of a text file

So, I’ve written the code below to extract hashtags and also tags with ‘@’, and then append them to a list and sort them in descending order. The thing is that the text might not be perfectly formatted and not have spaces between each individual hashtag and the following problem may occur – as it may be checked with the #print statement inside the for loop :
#socality#thisismycommunity#themoderndayexplorer#modernoutdoors#mountaincultureelevated

So, the .split() method doesn’t deal with those. What would be the best practice to this issue?

Here is the .txt file

Grateful for your time.

name = input("Enter file:")
if len(name) < 1 : name = "tags.txt"
handle = open(name)
tags = dict()
lst = list()

for line in handle :
    hline = line.split()
    for word in hline:
        if word.startswith('@') : tags[word] = tags.get(word,0) + 1
        else :
            tags[word] = tags.get(word,0) + 1
        #print(word)

for k,v in tags.items() :
    tags_order = (v,k)
    lst.append(tags_order)

lst = sorted(lst, reverse=True)[:34]
print('Final Dictionary: ' , '\n')
for v,k in lst :
    print(k , v, '')

Solution:

Use a regular expression. There are only a few limits; a tag must start with either # or @, and it may not contain any spaces or other whitespace characters.

This code

import re
tags = []
with open('../Downloads/tags.txt','Ur') as file:
    for line in f.readline():
        tags += re.findall(r'[#@][^\s#@]+', line)

creates a list of all tags in the file. You can easily adjust it to store the found tags in your dictionary; instead of storing the result straight away in tags, loop over it and do with each item as you please.

The regex is built up from these two custom character classes:

  • [#@] – either the single character # or @ at the start
  • [^\s#@]+ – a sequence of not any single whitespace character (\s matches all whitespace such as space, tab, and returns), #, or @; at least one, and as many as possible.

So findall starts matching at the start of any tag and then grabs as much as it can, stopping only when encountering any of the “not” characters.

findall returns a list of matching items, which you can immediately add to an existing list, or loop over the found items in turn:

for tag in re.findall(r'[#@][^\s#@]+', line):
    # process "tag" any way you want here

The source text file contains Windows-style \r\n line endings, and so I initially got a lot of empty “lines” on my Mac. Opening the text file in Universal newline mode makes sure that is handled transparently by the line reading part of Python.

Split list into sub-lists based on attribute value

I have an array of objects that have a suit attribute, and I want to split into sub arrays based on which suit the object has. I currently am using this:

    for c in cards:
        if c.suit.value == 0:
            spades.append(c)
        elif c.suit.value == 1:
            diamonds.append(c)
        elif c.suit.value == 2:
            clubs.append(c)
        else:
            hearts.append(c)

I have tried to use itertools.groupby as follows:

suits = [list(g) for g in intertools.groupby(cards, lambda x: x.suit.value)]

But this just yields:

[[3, <itertools._grouper object at 0x000000000296B2E8>], ...]

My first approach works, I just imagine there is a simple pythonic one liner that accomplishes what I need.

Solution:

Although it is not a one-liner, by using a list, we make it more elegant:

spades, diamonds, clubs, hearts = collcard = [[] for _ in range(4)]

for c in cards:
    collcard[c.suit.value].append(c)

Here we thus initialize a list with four empty sublists, then we append the card c to the list with index c.suit.value.

We use iterable unpacking to assign the first element to spades, the second to diamonds, etc.

The advantage is that we avoid sorting (which works in O(n log n)). So this algorithm has time complexity O(n) (given the amortized cost of list appending is O(1)).

Although oneliners are usually elegant, one should not put to much effort in writing these, since oneliners can be harder to understand, or have significant impact with respect to performance.

Why does chained assignment work this way?

I found the assignment a = a[1:] = [2] in an article. I tried it in python3 and python2; it all works, but I don’t understand how it works. = here is not like in C; C processes = by right to left. How does python process the = operator?

Solution:

Per the language docs on assignment:

An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.

In this case, a = a[1:] = [2] has an expression list [2], and two “target lists”, a and a[1:], where a is the left-most “target list”.

You can see how this behaves by looking at the disassembly:

>>> import dis
>>> dis.dis('a = a[1:] = [2]')
  1           0 LOAD_CONST               0 (2)
              2 BUILD_LIST               1
              4 DUP_TOP
              6 STORE_NAME               0 (a)
              8 LOAD_NAME                0 (a)
             10 LOAD_CONST               1 (1)
             12 LOAD_CONST               2 (None)
             14 BUILD_SLICE              2
             16 STORE_SUBSCR
             18 LOAD_CONST               2 (None)
             20 RETURN_VALUE

(The last two lines of the disassembly can be ignored, dis is making a function wrapper to disassemble the string)

The important part to note is that when you do x = y = some_val, some_val is loaded on the stack (in this case by the LOAD_CONST and BUILD_LIST), then the stack entry is duplicated and assigned, from left to right, to the targets given.

So when you do:

a = a[1:] = [2]

it makes two references to a brand new list containing 2, and the first action is a STORE one of these references to a. Next, it stores the second reference to a[1:], but since the slice assignment mutates a itself, it has to load a again, which gets the list just stored. Luckily, list is resilient against self-slice-assignment, or we’d have issues (it would be forever reading the value it just added to add to the end until we ran out of memory and crashed); as is, it behaves as a copy of [2] was assigned to replace any and all elements from index one onwards.

The end result is equivalent to if you’d done:

_ = [2]
a = _
a[1:] = _

but it avoids the use of the _ name.

To be clear, the disassembly annotated:

Make list [2]:

  1           0 LOAD_CONST               0 (2)
              2 BUILD_LIST               1

Make a copy of the reference to [2]:

              4 DUP_TOP

Perform store to a:

              6 STORE_NAME               0 (a)

Perform store to a[1:]:

              8 LOAD_NAME                0 (a)
             10 LOAD_CONST               1 (1)
             12 LOAD_CONST               2 (None)
             14 BUILD_SLICE              2
             16 STORE_SUBSCR

How to save numpy ndarray as .csv file?

I created a numpy array as follows:

import numpy as np

names  = np.array(['NAME_1', 'NAME_2', 'NAME_3'])
floats = np.array([ 0.1234 ,  0.5678 ,  0.9123 ])

ab = np.zeros(names.size, dtype=[('var1', 'U6'), ('var2', float)])
ab['var1'] = names
ab['var2'] = floats

The values in ab are shown below:

array([(u'NAME_1',  0.1234), (u'NAME_2',  0.5678), (u'NAME_3',  0.9123)],
      dtype=[('var1', '<U6'), ('var2', '<f8')])

When I try to save ab as a .csv file using savetxt() command,

np.savetxt('D:\test.csv',ab,delimiter=',')

I get below error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-66-a71fd201aefe> in <module>()
----> 1 np.savetxt('D:\Azim\JF-Mapping-workflow-CRM\Backup\delete.csv',ab,delimiter=',')

c:\python27\lib\site-packages\numpy\lib\npyio.pyc in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments)
   1256                     raise TypeError("Mismatch between array dtype ('%s') and "
   1257                                     "format specifier ('%s')"
-> 1258                                     % (str(X.dtype), format))
   1259         if len(footer) > 0:
   1260             footer = footer.replace('\n', '\n' + comments)

TypeError: Mismatch between array dtype ('[('var1', '<U6'), ('var2', '<f8')]') and format specifier ('%.18e,%.18e')

Solution:

Your array include strings, but np default formatting is for floats.

Try manually setting the format:

np.savetxt(r'g:\test.csv',ab,delimiter=',', fmt=('%s, %f'))

urllib and “SSL: CERTIFICATE_VERIFY_FAILED” Error

in Python 3.6.1 on MacOs Sierra

Entering this in the bash terminal solved the problem:

pip install certifi
/Applications/Python\ 3.6/Install\ Certificates.command

Recursion Function Isn't Working

Okay, so I’m trying to make a recursive function that returns True if the function is a palindrome, and False otherwise. However, it doesn’t go to the very end, and randomly stops.

Code:


def is_palindrome(word):

    if len(word) == 1 or len(word) == 0:
        return True
    else:
        lst = len(word) - 1
        if word[0] == word[lst]:
            print(len(word), " --> ", word)
            print(word[0], " # ", word[lst])
            is_palindrome(word[0+1:lst])
        else: 
            return False

For the life of me, I can’t figure out why. Here’s a sample output:

7  -->  racecar
r  #  r
5  -->  aceca
a  #  a
3  -->  cec
c  #  c

^ It stops right here. Why doesn't it continue and return True when length = 1?

Solution:

You need to return your call to the recursive function:

def is_palindrome(word):

    if len(word) == 1 or len(word) == 0:
        return True
    else:
        lst = len(word) - 1
        if word[0] == word[lst]:
            print(len(word), " --> ", word)
            print(word[0], " # ", word[lst])
            return is_palindrome(word[0+1:lst])     # change here
        else: 
            return False

The reason you code appears to stop at the final step of recursion is because you never actually return a value in that case. In some programming languages, such as C or maybe Java, such code would not even compile. Python appears to tolerate it, though it results in your current behavior.

Python overloading non-existent operator works, why?

While messing around with overloading operators and namedtuples, I’ve stumbled on some weird behavior which works, for some reason or another:

https://repl.it/repls/RemorsefulFlawlessAfricanwildcat

import collections, math

Point = collections.namedtuple("Point", ["x", "y"])
Point.__floor__ = lambda self: Point(int(math.floor(self.x)), int(math.floor(self.y)))
print(math.floor(Point(1.4, -5.9)))
#prints: Point(x=1, y=-6)

Does anyone have any insight into this? Why does it work?
If I remove the Point.__floor__ line, it doesn’t work.


Did the math package define a __floor__ operator somewhere?
OR
Does Python parse Point.__XXX__ to extract XXX and compare with the name of the thing (function/operator) that acts on the argument?

I’m confused, probably because I don’t know how exactly these things work deep down.

Solution:

From the docs (emphasis mine):

math.floor(x)

Return the floor of x, the largest integer less than or equal to x. If x is not a float, delegates to x.__floor__(), which should return an Integral value.

Can someone explain to me why this second method does not fully update the string?

I’ve been trying to write a function which converts under_score_words to camelCaseWords. The following is how I’d do something like this in the past;

functionName = "function_for_test_case"
for character in functionName:
    if character == "_":
        functionName = functionName.replace("_" + functionName[functionName.index(character) + 1], functionName[functionName.index(character) + 1].upper())

print functionName

which correctly outputs:

functionForTestCase

However this time I originally tried doing it another way, which I found a bit neater:

functionName = "function_for_test_case"
for index, character in enumerate(functionName):
    if character == "_":
        functionName = functionName.replace("_" + functionName[index + 1], functionName[index + 1].upper())

print functionName

Which instead outputs:

functionFor_test_case

I was stumped to why it wasn’t working… I figured it might’ve been since I was changing the length of the string (by removing the underscore), but then I’m not sure why the first method works.

Also, if you print the replace as it goes for the second function, you can see it does actually find and replace the rest of the values, but ofcourse it does not save them. For example:

functionName = "function_for_test_case"
for index, character in enumerate(functionName):
    if character == "_":
        print functionName.replace("_" + functionName[index + 1], functionName[index + 1].upper())


functionFor_test_case
function_forTest_case
function_for_testCase

From what I could tell, these functions were essentially doing the same thing in different wording, could anyone explain why they have a different output?

Edit: I’ved edited the for loops to make it more obvious of what I was trying

Solution:

enumerate(functionName) is being evaluated when program enters the loop.
The first time you replace 2 chars with just 1 (_f -> F), the indices become invalid. So at some point you have this situation:

index == 12
character == '_'
functionName == 'functionFor_test_case'
functionName[index + 1] == 'e'

So you try to replace _e with E and it’s simply not there.

BTW, take a look at camelize() function in inflection library.

Prevent backtracking on regex to find non-comment lines (not starting with indented '#')

I’d like to search for lines that don’t start with a pound sign (#) on indented code.

Currently, I’m using the regex ^\s*([^\s#].*) with multiline option on.

My problem is that on non commented lines it works perfectly.

On commented lines the regex engine performs a backtrack due to \s* all the way from the comment sign to the start of the line, which can sometimes cause 40 or 50 backtrack steps.

The regex works perfectly on python code. It’s just not very efficient due to the backtracking caused by the engine.

Any idea as of how to avoid it?


Bonus: It’s rather funny that the regex engine doesn’t recognize the fact that it’s searching for [^\s] one by one in \s* and causes this amount of backtracking. What are the challenges in making the re engine work so?

Bonus 2: Using only the stdlib re module. As I cannot add 3rd parties. (I’m technically searching using sublime text but want to know how to generally do it in Python)

Solution:

Use atomic feature of lookarounds to avoid backtrack:

^(?=(\s*))\1([^#].*)
    ^^^^^  ^

This usage is simplified in a negative lookahead which is proposed by @vks beautifully.

or possessive quantifiers while using regex module:

^\s*+([^#].*)

or even atomic groups:

^(?>\s*)([^#].*)

Sublime Text supports all three since being on PCRE.

and for bonus part, no it’s not funny. If you be more eagle-eye on it you’ll see it’s not [^\s] which is literally equal to \S but it is a little bit different: [^\s#] which for engine means it has two different paths at each step to look for so it backtracks to reach one.