Prefer functions over methods. #244

tbekolay · 2016-03-07T11:53:40Z

Most methods on NumPy arrays also have an equivalent NumPy function that takes the array as input. I have noticed that many novice programmers prefer functions (which have a pretty clear input-output relationship) to methods (which have an implicit input). This PR changes all instances of calling methods on arrays to calling the equivalent NumPy function instead. I tried to simplify the explanation in 01-numpy.md as well, which hopefully does not significantly affect the flow of the prose.

This PR came out of one of the discussion sessions (apologies, I don't remember who brought this up -- if it was you, let me know and I'll add you to the commit message :) )

gvwilson · 2016-03-07T12:25:43Z

+1 (at least)

iglpdc · 2016-03-09T21:08:29Z

Looks good to me, but which is the reason to prefer functions over methods? I always thought methods would be easier to understand and that some Python traits, such as the . syntax everywhere or everything being an object, make the method approach particularly easier.

(I'm not opposing the change, just want to learn more. Maybe a short paragraph discussing both approaches could be added to the instructor's guide)

gvwilson · 2016-03-09T21:12:04Z

Poll the 'discuss' list?

wking · 2016-03-09T21:53:22Z

On Wed, Mar 09, 2016 at 01:08:30PM -0800, Ivan Gonzalez wrote:

Looks good to me, but which is the reason to prefer functions over
methods?

I remember previous discussion (in the unittest context?) where
methods were opposed because they're one more thing to introduce.
I'll try to dig up a reference to that discussion later today.

If there is a functions vs. methods debate, I'd be very careful to
scope it to “for a six-hour introduction to Python that starts with an
‘Assign values to variables’ learning objective”. Otherwise we'll get
into a whole Object Oriented religious discussion ;).

abostroem · 2016-03-10T22:46:52Z

+1 because there is always a function and only sometimes a method. When learners are first learning to code, having to think about whether median has method form or just a function form is extra work that they don't need to do.

tbekolay · 2016-03-14T13:31:29Z

My reason for the change is more empirical in that I have had several experiences with learners who use Numpy functions when methods exist (these learners also intuited the numpy.shape function which I had no idea existed!) I also find that when I'm teaching, I can go through variables and functions pretty quickly, but methods slow me down a lot and I find myself saying "well, actually what I said before isn't quite true" which to me is fine in hour 4 but not minute 15.

I'll send a quick message to the discuss list.

jgosmann · 2016-03-14T14:08:21Z

I might have been the one who brought this up. Here are the arguments in favor I can recall:

Functions also work on lists and other sequence types.
No need need to explain attributes vs. methods (i.e. why do have to write data.max(), but data.shape and not data.shape())

The argument in favor of methods that I can recall:

Functions are more typing because they have to be prefixed with numpy. (and we are not doing import numpy as np which would shorten this to np.)

bsmith89 · 2016-03-14T14:09:58Z

I am in favor of using functions whenever possible for an introduction to Python. In my opinion, the functional style seems to make sense at the low level, while object orientation can be useful in the large.

However: If we're also introducing Pandas, methods are very relevant. Idiomatic use of pandas composes methods, so it's some weird middle ground between functional and object-oriented.

jni · 2016-03-14T14:10:41Z

👍

functions have to be introduced, objects and methods one can mostly do without. If one strays from Python canon, one can even do without Python built-in methods:

>>> x = [3, 1, 2, 0]
>>> list.sort(x)
>>> x
[0, 1, 2, 3]
>>> list.index(x, 2)
2
>>> str.startswith('function', 'fun')
True

NumPy array functions have the further benefit over methods that they work on any array-like, e.g.

>>> import numpy as np
>>> np.mean([0, 1, 2])
1.0

profgiuseppe · 2016-03-14T14:12:27Z

Not picking a side, just a couple of comments:

a method can be presented as an action on the object (i.e. light.switch_off() ), while the function as an external tool that takes the object as input (i.e. blend(fruits) ), then the difference is in the additional concept is "object have actions" in the former case
a method on something may have side effects (i.e. data.reverse() ), while a function may not (should I assign the return value to the same variable? Was it changed inside the function?)

shwina · 2016-03-14T14:54:58Z

+1 for using functions. At workshops, I've found myself doing a long aside about what methods are, how they're tied to the "type" of object, how to get documentation on them, etc.,

I know the novice material is designed for first-time programmers, but in practice, learners have programmed in some language before, and the concept of a function is familiar to them.

tbekolay · 2016-03-14T14:57:36Z

Note to self: once consensus is reached, add a note to the FAI with a link to this discussion.

jeremycg · 2016-03-14T16:55:08Z

I disagree with the consensus here - I think if the goal is to get people up and running in Python, they should have at least seen methods in the class.

Object Orientation is of course too much to completely introduce, but if someone takes the workshop and then she tries to use or modify another users code, almost immediately she will encounter a method and be flummoxed.

It is worthwhile keeping at least a couple, and saying a short "methods are functions specific to a certain data type" (the blender/light switch example above is great), so that the learner can at least understand what is going on.

jttkim · 2016-03-14T17:12:35Z

I'm also in favour of keeping objects / methods. From my personal experience, having the option of "going object oriented" is one of the biggest advantages of Python over other scripting languages -- I have numerous pieces of work that started as a few quick lines and then evolved into something comprised of a few classes. Also, libraries such as BioPython require some familiarity with classes / instances / methods (and incidentally, the "BioPython Cookbook" does quite a decent job of introducing these without incurring much overhead, see e.g. [1]).

Another context where objects are much preferred is random number generation, it would be very unfortunate to lead learners into preferring v = random.random() over rng = random.Random(1), followed by v = rng.random().

[1] http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc16

wking · 2016-03-14T17:48:10Z

On Mon, Mar 14, 2016 at 10:12:37AM -0700, Jan T. Kim wrote:

… incidentally, the "BioPython Cookbook" does quite a decent job of
introducing these without incurring much overhead, see e.g. 1).

1 http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc16

There's an earlier method reference here 1, but:

In addition to having an alphabet, the Seq object differs from the
Python string in the methods it supports.

doesn't seem like much of an introduction to methods (i.e. folks who
wonder “what's a method?” will still be confused).

Another context where objects are much preferred is random number
generation, it would be very unfortunate to lead learners into
preferring v = random.random() over rng = random.Random(1),
followed by v = rng.random().

I think everyone here will agree that methods are a wonderful thing
that Python programmers should take advantage of. I'm just not
convinced they fit into the first three hours of Python 2.

And following up on my earlier commitment 2, previous discussion for
unittest was about defining classes more than about using methods
[3,4]. Although @gvwilson claims “Object Oriented Programming is
out-of-scope” 5, and it's not clear if that's a narrow “writing OO code”
or a broader “using OO libraries”. Although of course Python's
namespaces and attributes make the dot-syntax hard to avoid, so using
methods is a much smaller lift than defining methods.

willingc · 2016-03-14T18:07:03Z

@tbekolay I believe in the Novice Python lesson that covering functions is more important than methods of a class because 1) an understanding of functions is needed to grasp methods; 2) novices are learning quite a bit at once - the additional syntax with methods is not worth the cognitive load; 3) functions are relevant whether doing procedural, object oriented, or functional programming.

I like this snippet from Allen Downey's book, Think Like a CS in Python which is open sourced on his website:

Methods are just like functions, with two differences:

Methods are defined inside a class definition in order to make the relationship between the class and the method explicit.
The syntax for invoking a method is different from the syntax for calling a function.
In the next few sections, we will take the functions from the previous two chapters and transform them into methods. This transformation is purely mechanical; you can do it simply by following a sequence of steps. If you are comfortable converting from one form to another, you will be able to choose the best form for whatever you are doing.

jgosmann · 2016-03-14T18:59:21Z

Another comment why I personally prefer to use functions follows from the anti-symmetry of procedural vs. object oriented code as explained in Clean Code, Chapter 6. To quote the bottom line:

Procedural code (code using data structures) makes it easy to add new functions without changing the existing data structures. OO code, on the other hand, makes it easy to add new classes without changing existing functions.

I usually want to add new functions which process the data in a new way, but seldom want to add a new type of NumPy array. Thus, it is more natural for me to stick to functions in general. (Though, this does not imply that is easier for a novice.)

jttkim · 2016-03-14T19:05:03Z

@willingc The notion of a "purely mechanical" transformation from functions to methods applies only where methods don't change instance state (see my random.Random example above for illustration). The special case of computing results from an instance without changing it is rather common in some use cases of the "number crunching" type, but in the context of scientific computing in general it's a rather narrow special case.

I'm also not sure about the assumptions about novices: To a total novice, functions and methods will be equally new. To not-quite-novices (as mentioned by @shwina above), one of the variants will appear more familiar / intuitive.

I think, though, that the function / method decision should also be based on the effect that may have on learners in the future, as well as factoring in their previous knowledge and "intuition". From that perspective, the group that gets a net benefit from prefering functions is comprised of those who have some previous experience with imperative / procedural programming and who will only do scientific computing of the number crunching type. Those with a different previous background will experience no benefit, and those with tasks that require managing state will miss a key tool that can spare them cartloads of pain. Therefore, even though there's no time in workshops to cover OO completely, I think the use of methods in the Python lesson is quite consistent with the SWC notion of giving learners a good basis to start developing themselves from.

gvwilson · 2016-03-14T19:07:36Z

I'm really enjoying this discussion - it's drawing out exactly the kind of pedagogical content knowledge and personal experience that we all rely on but rarely articulate. Thank you all...

wking · 2016-03-14T19:18:22Z

On Mon, Mar 14, 2016 at 12:05:05PM -0700, Jan T. Kim wrote:

I think, though, that the function / method decision…

But it's not a function vs. methods decision, is it? It's a functions
vs. (functions and methods) decision 1.

shwina · 2016-03-14T19:20:32Z

I'm not sure if this should be part of the discussion here, but using methods has also the additional disadvantage that finding help on them is actually quite non trivial.

For the numpy max function, I just have to do help(numpy.max). For the max method, the easy way is to do help(patient_data.max), which is only slightly less confusing from the more general help(numpy.ndarray.max).

kylerbrown · 2016-03-14T20:22:56Z

All the the points above (both for and against) have been very interesting. Consider an operation like Root Mean Squared. Using functions

numpy.sqrt(numpy.mean(numpy.power(a,2)))

or

numpy.sqrt(numpy.mean(a**2))

the code is very similar to the operation's English name (RMS), and the code can be read from left to right.
Using methods

a.power(2).mean().sqrt()

is both less readable as doesn't work, as neither power, nor sqrt are ndarray methods. Instead

(a**2).mean()**.5

or

numpy.sqrt( (a**2).mean())

both work, but in my opinion are less readable and more complex, due to chaining a method to a statement in parentheses.

I believe that this example, and others that come to mind (Sum Squared Error), functions produce clearer code. Can someone think of a nice counter-example?

wking · 2016-03-14T21:36:04Z

On Mon, Mar 14, 2016 at 01:22:59PM -0700, Kyler Brown wrote:

I believe that this example, and others that come to mind (Sum
Squared Error), functions produce clearer code. Can someone think of
a nice counter-example?

I'm sure both have situations where they are more appropriate. For
methods, @bsmith89 suggests Pandas method chaining 1, and I'd
suggest scalable duck typing [2](where objects can implement their
own .std method or whatever, instead of teaching a central numpy.std
how to handle all the objects).

Still, this is all in the “for idomatic Python use” level, when the
discussion for this repository is about the “for your first three
hours of Python” level 3. Would learning about methods make you a
better Python programmer? Absolutely. Is it more important than
whatever gets bumped in the time taken to explain methods? I doubt
it.

Getting a better handle on the bumped information would help nail this
down. The current PR (bf019cc) doesn't change much except around
4, where it drops the discussion about methods in favor of
functions. That's not much less time to read, but it is one less new
word for novice programmers to internalize. And how much time it
saves depends on how confused learners are by the existing method
explanation. The current PR doesn't get away from methods entirely
though, since add_subplot [5,6] doesn't seem to have a function analog
7 and I think the implicit-figure ‘subplot’ 6 is too magical. I'd
probably just drop the subplot section of this lesson and use multiple
figures. Then I'd guess we'd want to try it out, and see if
instructors using the function-only approach got through more of the
lesson or collected fewer “methods are confusing” stickies than their
counterparts using the current functions-and-methods content.

psteinb · 2016-03-15T09:08:53Z

+1 to this PR and +1 to the discussion of it.
From personal experience, I must agree to both sides (I also struggle with the subject discussing methods while we mostly use functions in the material).

Related to SWC material, I'd like to say that this discussion also emphasizes how much development pressure our community is under to provide intermediate material, e.g. in python. This would be a perfect place to discuss "functions vs methods", to teach unit tests (including pulling them out of novice-python) as recently discussed on the mailing list, etc. As Greg once said, coming up with teaching material is about deciding what is essential and what is important. this PR is a prime example for this.

bartoldeman · 2016-03-15T12:23:35Z

This discussion reminds me of the old Numeric manual that I read a long time ago before numpy was universal and we still had to deal with Numeric and numarray: quoting http://numpy.sourceforge.net/numdoc/numdoc.pdf

8. Array Functions

Most of the useful manipulations on arrays are done with functions. This might be surprising given
Python's object-oriented framework, and that many of these functions could have been implemented
using methods instead. Choosing functions means that the same procedures can be applied to
arbitrary python sequences, not just to arrays. For example, while transpose([[1,2],[3,4]]) works
just fine, [[1,2],[3,4]].transpose() can’t work. This approach also allows uniformity in interface between
functions defined in the Numeric Python system, whether implemented in C or in Python, and functions
defined in extension modules. The use of array methods is limited to functionality which depends
critically on the implementation details of array objects.

jttkim · 2016-03-15T12:26:18Z

Revisiting this after sleeping on it I still object to the addition of numpy.random.random() as a coding example 1 (ref. 4 in @wking's post above). The output is not reproducible and therefore at odds with rule 6 of Sandve et.al 2 . On a personal level, I consider implicit state in random number generators evil because figuring these out and managing multiple streams of random numbers has taken me a the better part of a month when I was a novice programmer myself, spent with littering my code with setstate and getstate calls wrapping all portions involving random number generation, worrying about and hunting for places where I might have forgotten or messed up the state setting. Today, I write

rngForModel = random.Random(1)
rngForSampling = random.Random(2)

and avoid all that state management mess and nonsense. If as a novice I had seen an example of random number generator objects, even without much elaboration, that might well have saved me that month of messing and worrying, and therefore I argue for using objects rather than functions in any context of random numbers so SWC novices may be spared from that waste of time.

In general OO terms, this is an example of encapsulation. With respect to the extent to which OO is covered in SWC, I suggest that learning to use encapsulated objects should be a novice learning objective (it's not really possible to use Python without using objects anyway), while writing classes could be considered intermediate material.

[1] bf019cc#diff-150012b0321d07fbe188f0c1a85c9c7eL408
[2] http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285

willingc · 2016-03-15T15:24:44Z

Strings are one place that "methods" could be introduced to students. Strings are something that folks tend to understand fairly simply. In Python, Strings (str objects) and the associated string methods are well encapsulated and understandable for a novice.

Using Strings as the foundation for learning methods would be more effective than combining methods and functions when teaching the numpy array examples.

wking · 2016-03-15T17:04:46Z

On Tue, Mar 15, 2016 at 05:26:19AM -0700, Jan T. Kim wrote:

Revisiting this after sleeping on it I still object to the addition
of numpy.random.random() as a coding example 1

How about replacing that with time.time()?

jttkim · 2016-03-15T17:34:33Z

@willingc good idea to use simple built-in types. Some methods already appear in the lists section, so there may be opportunity to focus on them a bit more and then use string methods to reinforce them? Some list methods are also quite simple:

>>> l = ['a', 'b', 'c']
>>> l.count('a')
1
>>> l.append('a')
>>> l.count('a')
2

Perhaps something along these lines could be added to 03-lists.md?

wking · 2016-03-15T17:49:29Z

On Tue, Mar 15, 2016 at 10:34:35AM -0700, Jan T. Kim wrote:

Some methods already appear in the lists section…

Ah, good point. If the goal is to remove method references,
odds.reverse() 1 would have to be replaced by:

odds = list(reversed(odds))

or:

odds = odds[::-1]

which are different (creating a new object) than odds.reverese()
(mutating an existing object in place). The list(…) wrapper there
converts the generator returned by reversed(odds) to a list for
printing:

odds = [1, 2]
print(reversed(odds))
<list_reverseiterator object at 0x7f010e6b4da0>
print(list(reversed(odds)))
[2, 1]

and while generators are lovely, efficient things, I think we should
avoid them in this novice lesson more strongly than I think we should
avoid methods ;).

There's also odds.append(11) 2, which would have to be replaced by:

odds = odds + [11]

which also creates a new object. And in both cases, the method
approach is much more idiomatic (at least when you don't need to
create a new object).

There's a lot of material in this lesson, and we could probably just
drop .reverse() and .append() in favor of the [::-1] slice and ‘+
[11]’. But that depends on how strongly we want to avoid methods. I
think removing .append() is a much bigger reduction than removing the
NumPy methods and dropping add_subplot 3. If we decide to keep the
list methods, I think there's less use in dropping the NumPy methods.

jttkim · 2016-03-15T17:54:27Z

@wking regarding time.time() instead of numpy.random.random(), I wonder whether it's necessary to introduce parameterless functions this early. time.time() is nothing to do with numpy, so it's a bit exotic in 01-numpy. md.

jttkim · 2016-03-15T18:02:27Z

@wking I didn't mean to suggest replacing list methods with functions. I think it's important to learn about basic operations on mutable objects, especially the append method because that is so commonly used. And I much favour introducing the reverse method rather than using the list(reversed(l)) construct, which to me appears rather unnecessarily complex and difficult to explain.

wking · 2016-03-15T18:11:04Z

On Tue, Mar 15, 2016 at 10:54:30AM -0700, Jan T. Kim wrote:

@wking regarding time.time() instead of numpy.random.random(), I
wonder whether it's necessary to introduce parameterless functions
this early. time.time() is nothing to do with numpy, so it's a bit
exotic in 01-numpy. md.

Time is something everyone is familiar with, and the callout is just
“look, some functions don't need arguments” before folks hit
matplotlib.pyplot.show(). If we want to avoid the ‘time’ import, we
could use print() (but the output is less exciting and obvious ;). If
we're looking for a NumPy example, numpy.float() may be a good choice.

wking · 2016-03-15T18:32:35Z

On Tue, Mar 15, 2016 at 11:02:28AM -0700, Jan T. Kim wrote:

I didn't mean to suggest replacing list methods with functions.

Right, I'm just saying that if avoiding methods is a pedagogical goal
(one less concept to teach), we'd want to remove those too. If
teaching methods is a pedagogical goal (one more useful Python
feature), they are a good place to do that.

I think it's important to learn about basic operations on mutable
objects…

There's a lot of mutable discussion in a call out in 03-lists, and you
can approach mutablility with ‘my_list[0] = 1’. I think keeping the
mutability discussion in a callout is good “first three hours” scoping
1.

… especially the append method because that is so commonly used.

I agree (see my “bigger reduction” comment in 2). And maybe
introducing methods is easier than introducing “+ for concatenation”.
But this is all Python peripherals. The carpentry 3 is in “Explain
why we should divide programs into small, single-purpose functions”
4, testing 5, and the soap-boxing in 6. The more
Python-specific stuff we can cut to get to those, the better (see
“other 90%” 7). Folks who wonder “how can I add a value to an
existing array” will know they have that question and can look it up
on their own afterwards. Folks are less likely to realize that small,
focused functions are easier to work with than huge, rambly ones until
they have to refactor or test one of the latter ;).

And I much favour introducing the reverse method rather than using
the list(reversed(l)) construct, which to me appears rather
unnecessarily complex and difficult to explain.

Right, which is why I prefer the [::-1] slice, although it looks like
the slice discussion in 01-numpy doesn't cover the step argument yet
8. I'm also fine not talking about reversing arrays at all.

profgiuseppe · 2016-03-15T18:47:51Z

@wking one thing that you mentioned may be quite important in this thread.
In the lesson you get

image = matplotlib.pyplot.imshow(data)
matplotlib.pyplot.show()

The lack of parameters for show() is the less troublesome aspect. We have imshow taking an argument, returning something, but neither are used in show(). imshow is changing the internal state of pyplot, then at this point the different between function and methods and the implications of side-effects should be mentioned, no matter if everything else was switched to functions.

wking · 2016-03-15T19:03:35Z

On Tue, Mar 15, 2016 at 11:47:53AM -0700, Giuseppe Profiti wrote:

@wking one thing that you mentioned may be quite important in this thread.
In the lesson you get
image = matplotlib.pyplot.imshow(data) matplotlib.pyplot.show()

The lack of parameters for show() is the less troublesome aspect. We
have imshow taking an argument, returning something, but neither are
used in show().

Good point. This is another instance of matplotlib's global state
magic 1. Maybe it's not possible to use matplotlib without either
that global state magic or methods 2, in which case I agree that
methods are going to be the better route.

It's frustrating to be one matplotlib.pyplot.show(image) away from
being able to drop methods. Maybe we should PR matplotlib :p.

kevin-vilbig · 2016-03-20T03:40:22Z

I thought about this for a while and I agree with this as a general rule. Avoiding odd syntax, that simplifies things when you are hacking between scopes that you probably shouldn't, is a good idea for beginners.

iglpdc · 2016-03-21T19:40:23Z

I think that methods (as we currently teach them in the lesson) don't add much extra load and I'd prefer them for the simplicity in the code. It's a big effort for the learners to follow the lesson and type, so if we can save them 10-20% of the typing by using methods, I'd go for it.

I think that methods are a natural follow up from objects and, conceptually, don't differ much from attributes, which we need to explain anyways.

I introduce objects very early, actually in the line that creates the first "variable". I mention "variable", but prefer to talk about names and objects, rather than variables and values. I think this is much aligned with the realities of both Python and the audience, e.g. people not coming from the hard sciences may not be used to think in terms "variables" having "values".

The great thing about objects is that the technical concept fits very well with many learner preconceptions: "everything is an object", "objects are physical things that take space somewhere", "objects have names", "the same object can have several names", "an object without a name cannot be found", "objects can be grouped in classes", "objects can be made up of other objects", ... (I'm not going into differences between objects, instances, classes and all that...)

From there to "objects can have actions associated to them" is a short step in my opinion: at the end, most of our day is spent dealing with interfaces to act on human-made objects.

The ubiquitous . syntax in Python makes even simpler to understand what data.mean() is doing. Tab completion in the notebook helps and is the shorter way to know which are the "actions" are available to an object.

On the other hand, I've never tried the function approach, so this is just an opinion. Maybe we should start "A/B testing" these things. Given that we are teaching almost a workshop a day and the alternative version in this PR is ready, it would take just a few of weeks to collect enough data to have a better answer to this.

tbekolay · 2016-03-26T18:37:17Z

Hi all, I changed the random() example to ctime() to avoid the magic RNG state issue (I have strong opinions about the RNG stuff but they're tangential to this thread). I've also added a pointer to this discussion in the frequently argued issues section of the instructor guide.

Since discussion seems to be winding down, I've compiled a summary of the arguments made in this thread so far; please feel free to correct me if I've misconstrued anything!

I didn't notice any obvious groupings other than arguments specific to NumPy and general arguments for or against teaching methods.

Arguments in favor of switching methods to functions

General:

Empirically, functions are easier for students to understand and are easier for instructors to explain. [@tbekolay, @shwina, @psteinb]
Attributes vs. methods can be confusing (i.e., why data.max() but data.shape instead of data.shape()). [@jgosmann]
Functions are necessary; methods are useful but not necessary, and in a 3 or 6 hour block we should focus on what's necessary. [@wking, @bsmith89, @jni, @willingc]
Learners with experience in another language will already know functions, but may not know methods. [@shwina]
Object oriented programming is "out of scope" for this lesson. [@gvwilson]
One needs to understand functions to understand methods, but not vice-versa. [@willingc]
It is easier to add new functions to manipulate objects than it is to add new objects to be manipulated. [@jgosmann]
Obtaining help for functions (help(func)) is more obvious than for methods (should it be help(instance.method) or help(Class.method)?) [@shwina]
Using functions tends to read like English (e.g., sqrt(mean(square(a))) does root-mean squared operation). [@kylerbrown]
Functions are part of SWC's good enough practices; methods are not. [@wking]

NumPy specific:

There's always has a NumPy function, but only sometimes a corresponding method. [@abostroem]
NumPy functions work on lists and other sequences in addition to ndarrays. [@jgosmann, @jni]
NumPy functions can always be nested (e.g., numpy.sqrt(numpy.mean(numpy.power(a, 2)))), while methods cannot always be chained (e.g., a.power(2).mean().sqrt() does not work). [@kylerbrown]

General +1s not cited above: @jiffyclub, @douglatornell, @gdevenyi, @fu9ar

Arguments in favor of keeping methods

Methods don't add much extra cognitive load. [@iglpdc]
Methods make the code shorter / simpler, which matters when learners are typing along. [@iglpdc, @jgosmann]
Methods are heavily used in the real world; learners should be introduced to them. [@jeremycg, @jttkim]
Object orientation sets Python apart from other scripting languages. [@jttkim]
Methods use . syntax, which is already introduced for namespaces, so it's not really new syntax. The . syntax also enables tab-completion of methods in some environments [@iglpdc]
To novices, functions and methods are both novel, so no preference should be assumed a priori. [@jttkim]
Knowledge of methods benefits those who already know functions. [@jttkim]
Methods enable scalable duck typing / polymorphism (e.g., instance.sum() can work for multiple instance types, whereas with sum(instance) one function must handle all cases). [@wking]
Methods are used for plotting commands and lists anyway. [@wking]

Compromises / caveats

Pandas makes heavy use of methods, so a lesson with Pandas should introduce methods. [@bsmith89]
Rather than introduce methods with NumPy, we should introduce them with strings and/or lists. [@willingc, @jttkim]
Should we also change any string or list methods that we're using to functions? [@wking]

Published quotes of interest

Methods are just like functions, with two differences:

Methods are defined inside a class definition in order to make
the relationship between the class and the method explicit.
The syntax for invoking a method is different from the syntax
for calling a function.
In the next few sections, we will take the functions from the
previous two chapters and transform them into methods.
This transformation is purely mechanical; you can do it simply
by following a sequence of steps. If you are comfortable converting
from one form to another, you will be able to choose the best form
for whatever you are doing.

From Think like a computer scientist in Python by Allen Downey

Procedural code (code using data structures) makes it easy to add
new functions without changing the existing data structures. OO
code, on the other hand, makes it easy to add new classes without
changing existing functions.

From Clean Code by Robert C. Martin

Most of the useful manipulations on arrays are done with
functions. This might be surprising given Python's object-oriented
framework, and that many of these functions could have been
implemented using methods instead. Choosing functions means that the
same procedures can be applied to arbitrary python sequences, not
just to arrays. For example, while transpose([[1,2],[3,4]]) works
just fine, [[1,2],[3,4]].transpose() can’t work. This approach also
allows uniformity in interface between functions defined in the
Numeric Python system, whether implemented in C or in Python, and
functions defined in extension modules. The use of array methods is
limited to functionality which depends critically on the
implementation details of array objects.

From Numeric manual

Summary

In all, I read:

13 arguments in favor of this PR,
16 people arguing for or +1ing this PR,
9 arguments against this PR,
5 people arguing against or -1ing this PR.

To me, this summary means that we should merge this PR. I believe that adding extra explanation about methods in the string / list sections, or removing the use of methods in these sections should be left to another issue / PR if anyone feels strongly about these changes.

@abostroem, if you agree with the assessment, then I'll leave the merging up to you!

@gvwilson Is a summary of this thread worth a blog post?

wking · 2016-03-26T21:41:52Z

On Sat, Mar 26, 2016 at 11:37:19AM -0700, Trevor Bekolay wrote:

Since discussion seems to be winding down, I've compiled a summary…

It's nice to see everything laid out so compactly :).

iglpdc · 2016-03-30T15:09:48Z

Thanks, @tbekolay: I vote to make it into a blog post :)

shwina · 2016-06-09T20:03:39Z

+1

Most methods on NumPy arrays also have an equivalent NumPy functions that takes the array as input. I have noticed that many novice programmers prefer function (which have a pretty clear input-output relationship) to methods (which have an implicit input). This commit changes all instances of calling methods on arrays to calling the equivalent NumPy function instead. I tried to simplify the explanation in `01-numpy.md` as well, which hopefully does not significantly affect the flow of the prose. I also added a "function" entry to the glossary and linked to it in `01-numpy.md`.

tbekolay · 2016-06-22T14:13:43Z

I rebased and merged this as per the discussion. I'll write the corresponding blog post shortly!

Closes swcarpentry#244.

tbekolay added the enhancement label Mar 7, 2016

tbekolay force-pushed the functions-over-methods branch 2 times, most recently from 97a79d4 to bf019cc Compare March 7, 2016 11:55

tbekolay added 3 commits June 22, 2016 10:12

Use time example instead of random()

54ed7ac

Explain why we removed NumPy methods

81a3560

tbekolay force-pushed the functions-over-methods branch from 5e7e45f to 81a3560 Compare June 22, 2016 14:12

tbekolay merged commit 81a3560 into gh-pages Jun 22, 2016

gvwilson deleted the functions-over-methods branch June 23, 2016 21:16

rgaiacs pushed a commit to rgaiacs/swc-python-novice-inflammation that referenced this pull request May 6, 2017

Adding link to lesson incubation.

a7357fa

Closes swcarpentry#244.

katrinleinweber mentioned this pull request Jun 10, 2018

Rephrase to avoid misinterpretation as pipeline swcarpentry/r-novice-inflammation#361

Merged

maxim-belkin mentioned this pull request Mar 14, 2019

pd.unique(df['column_name']) vs df['column_name'].unique() datacarpentry/python-ecology-lesson#366

Closed

maxim-belkin mentioned this pull request Jul 25, 2019

Lesson 10 - numpy.mean(data) and data.mean #675

Closed

ldko mentioned this pull request May 26, 2020

02-numpy: inconsistent text/images for array indexing and method calls #823

Closed

Uh oh!

Prefer functions over methods. #244

Prefer functions over methods. #244

Uh oh!

Conversation

tbekolay commented Mar 7, 2016

Uh oh!

gvwilson commented Mar 7, 2016 via email

Uh oh!

iglpdc commented Mar 9, 2016

Uh oh!

gvwilson commented Mar 9, 2016 via email

Uh oh!

wking commented Mar 9, 2016

Uh oh!

abostroem commented Mar 10, 2016

Uh oh!

tbekolay commented Mar 14, 2016

Uh oh!

jgosmann commented Mar 14, 2016

Uh oh!

bsmith89 commented Mar 14, 2016

Uh oh!

jni commented Mar 14, 2016

Uh oh!

profgiuseppe commented Mar 14, 2016

Uh oh!

shwina commented Mar 14, 2016

Uh oh!

tbekolay commented Mar 14, 2016

Uh oh!

jeremycg commented Mar 14, 2016

Uh oh!

jttkim commented Mar 14, 2016

Uh oh!

wking commented Mar 14, 2016

Uh oh!

willingc commented Mar 14, 2016

Uh oh!

jgosmann commented Mar 14, 2016

Uh oh!

jttkim commented Mar 14, 2016

Uh oh!

gvwilson commented Mar 14, 2016 via email

Uh oh!

wking commented Mar 14, 2016

Uh oh!

shwina commented Mar 14, 2016

Uh oh!

kylerbrown commented Mar 14, 2016

Uh oh!

wking commented Mar 14, 2016

Uh oh!

psteinb commented Mar 15, 2016

Uh oh!

bartoldeman commented Mar 15, 2016

8. Array Functions

Uh oh!

jttkim commented Mar 15, 2016

Uh oh!

willingc commented Mar 15, 2016

Uh oh!

wking commented Mar 15, 2016

Uh oh!

jttkim commented Mar 15, 2016

Uh oh!

wking commented Mar 15, 2016

Uh oh!

jttkim commented Mar 15, 2016

Uh oh!

jttkim commented Mar 15, 2016

Uh oh!

wking commented Mar 15, 2016

Uh oh!

wking commented Mar 15, 2016

Uh oh!

profgiuseppe commented Mar 15, 2016

Uh oh!

wking commented Mar 15, 2016

Uh oh!