These are sample chapters from the book Pandas Brain Teasers: 25 brain teasers
to tickle your mind make you a better Pandas developer.
by Miki Tebeka.
Buy the book at Gumroad (ePub & PDF) 
The Brain Teasers
We shape our tools, and thereafter our tools shape us.
1. Rectified
1
2
3
4
5
6
7
8
9
10
11
import pandas as pd
def relu(n):
if n < 0:
return 0
return n
arr = pd.Series([1, 0, 1])
print(relu(arr))
Try to guess what the output is before moving to the next page. 
This code will raise a 
The problematic line is if n < 0:
, n
is the result of arr < 0
which is a
pandas.Series
.
In [1]: import pandas as pd
In [2]: arr = pd.Series([1, 0, 1])
In [3]: arr < 0
Out[3]:
0 True
1 False
2 False
dtype: bool
Once arr < 0
is computed, we use it in an if
statement.
Which brings us to how boolean values work in Python.
Every Python object, not only True
and False
has a boolean
value.
The
documentation
state the rules:
Everything is True
except:

0 numbers:
0
,0.0
,0+0j
… 
Empty collections:
[]
,{}
,''
, … 
None

False
You can test the truth value of a Python object using the builtin bool function.
On top of the above, any object can state its own boolean value using the
__bool__
special method.
The boolean logic for pandas.Series
is different than the one for a list or a
tuple  it raises an exception.
In [4]: bool(arr < 0)
...
ValueError: The truth value of a Series is ambiguous.
Use a.empty, a.bool(), a.item(), a.any() or a.all().
The exception tells you the reasoning  it follows The Zen of Python which states:
In the face of ambiguity, refuse the temptation to guess.
So, what are your options?
You can use all
or any
but then you’ll need to check the type of n
to see
if it’s a plain number of a pandas.Series
.
A function that works both on scalar and a pandas.Series
(or a numpy array)
is called a "ufunc", short for "universal function".
Most of the function from numpy or Pandas, such as min
, to_datetime
…, are
ufuncs.
numpy has a vectorize decorator for these cases.
1
2
3
4
5
6
7
8
9
10
11
12
13
import numpy as np
import pandas as pd
@np.vectorize
def relu(n):
if n < 0:
return 0
return n
arr = pd.Series([1, 0, 1])
print(relu(arr))
Now relu
will work both on scalars (e.g. 7, 2.18 …) and vectors (e.g. numpy
array, pandas.Series
…)
The output of relu now is numpy.ndarray , not pandas.Series .
You might want to have a look at
numba.vectorize
as well.

1.1. Further Reading

Truth value testing in the Python documentation

PEP 285  Adding a bool type

__bool__ documentation

Universal functions on the numpy docs
2. Free Range
1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
df = pd.DataFrame([
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
])
print(len(df.loc[1:3]))
Try to guess what the output is before moving to the next page. 
This code will print: 
Slices in Python are halfopen ^{[1]} range. You get values from the first index, up to but not including the last index.
In [1]: chars = ['a', 'b', 'c', 'd', 'e']
In [2]: chars[1:3]
Out[2]: ['b', 'c']
And most of the time, Pandas word the same way:
In [3]: s = pd.Series(chars)
In [4]: s[1:3]
Out[4]:
1 b
2 c
dtype: object
There are three ways to slice a pandas.Series
or a pandas.DataFrame
:
loc
works by label and it’s slices on a closed range  including the last
index.
In [5]: df[1:3]
Out[5]:
0 1 2
1 2 2 2
2 3 3 3
In [6]: df.iloc[1:3]
Out[6]:
0 1 2
1 2 2 2
2 3 3 3
In [7]: df.loc[1:3]
Out[7]:
0 1 2
1 2 2 2
2 3 3 3
3 4 4 4
Watch out for this offbyone errors ^{[2]} when using .loc
.
2.1. Further Reading

loc in Pandas documentation

iloc in Pandas documentation

Indexing and selecting data in the Pandas documentation

Offbyone error on Wikipedia
3. Off With Their NaNs
1
2
3
4
5
import numpy as np
import pandas as pd
s = pd.Series([1, np.nan, 3])
print(s[~(s == np.nan)])
Try to guess what the output is before moving to the next page. 
This code will will print 
0 1.0 1 NaN 2 3.0 dtype: float64 
We’ve covered some of the floating point oddities in [Multiplying].
NaN
(or np.nan
) is another oddity.
The name NaN
stands for "not a number", it serves two purposes  illegal
computation and missing values.
Here’s an example of a bad computation:
In [1]: np.float64(0)/np.float64(0) <ipythoninput50796728115601>:1: RuntimeWarning: invalid value encountered in double_scalars np.float64(0)/np.float64(0) Out[1]: nan
You see a warning but not an exception and the return value is nan
.
nan
does not equal any number, including itself.
In [2]: np.nan == np.nan
Out[2]: False
To check that a value is nan
, you need to use a special function such as
pandas.isnull.
In [3]: pd.isnull(np.nan)
Out[3]: True
You can use pandas.isnull
to fix this teaser.
1
2
3
4
5
import numpy as np
import pandas as pd
s = pd.Series([1, np.nan, 3])
print(s[~pd.isnull(s)])
pandas.isnull
work with all of Pandas "missing" values: None
,
pandas.NaT
(not a time) and the new pandas.NA
.
Floating points have several other special "numbers" such as inf
(infinity),
inf
, 0
, +0
and others. You can learn more about them in the links below.
3.1. Further Reading

pandas.isnull documentation

Experimental NA scalar to denote missing values in the Pandas documentation

Floating Point Arithmetic: Issues and Limitations in the Python documentation

floating point zine by Julia Evans

What Every Computer Scientist Should Know About FloatingPoint Arithmetic