Skip to content

punkbrwstr/pynto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

241 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pynto logo

pynto: Data analysis in Python using stack-based programming

pynto is a Python package that lets you manipulate a data frame as a stack of columns, using the the expressiveness of the concatenative/stack-oriented paradigm.

How does it work?

With pynto you chain together functions called words to formally specify how to calculate each column of your data frame. The composed words can be lazily evaluated over any range of rows to create your data frame.

Words add, remove or modify columns. They can operate on the entire stack or be limited to a certain columns using a column indexer. Composed words will operate in left-to-right order, with operators following their operands in postfix (Reverse Polish Notation) style. More complex operations can be specified using quotations, anonymous blocks of words that do not operate immediately, and combinators, higher-order words that control the execution of quotations.

What does it look like?

Here's a program to calculate deviations from moving average for each column in a table using the combinator/quotation pattern.

>>> import pynto as pt 
>>> ma_dev = (                        # create a pynto expression by concatenating words to
>>>     pt.load('stock_prices')      # append columns to stack from the build-in database
>>>     .q                            # start a quotation 
>>>         .dup                      # push a copy of the top (leftmost) column of the stack
>>>         .ravg(20)                 # calculate 20-period moving average
>>>         .sub                      # subtract top column from second column 
>>>     .p                            # close the quotation
>>>     .map                          # use the map combinator to apply the quotation
>>> )                                 # to each column in the stack
>>>
>>> df = ma_dev.rows['2021-06-01':]         # evaluate over a range of rows to get a DataFrame
>>> pt.db['stocks_ma_dev'] = df             # save the results back to the database   

Why pynto?

  • Expressive: Pythonic syntax; Combinatory logic for modular, reusable code
  • Performant: Memoization to eliminate duplicate operations
  • Batteries included: Built-in time series database
  • Interoperable: Seemlessly integration with Pandas/numpy

Get pynto

pip install pynto

Reference

The Basics

Constant literals

Add constant-value columns to the stack using literals that start with c, followed by a number with - and . characters replaced by _. rn adds whole number-value constant columns up to n - 1.

>>> # Compose _words_ that add a column of 10s to the stack, duplicate the column, 
>>> # and then multiply the columns together
>>> ten_squared = pt.c10_0.dup.mul         

Row indexers

To evaluate your expression, you use a row indexer. Specify rows by date range using the .rows[start:stop (exclusive):periodicity] syntax. None slicing arguments default to the widest range available. int indices also work with the .rows indexer. .first, and .last are included for convenience.

>>> ten_squared.rows['2021-06-01':'2021-06-03','B']                   # evaluate over a two business day date range                                                   
                 c
2021-06-01     100.0
2021-06-02     100.0

Quotations and Combinators

Combinators are higher-order functions that allow pynto to do more complicated things like branching and looping. Combinators operate on quotations, expressions that are pushed to the stack instead of operating on the stack. To push a quotation to the stack, put words in between q and p (or put an expression in the local namespace within the parentheses of pt.q(_expression_)). THe map combinator evaluated a quotation at the top of the stack over each column below in the stack.

>>> pt.c9.c10.q.dup.mul.p.map.last
                 c         c
2021-06-02      81.0     100.0

Headers

Each column has a string header. hset sets the header to a new value. Headers are useful for filtering or arranging columns.

>>> pt.c9.c10.q.dup.mul.p.map.hset('a','b').last
                 a         b
2021-06-02      81.0     100.0

Column indexers

Column indexers specify the columns on which a word operates, overiding the word's default. Postive int indices start from the bottom (left) of the stack and negative indices start from the top.

By default add has a column indexer of [-2:]

>>> pt.r5.add.last
              c    c    c    c
2021-06-02  0.0  1.0  2.0  7.0

Change the column indexer of add to [:] to sum all columns

>>> pt.r5.add[:].last
               c
2025-06-02  10.0

You can also index columns by header, using regular expressions

>>> pt.r3.hset('a,b,c').add['(a|c)'].last
              b    a
2025-06-02  1.0  2.0

Defining words

Words in the local namespace can be composed using the + operator.

>>> squared = pt.dup.mul
>>> ten_squared2 = pt.c10_0 + squared    # same thing

Words can also be defined globally in the pynto vocabulary.

>>> pt.define['squared'] = pt.dup.mul
>>> ten_squared3 = pt.c10_0.squared    # same thing

The Database

pynto has built-in database functionality that lets you save DataFrames and Series to a Redis database. The database saves the underlying numpy data in native byte format for zero-copy retrieval. Each DataFrame column is saved as an independent key and can be retrieved or updated on its own. The database also supports three-dimensional frames that have a two-level MultiIndex.

>>> pt.db['my_df'] = expr.rows['2021-06-01':'2021-06-03']
>>> pt.load('my_df').rows[:]
              constant  constant
2021-06-01      81.0     100.0
2021-06-02      81.0     100.0

pynto built-in vocabulary

Column Creation

Word Default Selector Parameters Description
c [-1:] values Pushes constant columns for each of values
day_count [-1:] Pushes a column with the number of days in the period
from_pandas [:] pandas, round_ Pushes columns from Pandas DataFrame or Series pandas
load [-1:] Pushes columns of a DataFrame saved to internal DB as key
nan [-1:] values Pushes a constant nan-valued column
period_ordinal [-1:] Pushes a column with the period ordinal
r [-1:] n Pushes constant columns for each whole number from 0 to n - 1
randn [-1:] Pushes a column with values from a random normal distribution
timestamp [-1:] Pushes a column with the timestamp of the end of the period

Stack Manipulation

Word Default Selector Parameters Description
drop [-1:] Removes selected columns
dup [-1:] Duplicates columns
hsort [:] Sorts columns by header
id [:] Identity/no-op
interleave [:] parts Divides columns in parts groups and interleaves the groups
keep [:] Removes non-selected columns
nip [-1:] Removes non-selected columns, defaulting selection to top
pull [:] Brings selected columns to the top
rev [:] Reverses the order of selected columns
roll [:] Permutes selected columns
swap [-2:] Swaps top and bottom selected columns

Quotation

Word Default Selector Parameters Description
q [-1:] quoted, this Wraps the following words until p as a quotation, or wraps quoted expression as a quotation

Header manipulation

Word Default Selector Parameters Description
halpha [:] Set headers to alphabetical values
happly [:] header_func Apply header_func to headers_
hformat [:] format_spec Apply format_spec to headers
hreplace [:] old, new Replace old with new in headers
hset [:] headers Set headers to *headers
hsetall [:] headers Set headers to *headers repeating, if necessary

Combinators

Word Default Selector Parameters Description
call [:] Applies quotation
cleave [:] num_quotations Applies all preceding quotations
compose [:] num_quotations Combines quotations
hmap [:] Applies quotation to stacks created grouping columns by header
ifexists [:] count Applies quotation if stack has at least count columns
ifexistselse [:] count Applies top quotation if stack has at least count columns, otherwise applies second quotation
ifheaders [:] predicate Applies top quotation if list of column headers fulfills predicate
ifheaderselse [:] predicate Applies quotation if list of column headers fulfills predicate, otherwise applies second quotation
map [:] every Applies quotation in groups of every
partial [-1:] quoted, this Pushes stack columns to the front of quotation
repeat [:] times Applies quotation times times

Data cleanup

Word Default Selector Parameters Description
ffill [:] lookback, leave_end Fills nans with previous values, looking back lookback before range and leaving trailing nans unless not leave_end
fill [:] Fills nans with value
fillfirst [-1:] lookback Fills first row with previous non-nan value, looking back lookback before range
join [-2:] date Joins two columns at date
sync [:] Align available data by setting all values to NaN when any values is NaN
zero_first [-1:] Changes first value to zero
zero_to_na [-1:] Changes zeros to nans

Resample methods

Word Default Selector Parameters Description
resample_avg [:] Sets periodicity resampling method to avg
resample_first [:] Sets periodicity resampling method to first
resample_firstnofill [:] Sets periodicity resampling method to first
resample_last [:] Sets periodicity resampling method to last
resample_lastnofill [:] Sets periodicity resampling method to last with no fill
resample_max [:] Sets periodicity resampling method to max
resample_min [:] Sets periodicity resampling method to min
resample_sum [:] Sets periodicity resampling method to sum
set_periodicity [-1:] periodicity Changes column periodicity to periodicity, then resamples
set_start [-1:] start Changes period start to start, then resamples

Row-wise Reduction

Word Default Selector Parameters Description
add [-2:] ignore_nans Addition
avg [-2:] ignore_nans Arithmetic average
div [-2:] ignore_nans Division
max [-2:] ignore_nans Maximum
med [-2:] ignore_nans Median
min [-2:] ignore_nans Minimum
mod [-2:] ignore_nans Modulo
mul [-2:] ignore_nans Multiplication
pow [-2:] ignore_nans Power
std [-2:] ignore_nans Standard deviation
sub [-2:] ignore_nans Subtraction
var [-2:] ignore_nans Variance

Row-wise Reduction Ignoring NaNs

Word Default Selector Parameters Description
nadd [-2:] ignore_nans Addition
navg [-2:] ignore_nans Arithmetic average
ndiv [-2:] ignore_nans Division
nmax [-2:] ignore_nans Maximum
nmed [-2:] ignore_nans Median
nmin [-2:] ignore_nans Minimum
nmod [-2:] ignore_nans Modulo
nmul [-2:] ignore_nans Multiplication
npow [-2:] ignore_nans Power
nstd [-2:] ignore_nans Standard deviation
nsub [-2:] ignore_nans Subtraction
nvar [-2:] ignore_nans Variance

Rolling Window

Word Default Selector Parameters Description
ewm_mean [-1:] window Exponentially-weighted moving average
ewm_std [-1:] window Exponentially-weighted standard deviation
ewm_var [-1:] window Exponentially-weighted variance
radd [-1:] window Addition
ravg [-1:] window Arithmetic average
rcor [-2:] window Correlation
rcov [-2:] window Covariance
rdif [-1:] window Lagged difference
rlag [-1:] window Lag
rmax [-1:] window Maximum
rmed [-1:] window Median
rmin [-1:] window Minimum
rret [-1:] window Lagged return
rstd [-1:] window Standard deviation
rvar [-1:] window Variance
rzsc [-1:] window Z-score

Cumulative

Word Default Selector Parameters Description
cadd [-1:] Addition
cavg [-1:] Arithmetic average
cdif [-1:] Lagged difference
clag [-1:] Lag
cmax [-1:] Maximum
cmin [-1:] Minimum
cmul [-1:] Multiplication
cret [-1:] Lagged return
cstd [-1:] Standard deviation
csub [-1:] Subtraction
cvar [-1:] Variance

Reverse Cumulative

Word Default Selector Parameters Description
rcadd [-1:] Addition
rcavg [-1:] Arithmetic average
rcdif [-1:] Lagged difference
rclag [-1:] Lag
rcmax [-1:] Maximum
rcmin [-1:] Minimum
rcmul [-1:] Multiplication
rcret [-1:] Lagged return
rcstd [-1:] Standard deviation
rcsub [-1:] Subtraction
rcvar [-1:] Variance

One-for-one functions

Word Default Selector Parameters Description
abs [-1:] Absolute value
dec [-1:] Decrement
exp [-1:] Exponential
expm1 [-1:] Exponential minus one
inc [-1:] Increment
inv [-1:] Multiplicative inverse
lnot [-1:] Logical not
log [-1:] Natural log
log1p [-1:] Natural log of increment
neg [-1:] Additive inverse
rank [:] Row-wise rank
sign [-1:] Sign
sqrt [-1:] Square root

About

Time series analysis in Python using the concatenative paradigm

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages