-
Notifications
You must be signed in to change notification settings - Fork 1k
Closed
Labels
Description
HTML vignette series:
Planned for v1.9.8
- Quick tour of data.table
- Keys and fast binary search based subset
- Secondary indices and auto indexing
- Joins vignette. a) joins vs subsets -- extending binary search based subset to joins + conditional / non-equi joins, rolling and interval joins. b) by=.EACHI, join + update feature. c) Document
i.colusage as filed in Docs: explain and document the i.col notation for joins #1038. d) Also cover about performance/advantages fromonperforming slower than doublesetkey#1232. [ ] Covercovered in programming on data.table #4304get()andmget(). E.g., http://stackoverflow.com/q/33785747/559784- Add about on= argument rationale in FAQ ([Documentation] Use of the
on=argument for joins #1623). - FAQ 5.3 needs to mention that it's a shallow copy that's done in order to restore over-allocation. Thanks to Jan for linking it in := changes address of a data table #1729.
Future releases
- data.table internals, performance aspects and expressiveness
- Reading multiple files (
fread+rbindlist), ordering, ranking and set operations - IDateTime vignette
- Document the difference between
data.table()anddata.frame()somewhere - relevant issues: Creation of data.table using a list #968, data.table(x) != as.data.table(x) #877. Perhaps slightly more in detail in the FAQ. - coursera FAQ
- Advanced
data.tableusage:- NSE
- ...
- Timings vignette (moving [R-Forge #1133] Add r-help example to timings.Rnw #520 here to get everything in one place, but not sure if we need it as a vignette since we've the Wiki with benchmarks/timings).
-
fread+fwritevignette, include also Convenience features of fread wiki, also fread (and fwrite) vignette #2855
Finished:
- Introduction to data.table - data.table syntax, general form, subset rows in
i, select / do injand aggregations usingby. - Reference Semantics (add/update/delete columns by reference, and see that we can combine with
iandbyin the same way as before) - Efficient reshaping using data.tables
- Link to this answer on SO on
by=.EACHIuntil the vignette is done.
Minor:
- Operations using
integer64, and promoting it for large integers.
Notes (to update current vignettes based on feedbacks): Please let me know if I missed anything..
Introduction to data.table:
-
orderini. - Explain how to name columns in
jwhile selecting/computing. - Emphasise that keyby is applied after obtaining the result on the computed result, not on the original data.table.
- Mention new updates to
.SDcolsand cols inwith=FALSEbeing able to select columns ascolA:colB.
Reference semantics:
- Also explain all other relevant
set*functions here.. (setnames,setcolorderetc..) - Mainly
set. - Explain that
1b) the := operatoris just defining ways to use it - the example there doesn't work as it just shows two different ways of using it -- Following this comment.
Keys and fast binary search based subsets:
- Add an example of subset using integer/double keys.
- Difference in "nomatch" default in binary search based subsets.
- replacing NAs with binary search based subsets possible?
FAQ (most appropriate here, I think).
- Update FAQ with issue on external pointer being NULL when reading an R object from file, for example, using
readRDS(). Update this SO post. - Explain with example, on over allocating the data.table using
alloc.col(), and when to use it (when you need to create multiple columns), and why. Update this SO post.
KyleHaynes and avimallu