Generatools

Toolbox for text generation

Programmatic prompting
Grid exploration of prompt and LLM parameters
Experiment storage
Evaluation

Install

Cuda 11.0

Beforehand, you may want to uninstall previous nvidia drivers etc.. See here.

Pytorch

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Other requirements

pip install -r requirements.txt

Tasks list

The following would improve code robustness:

Programmatic generation & evaluation: when stabilised, add functional tests.
Programmatic generation: when stabilised, add sanity checks on conf file used.
mlflow: mlflow works through global variables, which can be dangerous. A good workaround would be to set experiment and run at the beginning of each function that makes use of mlflow.
PromptSeqsPairs: this object inherits from dataclass, which is not appropriate. Especially, it prevents from elegantly checking the sanity of changes in the contained data. Using a generic class, and defining functions for adding metrics and the like, would be preferable.

Contributing

Pre-commit hooks

At the root of the current repo, run

pre-commit install --hook-type pre-commit --hook-type pre-push

Using mlflow

1/ mlflow uses global variables for keeping track of the experiment and run at hand. This can be pretty dangerous. For that reason, all classes & functions in generatools.utils.mlflow assume that both mlflow.set_tracking_uri() and mflow.set_experiment() have been called beforehand to set the dir where all expe are stored, and the expe itself.

2/ mlflow.log_params string-ifies parameters before storing. This leads to ugly side-behaviors. Also, there is a 250-characters limit to parameters alues. In generatools.genseqs and generatools.grading, parameters are thus stored and retrieved from a json artifact to circumvent these limitations.

Conventions

Linting

Before commiting, use black for code formatting, with line-length set to 79.

black . -l 79

Printing and logging

All the priting should happen through a logger. Do not use print.

In a module, iniate a minimal logger only, so that the user's logger config won't be overriden

import logging
logger = logging.getLogger(__name__)

In a script, the following can be used:

import logging
logging.basicConfig(
    format="%(asctime)s %(name)-12s %(levelname)-8s %(message)s", level=logging.INFO
)
logger = logging.getLogger(__name__)

Exceptions

If you want to both raise an error and log it, you can use mnemgen.utils.logging.log_and_raise.

Typing hint

Following Google style rules here and here.

This makes the code easy to inspect statically. Especially, the IDE becomes able to show the definition of any argument, which helps understanding their manipulation.

Docstring

Docstrings follow the numpy conventions.

Save time using vim-pydocstring.

Docstrings must be thorough for classes and functions used by the end user. They may be much lighter for private functions, methods and classes.

For each module, put a docstring in the __init__.py file.

Testing

Tests should be marked:

Slow tests: use decorator @pytest.mark.slow
Unit tests: use decorator @pytest.mark.unit

Distinguising unit tests (implementation) from functional tests. If a unit tests fails but the implementation test don't, this may simply be related to an implementation change.

Distinguishing slow tests from others allow running fast tests all the time, and slow tests less often.

You may use both pytest and unittest. When running tests:

Faster tests only: py.test
Including slow tests: py.test --runslow

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
generatools		generatools
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generatools

Install

Cuda 11.0

Pytorch

Other requirements

Tasks list

Contributing

Pre-commit hooks

Using mlflow

Conventions

Linting

Printing and logging

Exceptions

Typing hint

Docstring

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Generatools

Install

Cuda 11.0

Pytorch

Other requirements

Tasks list

Contributing

Pre-commit hooks

Using mlflow

Conventions

Linting

Printing and logging

Exceptions

Typing hint

Docstring

Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages