Skip to content

[WIP] new implementation of parsing and serialization#240

Merged
N-Coder merged 13 commits intosquashfrom
new-parser-impl
May 16, 2020
Merged

[WIP] new implementation of parsing and serialization#240
N-Coder merged 13 commits intosquashfrom
new-parser-impl

Conversation

@N-Coder
Copy link
Copy Markdown
Member

@N-Coder N-Coder commented Mar 29, 2020

This "conversion" process now has two levels and replaces the old Parser and Serializer classes, combining code revolving around the same type in one place:

  • The low-level part is in valuetype, where ics value strings are converted to simple python objects and back, based on some externally determined VALUE type and without using any context or inspecting parameters.
  • The high-level part is in converter. A GenericConverter can take multiple ContentLines or Containers and convert them into the respective values of a Component. For AttributeConverters this is done for a single attribute of a Component type (which must be an attrs class), using metadata of the attribute to determine further information, such as value type, is required or multi value, ics name, etc. AttributeValueConverters then use a valuetype converter to parse the simple attribute value. Other converters combine multiple ContentLines into a single attribute, e.g. a Timespan, Person, or rrule. The ComponentConverter, which is created from the Meta attribute set on any Component subclass, inspects all attributes of the class and calls the respective converters. All unknown parameters are now also collected in a dict.

This makes it a lot less work to add new attributes, as most conversion logic is generated automatically and without any redundant code. Additionally, this makes it easier to implement correct handling of VTIMEZONEs in all places, it will allow implementation of JSON-based ical handling, variable levels of parser strictness and might even allow further extension to similar formats such as vCard.

I also included some further refactorings:

  • The tools module with the broken online validation is gone (the website is offline).
  • As all Alarms now only span very few lines of code they have been merged into a single file.
  • ics.grammar.parse has been shortened to ics.grammar.
  • The inner Meta classes has been replaced by instances of an attrs class.
  • The Component conversion methods are now called from_container and to_container.
  • For ContentLine/Container there's now a serialize method to convert them to ics strings.

The most important change might be that all custom __str__ and __repr__ methods were removed. Previously, __str__ returned the ics string and __repr__ returned a very short informal description, which made dumping the actual python values hard when debugging. They now default to what attr generates and in general follow the standard that reprs "should look like a valid Python expression that could be used to recreate an object with the same value". This makes debugging easier and only generates the ics representation when this is intended, as this is (and always has been) a quite heavyweight process which might also fail -- and you usually don't want exceptions thrown when dumping any object as a string. I also plan to add convenience to_ics and from_ics methods for easier transition from the old behaviour. Similarly I plan to bring back the old repr behaviour for str, yielding a nice and short informal representation. To summarize, these methods should work as follows:

  • repr returns a full, valid python representation, is fast and can't throw exceptions
  • str returns a short human-readable description, is fast and can't throw exceptions
  • to_ics returns the full ics representation, is still pretty fast and usually shouldn't throw exceptions

Actually, the conversion might even be a little bit faster than before, as we now do all the introspection and finding of Converters once at import-time (which we might later move to doing it lazy at first use if loading the package takes to long) and they no longer require the ContentLines to be rearranged first, but simply process them as a stream.

This "conversion" process now has two levels:
The low-level part is in `valuetype`, where ics value strings are converted to
simple python objects and back, based on some externally determined VALUE type
and without using any context or inspecting parameters.
The high-level part is in `converter`. A `GenericConverter` can take multiple
ContentLines or Containers and convert them into the respective values of a
`Component`. For `AttributeConverter`s this is done for a single attribute of a
`Component` type (which must be an `attr`s class), using metadata of the
attribute to determine further information, such as value type, is required or
multi value, ics name, etc. `AttributeValueConverter`s then use a `valuetype`
converter to parse the simple attribute value. Other converters combine
multiple `ContentLine`s into a single attribute, e.g. a `Timespan`, `Person`,
or `rrule`. The `ComponentConverter`, which is created from the `Meta`
attribute set on any `Component` subclass, inspects all attributes of the class
and calls the respective converters. All unknown parameters are now also
collected in a dict.

This makes it a lot less work to add new attributes, as most conversion logic
can be generated automatically and without any redundant code. Additionally,
this makes it easier to implement correct handling of `VTIMEZONE`s in all
places and will allow implementation of JSON-based ical handling and variable
levels of parser strictness.

I also included some further refactorings: the `tools` module with the broken
online validation is gone (the website is offline), as all `Alarm`s now only
take a few lines they have been merged into a single file, `ics.grammar.parse`
has been shortened to `ics.grammar`, the inner `Meta` classes has been replaced
by instances of an `attr`s class, the `Component` conversion methods are now
called `from_container` and `to_container` and for `ContentLine`/`Container`
there's now a `serialize` method to convert them to ics strings. The most
important change might be that all custom `__str__` and `__repr__` methods were
removed. They now default to what `attr` generates and in general follow the
standard that `repr`s "should look like a valid Python expression that could be
used to recreate an object with the same value". This makes debugging easier
and allows us to implement `str` with a nice and short informal representation,
and only generate the ics representation when this is intended. Previously,
`__str__` returned the ics string and `__repr__` returned a very short informal
description, which made dumping the actual python values hard when debugging.
@N-Coder N-Coder mentioned this pull request Apr 11, 2020
This was referenced Apr 20, 2020
@N-Coder N-Coder mentioned this pull request May 16, 2020
21 tasks
@N-Coder N-Coder added this to the Version 0.8 milestone May 16, 2020
@N-Coder N-Coder changed the base branch from master to squash May 16, 2020 10:08
@N-Coder N-Coder merged commit 52ef1f1 into squash May 16, 2020
N-Coder added a commit that referenced this pull request May 16, 2020
For my current PR #240 I'm adding `hypothesis` as a test dependency to allow generation of random input strings to better test parsing and escaping logic (see e44e4b3). I also changed how we load the grammar file, to make the package zip safe again (see 3cafe53). While working on how these could be integrated into our testsuite, I found out that `setup.py test` is deprecated ([1](pytest-dev/pytest#5534) [2](https://tox.readthedocs.io/en/latest/example/basic.html#integration-with-setup-py-test-command)). The recommendation was to use `tox` to test the generated package directly using `pytest` instead of relying on `setup.py` to test the sources. Additionally, there's a designated successor to `setup.py` and `setup.cfg` (in which we probably accumulated a few outdated definitions), the build-system independent `pyproject.toml`. Additionally, there are newer tools like `pipenv`, `flit` and `poetry` to help with the whole build and publishing process. As I was not quite happy with how the whole development process of ics.py was set up, I wanted to give them a shot. Collecting some opinion around the internet, it seemed that `flit` was mostly targeted at very simple low-configuration projects and the development of `pipenv` somehow stagnated, while `poetry` seemed to be a well suited solution.

So this PR contains my attempt at migrating to `poetry`, with all ics.py sources moved according to the new recommended format.
It's mostly the config files in the root directory that changed, but I also removed the `dev` directory as it should no longer be needed. Next to all files from the `./ics/` directory remain unchanged and are simply moved to `./src/ics/`. I didn't copy the tests over yet, as I plan to rewrite most of them in my other branch.
The first of the two main configuration files is now `pyproject.toml`, where all meta-information on the project and how it can be installed (i.e. dependencies) and built are stored (without the need to actually execute the file and have some specific setuptools lying around). The second is `tox.ini`, where all testing related functionality is configured. A third file `.bumpversion.cfg` is from a small tool that helps with updating the version number in all places when doing a new release. The `poetry.lock` file optionally stores the dependency versions against which we want to develop, which is independent from the versions the library pulls in as dependency itself, where we are pretty liberal, and the versions we test against, which is always the latest releases. All library sourcecode now resides within a `src` folder, which is recommended as it prevents you from accidentally having the sources in your PATH when you want to test the final package.

The root directory now looks very clean and all those files have their specific purpose. If you want to configure how testing is done, you find all information in [`tox.ini`](https://github.com/N-Coder/ics.py-poetry/blob/master/tox.ini). If you want to run the tests (i.e. pytest, doctest, flake8, mypy and check that the docs build), simply run `tox` - you don't have to worry about which versions in which venvs are installed and whether you're directly testing against the sources or against a built package, tox handles all that for you. This not only comes in very handy when running the tests manually, but should also ensure that [CI](https://github.com/N-Coder/ics.py-poetry/blob/master/.github/workflows/pythonpackage.yml) does exactly the same. On a side note, we're now again publishing [coverage data](https://codecov.io/gh/N-Coder/ics.py-poetry).

If you just want to run the tests and don't need to fiddle around with the development version of ics in an interactive shell, that's all you need. For the fiddling part, just run [`poetry install`](https://python-poetry.org/docs/cli/#install) and you will have a turnkey development environment set up. Use `poetry shell` or `poetry run` to easily access the venv poetry set up for you. Publishing is now also very simple: just call `poetry publish --build` and the tool will take care of the rest. This made it very easy to make some releases on the [testing pypi instance](https://test.pypi.org/project/ics/#history).

The third and last tool you might want is `bumpversion`, if you are making new releases. But there is no need anymore to handle any venvs yourself or to install all ics.py dependencies globally. To summarize, if you want to hit the ground running and publish a new release on a newly set-up machine, the following should suffice:

```bash
git clone https://github.com/N-Coder/ics.py-poetry.git && cd ics.py-poetry
pip install tox poetry bumpversion --user
tox # make sure all the test run
bumpversion --verbose release # 0.8.0-dev -> 0.8.0 (release)
poetry build # build the package
tox --recreate # ensure that the version numbers are consistent
# check changelog and amend if necessary
git push && git push --tags
poetry publish # publish to pypi
bumpversion --verbose minor # 0.8.0 (release) -> 0.9.0-dev
git push && git push --tags
```

You can try that out if you want -- except for the publishing part maybe. Also note that `bumpversion` directly makes a commit with the new version if you don't pass `--no-commit` or `--dry-run`, but that's no problem as you can easily amend any changes you want to make, e.g. to the changelog.

The above information on developing, testing and publishing can now also be found in the docs (see CONTRIBUTING.rst). As these changes are partially based upon #240 but are also quite fundamental, I wanted to collect feedback first before including the changes into #240. The only other thing #240 is still lacking is more testing (only few files already have close to 100% coverage), and I'd prefer to provide that using `tox` in this new environment. So that's also some kind of cyclic dependency.

Sorry for the (now superfluous) issue I opened before. So @C4ptainCrunch (and maybe also @aureooms and @tomschr), what's your opinion on this?

* migrate repo structure to poetry

* fix src path for pytest

* add doc skeleton

* implement handling of attachments

* import project files

* set version

* fix sphinx build with poetry

* don't use poetry within tox

see python-poetry/poetry#1941 (comment)

* fix timezone tests

* change coveralls action

* try codecov

* bugfixes

* add bumpversion

* separate src inspection (flake8+mypy src/) from package testing (pytest tests/) to fix PATH problems

* bugfixes

* Merge branch 'master' into new-parser-impl

* remove old files

* add dev and publish instructions

* checker happiness

`noqa` and `type: ignore` are now only used for actual bugs in the checkers
unfortunately, current pyflakes dislikes `type: ignore[something]`, so we can't ignore specific mypy bugs until pyflakes 2.2 is in flakes8

* more checker happiness

* Apply suggestions from code review

Co-Authored-By: Tom Schraitle <tomschr@users.noreply.github.com>

* use gitignore directly from github instead of gitignore.io

* Apply suggestions from code review to tox.ini

* fix tox.ini

* add pypy support

Mostly by moving/splitting test dependencies to different sections in
tox.ini as mypy and pypy don't work well together and it is sufficient
to run mypy checks on CPython.

* update developing documentation

* fix non-ASCII whitespace handling

* update test/dev dependencies
N-Coder added a commit that referenced this pull request May 16, 2020
For my current PR #240 I'm adding `hypothesis` as a test dependency to allow generation of random input strings to better test parsing and escaping logic (see e44e4b3). I also changed how we load the grammar file, to make the package zip safe again (see 3cafe53). While working on how these could be integrated into our testsuite, I found out that `setup.py test` is deprecated ([1](pytest-dev/pytest#5534) [2](https://tox.readthedocs.io/en/latest/example/basic.html#integration-with-setup-py-test-command)). The recommendation was to use `tox` to test the generated package directly using `pytest` instead of relying on `setup.py` to test the sources. Additionally, there's a designated successor to `setup.py` and `setup.cfg` (in which we probably accumulated a few outdated definitions), the build-system independent `pyproject.toml`. Additionally, there are newer tools like `pipenv`, `flit` and `poetry` to help with the whole build and publishing process. As I was not quite happy with how the whole development process of ics.py was set up, I wanted to give them a shot. Collecting some opinion around the internet, it seemed that `flit` was mostly targeted at very simple low-configuration projects and the development of `pipenv` somehow stagnated, while `poetry` seemed to be a well suited solution.

So this PR contains my attempt at migrating to `poetry`, with all ics.py sources moved according to the new recommended format.
It's mostly the config files in the root directory that changed, but I also removed the `dev` directory as it should no longer be needed. Next to all files from the `./ics/` directory remain unchanged and are simply moved to `./src/ics/`. I didn't copy the tests over yet, as I plan to rewrite most of them in my other branch.
The first of the two main configuration files is now `pyproject.toml`, where all meta-information on the project and how it can be installed (i.e. dependencies) and built are stored (without the need to actually execute the file and have some specific setuptools lying around). The second is `tox.ini`, where all testing related functionality is configured. A third file `.bumpversion.cfg` is from a small tool that helps with updating the version number in all places when doing a new release. The `poetry.lock` file optionally stores the dependency versions against which we want to develop, which is independent from the versions the library pulls in as dependency itself, where we are pretty liberal, and the versions we test against, which is always the latest releases. All library sourcecode now resides within a `src` folder, which is recommended as it prevents you from accidentally having the sources in your PATH when you want to test the final package.

The root directory now looks very clean and all those files have their specific purpose. If you want to configure how testing is done, you find all information in [`tox.ini`](https://github.com/N-Coder/ics.py-poetry/blob/master/tox.ini). If you want to run the tests (i.e. pytest, doctest, flake8, mypy and check that the docs build), simply run `tox` - you don't have to worry about which versions in which venvs are installed and whether you're directly testing against the sources or against a built package, tox handles all that for you. This not only comes in very handy when running the tests manually, but should also ensure that [CI](https://github.com/N-Coder/ics.py-poetry/blob/master/.github/workflows/pythonpackage.yml) does exactly the same. On a side note, we're now again publishing [coverage data](https://codecov.io/gh/N-Coder/ics.py-poetry).

If you just want to run the tests and don't need to fiddle around with the development version of ics in an interactive shell, that's all you need. For the fiddling part, just run [`poetry install`](https://python-poetry.org/docs/cli/#install) and you will have a turnkey development environment set up. Use `poetry shell` or `poetry run` to easily access the venv poetry set up for you. Publishing is now also very simple: just call `poetry publish --build` and the tool will take care of the rest. This made it very easy to make some releases on the [testing pypi instance](https://test.pypi.org/project/ics/#history).

The third and last tool you might want is `bumpversion`, if you are making new releases. But there is no need anymore to handle any venvs yourself or to install all ics.py dependencies globally. To summarize, if you want to hit the ground running and publish a new release on a newly set-up machine, the following should suffice:

```bash
git clone https://github.com/N-Coder/ics.py-poetry.git && cd ics.py-poetry
pip install tox poetry bumpversion --user
tox # make sure all the test run
bumpversion --verbose release # 0.8.0-dev -> 0.8.0 (release)
poetry build # build the package
tox --recreate # ensure that the version numbers are consistent
# check changelog and amend if necessary
git push && git push --tags
poetry publish # publish to pypi
bumpversion --verbose minor # 0.8.0 (release) -> 0.9.0-dev
git push && git push --tags
```

You can try that out if you want -- except for the publishing part maybe. Also note that `bumpversion` directly makes a commit with the new version if you don't pass `--no-commit` or `--dry-run`, but that's no problem as you can easily amend any changes you want to make, e.g. to the changelog.

The above information on developing, testing and publishing can now also be found in the docs (see CONTRIBUTING.rst). As these changes are partially based upon #240 but are also quite fundamental, I wanted to collect feedback first before including the changes into #240. The only other thing #240 is still lacking is more testing (only few files already have close to 100% coverage), and I'd prefer to provide that using `tox` in this new environment. So that's also some kind of cyclic dependency.

Sorry for the (now superfluous) issue I opened before. So @C4ptainCrunch (and maybe also @aureooms and @tomschr), what's your opinion on this?

* migrate repo structure to poetry

* fix src path for pytest

* add doc skeleton

* implement handling of attachments

* import project files

* set version

* fix sphinx build with poetry

* don't use poetry within tox

see python-poetry/poetry#1941 (comment)

* fix timezone tests

* change coveralls action

* try codecov

* bugfixes

* add bumpversion

* separate src inspection (flake8+mypy src/) from package testing (pytest tests/) to fix PATH problems

* bugfixes

* Merge branch 'master' into new-parser-impl

* remove old files

* add dev and publish instructions

* checker happiness

`noqa` and `type: ignore` are now only used for actual bugs in the checkers
unfortunately, current pyflakes dislikes `type: ignore[something]`, so we can't ignore specific mypy bugs until pyflakes 2.2 is in flakes8

* more checker happiness

* Apply suggestions from code review

Co-Authored-By: Tom Schraitle <tomschr@users.noreply.github.com>

* use gitignore directly from github instead of gitignore.io

* Apply suggestions from code review to tox.ini

* fix tox.ini

* add pypy support

Mostly by moving/splitting test dependencies to different sections in
tox.ini as mypy and pypy don't work well together and it is sufficient
to run mypy checks on CPython.

* update developing documentation

* fix non-ASCII whitespace handling

* update test/dev dependencies
@N-Coder N-Coder deleted the new-parser-impl branch May 16, 2020 10:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant