[WIP] new implementation of parsing and serialization by N-Coder · Pull Request #240 · ics-py/ics-py

N-Coder · 2020-03-29T11:54:55Z

This "conversion" process now has two levels and replaces the old Parser and Serializer classes, combining code revolving around the same type in one place:

The low-level part is in valuetype, where ics value strings are converted to simple python objects and back, based on some externally determined VALUE type and without using any context or inspecting parameters.
The high-level part is in converter. A GenericConverter can take multiple ContentLines or Containers and convert them into the respective values of a Component. For AttributeConverters this is done for a single attribute of a Component type (which must be an attrs class), using metadata of the attribute to determine further information, such as value type, is required or multi value, ics name, etc. AttributeValueConverters then use a valuetype converter to parse the simple attribute value. Other converters combine multiple ContentLines into a single attribute, e.g. a Timespan, Person, or rrule. The ComponentConverter, which is created from the Meta attribute set on any Component subclass, inspects all attributes of the class and calls the respective converters. All unknown parameters are now also collected in a dict.

This makes it a lot less work to add new attributes, as most conversion logic is generated automatically and without any redundant code. Additionally, this makes it easier to implement correct handling of VTIMEZONEs in all places, it will allow implementation of JSON-based ical handling, variable levels of parser strictness and might even allow further extension to similar formats such as vCard.

I also included some further refactorings:

The tools module with the broken online validation is gone (the website is offline).
As all Alarms now only span very few lines of code they have been merged into a single file.
ics.grammar.parse has been shortened to ics.grammar.
The inner Meta classes has been replaced by instances of an attrs class.
The Component conversion methods are now called from_container and to_container.
For ContentLine/Container there's now a serialize method to convert them to ics strings.

The most important change might be that all custom __str__ and __repr__ methods were removed. Previously, __str__ returned the ics string and __repr__ returned a very short informal description, which made dumping the actual python values hard when debugging. They now default to what attr generates and in general follow the standard that reprs "should look like a valid Python expression that could be used to recreate an object with the same value". This makes debugging easier and only generates the ics representation when this is intended, as this is (and always has been) a quite heavyweight process which might also fail -- and you usually don't want exceptions thrown when dumping any object as a string. I also plan to add convenience to_ics and from_ics methods for easier transition from the old behaviour. Similarly I plan to bring back the old repr behaviour for str, yielding a nice and short informal representation. To summarize, these methods should work as follows:

repr returns a full, valid python representation, is fast and can't throw exceptions
str returns a short human-readable description, is fast and can't throw exceptions
to_ics returns the full ics representation, is still pretty fast and usually shouldn't throw exceptions

Actually, the conversion might even be a little bit faster than before, as we now do all the introspection and finding of Converters once at import-time (which we might later move to doing it lazy at first use if loading the package takes to long) and they no longer require the ContentLines to be rearranged first, but simply process them as a stream.

This "conversion" process now has two levels: The low-level part is in `valuetype`, where ics value strings are converted to simple python objects and back, based on some externally determined VALUE type and without using any context or inspecting parameters. The high-level part is in `converter`. A `GenericConverter` can take multiple ContentLines or Containers and convert them into the respective values of a `Component`. For `AttributeConverter`s this is done for a single attribute of a `Component` type (which must be an `attr`s class), using metadata of the attribute to determine further information, such as value type, is required or multi value, ics name, etc. `AttributeValueConverter`s then use a `valuetype` converter to parse the simple attribute value. Other converters combine multiple `ContentLine`s into a single attribute, e.g. a `Timespan`, `Person`, or `rrule`. The `ComponentConverter`, which is created from the `Meta` attribute set on any `Component` subclass, inspects all attributes of the class and calls the respective converters. All unknown parameters are now also collected in a dict. This makes it a lot less work to add new attributes, as most conversion logic can be generated automatically and without any redundant code. Additionally, this makes it easier to implement correct handling of `VTIMEZONE`s in all places and will allow implementation of JSON-based ical handling and variable levels of parser strictness. I also included some further refactorings: the `tools` module with the broken online validation is gone (the website is offline), as all `Alarm`s now only take a few lines they have been merged into a single file, `ics.grammar.parse` has been shortened to `ics.grammar`, the inner `Meta` classes has been replaced by instances of an `attr`s class, the `Component` conversion methods are now called `from_container` and `to_container` and for `ContentLine`/`Container` there's now a `serialize` method to convert them to ics strings. The most important change might be that all custom `__str__` and `__repr__` methods were removed. They now default to what `attr` generates and in general follow the standard that `repr`s "should look like a valid Python expression that could be used to recreate an object with the same value". This makes debugging easier and allows us to implement `str` with a nice and short informal representation, and only generate the ics representation when this is intended. Previously, `__str__` returned the ics string and `__repr__` returned a very short informal description, which made dumping the actual python values hard when debugging.

ValueConverters are now allowed to modify optional params and context, i.e. consume params and store context when parsing, and add params when serializing. also move ExtraParams to types, use NewType instead of a direct alias to catch invalid dict usage, and ensure that they are copied using deep-copy (they might contain lists), add EmptyDict as argument default, fix timespan context clean-up

@C4ptainCrunch

For my current PR #240 I'm adding `hypothesis` as a test dependency to allow generation of random input strings to better test parsing and escaping logic (see e44e4b3). I also changed how we load the grammar file, to make the package zip safe again (see 3cafe53). While working on how these could be integrated into our testsuite, I found out that `setup.py test` is deprecated ([1](pytest-dev/pytest#5534) [2](https://tox.readthedocs.io/en/latest/example/basic.html#integration-with-setup-py-test-command)). The recommendation was to use `tox` to test the generated package directly using `pytest` instead of relying on `setup.py` to test the sources. Additionally, there's a designated successor to `setup.py` and `setup.cfg` (in which we probably accumulated a few outdated definitions), the build-system independent `pyproject.toml`. Additionally, there are newer tools like `pipenv`, `flit` and `poetry` to help with the whole build and publishing process. As I was not quite happy with how the whole development process of ics.py was set up, I wanted to give them a shot. Collecting some opinion around the internet, it seemed that `flit` was mostly targeted at very simple low-configuration projects and the development of `pipenv` somehow stagnated, while `poetry` seemed to be a well suited solution. So this PR contains my attempt at migrating to `poetry`, with all ics.py sources moved according to the new recommended format. It's mostly the config files in the root directory that changed, but I also removed the `dev` directory as it should no longer be needed. Next to all files from the `./ics/` directory remain unchanged and are simply moved to `./src/ics/`. I didn't copy the tests over yet, as I plan to rewrite most of them in my other branch. The first of the two main configuration files is now `pyproject.toml`, where all meta-information on the project and how it can be installed (i.e. dependencies) and built are stored (without the need to actually execute the file and have some specific setuptools lying around). The second is `tox.ini`, where all testing related functionality is configured. A third file `.bumpversion.cfg` is from a small tool that helps with updating the version number in all places when doing a new release. The `poetry.lock` file optionally stores the dependency versions against which we want to develop, which is independent from the versions the library pulls in as dependency itself, where we are pretty liberal, and the versions we test against, which is always the latest releases. All library sourcecode now resides within a `src` folder, which is recommended as it prevents you from accidentally having the sources in your PATH when you want to test the final package. The root directory now looks very clean and all those files have their specific purpose. If you want to configure how testing is done, you find all information in [`tox.ini`](https://github.com/N-Coder/ics.py-poetry/blob/master/tox.ini). If you want to run the tests (i.e. pytest, doctest, flake8, mypy and check that the docs build), simply run `tox` - you don't have to worry about which versions in which venvs are installed and whether you're directly testing against the sources or against a built package, tox handles all that for you. This not only comes in very handy when running the tests manually, but should also ensure that [CI](https://github.com/N-Coder/ics.py-poetry/blob/master/.github/workflows/pythonpackage.yml) does exactly the same. On a side note, we're now again publishing [coverage data](https://codecov.io/gh/N-Coder/ics.py-poetry). If you just want to run the tests and don't need to fiddle around with the development version of ics in an interactive shell, that's all you need. For the fiddling part, just run [`poetry install`](https://python-poetry.org/docs/cli/#install) and you will have a turnkey development environment set up. Use `poetry shell` or `poetry run` to easily access the venv poetry set up for you. Publishing is now also very simple: just call `poetry publish --build` and the tool will take care of the rest. This made it very easy to make some releases on the [testing pypi instance](https://test.pypi.org/project/ics/#history). The third and last tool you might want is `bumpversion`, if you are making new releases. But there is no need anymore to handle any venvs yourself or to install all ics.py dependencies globally. To summarize, if you want to hit the ground running and publish a new release on a newly set-up machine, the following should suffice: ```bash git clone https://github.com/N-Coder/ics.py-poetry.git && cd ics.py-poetry pip install tox poetry bumpversion --user tox # make sure all the test run bumpversion --verbose release # 0.8.0-dev -> 0.8.0 (release) poetry build # build the package tox --recreate # ensure that the version numbers are consistent # check changelog and amend if necessary git push && git push --tags poetry publish # publish to pypi bumpversion --verbose minor # 0.8.0 (release) -> 0.9.0-dev git push && git push --tags ``` You can try that out if you want -- except for the publishing part maybe. Also note that `bumpversion` directly makes a commit with the new version if you don't pass `--no-commit` or `--dry-run`, but that's no problem as you can easily amend any changes you want to make, e.g. to the changelog. The above information on developing, testing and publishing can now also be found in the docs (see CONTRIBUTING.rst). As these changes are partially based upon #240 but are also quite fundamental, I wanted to collect feedback first before including the changes into #240. The only other thing #240 is still lacking is more testing (only few files already have close to 100% coverage), and I'd prefer to provide that using `tox` in this new environment. So that's also some kind of cyclic dependency. Sorry for the (now superfluous) issue I opened before. So @C4ptainCrunch (and maybe also @aureooms and @tomschr), what's your opinion on this? * migrate repo structure to poetry * fix src path for pytest * add doc skeleton * implement handling of attachments * import project files * set version * fix sphinx build with poetry * don't use poetry within tox see python-poetry/poetry#1941 (comment) * fix timezone tests * change coveralls action * try codecov * bugfixes * add bumpversion * separate src inspection (flake8+mypy src/) from package testing (pytest tests/) to fix PATH problems * bugfixes * Merge branch 'master' into new-parser-impl * remove old files * add dev and publish instructions * checker happiness `noqa` and `type: ignore` are now only used for actual bugs in the checkers unfortunately, current pyflakes dislikes `type: ignore[something]`, so we can't ignore specific mypy bugs until pyflakes 2.2 is in flakes8 * more checker happiness * Apply suggestions from code review Co-Authored-By: Tom Schraitle <tomschr@users.noreply.github.com> * use gitignore directly from github instead of gitignore.io * Apply suggestions from code review to tox.ini * fix tox.ini * add pypy support Mostly by moving/splitting test dependencies to different sections in tox.ini as mypy and pypy don't work well together and it is sufficient to run mypy checks on CPython. * update developing documentation * fix non-ASCII whitespace handling * update test/dev dependencies

@C4ptainCrunch

For my current PR #240 I'm adding `hypothesis` as a test dependency to allow generation of random input strings to better test parsing and escaping logic (see e44e4b3). I also changed how we load the grammar file, to make the package zip safe again (see 3cafe53). While working on how these could be integrated into our testsuite, I found out that `setup.py test` is deprecated ([1](pytest-dev/pytest#5534) [2](https://tox.readthedocs.io/en/latest/example/basic.html#integration-with-setup-py-test-command)). The recommendation was to use `tox` to test the generated package directly using `pytest` instead of relying on `setup.py` to test the sources. Additionally, there's a designated successor to `setup.py` and `setup.cfg` (in which we probably accumulated a few outdated definitions), the build-system independent `pyproject.toml`. Additionally, there are newer tools like `pipenv`, `flit` and `poetry` to help with the whole build and publishing process. As I was not quite happy with how the whole development process of ics.py was set up, I wanted to give them a shot. Collecting some opinion around the internet, it seemed that `flit` was mostly targeted at very simple low-configuration projects and the development of `pipenv` somehow stagnated, while `poetry` seemed to be a well suited solution. So this PR contains my attempt at migrating to `poetry`, with all ics.py sources moved according to the new recommended format. It's mostly the config files in the root directory that changed, but I also removed the `dev` directory as it should no longer be needed. Next to all files from the `./ics/` directory remain unchanged and are simply moved to `./src/ics/`. I didn't copy the tests over yet, as I plan to rewrite most of them in my other branch. The first of the two main configuration files is now `pyproject.toml`, where all meta-information on the project and how it can be installed (i.e. dependencies) and built are stored (without the need to actually execute the file and have some specific setuptools lying around). The second is `tox.ini`, where all testing related functionality is configured. A third file `.bumpversion.cfg` is from a small tool that helps with updating the version number in all places when doing a new release. The `poetry.lock` file optionally stores the dependency versions against which we want to develop, which is independent from the versions the library pulls in as dependency itself, where we are pretty liberal, and the versions we test against, which is always the latest releases. All library sourcecode now resides within a `src` folder, which is recommended as it prevents you from accidentally having the sources in your PATH when you want to test the final package. The root directory now looks very clean and all those files have their specific purpose. If you want to configure how testing is done, you find all information in [`tox.ini`](https://github.com/N-Coder/ics.py-poetry/blob/master/tox.ini). If you want to run the tests (i.e. pytest, doctest, flake8, mypy and check that the docs build), simply run `tox` - you don't have to worry about which versions in which venvs are installed and whether you're directly testing against the sources or against a built package, tox handles all that for you. This not only comes in very handy when running the tests manually, but should also ensure that [CI](https://github.com/N-Coder/ics.py-poetry/blob/master/.github/workflows/pythonpackage.yml) does exactly the same. On a side note, we're now again publishing [coverage data](https://codecov.io/gh/N-Coder/ics.py-poetry). If you just want to run the tests and don't need to fiddle around with the development version of ics in an interactive shell, that's all you need. For the fiddling part, just run [`poetry install`](https://python-poetry.org/docs/cli/#install) and you will have a turnkey development environment set up. Use `poetry shell` or `poetry run` to easily access the venv poetry set up for you. Publishing is now also very simple: just call `poetry publish --build` and the tool will take care of the rest. This made it very easy to make some releases on the [testing pypi instance](https://test.pypi.org/project/ics/#history). The third and last tool you might want is `bumpversion`, if you are making new releases. But there is no need anymore to handle any venvs yourself or to install all ics.py dependencies globally. To summarize, if you want to hit the ground running and publish a new release on a newly set-up machine, the following should suffice: ```bash git clone https://github.com/N-Coder/ics.py-poetry.git && cd ics.py-poetry pip install tox poetry bumpversion --user tox # make sure all the test run bumpversion --verbose release # 0.8.0-dev -> 0.8.0 (release) poetry build # build the package tox --recreate # ensure that the version numbers are consistent # check changelog and amend if necessary git push && git push --tags poetry publish # publish to pypi bumpversion --verbose minor # 0.8.0 (release) -> 0.9.0-dev git push && git push --tags ``` You can try that out if you want -- except for the publishing part maybe. Also note that `bumpversion` directly makes a commit with the new version if you don't pass `--no-commit` or `--dry-run`, but that's no problem as you can easily amend any changes you want to make, e.g. to the changelog. The above information on developing, testing and publishing can now also be found in the docs (see CONTRIBUTING.rst). As these changes are partially based upon #240 but are also quite fundamental, I wanted to collect feedback first before including the changes into #240. The only other thing #240 is still lacking is more testing (only few files already have close to 100% coverage), and I'd prefer to provide that using `tox` in this new environment. So that's also some kind of cyclic dependency. Sorry for the (now superfluous) issue I opened before. So @C4ptainCrunch (and maybe also @aureooms and @tomschr), what's your opinion on this? * migrate repo structure to poetry * fix src path for pytest * add doc skeleton * implement handling of attachments * import project files * set version * fix sphinx build with poetry * don't use poetry within tox see python-poetry/poetry#1941 (comment) * fix timezone tests * change coveralls action * try codecov * bugfixes * add bumpversion * separate src inspection (flake8+mypy src/) from package testing (pytest tests/) to fix PATH problems * bugfixes * Merge branch 'master' into new-parser-impl * remove old files * add dev and publish instructions * checker happiness `noqa` and `type: ignore` are now only used for actual bugs in the checkers unfortunately, current pyflakes dislikes `type: ignore[something]`, so we can't ignore specific mypy bugs until pyflakes 2.2 is in flakes8 * more checker happiness * Apply suggestions from code review Co-Authored-By: Tom Schraitle <tomschr@users.noreply.github.com> * use gitignore directly from github instead of gitignore.io * Apply suggestions from code review to tox.ini * fix tox.ini * add pypy support Mostly by moving/splitting test dependencies to different sections in tox.ini as mypy and pypy don't work well together and it is sufficient to run mypy checks on CPython. * update developing documentation * fix non-ASCII whitespace handling * update test/dev dependencies

N-Coder force-pushed the new-parser-impl branch from 3a81c30 to f6544d9 Compare March 29, 2020 12:03

N-Coder added 12 commits March 29, 2020 17:50

fixed Timespan end/due extra param handling and further small issues

8466e3a

ensure that event modifications times are in UTC

b05ce7d

rename Event.name to the RFC compliant summary

eae493d

make mypy happy with ExtraParams and EmptyParams and their defaults

537823d

warn when a modification of EmptyDictType is attempted

827088e

bring back __str__ and fix doctests

f846543

make zip safe using importlib_resources

3cafe53

improved handling of escaped strings, testing

e44e4b3

fix handling of quoted params

c79fd0b

implement handling of attachments

f608e0c

bugfixes

b9e6ab9

N-Coder mentioned this pull request Apr 11, 2020

New build system #241

Closed

N-Coder force-pushed the new-parser-impl branch from 6673720 to b9e6ab9 Compare April 11, 2020 19:53

N-Coder mentioned this pull request Apr 12, 2020

docs(AUTHORS.rst): Polish contents. #242

Merged

This was referenced Apr 20, 2020

New build system #243

Merged

Improve the documentation of ics #220

Open

N-Coder mentioned this pull request May 16, 2020

Roadmap for v0.8 #245

Open

21 tasks

N-Coder added this to the Version 0.8 milestone May 16, 2020

N-Coder changed the base branch from master to squash May 16, 2020 10:08

N-Coder merged commit 52ef1f1 into squash May 16, 2020

N-Coder mentioned this pull request May 16, 2020

squash and merge #240 and #243 #246

Merged

N-Coder deleted the new-parser-impl branch May 16, 2020 10:58

N-Coder mentioned this pull request Oct 14, 2020

Fix documentation quickstart example #262

Closed

N-Coder mentioned this pull request Jun 1, 2022

add warnings about breaking changes to Calendar.__str__ and __iter__ in v0.7(.1) #318

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] new implementation of parsing and serialization#240

[WIP] new implementation of parsing and serialization#240
N-Coder merged 13 commits intosquashfrom
new-parser-impl

N-Coder commented Mar 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

N-Coder commented Mar 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant