Skip to content

codespell: add codespell config, workflow and some typos fixed#2470

Closed
yarikoptic wants to merge 8 commits intozenodo:masterfrom
yarikoptic:enh-codespell
Closed

codespell: add codespell config, workflow and some typos fixed#2470
yarikoptic wants to merge 8 commits intozenodo:masterfrom
yarikoptic:enh-codespell

Conversation

@yarikoptic
Copy link
Copy Markdown

workflow would detect new typos if they are introduced. No magical fixing would happen behind your shoulders.

Thanks for the zenodo!

"id": "cc-by-sa"
},
"description": "<p><strong>Project Specification</strong></p>\n\n<p>The aim of this openlab summer student project is to enhance ZENODO digital repository service with several preservation-oriented features, such as preservation meter and badge to indicate the&nbsp;suitability of a document for long-term reservation. The project will be developed in the Python&nbsp;programming language, using Flask/HTML5/jQuery/TwitterBootstrap technologies for the user&nbsp;interface and SQLAlchemy/MySQL for persistence.</p>\n\n<p><strong>Abstract</strong></p>\n\n<p>Digital Preservation consists mainly in storing digital information, mostly digital-born content,&nbsp;and making sure that it remains available and accessible in the future. This tasks has many&nbsp;challenges such as making sure that the files are in a known and acessible format, that they are&nbsp;not corrupt, lost or unretrievable.&nbsp;The digital preservation challenges apply, noticeably, on digital repositories such as Zenodo.&nbsp;Zenodo aims to provide a secure and trusty way of storing data for the long tail of science. This is&nbsp;to say, storing and connecting information that is normally not available on the main publications,&nbsp;such as the used datasets for a given study or the produced software for a specific paper.&nbsp;The goal of this work was to develop a Preservation Meter that allowed the users to know how&nbsp;suitable the files on their submitted records are in terms of preservation.This was accomplished by using a simple and intuitive visual representation of such suitability by&nbsp;means of a progress bar, where a completely filled bar means the file is very likely to be well&nbsp;preserved.&nbsp;The overall goals of the project were completed and the implementation of this work was&nbsp;integrated on the Zenodo repository as a plugin.</p>",
"description": "<p><strong>Project Specification</strong></p>\n\n<p>The aim of this openlab summer student project is to enhance ZENODO digital repository service with several preservation-oriented features, such as preservation meter and badge to indicate the&nbsp;suitability of a document for long-term reservation. The project will be developed in the Python&nbsp;programming language, using Flask/HTML5/jQuery/TwitterBootstrap technologies for the user&nbsp;interface and SQLAlchemy/MySQL for persistence.</p>\n\n<p><strong>Abstract</strong></p>\n\n<p>Digital Preservation consists mainly in storing digital information, mostly digital-born content,&nbsp;and making sure that it remains available and accessible in the future. This tasks has many&nbsp;challenges such as making sure that the files are in a known and accessible format, that they are&nbsp;not corrupt, lost or unretrievable.&nbsp;The digital preservation challenges apply, noticeably, on digital repositories such as Zenodo.&nbsp;Zenodo aims to provide a secure and trusty way of storing data for the long tail of science. This is&nbsp;to say, storing and connecting information that is normally not available on the main publications,&nbsp;such as the used datasets for a given study or the produced software for a specific paper.&nbsp;The goal of this work was to develop a Preservation Meter that allowed the users to know how&nbsp;suitable the files on their submitted records are in terms of preservation.This was accomplished by using a simple and intuitive visual representation of such suitability by&nbsp;means of a progress bar, where a completely filled bar means the file is very likely to be well&nbsp;preserved.&nbsp;The overall goals of the project were completed and the implementation of this work was&nbsp;integrated on the Zenodo repository as a plugin.</p>",
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here typo was acessible

"id": "cc-by-sa"
},
"description": "<p>&nbsp;&nbsp;Abstract</p>\n\n<p>WinCC OA is a SCADA (Supervisory Control and Data Acquisition) system tool that is used to develop the Control System applications. As most of the control systems used in CERN are developed in WinCC OA, it is better useful to understand how the applications developed by EN/ICE are actually used by the different operators .It becomes more and more important to monitor users&rsquo; behavior and analyzing it. The final goal of this project is to develop a generic WinCC OA component to collect data about user interaction which will take advantage of (1) the internal mechanisms already present in WinCC OA to monitor some user interactions such as the internal UI data points; and (2) the commonalities of applications through the use of the standard frameworks JCOP, UNICOS and CPC. The final component developed provides the capability of storing as well as displaying user interaction data on a single timeline.</p>\n\n<p>Kewords: WinCC OA, SCADA, JCOP, UNICOS, CPC</p>",
"description": "<p>&nbsp;&nbsp;Abstract</p>\n\n<p>WinCC OA is a SCADA (Supervisory Control and Data Acquisition) system tool that is used to develop the Control System applications. As most of the control systems used in CERN are developed in WinCC OA, it is better useful to understand how the applications developed by EN/ICE are actually used by the different operators .It becomes more and more important to monitor users&rsquo; behavior and analyzing it. The final goal of this project is to develop a generic WinCC OA component to collect data about user interaction which will take advantage of (1) the internal mechanisms already present in WinCC OA to monitor some user interactions such as the internal UI data points; and (2) the commonalities of applications through the use of the standard frameworks JCOP, UNICOS and CPC. The final component developed provides the capability of storing as well as displaying user interaction data on a single timeline.</p>\n\n<p>Keywords: WinCC OA, SCADA, JCOP, UNICOS, CPC</p>",
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here typo was Kewords:

"id": "cc-by-sa"
},
"description": "<p><strong>Project Specification</strong></p>\n\n<p>The project concerns various C++11 features - their performance and reliability. The report summarizes the tesults from four micro-benchmarks designed for this project and run with three different compilers (GCC, ICC, Clang) and tries to make an evaluation based on the results.</p>\n\n<p><strong>Abstract</strong></p>\n\n<p>As C++11 gained almost full support by compilers, it is interesting to see whether we can leverage some of the features to improve performance and reliability of C++ code. This work is focused on four selected problems: time measurement techniques, for-loops efficiency, asynchronous tasks and parallel mode of STL algorithms. For each of them a micro-benchmark is made. All the benchmarks are fully automatized to generate results from running binaries compiled by three compilers: GCC, ICC and Clang with -O2, -O3 and -Ofast options. In order to evaluate vectorization and multithreading, profiling tools such as perf and Intel Vtune are used.</p>",
"description": "<p><strong>Project Specification</strong></p>\n\n<p>The project concerns various C++11 features - their performance and reliability. The report summarizes the tesults from four micro-benchmarks designed for this project and run with three different compilers (GCC, ICC, Clang) and tries to make an evaluation based on the results.</p>\n\n<p><strong>Abstract</strong></p>\n\n<p>As C++11 gained almost full support by compilers, it is interesting to see whether we can leverage some of the features to improve performance and reliability of C++ code. This work is focused on four selected problems: time measurement techniques, for-loops efficiency, asynchronous tasks and parallel mode of STL algorithms. For each of them a micro-benchmark is made. All the benchmarks are fully automated to generate results from running binaries compiled by three compilers: GCC, ICC and Clang with -O2, -O3 and -Ofast options. In order to evaluate vectorization and multithreading, profiling tools such as perf and Intel Vtune are used.</p>",
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here was automatized replaced to automated... I guess could be added to skips if really desired

Comment thread zenodo/modules/fixtures/data/funders.json Outdated
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w -i 3 -C 2",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
@musvaage
Copy link
Copy Markdown
Contributor

musvaage commented Nov 22, 2023

here was automatized replaced to automated... I guess could be added to skips if really desired

zenodo/modules/fixtures/data/records.json

looks like @lnielsen in #475 initially submitted and merged that file

possibly the author's intention in that file was offers rather than offsers

Seagate Kinetic offsers ethernet enabled disk drives

yarikoptic added a commit to yarikoptic/codespell that referenced this pull request Nov 22, 2023
A sharp eye of the @musvaage spotted it in
zenodo/zenodo#2470 .
And indeed github search gives 460 hits ATM
https://github.com/search?q=offsers&type=code
and they look like legit typos.

Typo is really close to offser->offset one, but upon
quick github search I never found offser to be a typo
for offer. I think that spurious "s" in offsers is mechanical,
and thus I did not bother adding second variant for "offser" typo
to be offer
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
@yarikoptic
Copy link
Copy Markdown
Author

possibly the author's intention in that file was offers rather than offsers

nice catch!

DimitriPapadopoulos pushed a commit to codespell-project/codespell that referenced this pull request Nov 23, 2023
A sharp eye of the @musvaage spotted it in
zenodo/zenodo#2470 .
And indeed github search gives 460 hits ATM
https://github.com/search?q=offsers&type=code
and they look like legit typos.

Typo is really close to offser->offset one, but upon
quick github search I never found offser to be a typo
for offer. I think that spurious "s" in offsers is mechanical,
and thus I did not bother adding second variant for "offser" typo
to be offer
@musvaage
Copy link
Copy Markdown
Contributor

musvaage commented Dec 16, 2023

cf: #2476 (comment)

Modifying description fields in records.json may seem innocuous.
...
@mitsosf, @ppanero

In retrospect if any of the description fields modified in #2423 (Closed) should be reverted this can be pursued.

More specifically developer input on the purpose of modifying the description fields should be forthcoming!

Indeed, is there any 'public exposure' of such zenodo hosted "Abstract"(s) aside from those appearing on the specified URLs?

If not, there would appear to be NO reason to modify those description fields.

I am happy to submit a PR reverting specific changes implemented in #2423 (Closed).

$ grep -nr loaddemorecords zenodo
zenodo/zenodo/modules/fixtures/cli.py:46:from .records import loaddemorecords, loadsipmetadatatypes
zenodo/zenodo/modules/fixtures/cli.py:115:@fixtures.command('loaddemorecords')
zenodo/zenodo/modules/fixtures/cli.py:120:def loaddemorecords_cli(records_file=None, owner=None):
zenodo/zenodo/modules/fixtures/cli.py:129:        loaddemorecords(records, owner)
zenodo/zenodo/modules/fixtures/records.py:42:def loaddemorecords(records, owner):
$ 
$ ed -s zenodo/zenodo/modules/fixtures/cli.py <<<'116,117p'
@click.option('--records-file', type=click.File(),
              default=join(dirname(__file__), 'data/records.json'))
$ 

My sympathies at this point is to scrap this PR.

A new PR implementing exclusively the herein exposed typos occurring in the .html file and the .py files might be made.

That approach would obviously not modify tests/unit/deposit/test_api_metadata.py.

Adding codespell content to this repo seems to me to be over the top.

@yarikoptic
Copy link
Copy Markdown
Author

My sympathies at this point is to scrap this PR.

really not sure what exactly motivation to scrap this PR since it fixes "legit" typos in a number of files/locations. If you would like to keep records.json not modified - I can skip it, just tell me so.

@musvaage
Copy link
Copy Markdown
Contributor

motivation to scrap this PR

cf: #2476 (comment)

I'd have approached the subject matter of #2470 differently.

  1. an Issue soliciting Maintainer opinion on adding a workflow to detect new typos if they are introduced
  2. a separate Pull Request fixing the spell-checker identified and other typos

FWIW, the majority of the commits to this repo appear to be 2 to 8 years old.

To elaborate on my previous comment to this current PR.

Adding codespell content to this repo seems to me to be over the top.

IMHO, the proposed workflow isn't justified owing to the frequency of new textual content being added to this repo.

@yarikoptic
Copy link
Copy Markdown
Author

Adding codespell content to this repo seems to me to be over the top.

IMHO, the proposed workflow isn't justified owing to the frequency of new textual content being added to this repo.

oh well, I guess we have different opinions about such automations - I tend to automate anything which could save humans some time at the cost of some negligible compute (in particular the free one). Feel welcome to scrap this PR then.

@musvaage
Copy link
Copy Markdown
Contributor

@lnielsen

I am happy to submit a PR reverting specific changes implemented in #2423 (Closed).

As a Member would you voice an opinion on whether changes to records.json in #2423 (Closed) should be reverted?

Factually I believe those changes to records.json should be reverted.


Separately might be clarified the Maintainer/Member consensus on whether or not to add this OP's proposed workflow.

@slint slint mentioned this pull request Feb 20, 2024
@slint
Copy link
Copy Markdown
Member

slint commented Feb 20, 2024

Unfortunately, I will have to close this PR to prevent unnecessary work from being done in this repository for the time being. See my detailed response at #2519 (comment)

@slint slint closed this Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants