Conversation
Codecov Report
@@ Coverage Diff @@
## main #448 +/- ##
==========================================
+ Coverage 84.50% 86.64% +2.14%
==========================================
Files 134 134
Lines 6497 6516 +19
==========================================
+ Hits 5490 5646 +156
+ Misses 1007 870 -137
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
| f'attribute {attr} is not a valid multi modal attribute.' | ||
| f' One possible cause is the usage of a non-supported type in the dataclass definition.' | ||
| ) | ||
| return int(pos) |
There was a problem hiding this comment.
Not sure to be honest, in the original implementation this was done (but in a slightly different place), so I kept it.
| if attr not in self._metadata['multi_modal_schema']: | ||
| raise ValueError(f'the Document schema does not contain attribute {attr}') | ||
|
|
||
| pos = self._metadata['multi_modal_schema'][attr].get('position') |
There was a problem hiding this comment.
I wonder if we can have a list of all these specific KEYS and their meaning so that we do not have them scattered around the code
There was a problem hiding this comment.
Good point, but I would also have to collect them all around the code base first.
There was a problem hiding this comment.
Hello, my name is Nicholas, I am new here, but I have been doing something for a small learning and automation time saving project that I could recreate and modify a bit and I do believe that it could automate the task of identifying your "keys / Symbolic-Arguments via first performing a search of all the standard source code files and read me documentations, located in the projects collection. As long as the files are encoded in standard us English text char encoding it wouldn't matter whether they be .py, .js, whatever. The agent would simply be searching through the source code looking for all the comment blocks, and extracting those as well as all the read me and help documentation files. Once it has collected at least the majority of said comments and documentation text from all included project files, it would count up the number of times repeatedly used, but non standard/non-dictionary words which are found to occur in a frequency that is above the statistical expectation for a similar non-dictionary phrase if not being used to convey some symbolic meaning or as a proxy for an argument/variable.
It could then collect all such possible "KEYS" you guys are discussing, trying to create a unified index to be used by all code contributors and future users, and then at least all the charector strings which are repetitively utalied in the source comments and or documentation would be neatly indexed, in order of most frequent to least if that would be also helpful :).
Would you guys think me creating a small program/script to do that would actually be very helpful? I know it's not a huge thing, but I am just getting into ML and I could certainly take on something like that. Once the list of possible keys has been created, of course, they should be discussed amongst the core staff and frequent contributors, I think. In fact, I would be okay with simply creating the tool, and then handing the code and the tool with its simple documentation and usage information to your team. Let me know what you think, if that would really be helpful, or not. It's okay if not, I would rather know it wouldn't really help much, than to commit my limited available coding / research time to a task that is of negligible value for the sake of someone trying to be polite. Lol, I don't get my feelings hurt, so just be clear and honest, and I'll appreciate your genuine communication over overly polite confusing language any day! Thanks either way. TheAssemboler, NickOrion21@gmail.com
I am working on a Java script tool that loads any document, website, .CSV, excel, ... Etc, basically any standard char. file, and functionally performs an array of regex swaps for simple string replacements in the target file. This was to allow much faster creation of custom eBAY listings using a custom HTML template I created to sell a large number of collectable trading cards, each being slightly different requiring a number of particular changes in the base template, and doing so manually was simply too slow.
I mention this However, because as I read your guys conversation it occures to me that I could conceivably adapt the code so that instead of doing reg ex swaps via arrays from custom target files and replacement values per item.
I could write a new version that could first perform a search of all the documents located in a collection, whether they be .py, .js, whatever, and search through the comments of each language to identify specific non-dictionary words that occur in a frequency that is above the statistical expectation for a similar non-dictionary phrase.
It could then collect all such possible "KEYS" you guys are discussing, trying to create a sort of unified index to be used by all code contributors and future users. Would that be very helpful? I know it's not a huge thing, but I am just getting into creating basic AI agents and leveraging the awesome power of ML and I could certainly take on something semi simple such as a key identification program.
Once the list of possible keys has been created, of course, they should be discussed and officiated or removed amongst the core developers and and frequent contributes.
In fact, I would be okay with simply creating the tool, and then handing the code and the tool with its simple documentation and usage information to your team. Let me know what you think, if that would actually be helpful, in a real way, or not. Thanks either way. The Assemboler, NickOrion21@gmail.com
Previously, setting a multi-modal attribute did not set chunks properly. This resulted in only Document level access working, while DocumentArray level access was broken.
This PR fixes this.
The previous tests were not comprehensive enough to catch this flaw. This PR adds tests to that effect.