Conversation
The When there are siblings, the new thought will receive a rank that is half-way in between its surrounding siblings (or a simple integer decrement or increment for adding to the beginning or end of the context, respectively). This is done for efficiency; it allows new thoughts to be added at any position without having to modify the existing ranks.
From https://github.com/cybersemics/em/wiki/Technical-Overview:
Hope that helps!
Yes, I do not know the implications of changing the ranking logic in this way. I would recommend preserving it as-is unless you have a good reason to change it.
In this case, the thoughts are imported in this order (shown with ranks): 0: It appears that the rank is unintentionally incremented after popping out of each context. However as described above, this has no effect on the relative ranks within each context. Hopefully this helps, but if you get stuck and want me to evaluate why |
|
Thanks you for detailed explanation! I've read Technical Overview and understand that absolute value of rank doesn't matter in general.
This explains my problem! Let me know please what's the reason for this decision? Because in Technical Overview says that
Considering this, the first thought in a context (if context doesn't have any thoughts) should have rank 0. But after importing we get this: 0: /a Should we set ranks this way after importing? Correct me please, if i'm wrong somewhere. |
No, that's not the case.
It seems to be just an outcome of the current
I will update the Technical Overview! That is misleading since it does not apply to
The first thought in a context does not need to have rank 0. I'm not going to introduce a constraint that is not needed. Relative ranks are perfectly functional and actually more efficient. |
425ca52 to
11543b3
Compare
0563658 to
06e5813
Compare
raineorshine
left a comment
There was a problem hiding this comment.
Hey Fedor, nice work. I appreciate the organization and readability of the code.
I have done an initial review of convertHTMLtoJSON. I haven't reviewed importJSON yet, but I figured I would send you what I have so far so you can address the questions and suggestions. Thanks!
src/util/convertHTMLtoJSON.ts
Outdated
| return Object.assign({}, parent, { children: children.flat() }) as PreBlock | ||
| } | ||
| if (nodes.every(node => Array.isArray(node))) return nodes.flat() | ||
| if (nodes.some(node => Array.isArray(node))) { |
There was a problem hiding this comment.
By the 4th if statement, the nodes have been iterated over up to 4 times. See if you can use a single loop to determine the appropriate return value, i.e. Refactor joinChildren to be O(n) instead of O(4n).
There was a problem hiding this comment.
We can use for loop for this. It doesn't look elegant, but performance is better.
src/util/convertHTMLtoJSON.ts
Outdated
| } | ||
|
|
||
| /** Append children to parent as children property if it's necessary. */ | ||
| const joinChildren = (nodes: (PreBlock[] | PreBlock)[]): PreBlock[] | PreBlock => { |
There was a problem hiding this comment.
Why does joinChildren take PreBlock[] | PreBlock instead of PreBlock[]? Same with the return value.
In general we want to avoid polymorphic behavior in function interfaces, i.e. accepting different types of inputs or returning different types of outputs. There are some exceptions in the codebase such as when functions handle either Path or context, but usually it is undesirable.
There was a problem hiding this comment.
I explain. E.g. such HTML:
<li>a<ul>
<li>x</li>
<li>y</li>
</ul></li>
converts by himalaya to such structure:
[
{
"type": "element",
"tagName": "li",
"attributes": [],
"children": [
{
"type": "text",
"content": "a"
},
{
"type": "element",
"tagName": "ul",
"attributes": [],
"children": [
{
"type": "element",
"tagName": "li",
"attributes": [],
"children": [
{
"type": "text",
"content": "x"
}
]
},
{
"type": "element",
"tagName": "li",
"attributes": [],
"children": [
{
"type": "text",
"content": "y"
}
]
}
]
}
]
}
]
In this JSON, thoughts a and ul are siblings. In convert function i iterate through tree and transform raw himalaya output to PreBlock.
For above output, after mapping and before joinChildren call, we have such data:
[
{
scope: 'a',
children: null,
},
[
{
scope: 'x',
children: null,
},
{
scope: 'y',
children: null
}
]
]
So we have PreBlock (parent) and array of PreBlock (children). Array of PreBlock, which is placed next to PreBlock object in array, is identified as children of this object.
Another example:
[
PreBlock (a),
[
PreBlock (b),
[
PreBlock, PreBlock, PreBlock
] (c),
PreBlock (d)
]
],
results in
- a
- b
- PreBlock, PreBlock, PreBlock (c)
- d
- b
Two or more PreBlocks (not array) together are identified as siblings.
E.g.
[PreBlock, PreBlock, PreBlock, [Preblock, PreBlock]] ->
- PreBlock
- PreBlock
- PreBlock
- [PreBlock, PreBlock]
There was a problem hiding this comment.
Thank you for the good explanation. That helps me understand the himalaya output and the conversion to the nested PreBlock structure.
I have some reservations about the PreBlock structure. In the himalaya output, the node type makes the difference between siblings and children explicit. In the PreBlock structure, this becomes implicit and is determined indirectly by Array.isArray. The notion of "children" is defined in two ways now: either explicitly in the children property or in the array representation. Sometimes separating different layers of functionality makes things easier to understand and more modular, but in this case I think it adds more complexity, due to the unusual data structure.
I will consider next steps and which one of us should undertake them, but feel free to let me if you have any thoughts.
src/util/convertHTMLtoJSON.ts
Outdated
| wasSpan?: boolean, | ||
| hasSpaces?: boolean, |
There was a problem hiding this comment.
Could you explain what PreBlock is for me, and the use of wasSpan and hasSpaces? They seem like very low-level, parsing details, so I'm curious why an intermediate structure must be created for them. Thanks!
There was a problem hiding this comment.
I have simplified it, now it uses only partOfThought. It needs for joining formatting tags.
Example:
For such HTML:
<b>one</b> and <i>two</i>
Himalaya output is
[
{
"type": "element",
"tagName": "b",
"attributes": [],
"children": [
{
"type": "text",
"content": "one"
}
]
},
{
"type": "text",
"content": " and "
},
{
"type": "element",
"tagName": "i",
"attributes": [],
"children": [
{
"type": "text",
"content": "two"
}
]
}
]
Result of conversion to PreBlock:
[
{
scope: "<b>one<b/>",
children: null,
partOfThought: true,
},
{
scope: " and ",
children: null,
},
{
scope: "<i>two<i>",
children: null,
partOfThought: true,
}
]
As i mentioned above, PreBlock elements (not array of PreBlock) are determined as siblings, but in current case we need to merge such elements together, that's why i use partOfThought
There was a problem hiding this comment.
I see. Thank you for the explanation. Maybe we can combine these in an earlier step to avoid the additional complexity of an intermediate step.
06e5813 to
a841875
Compare
raineorshine
left a comment
There was a problem hiding this comment.
Hey Fedor, I reviewed importJSON. Thanks for your work. It is well-organized and the control flow is clear. A few small recommendations, and then the issue of getRankOffset. Let me know if you have any questions. Thanks!
a841875 to
6b0bd33
Compare
src/util/convertHTMLtoJSON.ts
Outdated
| if (node.type === 'element') { | ||
| if (node.tagName === 'br') return false | ||
| return true | ||
| } | ||
| if (node.type === 'comment') return false | ||
| // remove text nodes containing only empty space and new line characters, such nodes could be created by himalaya in case of multi-line html input | ||
| if (node.type === 'text' && isStringIncludesOnly(node.content, ['\n', ' '])) return false | ||
| return node.content.length |
There was a problem hiding this comment.
Please re-write the filter predicate as a single (nested) boolean expression. You can break the boolean onto separate lines to make it easier to read. The current use of if statements and explicit boolean returns does not fit the existing style of the project.
src/util/convertHTMLtoJSON.ts
Outdated
| } | ||
| } | ||
| else return convert(node.children as (Element | Text)[]) | ||
| }).filter(node => node !== undefined) as (PreBlock | PreBlock[])[] |
There was a problem hiding this comment.
Let's use an explicit return null in the appropriate place within the mapping function instead of the implicit undefined. Functions should never return undefined in my opinion unless they are a void function (i.e. always return undefined).
|
Regarding the use of |
Okay, give me some time for coming up with updated algorithm for this and i will tell you my thoughts. If it's okay, i will implement it. |
|
@raineorshine Raine, hi! I have changed ranking algorithm, but tests, based on previous ranking, fail. I changed some of them, but i'm not sure how to implement it in correct way. |
|
You'll want to fix the linter errors. That is preventing the tests from running in the GitHub Action, so I am unable to see which tests are failing right here on the PR.
So much better!
You can do this.The tests are not changing in any fundamental way. Go through the failing tests one at a time. Where is it failing? What is causing it? Is it asserting a rank based on the previous algorithm? If so, change the assertion so it asserts the expected new rank. The tests were all passing before the ranking algorithm change, so we can assume that if the test is failing, it must be solely due to the rank change. The easiest approach will be to test exact ranks. A more flexible approach would only test the minimum requirement: that ranks of siblings are unique and in the correct order. I'm okay with either approach at this point. If you get stuck, link to the test that you are unsure about, explain the problem, and I will offer assistance. |
a645f74 to
ff0e2b6
Compare
|
Raine, hi! I've fixed all failing tests in ff0e2b6, but i believe it's not the best way to do it. I don't like that ranks is hardcoded. |
There was a problem hiding this comment.
Raine, hi! I've fixed all failing tests in ff0e2b6, but i believe it's not the best way to do it. I don't like that ranks is hardcoded.
Good work!
For the purpose of testing, I actually think the hardcoded ranks are good. Most of those functions end up changing the rank (e.g. existingThoughtMove) and we want to make sure it changes exactly as expected. Having it tightly coupled to importText in this small way seems like a reasonable tradeoff to me. The alternative would be to determine the existing rank, then calculate the expected relative rank for the assertion, which I think would make the tests harder to read and more error prone without much benefit.
In 5c09bba i tried to replace hardcoded ranks in
setCursorwithsetCursorFirstMatch. It works forcursorPrev, but doesn't work for others.
See comment:
| ]) | ||
| - b`)) | ||
|
|
||
| setCursorFirstMatch(['b'])(store.getState()) |
There was a problem hiding this comment.
This line won't do anything, since you're not using the return value. setCursorFirstMatch is a reducer, so it returns the new state.
|
Okay, thanks, in this case i can revert 5c09bba. |
I'm pretty sure you will still need to set the cursor in some cases. You just need to do it so it actually changes the state. |
5c09bba to
a3c8bef
Compare
src/util/convertHTMLtoJSON.ts
Outdated
| return Object.assign({}, parent, { children: children.flat() }) as PreBlock | ||
| } | ||
| if (nodes.every(node => Array.isArray(node))) return nodes.flat() | ||
| if (nodes.some(node => Array.isArray(node))) { |
src/util/importJSON.ts
Outdated
| interface RankInfo { | ||
| rank: number, | ||
| deepLevel: number, | ||
| } |
| /** Calculate rankIncrement value based on rank of next sibling or its absence. */ | ||
| const getRankIncrement = (state: State, blocks: Block[], context: Context, destThought: Child, rankStart: number) => { | ||
| const destValue = destThought.value | ||
| const destRank = destThought.rank | ||
| const next = nextSibling(state, destValue, context, destRank) // paste after last child of current thought | ||
| const rankIncrement = next ? (next.rank - rankStart) / (getContextsNum(blocks) || 1) : 1 // prevent divide by zero | ||
| return rankIncrement | ||
| } |
There was a problem hiding this comment.
We may be able to get rid of getRankIncrement and use an increment of 1 since the ranks are relative to their context. The only exception is the starting thought which may be inserted as a sibling.
Let's wait until this PR is merged and we address #832 first though. We can refactor in a separate PR.
|
Hi Fedor. The new ranking calculations look good. There were some comments that were unaddressed. I tagged you on them. I would appreciate more thoroughness in responding to and addressing all requests for changes. You should generally address all comments, and if one has not been addressed, let me know explicitly that you're still working on it. If you're requesting a partial review, let me know that, otherwise I assume it's supposed to be a full review and waste time tagging you on unaddressed comments which I'm not sure if you're aware of or not. Comments that GitHub hides automatically when the file changes are the easiest to lose track of.
What's your status on factoring out Thanks :) |
c90b501 to
c0fc12b
Compare
Yes, i understand. Sorry for such unclearness in review asking. Next time i will directly tell you what part of PR is waiting for review. About PreBlock: |
We want to avoid anything that adds additional iterations. Use the iterations that are already there. More importantly, we want to isolate the formatting tag complexity to a single place and eliminate the The main issue is within With the new implementation, the logic for converting formatting tags into regular This is an important principle for good architecture, so I hope I have explained it well! When I learned this concept my programming skills really leveled up. Your job as a programmer is not to make things work; that's only the most basic step. Your job as a programming is to manage complexity. |
|
Thank you for explanation! I really appreciate it. In my opinion, Maybe it would be better if we handle and reorganize himalaya tree to our desired structure (it's what |
Also fix convertHTMLtoJSON
f7974a0 to
2816cb7
Compare
Great, thank you.
This is acceptable since it is contained within the
That's a good catch, but we need to make sure it works for other uses of I did some widespread refactoring in 4fe2c4b. I recommend looking closely at this commit for areas of improvement. Some of the main issues:
|
a8c546d to
7b3f971
Compare
7b3f971 to
ef2e794
Compare
|
@raineorshine Thanks for review and explanation! I really appreciate it. I have fixed new tests. Syntactically, they are the same. I decided to use small workaround for Workflowy. If the next element after |
...to ensure separate thoughts are created even when there are no line breaks in the HTML.
Excellent! That totally solves it. Thanks so much for all your hard work. This was a challenging task and I appreciate you sticking it out. Hopefully you learned some useful things in the process. |




Fixes #714
One test still fails and code needs to be cleaned.
I'm a little confused with this test and can't figure out what's the problem. Test, i'm talking about, is
merge duplicate with new rankof existingThoughtMove.js. Tree for this test should look like this:I've logged import html structure and it looks like this:
<li>a<ul><li>m<ul><li>x</li></ul></li></ul></li><li>m<ul><li>y</li></ul></li>. If you paste it to em, the tree will be exactly as expected. I would like to clarify one moment:/mthought after importing has rank of 4. I thought that if we don't have any other thoughts, after pasting/athought should have rank 0 and/mshould have rank 1. Currently after importing/ahas rank 1 and/mhas rank 4. Especially interesting that/a/mhas rank 2 and/a/m/xhas rank 3. I left the rankStart logic exactly the same as in dev branchem/src/util/importHTML.ts
Line 66 in 07b1155
/ais calculated by it. I modified code in order to instead of thisget tree like this (we don't have any other thoughts)
But these changes lead to failing a couple of other tests (which were passed before). In addition, in test we have this line
em/src/reducers/__tests__/existingThoughtMove.js
Line 297 in 07b1155
As i understand,
/mis expected to have rank 5. At this point i'm totally confused, and it seems to me that i don't understand something about rank logic. Could you please clarify this moment?