Skip to content

Refactor import html#820

Merged
raineorshine merged 34 commits intocybersemics:devfrom
pushkov-fedor:refactor-import-html
Oct 5, 2020
Merged

Refactor import html#820
raineorshine merged 34 commits intocybersemics:devfrom
pushkov-fedor:refactor-import-html

Conversation

@pushkov-fedor
Copy link

Fixes #714

One test still fails and code needs to be cleaned.

I'm a little confused with this test and can't figure out what's the problem. Test, i'm talking about, is merge duplicate with new rank of existingThoughtMove.js. Tree for this test should look like this:

  • a
    • m
      • x
  • m
    • y

I've logged import html structure and it looks like this: <li>a<ul><li>m<ul><li>x</li></ul></li></ul></li><li>m<ul><li>y</li></ul></li>. If you paste it to em, the tree will be exactly as expected. I would like to clarify one moment: /m thought after importing has rank of 4. I thought that if we don't have any other thoughts, after pasting /a thought should have rank 0 and /m should have rank 1. Currently after importing /a has rank 1 and /m has rank 4. Especially interesting that /a/m has rank 2 and /a/m/x has rank 3. I left the rankStart logic exactly the same as in dev branch

const rankStart = getRankAfter(state, thoughtsRanked)
so rank 1 of /a is calculated by it. I modified code in order to instead of this

  • a (1)
    • m (2)
      • x (3)
  • m (4)
    • y (5)

get tree like this (we don't have any other thoughts)

  • a (1)
    • m (0)
      • x (0)
  • m (2)
    • y (0)

But these changes lead to failing a couple of other tests (which were passed before). In addition, in test we have this line

oldPath: [{ value: 'm', rank: 5 }],

As i understand, /m is expected to have rank 5. At this point i'm totally confused, and it seems to me that i don't understand something about rank logic. Could you please clarify this moment?

@raineorshine
Copy link
Contributor

I would like to clarify one moment: /m thought after importing has rank of 4. I thought that if we don't have any other thoughts, after pasting /a thought should have rank 0 and /m should have rank 1. Currently after importing /a has rank 1 and /m has rank 4. Especially interesting that /a/m has rank 2 and /a/m/x has rank 3. I left the rankStart logic exactly the same as in dev branch.

The rank is evaluated relative to its siblings, so the absolute value does not matter. Creating a new thought in an empty context (with newThought) happens to give the thought rank 0, but it could give it any value.

When there are siblings, the new thought will receive a rank that is half-way in between its surrounding siblings (or a simple integer decrement or increment for adding to the beginning or end of the context, respectively). This is done for efficiency; it allows new thoughts to be added at any position without having to modify the existing ranks.

importHTML simply autoincrements the ranks across contexts for efficiency. This results in thoughts within a context having ascending ranks. The start rank does not matter.

From https://github.com/cybersemics/em/wiki/Technical-Overview:

The first (chronological) thought in a context has rank: 0, but the absolute value does not matter. It is only used to compare with other ranks to determine sort order. Ranks can be negative. For example, creating a new thought before { value: 'A', rank: 0 } will yield { value: 'MyThought', rank: -1 }. Creating a new thought in between two thoughts will result in a rank that is numerically halfway between them (e.g. -0.5). The user may change the rank by dragging and dropping the thought.

Hope that helps!

But these changes lead to failing a couple of other tests (which were passed before).

Yes, I do not know the implications of changing the ranking logic in this way. I would recommend preserving it as-is unless you have a good reason to change it.

In addition, in test we have this line

oldPath: [{ value: 'm', rank: 5 }],

As i understand, /m is expected to have rank 5. At this point i'm totally confused, and it seems to me that i don't understand something about rank logic. Could you please clarify this moment?

In this case, the thoughts are imported in this order (shown with ranks):

0: /a
1: /a/m
2: /a/m/x
3: —
4: —
5: /m
6: /m/y

It appears that the rank is unintentionally incremented after popping out of each context. However as described above, this has no effect on the relative ranks within each context.

Hopefully this helps, but if you get stuck and want me to evaluate why merge duplicate with new rank is failing, just let me know!

@pushkov-fedor
Copy link
Author

Thanks you for detailed explanation! I've read Technical Overview and understand that absolute value of rank doesn't matter in general.
But i thought that for empty database creating this tree iterative by hand or by importing html should lead to the same rank result.

  • a (0)
    • m (0)
    • x (0)
  • m (1)
    • y (0)

It appears that the rank is unintentionally incremented after popping out of each context. However as described above, this has no effect on the relative ranks within each context.

This explains my problem! Let me know please what's the reason for this decision? Because in Technical Overview says that

The first (chronological) thought in a context has rank: 0, but the absolute value does not matter

Considering this, the first thought in a context (if context doesn't have any thoughts) should have rank 0. But after importing we get this:

0: /a
1: /a/m
2: /a/m/x
3: —
4: —
5: /m
6: /m/y

Should we set ranks this way after importing? Correct me please, if i'm wrong somewhere.

@raineorshine
Copy link
Contributor

But i thought that for empty database creating this tree iterative by hand or by importing html should lead to the same rank result.

No, that's not the case.

This explains my problem! Let me know please what's the reason for this decision?

It seems to be just an outcome of the current importHTML implementation. It is efficient to simply autoincrement across multiple contexts.

Technical Overview says that

The first (chronological) thought in a context has rank: 0, but the absolute value does not matter

I will update the Technical Overview! That is misleading since it does not apply to importHTML.

Should we set ranks this way after importing? Correct me please, if i'm wrong somewhere.

The first thought in a context does not need to have rank 0. I'm not going to introduce a constraint that is not needed. Relative ranks are perfectly functional and actually more efficient.

@pushkov-fedor pushkov-fedor force-pushed the refactor-import-html branch 2 times, most recently from 425ca52 to 11543b3 Compare August 7, 2020 12:31
@pushkov-fedor pushkov-fedor force-pushed the refactor-import-html branch 2 times, most recently from 0563658 to 06e5813 Compare August 21, 2020 12:26
Copy link
Contributor

@raineorshine raineorshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Fedor, nice work. I appreciate the organization and readability of the code.

I have done an initial review of convertHTMLtoJSON. I haven't reviewed importJSON yet, but I figured I would send you what I have so far so you can address the questions and suggestions. Thanks!

return Object.assign({}, parent, { children: children.flat() }) as PreBlock
}
if (nodes.every(node => Array.isArray(node))) return nodes.flat()
if (nodes.some(node => Array.isArray(node))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the 4th if statement, the nodes have been iterated over up to 4 times. See if you can use a single loop to determine the appropriate return value, i.e. Refactor joinChildren to be O(n) instead of O(4n).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use for loop for this. It doesn't look elegant, but performance is better.

}

/** Append children to parent as children property if it's necessary. */
const joinChildren = (nodes: (PreBlock[] | PreBlock)[]): PreBlock[] | PreBlock => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does joinChildren take PreBlock[] | PreBlock instead of PreBlock[]? Same with the return value.

In general we want to avoid polymorphic behavior in function interfaces, i.e. accepting different types of inputs or returning different types of outputs. There are some exceptions in the codebase such as when functions handle either Path or context, but usually it is undesirable.

Copy link
Author

@pushkov-fedor pushkov-fedor Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explain. E.g. such HTML:

<li>a<ul>
  <li>x</li>
  <li>y</li>
</ul></li>

converts by himalaya to such structure:

[
  {
    "type": "element",
    "tagName": "li",
    "attributes": [],
    "children": [
      {
        "type": "text",
        "content": "a"
      },
      {
        "type": "element",
        "tagName": "ul",
        "attributes": [],
        "children": [
          {
            "type": "element",
            "tagName": "li",
            "attributes": [],
            "children": [
              {
                "type": "text",
                "content": "x"
              }
            ]
          },
          {
            "type": "element",
            "tagName": "li",
            "attributes": [],
            "children": [
              {
                "type": "text",
                "content": "y"
              }
            ]
          }
        ]
      }
    ]
  }
]

In this JSON, thoughts a and ul are siblings. In convert function i iterate through tree and transform raw himalaya output to PreBlock.
For above output, after mapping and before joinChildren call, we have such data:

[
  {
    scope: 'a',
    children: null,
  },
  [
    {
      scope: 'x',
      children: null,
    },
    {
      scope: 'y',
      children: null
    }
  ]
]

So we have PreBlock (parent) and array of PreBlock (children). Array of PreBlock, which is placed next to PreBlock object in array, is identified as children of this object.
Another example:

[
  PreBlock (a), 
  [
    PreBlock (b), 
    [
      PreBlock, PreBlock, PreBlock
    ] (c), 
    PreBlock (d)
  ]
], 

results in

  • a
    • b
      • PreBlock, PreBlock, PreBlock (c)
    • d

Two or more PreBlocks (not array) together are identified as siblings.
E.g.
[PreBlock, PreBlock, PreBlock, [Preblock, PreBlock]] ->

- PreBlock
- PreBlock
- PreBlock
  - [PreBlock, PreBlock]

Copy link
Contributor

@raineorshine raineorshine Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the good explanation. That helps me understand the himalaya output and the conversion to the nested PreBlock structure.

I have some reservations about the PreBlock structure. In the himalaya output, the node type makes the difference between siblings and children explicit. In the PreBlock structure, this becomes implicit and is determined indirectly by Array.isArray. The notion of "children" is defined in two ways now: either explicitly in the children property or in the array representation. Sometimes separating different layers of functionality makes things easier to understand and more modular, but in this case I think it adds more complexity, due to the unusual data structure.

I will consider next steps and which one of us should undertake them, but feel free to let me if you have any thoughts.

Comment on lines +6 to +7
wasSpan?: boolean,
hasSpaces?: boolean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what PreBlock is for me, and the use of wasSpan and hasSpaces? They seem like very low-level, parsing details, so I'm curious why an intermediate structure must be created for them. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have simplified it, now it uses only partOfThought. It needs for joining formatting tags.
Example:
For such HTML:
<b>one</b> and <i>two</i>
Himalaya output is

[
  {
    "type": "element",
    "tagName": "b",
    "attributes": [],
    "children": [
      {
        "type": "text",
        "content": "one"
      }
    ]
  },
  {
    "type": "text",
    "content": " and "
  },
  {
    "type": "element",
    "tagName": "i",
    "attributes": [],
    "children": [
      {
        "type": "text",
        "content": "two"
      }
    ]
  }
]

Result of conversion to PreBlock:

[
  {
    scope: "<b>one<b/>",
    children: null,
    partOfThought: true,
  },
  {
    scope: " and ",
    children: null,
  },
  {
    scope: "<i>two<i>",
    children: null,
    partOfThought: true,
  }
]

As i mentioned above, PreBlock elements (not array of PreBlock) are determined as siblings, but in current case we need to merge such elements together, that's why i use partOfThought

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thank you for the explanation. Maybe we can combine these in an earlier step to avoid the additional complexity of an intermediate step.

Copy link
Contributor

@raineorshine raineorshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Fedor, I reviewed importJSON. Thanks for your work. It is well-organized and the control flow is clear. A few small recommendations, and then the issue of getRankOffset. Let me know if you have any questions. Thanks!

Comment on lines +28 to +35
if (node.type === 'element') {
if (node.tagName === 'br') return false
return true
}
if (node.type === 'comment') return false
// remove text nodes containing only empty space and new line characters, such nodes could be created by himalaya in case of multi-line html input
if (node.type === 'text' && isStringIncludesOnly(node.content, ['\n', ' '])) return false
return node.content.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-write the filter predicate as a single (nested) boolean expression. You can break the boolean onto separate lines to make it easier to read. The current use of if statements and explicit boolean returns does not fit the existing style of the project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}
else return convert(node.children as (Element | Text)[])
}).filter(node => node !== undefined) as (PreBlock | PreBlock[])[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use an explicit return null in the appropriate place within the mapping function instead of the implicit undefined. Functions should never return undefined in my opinion unless they are a void function (i.e. always return undefined).

@raineorshine
Copy link
Contributor

Regarding the use of PreBlock, I would like to try to only use Block and remove partOfThought by joining the children earlier in the pipeline. This will reduce the complexity of the intermediary structure. I could be wrong, but I think this is doable. Is this something you can think you can do?

@pushkov-fedor
Copy link
Author

Regarding the use of PreBlock, I would like to try to only use Block and remove partOfThought by joining the children earlier in the pipeline. This will reduce the complexity of the intermediary structure. I could be wrong, but I think this is doable. Is this something you can think you can do?

Okay, give me some time for coming up with updated algorithm for this and i will tell you my thoughts. If it's okay, i will implement it.

@pushkov-fedor
Copy link
Author

@raineorshine Raine, hi! I have changed ranking algorithm, but tests, based on previous ranking, fail. I changed some of them, but i'm not sure how to implement it in correct way.

@raineorshine
Copy link
Contributor

You'll want to fix the linter errors. That is preventing the tests from running in the GitHub Action, so I am unable to see which tests are failing right here on the PR.

I have changed ranking algorithm

So much better!

I changed some of them, but i'm not sure how to implement it in correct way.

You can do this.The tests are not changing in any fundamental way. Go through the failing tests one at a time. Where is it failing? What is causing it? Is it asserting a rank based on the previous algorithm? If so, change the assertion so it asserts the expected new rank. The tests were all passing before the ranking algorithm change, so we can assume that if the test is failing, it must be solely due to the rank change.

The easiest approach will be to test exact ranks. A more flexible approach would only test the minimum requirement: that ranks of siblings are unique and in the correct order. I'm okay with either approach at this point.

If you get stuck, link to the test that you are unsure about, explain the problem, and I will offer assistance.

@pushkov-fedor
Copy link
Author

Raine, hi! I've fixed all failing tests in ff0e2b6, but i believe it's not the best way to do it. I don't like that ranks is hardcoded.
In 5c09bba i tried to replace hardcoded ranks in setCursor with setCursorFirstMatch. It works for cursorPrev, but doesn't work for others. Maybe the reason for this behavior is shortcuts execution in these tests. What do you think?
Lint check is passed so i hope you can see them.

Copy link
Contributor

@raineorshine raineorshine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raine, hi! I've fixed all failing tests in ff0e2b6, but i believe it's not the best way to do it. I don't like that ranks is hardcoded.

Good work!

For the purpose of testing, I actually think the hardcoded ranks are good. Most of those functions end up changing the rank (e.g. existingThoughtMove) and we want to make sure it changes exactly as expected. Having it tightly coupled to importText in this small way seems like a reasonable tradeoff to me. The alternative would be to determine the existing rank, then calculate the expected relative rank for the assertion, which I think would make the tests harder to read and more error prone without much benefit.

In 5c09bba i tried to replace hardcoded ranks in setCursor with setCursorFirstMatch. It works for cursorPrev, but doesn't work for others.

See comment:

])
- b`))

setCursorFirstMatch(['b'])(store.getState())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line won't do anything, since you're not using the return value. setCursorFirstMatch is a reducer, so it returns the new state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pushkov-fedor
Copy link
Author

Okay, thanks, in this case i can revert 5c09bba.

@raineorshine
Copy link
Contributor

Okay, thanks, in this case i can revert 5c09bba.

I'm pretty sure you will still need to set the cursor in some cases. You just need to do it so it actually changes the state.

return Object.assign({}, parent, { children: children.flat() }) as PreBlock
}
if (nodes.every(node => Array.isArray(node))) return nodes.flat()
if (nodes.some(node => Array.isArray(node))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +30 to +33
interface RankInfo {
rank: number,
deepLevel: number,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer used

Comment on lines +72 to +73
/** Calculate rankIncrement value based on rank of next sibling or its absence. */
const getRankIncrement = (state: State, blocks: Block[], context: Context, destThought: Child, rankStart: number) => {
const destValue = destThought.value
const destRank = destThought.rank
const next = nextSibling(state, destValue, context, destRank) // paste after last child of current thought
const rankIncrement = next ? (next.rank - rankStart) / (getContextsNum(blocks) || 1) : 1 // prevent divide by zero
return rankIncrement
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may be able to get rid of getRankIncrement and use an increment of 1 since the ranks are relative to their context. The only exception is the starting thought which may be inserted as a sibling.

Let's wait until this PR is merged and we address #832 first though. We can refactor in a separate PR.

@raineorshine
Copy link
Contributor

Hi Fedor. The new ranking calculations look good.

There were some comments that were unaddressed. I tagged you on them. I would appreciate more thoroughness in responding to and addressing all requests for changes. You should generally address all comments, and if one has not been addressed, let me know explicitly that you're still working on it. If you're requesting a partial review, let me know that, otherwise I assume it's supposed to be a full review and waste time tagging you on unaddressed comments which I'm not sure if you're aware of or not.

Comments that GitHub hides automatically when the file changes are the easiest to lose track of.

Regarding the use of PreBlock, I would like to try to only use Block and remove partOfThought by joining the children earlier in the pipeline. This will reduce the complexity of the intermediary structure. I could be wrong, but I think this is doable. Is this something you can think you can do?

Okay, give me some time for coming up with updated algorithm for this and i will tell you my thoughts. If it's okay, i will implement it.

What's your status on factoring out PreBlock? I think it's the last big piece.

Thanks :)

@pushkov-fedor
Copy link
Author

pushkov-fedor commented Sep 14, 2020

Hi Fedor. The new ranking calculations look good.

There were some comments that were unaddressed. I tagged you on them. I would appreciate more thoroughness in responding to and addressing all requests for changes. You should generally address all comments, and if one has not been addressed, let me know explicitly that you're still working on it. If you're requesting a partial review, let me know that, otherwise I assume it's supposed to be a full review and waste time tagging you on unaddressed comments which I'm not sure if you're aware of or not.

Comments that GitHub hides automatically when the file changes are the easiest to lose track of.

Regarding the use of PreBlock, I would like to try to only use Block and remove partOfThought by joining the children earlier in the pipeline. This will reduce the complexity of the intermediary structure. I could be wrong, but I think this is doable. Is this something you can think you can do?

Okay, give me some time for coming up with updated algorithm for this and i will tell you my thoughts. If it's okay, i will implement it.

What's your status on factoring out PreBlock? I think it's the last big piece.

Thanks :)

Yes, i understand. Sorry for such unclearness in review asking. Next time i will directly tell you what part of PR is waiting for review.

About PreBlock:
We have such line:
const preBlocks = convert(removeEmptyNodesAndComments(nodes)). Here i clear himalaya output from empty nodes, and then pass it to convert function. I think we can write additional function, which would be called before convert execution. In this function i will iterate through clear nodes tree and convert formatting tags or group of formatting tags into single 'text' node. Then formatted tree would be passed to convert function, where we will simply convert it to JSON. What do you think?
It would look something like this:
const preBlocks = convert(handleFormattingTags(removeEmptyNodesAndComments(nodes)))

@raineorshine raineorshine mentioned this pull request Sep 15, 2020
@raineorshine
Copy link
Contributor

About PreBlock:
We have such line:
const preBlocks = convert(removeEmptyNodesAndComments(nodes)). Here i clear himalaya output from empty nodes, and then pass it to convert function. I think we can write additional function, which would be called before convert execution. In this function i will iterate through clear nodes tree and convert formatting tags or group of formatting tags into single 'text' node. Then formatted tree would be passed to convert function, where we will simply convert it to JSON. What do you think?
It would look something like this:
const preBlocks = convert(handleFormattingTags(removeEmptyNodesAndComments(nodes)))

We want to avoid anything that adds additional iterations. Use the iterations that are already there. More importantly, we want to isolate the formatting tag complexity to a single place and eliminate the PreBlock intermediary structure.

The main issue is within convert. It relies on the intermediary PreBlock structure which adds additional complexity that is not needed. Think about it: You are doing work to generate the PreBlocks, then doing work to turn them into Blocks. Why not make Blocks to begin with? The idea is to convert formatting tags or groups of formatting tags into a proper Block the first time they are encountered—at convertFormattingTags. Rather than generating PreBlocks, consolidate the nodes into Blocks right there.

With the new implementation, the logic for converting formatting tags into regular Blocks is still there, but it is limited to a function call at a single point within the himalaya conversion. When you convert it to PreBlock early and convert back to Block later, then everything in between has to "know" about PreBlock when it really doesn't have to. PreBlock spreads that one piece of logic around throughout many more functions. It would be better to isolate it to as small a space as possible. Does that make sense?

This is an important principle for good architecture, so I hope I have explained it well! When I learned this concept my programming skills really leveled up. Your job as a programmer is not to make things work; that's only the most basic step. Your job as a programming is to manage complexity.

@pushkov-fedor
Copy link
Author

Thank you for explanation! I really appreciate it.

In my opinion, convertFormattingTags in current implementation isn't correct place to do it. It works with single Element (i,b,span). In some cases we need to merge formatting tags in a single element. e.g. a<span>fter</span>word. So we need information not only about current element, but about its siblings too. Then after merging the whole group of elements should be replaced with single element. In current implementation in convert function we retrieve information from himalaya nodes, converts nodes to JSON, based on their type and tags, and then reorganize Block tree in joinChildren. Original himalaya tree structure differs from desired Block structure.

Maybe it would be better if we handle and reorganize himalaya tree to our desired structure (it's what joinChildren function does now) before convert call, and on this step we handle formatting tags and merge them if it's necessary. And after that we simply iterate through reorganized tree and replace himalaya nodes with Block (what convert function does now). It sounds more logical.

@pushkov-fedor
Copy link
Author

Raine, hi! I have encountered some problems with Workflowy and its html structure.
Screenshot from 2020-09-30 18-13-25

It slightly differs from previous html structure in test, because it has br tag, which separate a parent tag from its children (all tags after br). Also we have ul tag after Note span.
Usually ul tag determines children of its previous sibling.
For example:
Screenshot from 2020-09-30 18-13-55

But it doesn't work the same way with Workflowy, so i should also handle this situation. Now everything works.
Also i simplified joinChildren logic and removed additional loops.
But we need 2 loops at the beginning of convert function in order to determine whether nodes include formatting tags (which should be joined) or br tag.

@raineorshine
Copy link
Contributor

i simplified joinChildren logic and removed additional loops.

Great, thank you.

But we need 2 loops at the beginning of convert function in order to determine whether nodes include formatting tags (which should be joined) or br tag.

This is acceptable since it is contained within the convert function and only applies to the current level, as opposed to a full traversal.

But it doesn't work the same way with Workflowy, so i should also handle this situation. Now everything works.

That's a good catch, but we need to make sure it works for other uses of <br>. <br> tags will not always be accompanied by a <ul>. We should insert multiple thoughts for text separated by <br> tags. I added the missing tests for this case. It would be great if you could update the code to get the tests passing.

I did some widespread refactoring in 4fe2c4b. I recommend looking closely at this commit for areas of improvement. Some of the main issues:

  • Overuse of explicit type casts
  • Simplify logic in multiple places
  • Move helper functions outside convertHTMLtoJSON function scope, since they are encapsulated by the module scope already.
  • Rename functions that start with convert since it is doesn't add any meaning (i.e. all functions "convert" something to something else; that's essentially the definition of a function).
  • Consolidate return statements (especially in the convert function)

@pushkov-fedor
Copy link
Author

@raineorshine Thanks for review and explanation! I really appreciate it. I have fixed new tests.
I would like to discuss my solution. Now we have 2 different conversion for br tag.
The first one is Workflowy:
Screenshot from 2020-10-05 15-26-22
br here is used for saving all tags after br as children.
The second one is tests that you've added.
Screenshot from 2020-10-05 15-27-51
br here is used as simply new line indicator.

Syntactically, they are the same. I decided to use small workaround for Workflowy. If the next element after br is workflowy span, i convert all nodes after br as children. In the another case, i filter all br, because they're unnecessary. What do you think about it?

@raineorshine
Copy link
Contributor

Syntactically, they are the same. I decided to use small workaround for Workflowy. If the next element after br is workflowy span, i convert all nodes after br as children. In the another case, i filter all br, because they're unnecessary. What do you think about it?

Excellent! That totally solves it.

Thanks so much for all your hard work. This was a challenging task and I appreciate you sticking it out. Hopefully you learned some useful things in the process.

@raineorshine raineorshine merged commit 4ffdfb2 into cybersemics:dev Oct 5, 2020
anmolarora1 pushed a commit to anmolarora1/em that referenced this pull request Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor importHTML

2 participants