Skip to content

feat: fullwidth characters support #460

@weii41392

Description

@weii41392

Currently the parser can recognize opening parentheses and closing parentheses and exclude closing parentheses when appropriate, while we don't have the same behavior with fullwidth characters. See this example:

import { tokenize } from "linkifyjs";

const links = [
    "http://foo.com/blah_blah",
    "http://foo.com/blah_blah_(wikipedia)_(again)"
];

const texts = [
    `${links[0]} ${links[1]}`,
    `Link 1(${links[0]}) Link 2(${links[1]})`,      // halfwidth parentheses
    `Link 1(${links[0]}) Link 2(${links[1]})`,   // fullwidth parentheses
];

for (const text of texts) {
    const tokens = tokenize(text);
    tokens.filter(token => token.isLink).forEach((token) => console.log(`"${token.v}"`));
}

// texts[0]: succeed without parentheses
// "http://foo.com/blah_blah"
// "http://foo.com/blah_blah_(wikipedia)_(again)"

// texts[1]: succeed with halfwidth parentheses
// "http://foo.com/blah_blah"
// "http://foo.com/blah_blah_(wikipedia)_(again)"

// texts[2]: fail to handle fullwidth parentheses
// "http://foo.com/blah_blah)"
// "http://foo.com/blah_blah_(wikipedia)_(again))"

My proposal is to define fullwidth characters as tokens, and add new behaviors in the parser.
The logic should be fairly simple as fullwidth brackets are semantically the same as their halfwidth counterparts.
(In our use case we care more about fullwidth parentheses (), but in general this can apply to other fullwidth characters, e.g. 「」『』<>.)

Metadata

Metadata

Assignees

Labels

parsingRelated to string parsing

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions