10

I'm trying to remove non-printable characters from a string in Golang.

https://play.golang.org/p/Touihf5-hGH

invisibleChars := "Douglas​"
fmt.Println(invisibleChars)
fmt.Println(len(invisibleChars))

normal := "Douglas"
fmt.Println(normal)
fmt.Println(len(normal))

Output:

Douglas​
10
Douglas
7

The first string has an invisible char at the end.

I've tried to replace non-ASCII characters, but it removes accents too.

How can I remove non-printable characters only?

1

3 Answers 3

29

Foreword: I released this utility in my github.com/icza/gox library, see stringsx.Clean().


You could remove runes where unicode.IsGraphic() or unicode.IsPrint() reports false. To remove certain runes from a string, you may use strings.Map().

For example:

invisibleChars := "Douglas​"
fmt.Printf("%q\n", invisibleChars)
fmt.Println(len(invisibleChars))

clean := strings.Map(func(r rune) rune {
    if unicode.IsGraphic(r) {
        return r
    }
    return -1
}, invisibleChars)

fmt.Printf("%q\n", clean)
fmt.Println(len(clean))

clean = strings.Map(func(r rune) rune {
    if unicode.IsPrint(r) {
        return r
    }
    return -1
}, invisibleChars)

fmt.Printf("%q\n", clean)
fmt.Println(len(clean))

This outputs (try it on the Go Playground):

"Douglas\u200b"
10
"Douglas"
7
"Douglas"
7
Sign up to request clarification or add additional context in comments.

Comments

15
invisibleChars = strings.TrimFunc(invisibleChars, func(r rune) bool {
        return !unicode.IsGraphic(r)
    })

Go Playground: https://play.golang.org/p/39yWgnnRPXr

Comments

4

Just F.Y.I.,

I often use strings.TrimFunc, but I have found that strings.Map() detects invisible chars better than strings.TrimFunc.

strings.TrimFunc can not detect if the input chars are "Douglas\u200b" + "bar". The following example fails if followed by "bar". The result becomes 13 rather than 10.

func ExampleTrimFunc() {
    invisibleChars := "Douglas\u200b" + "bar"
    invisibleChars = strings.TrimFunc(invisibleChars, func(r rune) bool {
        return !unicode.IsGraphic(r)
    })

    fmt.Println(invisibleChars)
    fmt.Println(len(invisibleChars))

    normal := "Douglasbar"
    fmt.Println(normal)
    fmt.Println(len(normal))

    // Output:
    // Douglasbar
    // 10
    // Douglasbar
    // 10
}

However, using strings.Map() as follows is successful.

 func ExampleTrimFunc() {
    invisibleChars := "Douglas\u200b" + "bar"
-   invisibleChars = strings.TrimFunc(invisibleChars, func(r rune) bool {
-       return !unicode.IsGraphic(r)
-   })
+   invisibleChars = strings.Map(func(r rune) rune {
+       if unicode.IsGraphic(r) {
+           return r
+       }
+       return -1
+   }, invisibleChars)
 
    fmt.Println(invisibleChars)
    fmt.Println(len(invisibleChars))
 
    normal := "Douglasbar"
    fmt.Println(normal)
    fmt.Println(len(normal))
 
    // Output:
    // Douglasbar
    // 10
    // Douglasbar
    // 10
 }

1 Comment

That makes sense, as the "Trim" functions are only intended to remove "leading and trailing Unicode code points" that match the cut-set.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.