Essays/Burrows-Wheeler Transform
The Burrows-Wheeler transform is a brilliantly simple way to group together characters in a string, making it more amenable to compression. Crucially, transform is bijective.
The forward transform works by arranging all rotations of the string, sorting the rotations, and taking the last column of the resultant array. Let us follow the example in the paper in J:
S=:'abraca'
N=.#S
(i.N)|."0 1 S
abraca
bracaa
racaab
acaabr
caabra
aabrac
M=./:~ (i.N)|."0 1 S
M
aabrac
abraca
acaabr
bracaa
caabra
racaab
{:"1 M
caraab
We will also need the index of the original string in the sorted array to be able to undo the transform, viz.
I=.M i. S I 1
Putting it all together in a monad:
bw =: monad define
S=.y
N=.#S
M=./:~ (i.N)|."0 1 S
({:"1 M);M i. S
)
bw 'abraca'
┌──────┬─┐
│caraab│1│
└──────┴─┘
; (link) is a J idiom where one would return multiple values as a tuple in other languages.
Let be the transformed string; to undo the transform, begin by forming the vector of indices where the characters of appear in the sorted string. The J implementation offers some clarity:
L=.'caraab' T=.(/:~L) pi L T 4 0 5 1 2 3
where pi is referenced from "progressive index-of."
oc=: [: ((] - {) /:@/:) i.~
pi=: #@[ ({. i.&(,.oc) }.) [ i.,
The paper then defines (somewhat confusingly) to be the image of by permutation according to applied times. Again J offers some clarity; we tabulate the iterated permutations:
N=.#L
1 |. ({/\(T"0 (i.N)))
2 4 3 0 5 1
5 2 1 4 3 0
3 5 0 2 1 4
1 3 4 5 0 2
0 1 2 3 4 5
4 0 5 1 2 3
Extracting the th column (recall is the index returned by bw, defined to be M i. S):
ix=.(I&{)"1 (1 |. {/\(T"0 (i.N)))
ix
4 2 5 3 1 0
which defines a permutation of to recover the original string (backwards and with a little twiddling of the indices):
2 |. |. ix { L
abraca
We can define ix in one step using F:.
ix=.(I&{) F:. { (T"0 i.(N+1))
ix
4 2 5 3 1 0
or (the below avoids copying/filling an array with )
ix=. T (I&{) F:. (] T&{) (i.N)
ix
4 2 5 3 1 0
which extracts the th index as we go.
So we have recovered the original string with and .
Putting it all together in a dyad:
bw_d =: dyad define
L=.x
I=.y
T=.(/:~L) pi L
N=.#L
ix=. T (I&{) F:. (] T&{) (i.N)
2 |. |. ix { L
)
'caraab' bw_d 1
abraca
Contributed by Vanessa McHale, including suggested improvements by Michael Day.