Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Challenges

Fast sampling of special binary strings

+1
−1

We are going to define a simple little language. A word in this language is a binary string where the longest run of consecutive $0$s, is shorter than every run of $1$s which is not contained within a larger run (a "maximal" run). So for example: \begin{align} 001110111110011 \end{align} which we will write as \begin{align} 0^21^301^50^21^2 \end{align} is not legal, because the following red segment is just as long as the blue segment: \begin{align} {\color{red}0^2}1^301^50^2{\color{blue}1^2} \end{align} The string: \begin{align} 0^31^401^80^31^5 \end{align} is valid, because the longest run of $0$s has length three and the shortest maximal run of $1$s has length four.

Strings of the form $0^n$ are somewhat ambiguous in the above description. Technically they should be allowed, since there are no runs of $1$s at all, but they seem to in some sense violate the spirit of the description. Whether they are included is up to you, it will not make a huge difference for the challenge.

Challenge

Given a non-negative integer $n$, uniformly sample from the words of length $n$ in the language. Your algorithm must have sub-exponential average-case time complexity in terms of the input value.

You may choose a word using a built-in source of randomness, or you can accept as an additional input a source of randomness (e.g. a infinite list of 1s and 0s, or a stateful blackbox function).

This is code-golf so the goal is to minimize the size of your source code as measured in bytes.

Non-solutions

An easy would-be solution is to sample randomly on binary strings of the correct length until you get something in the language.

We will show that this does not work. This gives a uniform distribution but fails because the average-case time complexity is exponential. On average you will fail exponentially many times before you succeed.

To see this notice that the string $010$ is never a legal substring in our language, regardless of context. So there are at most 7 possible length 3 segments that can extend a valid word at any point.

So the rate at which words in our language grow is $O\left(\sqrt[3]{7}^n\right)$ (we use the $O$-notation here precisely, meaning it is bounded above) while all binary strings grow at a rate of $2^n$. $2 = \sqrt[3]{8}>\sqrt[3]{7}\approx 1.913$.

You might try to generate random strings which don't contain $010$ instead (this can be done uniformly and efficiently), and then reject the ones that don't work. However a general version of this argument can show that any finite set of forbidden substrings will either rule out some words you want to sample (i.e. make the sampling non-uniform) or it will require exponentially many samples (i.e. the base of the exponent is too large).

History

1 comment thread

How about explaining in plain English? (6 comments)

Sign up to answer this question »