Archive

Posts Tagged ‘stringgene’

JGAP: Revisiting the Second Example

August 15, 2009 Leave a comment

Something happened on the way to my next post: I thought I discovered the reason why the Hello World example under JGAP did not work, or at least why it might not have worked (I am not sure if I could characterize it as failing as GP is all about combinations, not directed choice).

There were no mutations in the population! It was kind of like expecting Vanilla ice cream to become French Vanilla without any help. How could I have been so blind?

There was only one problem: I check the source code for the DefaultConfiguration object created by JGAP and it creates a MutationOperator with a default value of 1/12 (or .08). I traced the logic through my HelloWorld object and, lo and behold, the MutationOperator is created and used. The StringGene is told to mutate and it (occasionally) does.

The Watchmaker Framework recommends a mutation value of 0.02-0.05. The JGAP value is higher which means that it should work.

The problem is…it doesn’t. What could be wrong? What appeared to be a straightforward problem has turned into a white sperm whale tasking me.

I wanted to talk about ECJ. Now I have to talk about the Watchmaker Framework and what it does differently than JGAP. I really did not want to do that.I wanted to save it for later.

Now I have to.

The example code found in the Watchmaker Framework (WF) for the Hello World Example is different than the code in the documentation so if you go to the Completing the Jigsaw section the code located there and in the rest of Chapter 2 will not reflect the code in the actual example code shipped with the project. A shame, but nothing we can’t overcome by taking on a sunny disposition and a winning smile.

If you happen to have a favorite between JGAP and WF I apologize. This is liable to get ugly.

Time to compare.

The Fitness Function

The fitness functions are logically the same; check the letters at all positions. The difference: my JGAP fitness value goes up as more letters match; WF’s fitness value goes up as more don’t match. Nothing more to talk about there.

Evolving: Breeding/Crossover and Mutating

In JGAP the evolving population is made up of Chromosomes which are themselves made up of Genes. In the case of the Hello World example I wanted to evolve a string so I created a StringGene and made it part of a Chromosome.

...
    Gene stringGene = new StringGene(conf, MESSAGE.length(),
                                     MESSAGE.length(), "!dehlorw ,");
    IChromosome sampleChromosome = new Chromosome(conf, stringGene, 1);
...

Makes sense. A StringGene is a kind of Gene and a Chromosome is a kind of IChromosome (I know, I know…what the heck is an IChromosome? It is a naming convention that has been in use for many years now and used extensively, but not exclusively, by folks who use Eclipse. For now, let’s just call an IChromosome a Chromosome. Work with me).

In WF the evolving population is whatever type you want it to be. In this case, we want to evolve a String so it becomes the chromosome. No special wrappers here. Nothing wrong with using a StringGene; from an OO perspective it is a perfectly reasonable abstraction. It is just not used in WF. In addition, the population of Strings is created by a factory: the StringFactory which is a type of CandidateFactory. In JGAP I passed a selection of characters into the StringGene constructor, while in WF the selection of characters is passed into the StringFactory constructor (the following is example code shipped with WF).

...
    private static final char[] ALPHABET = new char[27];
    static
    {
        for (char c = 'A'; c <= 'Z'; c++)
        {
            ALPHABET[c - 'A'] = c;
        }
        ALPHABET[26] = ' ';
    }
...
    EvolutionEngine engine = new ConcurrentEvolutionEngine(
                                 new StringFactory(ALPHABET, target.length()),
                                 ...)
...

(Yes, it is different than the WF documentation. I also would not have used a for() to create a static array of 26 letters when actually listing the letters would have worked as well and been more obvious, but that is a different story and it is example code anyway. Jeez, just relax…)

After combing through the JGAP code, and taking a walk with my mom, I realized what was wrong: Watchmaker handles the string as a complete chromosome. I implemented the JGAP code for the string to be one gene!

Of course it didn’t work. The JGAP crossover was doing nothing.

I changed the JGAP example code to:

...
   private static final String ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZ ";
...
   Gene stringGene = new StringGene(conf, 1, 1, ALPHABET);
   IChromosome sampleChromosome = new Chromosome(conf, stringGene, MESSAGE.length());
...

The above configures one StringGene to hold one character and create as many StringGene objects in the Chromosome as the length of the MESSAGE. I changed the list of characters to reflect the data used in the WF example.

In addition, I changed the fitness function to check each individual allele against each individual character of the target string:

...
    protected double evaluate(IChromosome aSubject) {
        int fitnessValue = 0;

        int length = aSubject.size();
        for (int i = 0; i < length; i++) {
            String actual = (String) aSubject.getGene(i).getAllele();
            if (actual.toCharArray()[0] == _expected[i]) {
                fitnessValue += 1;
            }
        }

        return fitnessValue;
    }
...

I also changed the loop that evolves the population to continue until a perfect match is found.

Running the code to use the message “HELLO WORLD” and “ABCDEFGHIJKLMNOPQRSTUVWXYZ ” as the alphabet string I got:

Stopped at generation 123
Total evolution time: 0:0
The best solution has a fitness value of 11
Looking for 'HELLO WORLD':
	 'HELLO WORLD'

Much more reasonable than failure after 1 million tries with a population of 800K.

If I change the message to mixed case and a more complete alphabet:

...
    private static final String MESSAGE = "Hello, world!";
    private static final String ALPHABET =
                           "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ,!";
...
        Gene stringGene = new StringGene(conf, 1, 1, ALPHABET);
...

The results become:

Stopped at generation 233
Total evolution time: 0:0
The best solution has a fitness value of 13
Looking for 'Hello, world!':
	 'Hello, world!'

I slightly modified the WF StringsExample code to accommodate my compulsion desire to avoid using command line args in examples. With an input of “HELLO WORLD” and alphabet of “ABCDEFGHIJKLMNOPQRSTUVWXYZ ” the output was:

...
Generation 46: HELLO WORLD
Evolution result: HELLO WORLD

If I changed the inputs to mixed case:

...
Generation 125: Hello, world!
Evolution result: Hello, world!

WF took fewer generations to find the solution than JGAP. Interesting. Perhaps WF is a better petri dish.

I suspect there is some fine tuning that could be done with JGAP, but works better in this particular WF example.

JGAP is at version 3.4.3. WF is at version 0.6.1. If you would like to see the WF example code please download the package and have a look.

So there you have it.

The reason why it didn’t work with JGAP was operator error.

The cat was dead and I didn’t notice even after opening the box.

I hate when that happens.

The Code

HelloWorld2.java

package hiddenclause.example;

import org.jgap.Chromosome;
import org.jgap.Configuration;
import org.jgap.FitnessFunction;
import org.jgap.Gene;
import org.jgap.Genotype;
import org.jgap.IChromosome;
import org.jgap.InvalidConfigurationException;
import org.jgap.impl.DefaultConfiguration;
import org.jgap.impl.StringGene;

public class HelloWorld2 {

//    private static final String MESSAGE = "HELLO WORLD";
//    private static final String ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZ ";

//    private static final String ALPHABET = "dehlorw ,!";
//    private static final String MESSAGE = "hello, world!";

    private static final String MESSAGE = "Hello, world!";
    private static final String ALPHABET =
                                  "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ,!";

    public static void main(String[] args) throws Exception {
        long startTime = 0;
        long endTime = 0;
        HelloWorld2 hello = new HelloWorld2();

        Genotype population = hello.create(10);

        startTime = System.currentTimeMillis();
        IChromosome solution = null;
        int popIdx = 0;
        do {
            population.evolve();
            popIdx++;
            solution = population.getFittestChromosome();
        } while (solution.getFitnessValue() != MESSAGE.length());
        endTime = System.currentTimeMillis();

        outputSolution(population, startTime, endTime, popIdx);
    }

    private Genotype create(int popSize) throws InvalidConfigurationException {
        Configuration conf = new DefaultConfiguration();
        conf.setPreservFittestIndividual(true);

        FitnessFunction myFunc = new HelloWorldFitnessFunction2(MESSAGE);
        conf.setFitnessFunction(myFunc);

        Gene stringGene = new StringGene(conf, 1, 1, ALPHABET);
        IChromosome sampleChromosome = new Chromosome(conf, stringGene, MESSAGE.length());
        conf.setSampleChromosome(sampleChromosome);
        conf.setPopulationSize(popSize);

        Genotype population;
        population = Genotype.randomInitialGenotype(conf);

        return population;
    }

    private static void outputSolution(Genotype population, long startTime,
            long endTime, int evolutionIdx) {
        System.out.println("Stopped at generation " + evolutionIdx);

        long totalSeconds = (endTime - startTime) / 1000;
        long actualSeconds = totalSeconds % 60;
        long actualMinutes = totalSeconds / 60;
        System.out.println("Total evolution time: " + actualMinutes + ":"
                + actualSeconds);

        IChromosome solution = population.getFittestChromosome();
        int fitnessValue = (int) solution.getFitnessValue();
        System.out.println("The best solution has a fitness value of " + fitnessValue);

        System.out.println("Looking for '" + MESSAGE + "':");
        String message = getMessage(solution);
        System.out.println("\t '" + message + "'");
    }

    private static String getMessage(IChromosome solution) {
        String result = ""; 

        int length = solution.size();
        for (int i = 0; i < length; i++) {
            result += (String) solution.getGene(i).getAllele();
        }

        return result;
    }
}

HelloWorldFitnessFunction2.java

package hiddenclause.example;

import org.jgap.FitnessFunction;
import org.jgap.IChromosome;

public class HelloWorldFitnessFunction2 extends FitnessFunction {

    private char[] _expected;

    public HelloWorldFitnessFunction2(String desiredMessage) {
        _expected = desiredMessage.toCharArray();
    }

    @Override
    protected double evaluate(IChromosome aSubject) {
        int fitnessValue = 0;

        int length = aSubject.size();
        for (int i = 0; i < length; i++) {
            String actual = (String) aSubject.getGene(i).getAllele();
            if (actual.toCharArray()[0] == _expected[i]) {
                fitnessValue += 1;
            }
        }

        return fitnessValue;
    }

}

JGAP: A Second Example and Observations

August 11, 2009 2 comments

In the first chapter of the documentation for the Watchmaker Framework Daniel W. Dyer presents an interesting genetic algorithm example: what if we wanted to evolve the string “HELLO WORLD”?

The fitness function would be straightforward: check every letter of a possible solution and every matching letter at a matching position would raise that chromosome’s fitness value.

I highly recommend you read the chapter referenced above before continuing. It is quite well written and will make the pain I describe below clearer.

My first thought: implementing this in JGAP shouldn’t be that hard.

Implementing it: easy. Getting a good solution: not so much. This is not a problem with JGAP so much as it is a problem with monkeys and Shakespeare. Before I go into that let me present the code.

My target was a little bit more ambitious: instead of hello world in all upper case characters I wanted to evolve the string “hello, world!” all in lower case and containing a space, a comma and an exclamation point.

The fitness function was easy: check every letter and its position and if it matches the target string increment the fitness value by 1. The higher the fitness value the better the match.

Here is my fitness function:

package hiddenclause.example;

import org.jgap.FitnessFunction;
import org.jgap.IChromosome;

public class HelloWorldFitnessFunction extends FitnessFunction {

    private char[] _expected;

    public HelloWorldFitnessFunction(String desiredMessage) {
        _expected = desiredMessage.toCharArray();
    }

    @Override
    protected double evaluate(IChromosome aSubject) {
        int fitnessValue = 0;

        String msg = (String) aSubject.getGene(0).getAllele();
        char [] actual = msg.toCharArray();
        for (int i = 0; i < actual.length; i++) {
            if (actual[i] == _expected[i]) {
                fitnessValue += 1;
            }
        }

        return fitnessValue;
    }

}

Next step: implement the chromosome that will contain the letters that will evolve into the target string.

In order to accomplish the next step I decided to use the JGAP genetic algorithm* class StringGene. I configured it to randomly generate a string of exactly the length desired and use an alphabet of just the letters and punctuation from the target string. Why not add the entire alphabet and punctuation to add to the diversity? Monkeys, remember? More on that later.

The chromosome creation code is:

package hiddenclause.example;

import org.jgap.Chromosome;
import org.jgap.Configuration;
import org.jgap.FitnessFunction;
import org.jgap.Gene;
import org.jgap.Genotype;
import org.jgap.IChromosome;
import org.jgap.InvalidConfigurationException;
import org.jgap.impl.DefaultConfiguration;
import org.jgap.impl.StringGene;

public class HelloWorld {

    private static final String MESSAGE = "hello, world!";
    private static final int MAX_EVOLUTION = 1000000;

    public static void main(String[] args) throws Exception {
        long startTime = 0;
        long endTime = 0;
        HelloWorld hello = new HelloWorld();

        Genotype population = hello.create(1000);

        startTime = System.currentTimeMillis();
        int i = 0;
        for (i = 0; i < MAX_EVOLUTION; i++) {
            population.evolve();
            IChromosome solution = population.getFittestChromosome();
            if (solution.getFitnessValue() == MESSAGE.length()) {
                break;
            }
        }
        endTime = System.currentTimeMillis();

        outputSolution(population, startTime, endTime, i);
    }

    private Genotype create(int popSize) throws InvalidConfigurationException {
        Configuration conf = new DefaultConfiguration();
        conf.setPreservFittestIndividual(true);

        FitnessFunction myFunc = new HelloWorldFitnessFunction(MESSAGE);
        conf.setFitnessFunction(myFunc);

        Gene stringGene = new StringGene(conf, MESSAGE.length(), MESSAGE.length(),
                "!dehlorw ,");
        IChromosome sampleChromosome = new Chromosome(conf, stringGene, 1);
        conf.setSampleChromosome(sampleChromosome);
        conf.setPopulationSize(popSize);

        Genotype population;
        population = Genotype.randomInitialGenotype(conf);

        return population;
    }

    private static void outputSolution(Genotype population, long startTime,
            long endTime, int evolutionIdx) {
        System.out.println("Stopped at generation " + evolutionIdx);

        long totalSeconds = (endTime - startTime) / 1000;
        long actualSeconds = totalSeconds % 60;
        long actualMinutes = totalSeconds / 60;
        System.out.println("Total evolution time: " + actualMinutes + ":"
                + actualSeconds);

        IChromosome solution = population.getFittestChromosome();
        int fitnessValue = (int) solution.getFitnessValue();
        System.out.println("The best solution has a fitness value of " + fitnessValue);

        System.out.println("Looking for '" + MESSAGE + "':");
        String message = (String) solution.getGene(0).getAllele();
        System.out.println("\t '" + message + "'");
    }
}

Notice that the HelloWorld class will handle any kind of incoming string; I just happened to use a hard coded constant. One of its assumptions (euphemism for bug) is that the incoming string and the solution string are the same length. If the two strings are not the same length the loop might try to access part of an array that doesn’t exist and there are very few exceptions that cause as severe a level of mental anguish as the ArrayIndexOutOfBoundsException (however, NullPointerException is right up there).

The above gave the following output after 1 million generations:
Break: Executed evolution 1000000 times.
Total evolution time: 76:45
The best solution has a fitness value of 10
Looking for 'hello, world!':
'!ello,  or!d!'

Not pretty.

What happened?

While the use of crossover should make the evolution progress faster there are certain statistical issues that must be taken into account. For a chromosome to get a high value it has to have 2 things in its favor:

  1. it has to have one or more letters in the new string
  2. the letter(s) have to be in the proper position(s)

This is where the monkeys come in.

The more I thought of it the more I thought about the scenario of an infinite number of monkeys trying to write Shakespeare. To begin their arduous task one or more of the monkeys has to hit the proper first letter. Assuming a keyboard with just the alphabet, and no punctuation, they have a 1 in 52 chance of hitting the right first letter. Shouldn’t be too hard with an infinite number of primates.

The next letter becomes a little harder. Of the subset of monkeys that hit the proper first letter the chances of any of them hitting that second letter has a statistical probability of 1 in 2704. The third: 1 in 140608. The fourth: 1 in 7,311,616. The fifth: 1 in 380,204,032.

And that doesn’t include punctuation and white space.

That mean that if the letter h never makes it to the first character the system will never (never, as in…well, never) evolve the target string. Never. In order to up the statistical odds of my chromosome succeeding I created an alphabet of just the letters that comprise the target string: “!dehlorw ,”.

And yet, with that small set of characters, an ever changing population of 1000 chromosomes at 1 million generations was unable to generate the string. Can it succeed? Of course. The proper population (random generation of strings) in the petri dish that is JGAP would generate the target string. But it is random and random does not mean regular.

So, adding more letters and punctuation adds more noise to the mix and just makes the combinations even worse (though possibly better as more randomly mixed letters could elevate the use of letters from the target string).

When I changed the StringGene’s alphabet to include a few more letters, “!abcdefghlmnorstwxyz ,”, this is what the output was:

Stopped at generation 1000000
Total evolution time: 77:20
The best solution has a fitness value of 7
Looking for 'hello, world!':
'zblxo, ,orrdx'

Defintely not an improvement.

Is there a way of configuring this differently to encourage a solution? Perhaps a better fitness function? Custom gene? Better randomizer? I will gladly take any suggestions and try them out. Maybe.

Exercise for the reader: up the population value to create more opportunities for the desired letters to end up in the desired positions. Bake overnight.

When I increased the population number to 800K and ran it again my Kubuntu box locked up after 2 days because I continued to do things like run Eclipse, read PDFs and back-up my work. I guess 800 trillion iterations of the fitness function was exhausting.

No joy in Mudville.

Update 8/15/09: the problem solved.

* To create a genetic program using JGAP you would create a GPGenotype using a class that inherits from GPProblem and a fitness function that inherits from GPFitnessFunction.