I am currently planning to move some other projects to GitHub.
These are some older projects – which however seemed to be useful to some people.
Most solvers for linear programs are implemented in C/C++ for performance reasons. Java developers can use JNI interfaces for some solvers. However most interfaces are pretty difficult to setup, and lock the developer in to a specific solver. SCPSolver was developed to overcome those issues. The library has the following features:
- easy to deploy on Linux, Mac Os X and Windows with prebuilt linear programming libraries (almost dependency-free)
- plugin-concept: write model, use different solvers to solve it
- low learning curve: very simple, no-frills API
Please go to the projects homepage scpsolver.org for more details.
SmallGP – A simple and lean implementation of symbolic regression
SmallGP is a implementation of Symbolic Regression [1] wich is compact by means of lines of code and memory usage. SmallGP is a commandline-tool. All important parameters can be changed easily.
Please go to the projects homepage smallgp.de for more details.
Display the help with
> smallgp -h
to learn which parameters are available.Example parameters for a test run:
dataset: test.data
population size: 10000
tournament size: 100
maximum generations: 80
maximum prog. length: 50
maximum depth: 7
#constants: 200
Pcross: 0.7
Pmut/Node: 0.1
To start smallgp with these parameters just run
> smallgp -i test.data -p 10000 -t 100 -g 80 -l 50 -d 7 -r 200 -c 0.7 -m 0.1
Representation of individuals
SmallGP uses the concept of genotype-phenotype-duality.
Consider the following tree:
(+)
/ \
x1 (+)
/ \
x2 c2
This can be written in polish notation:
+ x1 + x2 c2
Due limited amount of different symbols it is possible to store this expression in a vector of integer values.
(+)
/ \
x1 (+) => + x1 + x2 c2 => (2,5,2,6,16)
/ \
x2 c2
The integer vector represents the genotype, while the phenotype is a standard binary-tree, that consists of nodes containing information which are holding pointers to their children and their predecessor.
struct node {
struct node *right;
struct node *left;
struct node **bp;
unsigned char no;
};
node->no specifies the primitive function/terminal represented by this node.
It can easily be deducted that the maximum number of different primitives and terminals is limited to 255 with using an 8-bit datatype. This phenotypic representation enables easy implementation of crossover and evaluation.
With 104 bits memory needed per node it isn’t really memory efficient, as every node contains just 8 bit of information, so we use the genotype for storage.
The phenotype is easily converted to the genotype: a vector of 8-bit values.
The genotype is far more memory efficient than the phenotypic representation, but the implementation of subtree crossover and evaluation is less trivial.
SmallGP uses the phenotype-genotype-duality to save memory and to allow easy manipulation. While for evaluation and crossover individuals are temporarily expanded to their phenotype, point mutation can be performed on the genotype.
Crossover
SmallGP implements a size-safe kozastyle subtree-crossover. Subtrees of appropriate size are selected from two individuals, and then exchanged. It is guaranteed that the size constraint is met by both individuals after the
operation.
Suppose that size(tree1) < size(tree2).
tree1 can be extended by (MAXLEN – size(tree1)) nodes. The maximum size of a subtree of tree2 is size(tree2) so we fix a maximum size for the subtree selected from tree2 by a positive random integer value smaller than
((MAX_LEN – size(tree1)) modulo size(tree2)) = maxsize2
After the selection of a subtree subtree_2 with a maximal size maxsize2 from tree2, we can determine maxsize1. tree2 can be extended by (MAXLEN – size(tree2) + size(subtree2)).
As the maximum size of a subtree from tree1 is size(tree1) we choose a positive random integer value not bigger than
((MAX_LEN – size(tree2) + size(subtree2)) modulo size(tree1)) = maxsize1
.
Mutation
SmallGP implements point-mutation, that simply substitutes a terminal with another terminal, and a primitive with another primitive.
Strategy
SmallGP uses a generational strategy, with tournament selection.
Initialization
The tree in the initial population are initialized with a size-safe „Grow“-Method. The probability of choosing a terminal at a certain depth
is obtained as follows:
|primitives| + |variables| + 1
P_term = ——————————
|variables| + 1
Constants are considered as one virtual terminal symbol.
Due to the „Grow“-Method most of the individuals are very small in the initial population.
This, in combination with the previously defined point mutation, can cause lack of structural innovation in some cases.
So an additional parameter P_term_really is introduced that allows to scale between „Grow“ and „Full“ initialization.
A terminal is set with the probability:
P_term‘ = P_term * P_term_really
. With P_term_really = 1 the method equals „Grow“, with P_term_really = 0
„Full“. The default value of Pterm is 1.