Symbolic Regression Applet
by Hannes Planatscher
An instance of the Symbolic-Regressor-Applet will appear in a Java enabled browser.
To come soon:
To come not very soon:
- print solutions in multiple code formats (spreadsheet, Matlab, Java, etc.)
- automatic simplification of solutions (intron splicing, constant tree aggregation,..)
- improved plot functions
- more flexible Genetic Programming engine (more functions, constants etc..)
Symbolic Regregression tries to find a function that fits some input data.
Think of linear or logistic regression, where the structure of the target is unknown.
Let's try it together, and you'll understand the basic concepts in a few minutes.
You want to use regression, so you must have some "mysterious" input data like:
input1 input2 input3 output
1 23 42 65
2 24 43 91
3 25 44 119
4 26 45 149
5 27 46 181
6 28 47 215
7 29 48 251
8 30 49 289
9 31 50 329
10 32 51 371
11 33 52 415
12 34 53 461
13 35 54 509
14 36 55 559
15 37 56 611
16 38 57 665
17 39 58 721
18 40 59 779
19 41 60 839
20 42 61 901
21 43 62 965
The first 3 columns are your inputs, the last column is the ouput.
So you want to find a function that describes the relation of the inputs to the outputs:
f(input1,input2,input3) = output
And thats what symbolic regression is supposed to do!
First you have to input your data. Switch to the "Data"-tab of the applet.
Then copy-paste your data (you can use the example-data) in the text-area.
Hint: You can copy the data from an Excel/OOffice-Spreadsheet. The resulting functions are "Excel-compatible".
You can change the parameters of the genetic programming system if you want.
I don't have the time to go into details, but there are several good tutorials about genetic programming and genetic algorithms.
The size of the population of individuals. Symbolic Regressor is quite memory efficient, so you can try large population sizes (> 5000).
Tournament selection takes tournament-size randomly and selects the best individual.
The initialization depth is depth of the function trees (individuals) in the first generation.
How many generations you want Symbolic Regressor to run.
The probality of crossover. ;)
Symbolic Regressor uses point mutation. If an individual is subject to mutation, evry node in this inidividual gets changed with this probality.
|number of constants
Symbolic Regressor uses ephemeral constants. This means that pool of constants has a fixed size.
Run the regression
Just click on the large Start-Button on the bottom of the application. :)
Get the results
Select the "formula"-tab. The best found solution will be printed in a "Excel-compatible"-format.
Just copy the formula you like..
..paste it into Excel..
..copy the formula for the following rows ..
.. e voila, and your have fitted your data. Sometimes ;).
The other tabs
There fitness/size plot show how the fitness/size of the best individual changes through the evolution.
The pareto-plot plots the size of an individual in relation to archived fitness.
This is useful to choose a solution after a few runs.