Symbolic Regression Applet

by Hannes Planatscher

An instance of the Symbolic-Regressor-Applet will appear in a Java enabled browser.
To come soon:

To come not very soon:

Tutorial

Symbolic Regregression tries to find a function that fits some input data. Think of linear or logistic regression, where the structure of the target is unknown.

Let's try it together, and you'll understand the basic concepts in a few minutes.

You want to use regression, so you must have some "mysterious" input data like:
input1  input2  input3  output
1	23	42	65
2	24	43	91
3	25	44	119
4	26	45	149
5	27	46	181
6	28	47	215
7	29	48	251
8	30	49	289
9	31	50	329
10	32	51	371
11	33	52	415
12	34	53	461
13	35	54	509
14	36	55	559
15	37	56	611
16	38	57	665
17	39	58	721
18	40	59	779
19	41	60	839
20	42	61	901
21	43	62	965
The first 3 columns are your inputs, the last column is the ouput.
So you want to find a function that describes the relation of the inputs to the outputs:

f(input1,input2,input3) = output

And thats what symbolic regression is supposed to do!

Data

First you have to input your data. Switch to the "Data"-tab of the applet.


Then copy-paste your data (you can use the example-data) in the text-area.



Hint: You can copy the data from an Excel/OOffice-Spreadsheet. The resulting functions are "Excel-compatible".


Parameter Settings

You can change the parameters of the genetic programming system if you want. I don't have the time to go into details, but there are several good tutorials about genetic programming and genetic algorithms.

parameter meaning
population-size The size of the population of individuals. Symbolic Regressor is quite memory efficient, so you can try large population sizes (> 5000).
tournament-size Tournament selection takes tournament-size randomly and selects the best individual.
initialization-depth The initialization depth is depth of the function trees (individuals) in the first generation.
generations How many generations you want Symbolic Regressor to run.
crossover-prob. The probality of crossover. ;)
node-mutation-prob. Symbolic Regressor uses point mutation. If an individual is subject to mutation, evry node in this inidividual gets changed with this probality.
number of constants Symbolic Regressor uses ephemeral constants. This means that pool of constants has a fixed size.

Run the regression

Just click on the large Start-Button on the bottom of the application. :)

Get the results

Select the "formula"-tab. The best found solution will be printed in a "Excel-compatible"-format.
Just copy the formula you like..



..paste it into Excel..



..copy the formula for the following rows ..



.. e voila, and your have fitted your data. Sometimes ;).


The other tabs

There fitness/size plot show how the fitness/size of the best individual changes through the evolution. The pareto-plot plots the size of an individual in relation to archived fitness. This is useful to choose a solution after a few runs.