PRO cafe_gafit, env, $ iter, quiet=quiet, selected=selected, cont=cont, $ help=help,shorthelp=shorthelp ;+ ; NAME: ; gafit ; ; PURPOSE: ; perform genetic algorithm fit process of given model to data ; ; CATEGORY: ; cafe ; ; SYNTAX: ; gafit, [,/selected][,/quiet][,/nodata] ; ; INPUTS: ; ; OPTIONS: ; quiet - Do not show fit processing. ; selected - Apply fit to selected data points only. ; nodata - Take also models into account which are in ; groups containing no data. May be usefull ; when building complex models refering one to ; other. ; nosave - Do not save intermediate population ; parameters in separate group. The default is ; false, i.e. save the data. ; cont - Continue with population saved last run (if ; nosave is not set). ; ; SETUP: ; All inputs/options may be set with the "set" command. The ; command prefix is "gafit". ; ; iternum - Number of iterations to perform. Default: ; 100. ; popnum - Size of the population. Default: 100 ; childnum - Number of children of next generation. Must ; be >=2. Default: 3 ; mutrate - Mutation rate. That is the probability that a ; mutation occurs. Default: 0.01 ; selpresaure - Factor defining the selection pressure during ; each run. 0 means no pressure at all ; (selection is done pure randomly and should ; not be used), 1 means equal random/fittness ; selection. Higher values increase influence ; of fittness. Default: 0.5 (more randomness ; but existing selective pressure). ; twinmutrate - Mutation rate to change equal individuums ; with. Default: 0.5 ; ; elitism_num - Number of best-fit individuums to keep and ; not overwrite/mutate. Default: 1. ; ; maxchange - Maximal number chi^2 does not change till ; stop. ; ; testnum - Set this number to values > 1 to run the ; fitting more than once to get more confidence ; of the results. Best fit value will be ; reported. ; iterplot - Plot for each population current plot with ; best fit parameters. This plot might ; represent intermediate results. ; iterplotout - Try to plot out intermediate results. ; ; DESCRIPTION: ; The gafit command performs genetic algorithm fit process ; resembling the one being described in Charbonneau, 1995. ; Here a much simpler verison is used which performs ; following steps: ; 1.) create population consisting of individuums who are ; described by parameters which are the fit parameters ; (the genotype). The phaenotype is the model function ; value which has to be fit to the data (the environment). ; 2.) Each population is randomly separated into two parents ; which are to be maried. ; 3.) Some (>2) children are generated from the parents ; which inherit the parameter from either ; mother/father (randomly). ; 4.) With a certain mutation rate some parameters are ; changed. ; 5.) All individuums of the next generation are tested for ; the environment (=data). Best matching one are kept. ; 6.) Individuums which have the same parameters as other ; will be deposited with an additional mutation. ; 7.) Repeat that till either the best fit value does not ; change for more that certain iterations or chi_red is ; less than 1. ; ; It is important to double-side limit all relevant ; parameters, otherwise an error is reported. Use the ; "limit"/"show,limit" command to do that. ; ; During fitting parameter handling is as follows: ; - if a parameter is frozen (fixed flag = 1) the parameter ; will not be touched while fitting. ; ; - tied parameters are not supported currently. ; ; - groups which contain a model but no valid data points ; are ignored. In case of the selected flag (s.a.) only ; selected data points are taken into account, i.e. if no ; data points are selected the group is ignored. ; ; - groups which contain valid data points but no model are ; ignored. ; ; SIDE EFFECTS: ; Changes parameter values/errors in environment ; according fit result. ; ; REMARK: ; Long lasting fit processes may be interrupted with "Q". ; ; This task may take its cpu! ; ; EXAMPLE: ; ; > model, ... ; > gafit ; -> fit result ; > plot,data+model,res ; ; HISTORY: ; $Id: cafe_gafit.pro,v 1.20 2004/11/01 16:19:29 goehler Exp $ ;- ; ; $Log: cafe_gafit.pro,v $ ; Revision 1.20 2004/11/01 16:19:29 goehler ; added plotout option for movie generation ; ; Revision 1.19 2004/05/27 07:33:38 goehler ; change: do not apply ptr_free to avoid problems in case of a undo command ; which would result in a loss of data. ; Instead of this the garbage collector will be called from exec instead. ; ; Revision 1.18 2004/03/09 10:15:24 goehler ; change: quit meta loop when "Q" is pressed. ; ; Revision 1.17 2004/03/04 13:38:54 goehler ; report intermediate results ; ; Revision 1.16 2004/03/02 17:15:39 goehler ; fix: wrong dimensions for parameters ; ; Revision 1.15 2004/03/02 09:51:25 goehler ; added possibility to tie parameters. Slows down a bit. ; ; Revision 1.14 2004/03/02 08:38:48 goehler ; - fix of elitism-number settings which is now appropriate for all operators ; - added meta loop to run multiple times the ga fit process and keeps the best ; result ; ; Revision 1.13 2004/03/01 17:35:51 goehler ; change: elitism_number defines the number of best fitted individuums which ; are kept without mutation or inheritance but "as is". ; ; Revision 1.12 2004/03/01 15:00:35 goehler ; fix: wrong implementation of selpressure which mixed up indices. ; improved default setting of selpressure=0.5 which is more appropriate in ; some test sessions ; ; Revision 1.11 2004/03/01 10:16:17 goehler ; added selpressure keyword which allows to define the degree of ; selection pressure in repsect of random selection. Still beta state. ; ; Revision 1.10 2004/02/16 08:04:11 goehler ; - fix: store best fit parameters instead of last one. ; - add: save population as multi-dimensional data in result group. ; This allows continuation (cont) and displaying of intermediate results. ; ; Revision 1.9 2004/02/13 17:20:50 goehler ; docu updated ; ; Revision 1.8 2004/02/13 14:00:09 goehler ; added progress report in case of hybrid fitting ; ; Revision 1.7 2004/02/13 13:05:52 goehler ; implemented hybrid fit: it is possible to perform standard fit for certain ; parameters during fitting ; ; Revision 1.6 2004/02/13 10:42:20 goehler ; update: added mutation for siblings. ; ; Revision 1.5 2004/02/13 07:43:13 goehler ; update docu ; ; Revision 1.4 2004/02/13 07:21:39 goehler ; fix: use appropriate size of new population ; ; Revision 1.3 2004/02/12 17:24:38 goehler ; docu/minor iteration number fix ; ; Revision 1.2 2004/02/12 17:14:05 goehler ; fix: do not modify fixed/tied parameters ; ; Revision 1.1 2004/02/12 15:22:01 goehler ; added genetic algorithm fit approach. Still beta, use with care ; ; ; ; ;; command name of this source (needed for automatic help) name="gafit" ;; create environment if not existing: IF n_elements(env) EQ 0 OR NOT PTR_VALID(env) THEN env = cafeenv__define() ;; ------------------------------------------------------------ ;; HELP ;; ------------------------------------------------------------ ;; if help given -> print the specification above (from this file) IF keyword_set(help) THEN BEGIN cafe_help,env, name return ENDIF ;; ------------------------------------------------------------ ;; SHORT HELP ;; ------------------------------------------------------------ IF keyword_set(shorthelp) THEN BEGIN cafereport,env, "fit - perform fit process" return ENDIF ;; ------------------------------------------------------------ ;; SETUP ;; ------------------------------------------------------------ ;; set verbose: cafegetparam,testnum,"gafit:testnum",(*env).setup, 1 ;; set verbose: cafegetparam,quiet,"gafit:quiet",(*env).setup, quiet ;; set selected: cafegetparam,selected,"gafit:selected",(*env).setup, selected ;; set selected: cafegetparam,nodata,"gafit:nodata",(*env).setup, nodata ;; maximal number of generation iterations: cafegetparam,iternum,"gafit:iternum",(*env).setup, 100 ;; number of individuums per population cafegetparam,popnum,"gafit:popnum",(*env).setup, 100 ;; number of children in next generation: cafegetparam,childnum,"gafit:childnum",(*env).setup, 3 ;; mutation rate: cafegetparam,mutrate,"gafit:mutrate",(*env).setup, 0.01 ;; mutation rate for twins to avoid incestuous situations cafegetparam,twinmutrate,"gafit:twinmutrate",(*env).setup, 0.5 ;; number of best fit individuums to keep: cafegetparam,elitism_num,"gafit:elitism_num",(*env).setup, 1 ;; maximum repetitions if nothing improves: cafegetparam,maxchange,"gafit:maxchange",(*env).setup, 20 ;; ranking defines selection pressure (0: none, 1: random=fittness) cafegetparam,selpressure,"gafit:selpressure",(*env).setup, 0.5 ;; set selected flag (*env).fitresult.selected = keyword_set(selected) ;; set result group for population parameters, if desired. IF NOT keyword_set(nosave) THEN cafe_chres,env ;; plot data if set cafegetparam,iterplot,"gafit:iterplot",(*env).setup cafegetparam,iterplotout,"gafit:iterplotout",(*env).setup ;; save data if not set cafegetparam,nosave,"gafit:nosave",(*env).setup,0 ;; continue with former population cafegetparam,cont,"gafit:cont",(*env).setup,cont ;; parameters to change using a standard fit. cafegetparam,fitparam,"gafit:fitparam",(*env).setup ;; ------------------------------------------------------------ ;; DEFINE PARAMETER LIST ;; ------------------------------------------------------------ ;; use all valid parameters which are named (and therefore used in a ;; model) param_index = cafegetvalidparam(env,selected=selected,nodata=nodata) ;; check for valid parameter existence: IF param_index[0] EQ -1 THEN BEGIN cafereport,env, "Error: No model/valid parameter found" return ENDIF ;; set parameter info -> all valid parameters parinfo = (*env).parameter[param_index] ;; index of free parameters: freeind = where(parinfo.fixed EQ 0) ;; number of parameters: parsize = n_elements(freeind) ;; size of next population newpopsize = popnum*childnum/2 +elitism_num ;; ------------------------------------------------------------ ;; CHECK FOR LIMITS ;; ------------------------------------------------------------ IF total(1-parinfo[freeind].limited) NE 0 THEN BEGIN cafereport,env, "Error: Not all parameters are limited" return ENDIF ;; ------------------------------------------------------------ ;; CLEAR ERROR INFORMATION ;; ------------------------------------------------------------ ;; error computed with error becomes invalid: (*env).parameter[*,*,*].errmin = 0. (*env).parameter[*,*,*].errmax = 0. (*env).parameter[*,*,*].errmininfo = 0 (*env).parameter[*,*,*].errmaxinfo = 0 ;; ------------------------------------------------------------ ;; BUILD MEASURE ARRAYS AS NEEDED FOR MPFITFUN ;; ------------------------------------------------------------ ;; dummy startup values: y=0.D0 err=0.D0 FOR group=0, n_elements((*env).groups)-1 DO BEGIN ;; skip groups without model: IF (*env).groups[group].model EQ "" THEN CONTINUE ;; ------------------------------------------------------------ ;; COMPUTE Y/ERROR VALUE FOR EACH SUBGROUP: ;; ------------------------------------------------------------ ;; check all subgroups, build y/error array FOR subgroup = 0, n_elements((*env).groups[group].data)-1 DO BEGIN ;; skip not defined data sets (subgroups) IF NOT PTR_VALID((*env).groups[group].data[subgroup].y) THEN CONTINUE ;; index for defined values: IF keyword_set(selected) THEN BEGIN def_index = where(*(*env).groups[group].data[subgroup].def AND $ *(*env).groups[group].data[subgroup].selected) ENDIF ELSE BEGIN def_index = where(*(*env).groups[group].data[subgroup].def) ENDELSE ;; no index found -> next data set IF def_index[0] EQ -1 THEN CONTINUE y = [y,(*(*env).groups[group].data[subgroup].y)[def_index]] ;; add error if existing IF PTR_VALID((*env).groups[group].data[subgroup].err) THEN BEGIN err = [err,(*(*env).groups[group].data[subgroup].err)[def_index]] ENDIF ELSE BEGIN interr = dblarr(n_elements(def_index)) interr[*] = 1.D0 err = [err,interr] ENDELSE ENDFOR ENDFOR ;; check for degree of freedom, remove first dummy one: IF n_elements(y) GT n_elements(param_index) THEN BEGIN y =y[1:*] err =err[1:*] ENDIF ELSE BEGIN cafereport,env, "Error: Insufficient defined datapoints" return ENDELSE ;; check for proper error values: IF min(err) LE 0 THEN BEGIN cafereport,env, "Error: negative/zero elements in error column" return ENDIF ;; ------------------------------------------------------------ ;; OUTER GA LOOP ;; ------------------------------------------------------------ ;; best chi^2 value obtained in several tests: bestchi2 = -1 ;; the meta loop: FOR try = 0,testnum - 1 DO BEGIN ;; print current loop IF NOT keyword_set(quiet) AND testnum GT 1 THEN BEGIN cafereport,env,"=======================================================" cafereport,env, "= RUN : "+string(try,format="(I5)")+" =" cafereport,env, "=======================================================" ENDIF ;;------------------------------------------------------------- ;; START SETUP ;;------------------------------------------------------------- ;; array containing best fit values fitness = dblarr(newpopsize) ;; last chi^2-value (unset) to determin stop criteria lastchi2 = -1 ;; degree of freedoms DOF = cafegetdof(env,selected=selected) ;; stop flag: stopped =0 ;; counter how many runs chi^2 didn't change nochangecount = 0 ;; number of runs: i = 0l x = 0. ;;------------------------------------------------------------ ;; CREATE START POPULATION ;;------------------------------------------------------------ ;; limit conversions: x=[0..1] -> y=limitscal*x + limitoffset limitscal = parinfo[freeind].limits[1] - parinfo[freeind].limits[0] ; scaling of 0..1 limitoffset = parinfo[freeind].limits[0] ; offset for limits ;; each individuum consists of a set of parameters: IF NOT keyword_set(cont) OR $ NOT ptr_valid((*env).groups[(*env).res_grp].data[0].x) $ OR try GT 0 THEN BEGIN ;; create new population population = randomu(seed,popnum,parsize) ;; adjust limits: FOR j = 0,parsize-1 DO $ population[*,j] = population[*,j]* limitscal[j] $ + limitoffset[j] ;; preset first solution with start values: population[0,*] = parinfo[freeind].value ENDIF ELSE BEGIN ; continue ;; load former population population = *(*env).groups[(*env).res_grp].data[0].x ENDELSE ;; new population array: newpopulation = dblarr(newpopsize,parsize) ;;------------------------------------------------------------ ;; --- POPULATE --- ;;------------------------------------------------------------ REPEAT BEGIN ;;------------------------------------------------------------ ;; BREED NEW POPULATION ;;------------------------------------------------------------ ;; marry individuums randomly: ;; index of fathers: marryindval = randomu(seed,popnum) marryind= sort(marryindval) fathers = population[marryind[0:popnum/2-1],*] mothers = population[marryind[popnum/2:popnum-1],*] ;; create new generation with random parameter selection: FOR child=0,childnum-1 DO BEGIN ;; value defining wether gen (=parameter) comes from mother ;; (>0.5) or father (<=0.5) genval = randomu(seed,popnum/2,parsize) ;; index for father/mother gens: fathergenind = where(genval LE 0.5) mothergenind = where(genval GT 0.5) ;; copulation children = dblarr(popnum/2,parsize) IF mothergenind[0] NE -1 THEN children[mothergenind] = mothers[mothergenind] IF fathergenind[0] NE -1 THEN children[fathergenind] = fathers[fathergenind] ;; populate new newpopulation[elitism_num+child*popnum/2:elitism_num+(child+1)*popnum/2-1,*] $ = children ENDFOR ;; preset empty most fittest individuums when starting (loop 0) newpopulation[0:elitism_num-1,*]= population[0:elitism_num-1,*] ;; crossover of parameters (genotype): ??? how to do that? ;;------------------------------------------------------------ ;; MUTATE ;;------------------------------------------------------------ ;; mutate parameters (genotype): -> randomly change parameters mutindval = randomu(seed,newpopsize,parsize) mutind = where(mutindval LT mutrate $ AND lindgen(newpopsize) GE elitism_num) ; do not mutate first one IF mutind[0] NE -1 THEN BEGIN ;; the overkill: create complete mutation: mutation = randomu(seed,newpopsize,parsize) ;; adjust limits: FOR j = 0,parsize-1 DO $ mutation[*,j] = mutation[*,j]* limitscal[j] $ + limitoffset[j] ;; apply mutation: newpopulation[mutind] = mutation[mutind] ENDIF ;;------------------------------------------------------------ ;; STANDARD FIT FOR DESIRED PARAMETERS ;;------------------------------------------------------------ IF n_elements(fitparam) NE 0 THEN BEGIN saved_param = (*env).parameter fit_param_index = cafeparam(env,fitparam,(*env).def_grp) ;; for each child -> fit: FOR child = 0,newpopsize-1 DO BEGIN ;; set current parameters: (*env).parameter[param_index[freeind]].value = $ reform(newpopulation[child,*]) ;; freeze all (*env).parameter[*].fixed = 1 ;; thaw interesting parameter(s): (*env).parameter[fit_param_index].fixed = 0 ;; ok, here we go cafe_fit,env,/quiet,iterstop=0 ;; progress report IF NOT keyword_set(quiet) AND (child MOD 10) EQ 0 THEN print, '.',format="(A,$)" ;; extract parameter newpopulation[child,*] = (*env).parameter[param_index[freeind]].value ENDFOR ;; restore parameters in environment (*env).parameter=saved_param ENDIF ;;------------------------------------------------------------ ;; TEST FOR FITNESS ;;------------------------------------------------------------ FOR child = 0, newpopsize-1 DO BEGIN ;; update tied parameters: (*env).parameter[param_index[freeind]].value = $ reform(newpopulation[child,*]) cafeupdatetie,env ;; define parameter: param = (*env).parameter[param_index].value ;; store udpated values: newpopulation[child,*] = param[freeind] ;; define fittness = chi^2 value fitness[child] = total((y-cafefitfun(x,param,env=env))^2/err^2) ENDFOR ;; get index of fittness order: rank_ind = sort(fitness) ; minimum ;; define array containing the ranking of each individuum ranking = dblarr(n_elements(rank_ind)) ranking[rank_ind]=dindgen(n_elements(rank_ind)) ;;------------------------------------------------------------ ;; SELECTION ;;------------------------------------------------------------ ;; create rank list of random entries, selpressure = 0: purely ;; random, selpressure=1: equal rank/randomness keepind = sort(randomu(seed,newpopsize) + $ selpressure*ranking/newpopsize) ; test only: ; keepind = rank_ind ;; keep fittest entries (elitism): IF elitism_num GT 0 THEN keepind[0:elitism_num-1] = rank_ind[0:elitism_num-1] ;; sort population according fittness: sort_ind = sort(ranking[keepind[0:popnum-1]]) ;; keep fittest children: population = newpopulation[keepind[sort_ind],*] sortfitness = fitness[keepind[sort_ind]] ; its fitness values (sorted) ; population = newpopulation[keepind[0:popnum-1],*] ;;------------------------------------------------------------ ;; SET/REPORT BEST PARAMETER: ;;------------------------------------------------------------ ;; best values found: parinfo[freeind].value=reform(population[0,*]) ;; store them: (*env).parameter[param_index] = parinfo ;; current best fit value: chi2 = fitness[keepind[sort_ind[0]]] ;; print result IF NOT keyword_set(quiet) THEN BEGIN print,"-----------------------------------------------------" print,"Iteration:",i print,"Chi^2:", chi2, "Chi^2_red:", chi2/DOF,format="(A10,G18.5,X,A10,G18.5)" FOR j=0,n_elements(parinfo)-1 DO $ print,parinfo[j].parname,parinfo[j].value ENDIF ;; plot current result IF NOT keyword_set(nosave) THEN BEGIN ;; mark former values as invalid (freeing will be done with ;; the GC): (*env).groups[(*env).res_grp].data[*].x = ptr_new() (*env).groups[(*env).res_grp].data[*].y = ptr_new() (*env).groups[(*env).res_grp].data[*].err= ptr_new() (*env).groups[(*env).res_grp].data[*].def= ptr_new() ;; save parameters: (*env).groups[(*env).res_grp].data[0].x = $ ptr_new(population) (*env).groups[(*env).res_grp].data[0].y = $ ptr_new(sortfitness) (*env).groups[(*env).res_grp].data[0].def = $ ptr_new(make_array(popnum,/byte,value=1)) (*env).groups[(*env).res_grp].data[0].selected = $ ptr_new(make_array(popnum,/byte,value=0)) ;; data file name: (*env).groups[(*env).res_grp].data[0].file = "-PARAM-" (*env).groups[(*env).res_grp].data[0].title= "PARAM" ENDIF ;; plot current result IF keyword_set(iterplot) THEN BEGIN cafe_plot,env,/quiet ENDIF ;; plot current result IF keyword_set(iterplotout) THEN BEGIN cafe_plotout,env,/append ENDIF ;;------------------------------------------------------------ ;; STOP CRITERIA ;;------------------------------------------------------------ ;; iterate i = i + 1 ;; increase change counter if equal IF lastchi2 EQ chi2 THEN nochangecount = nochangecount + 1 ELSE nochangecount = 0 ;; udpate last chi^2 lastchi2 = chi2 ;; stop if chi_red < 1 IF chi2/DOF LT 1 THEN stopped = 1 ;; stop if more than maxchange no improvement of chi^2 IF nochangecount GE maxchange THEN stopped = 2 ;; stop if maximum iteration timed out: IF i GT iternum THEN stopped = 3 ;; stop if 'q' pressed IF get_kbrd(0) EQ 'Q' THEN stopped = 4 IF stopped THEN break ;;------------------------------------------------------------ ;; MUTATE INCESTUOUS INDIVIDUUMS ;;------------------------------------------------------------ ;; look for uniform individuums, but keep first elite: twin_ind = where(sortfitness eq shift(sortfitness,1) AND indgen(popnum) GE elitism_num) ;; elitism: keep the fittest: ; IF n_elements(twin_ind) GT elitism_num THEN twin_ind = twin_ind[elitism_num:*] ;; number of twins twin_num = n_elements(twin_ind) ;; found -> mutate IF twin_ind[0] NE -1 THEN BEGIN ;; unresort to actual index: ; twin_ind = keepind[twin_ind] ;; mutate parameters (genotype): -> randomly change parameters mutindval = randomu(seed,twin_num,parsize) mutind = where(mutindval LT twinmutrate) IF mutind[0] NE -1 THEN BEGIN ;; the overkill: create complete mutation: mutation = randomu(seed,twin_num,parsize) ;; adjust limits: FOR j = 0,parsize-1 DO $ mutation[*,j] = mutation[*,j] * limitscal[j] $ + limitoffset[j] ;; apply mutation: twinpopulation = population[twin_ind,*] twinpopulation[mutind] = mutation[mutind] population[twin_ind,*]=twinpopulation ENDIF ENDIF ENDREP UNTIL stopped NE 0 ;; save best values for current try run: IF chi2 LT bestchi2 OR bestchi2 LT 0 THEN BEGIN bestparam = parinfo bestchi2 = chi2 ENDIF ;; ------------------------------------------------------------ ;; STORE PARAMETER RESULTS: ;; ------------------------------------------------------------ ;; all values proper: IF total(finite(bestparam.value)) EQ n_elements(bestparam) THEN BEGIN (*env).parameter[param_index] = bestparam ENDIF ELSE BEGIN cafereport,env, "Error: Some parameter values were infinite/not defined:" return ENDELSE ;; ------------------------------------------------------------ ;; SHOW FIT RESULTS: ;; ------------------------------------------------------------ ;; display result: cafe_show,env,"param+result",/transient ;; quit -> actually quit IF stopped EQ 4 THEN break ;; ------------------------------------------------------------ ;; END OUTER GA LOOP ;; ------------------------------------------------------------ ENDFOR RETURN END