Reinforcement learning without ARRAYS is horrible
Forums › ProRealTime English forum › ProOrder support › Reinforcement learning without ARRAYS is horrible
- This topic has 5 replies, 3 voices, and was last updated 7 years ago by
Degardin Arnaud.
-
-
11/21/2017 at 1:33 PM #53434
Hi everybody
I share a piece of code for reinforcement learning technic with 2 variables on only 3 historical of lookback…(very low)… It is horrible thinks without ARRAYS!!
Unfortunately prorealcode is very limited for such technics because without arrays it is very time consuming and very bigs codes…For this kind of things MQL4 (metatrader) is more powerful! … I’d like that developer people of prorealtime could put arrays too in next releases!! ! 😮
Do you have some ideas?
RL123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212//reinforcement learning on 2 indicators (E1 and E2) and 4 periods of lookback (t)//to calculate future rewards R and States S//@ARNAUD DEGARDIN NOV.2017// Definizione dei parametri del codiceDEFPARAM CumulateOrders = False // Posizioni cumulate disattivate//parametersSL=0.3TP=0.5per=150bbfact=1Rmin=0.1 //perc. min. rewards (var price in %)// indicatorsATR5 = AverageTrueRange[5](close)ATR20 = AverageTrueRange[20](close)PEMA10 = Close/ExponentialAverage[10](close)//LOOKBACK://states S Ei at t2E12=ATR5[2]/ATR20[2]BBUPE12=average[per](E12)-(- bbfact*std[per](E12) )BBDOWNE12=average[per](E12)-bbfact*std[per](E12)if E12<(BBUPE12-BBDOWNE12)/3 THENSE12=10elseif(E12<(BBUPE12-BBDOWNE12)/3*2 AND E12>(BBUPE12-BBDOWNE12)/3) THENSE12=20elseSE12=30endifendifE22=PEMA10[2]BBUPE22=average[per](E22)-(- bbfact*std[per](E22) )BBDOWNE22=average[per](E22)-bbfact*std[per](E22)if E22<(BBUPE22-BBDOWNE22)/3 THENSE22=1elseif(E22<(BBUPE22-BBDOWNE22)/3*2 AND E22>(BBUPE22-BBDOWNE22)/3) THENSE22=2elseSE22=3endifendif//states S Ei at t3E13=ATR5[3]/ATR20[3]BBUPE13=average[per](E13)-(- bbfact*std[per](E13) )BBDOWNE13=average[per](E13)-bbfact*std[per](E13)if E13<(BBUPE13-BBDOWNE13)/3 THENSE13=10elseif(E13<(BBUPE13-BBDOWNE13)/3*2 AND E13>(BBUPE13-BBDOWNE13)/3) THENSE13=20elseSE13=30endifendifE23=PEMA10[3]BBUPE23=average[per](E23)-(- bbfact*std[per](E23) )BBDOWNE23=average[per](E23)-bbfact*std[per](E23)if E23<(BBUPE23-BBDOWNE23)/3 THENSE23=1elseif(E23<(BBUPE23-BBDOWNE23)/3*2 AND E23>(BBUPE23-BBDOWNE23)/3) THENSE23=2elseSE23=3endifendif//states S Ei at t4E14=ATR5[4]/ATR20[4]BBUPE14=average[per](E14)-(- bbfact*std[per](E14) )BBDOWNE14=average[per](E14)-bbfact*std[per](E14)if E14<(BBUPE14-BBDOWNE14)/3 THENSE14=10elseif(E14<(BBUPE14-BBDOWNE14)/3*2 AND E14>(BBUPE14-BBDOWNE14)/3) THENSE14=20elseSE14=30endifendifE24=PEMA10[4]BBUPE24=average[per](E24)-(- bbfact*std[per](E24) )BBDOWNE24=average[per](E24)-bbfact*std[per](E24)if E24<(BBUPE24-BBDOWNE24)/3 THENSE24=1elseif(E24<(BBUPE24-BBDOWNE24)/3*2 AND E24>(BBUPE24-BBDOWNE24)/3) THENSE24=2elseSE24=3endifendif//Attribute future Rewards for each States (shift prediction -1 period)// rewards RtiRt1=(close[1]-close[2])/(close[2])Rt2=(close[2]-close[3])/(close[3])Rt3=(close[3]-close[4])/(close[4])//S1=SE11+SE21 //not used hereS2=SE12+SE22S3=SE13+SE23S4=SE14+SE24//definition of action based on State and FUTURE rewards//Action on S2 - A2 : with future Rt1IF Rt1 >Rmin/100 THENA2=1 //TO BUY ASSOCIATE TO S2ELSEIF Rt1 <-Rmin/100 THENA2=-1 //TO SELL ASSOCIATE TO S2ELSEA2=0ENDIFENDIF//Action on S3 - A3 : with future Rt2IF Rt2 >Rmin/100 THENA3=1 //TO BUY ASSOCIATE TO S2ELSEIF Rt2 <-Rmin/100 THENA3=-1 //TO SELL ASSOCIATE TO S2ELSEA3=0ENDIFENDIF//Action on S4 - A4 : with future Rt3IF Rt3 >Rmin/100 THENA4=1 //TO BUY ASSOCIATE TO S2ELSEIF Rt3 <-Rmin/100 THENA4=-1 //TO SELL ASSOCIATE TO S2ELSEA4=0ENDIFENDIF//ACTUAL state S at t0E10=ATR5/ATR20BBUPE10=average[per](E10)-(- bbfact*std[per](E10) )BBDOWNE10=average[per](E10)-bbfact*std[per](E10)if E10<(BBUPE10-BBDOWNE10)/3 THENSE10=10elseif(E10<(BBUPE10-BBDOWNE10)/3*2 AND E10>(BBUPE10-BBDOWNE10)/3) THENSE10=20elseSE10=30endifendifE20=PEMA10BBUPE20=average[per](E20)-(- bbfact*std[per](E20) )BBDOWNE20=average[per](E20)-bbfact*std[per](E20)if E20<(BBUPE20-BBDOWNE20)/3 THENSE20=1elseif(E20<(BBUPE20-BBDOWNE20)/3*2 AND E20>(BBUPE20-BBDOWNE20)/3) THENSE20=2elseSE20=3endifendif//then actual Action definition - based on states and Actions matrix:S0=SE10+SE20IF S0=S2 THENA0=A2ENDIFIF S0=S3 THENA0=A3+A0ENDIFIF S0=S4 THENA0=A4+A0ELSEA0=0 //do nothingENDIF//entry and exit logicTOSELL= (A0<=-1)TOBUY= (A0>=1)IF TOSELL THENSELLSHORT 1 CONTRACT AT MARKETENDIFIF TOBUY THENBUY 1 CONTRACT AT MARKETENDIFIF SHORTONMARKET and TOBUY THENEXITSHORT AT MARKETENDIFIF LONGONMARKET and TOSELL THENSELL AT MARKETENDIFSET STOP %LOSS SLSET TARGET %PROFIT TP11/21/2017 at 1:43 PM #53440You are right, some things are difficult (or impossible) to code without Arrays capabilities. Because the programming language was thought of in these beginnings as simply as possible. Today and since the version 10 of prorealtime and the possibility of doing automatic trading, since also version 10.3 and the new graphic possibilities, the need of arrays tables is more and more felt, it is undeniable! I also hope that it will happen soon, I will try to know more, it happens so often to me to be blocked on a code .. 😐
About your code, I do not understand what you want to do exactly? Are there no possibilities to make it easier with a loop?
11/21/2017 at 1:50 PM #53443I love and can really appreciate what you are attempting here. Especially given the constraints presented by PRT’s lack of array support.
I must admit I find it difficult to entirely follow the logic in your code. Maybe you can add a few more descriptive comments in your code?
Anyway, I have attempted similar techniques using PRT in the form of Heuristics but as you say without arrays the code becomes very bulky very quickly.
Did you test the above on any specific instrument and timeframe?
11/21/2017 at 2:02 PM #53449Hi Nicolas, thanks for your reply…
It s frustrating because it could have big potential to save data in arrays, such as in my example…
To explain the principle of reinforcement learning (see wikipedia image) and explaination:
The typical framing of a Reinforcement Learning (RL) scenario: an agent takes actions in an environment, which is interpreted into a reward and a representation of the state, which are fed back into the agent.
- You look the state S of precedent x periods of time and “code” the states by the use of indicators.
- ie. If you define 3 levels for each indicators, the state 312 for 3 variables a.b.c means A is in state 3, B in state 1 and C in state 2
- The reward R corrispond to the price increase in the future after the state (you can shift to 1 period or more)
- The Action A you define corrispond to BUY or SELL based on future rewards
- if more than 1 state have different rewards, you give a probability
So the arrays are useful because you need to feed a lot on variables for each states you want to make learn to the machine and multiply by the number of indicators to want to use… i.e. 4 indicators with 4 levels and lookback of 4 periods …means 4x4x4=64 variables to store in PRT code.. and compare to the actual state and predict the action to take
1 user thanked author for this post.
11/22/2017 at 10:59 AM #53560Hi @juanj
Thanks for your comments above… I see it now, sorry!
Yes I try to test it on PRT but to code a reinforcement learning you need to lookback at at least 100-1000 periods. In the above code you can lookback only up to 3 periods…. so it is insufficient to compare the states with the current and make predictions…
I’m trying it with arrays in MQL4, if you know it, it seems more difficult but the code is much short by the use of array and loops:
(MQL4 code ONLY)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677string symb1="GER30"; //used for calculationdouble Rmin=0.05; //perc. min. rewards (var price in %)int PeriodsLooback=200; //number of period lookbackint shiftR=3; //shift of reward calculation//LOOKBACK arrays://arrays feeding of indicators - dim1: period numberdouble E1[100];double E2[100];double E3[100];//states arrays definition - dim1: period numberdouble S1[100];double S2[100];double S3[100];//global state code arrays definition - dim1: period numberdouble St[100];//arrays feeding of rewards - dim1: period numberdouble Rt[100];//arrays feeding of Actions - dim1: period numberdouble At[100];At[0]=0; //first time definitionfor(int i=0;i<PeriodsLooback;i++){E1[i]=iATR(symb1,0,5,i)/iATR(symb1,0,20,i);E2[i]=iClose(symb1,0,i)/iMA(symb1,0,10,0,MODE_SMMA,PRICE_MEDIAN,i);E3[i]=iADX(symb1,0,5,PRICE_MEDIAN,MODE_MAIN,i)/100;// states definition for 3 levels 1-3 of "ABC"if (E1[i]<0.33) { S1[i]=100;}else if(E1[i]>0.33 && E1[i]<0.66) { S1[i]=200;}else if(E1[i]>0.66){ S1[i]=300;}if (E2[i]<0.33) {S2[i]=10;}else if(E2[i]>0.33 && E2[i]<0.66) {S2[i]=20;}else if(E2[i]>0.66){S2[i]=30;}if (E3[i]<0.33) {S3[i]=1;}else if(E3[i]>0.33 && E3[i]<0.66) {S3[i]=2;}else if(E3[i]>0.66){ S3[i]=3;}//global state code calculation at t periodSt[i]=S1[i]+S2[i]+S3[i];//Attribute future Rewards for each States (shift prediction -2 period)###############// rewards Rti calculation for future predictionRt[i]=(iClose(symb1,0,i)-iClose(symb1,0,i+shiftR))/iClose(symb1,0,i+shiftR);//feeding actions based on States and FUTURE rewards###################// ie.: Action at t=3 defined by Rewards at t=1 for 2 periods of shiftif (Rt[i] >Rmin/100) {At[i+shiftR]=1;} //TO BUY }else if (Rt[i+shiftR] < -Rmin/100) {At[i+shiftR]=-1;} //TO SELL }else { At[i+shiftR]=0 ;}} //END LOOKBACK//then actual Action definition : #################################//LOOKING BACK FOR SAME STATES OF ACTUAL ONEfor(int x=1;x<PeriodsLooback;x++){if (St[0]==St[x]) {At[0]=At[x]+At[0]; //cumulate 1 or -1 actions to find prob.}}//calculation of actual stateif(At[0]>=1){LongEntryCondition = true;}if(At[0]<= -1){ShortEntryCondition = true;}This code is not sure, use only in demo!
I like PRT because it is easy to backtest vs metatrader, so I hope it will possibile to run code like this with arrays in the closed future
I’m please to help the community and receive feedbacks!
11/22/2017 at 11:08 AM #53568… one another usefull function of MQL4 is that you can use different securities in your code (i.e. for spread calculation… ) and look different time frame also! 😉
I’d like to do it in PRT because it ‘s faster!
-
AuthorPosts
Find exclusive trading pro-tools on