Reinforcement learning without ARRAYS is horrible

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #53434 quote
    Degardin Arnaud
    Participant
    Junior

    Hi everybody

    I share a piece of code for reinforcement learning technic with 2 variables on only 3 historical of lookback…(very low)… It is horrible thinks without ARRAYS!!

    Unfortunately prorealcode is very limited for such technics because without arrays it is very time consuming and very bigs codes…For this kind of things MQL4 (metatrader) is more powerful! … I’d like that developer people of prorealtime could put arrays too in next releases!! ! 😮

    Do you have some ideas?

    //reinforcement learning on 2 indicators (E1 and E2) and 4 periods of lookback (t)
    //to calculate future rewards R and States S
    //@ARNAUD DEGARDIN NOV.2017
    
    // Definizione dei parametri del codice
    DEFPARAM CumulateOrders = False // Posizioni cumulate disattivate
    //parameters
    SL=0.3
    TP=0.5
    per=150
    bbfact=1
    Rmin=0.1 //perc. min. rewards (var price in %)
    
    // indicators
    
    ATR5 = AverageTrueRange[5](close)
    ATR20 = AverageTrueRange[20](close)
    PEMA10 = Close/ExponentialAverage[10](close)
    
    //LOOKBACK:
    
    //states S Ei at t2
    E12=ATR5[2]/ATR20[2]
    BBUPE12=average[per](E12)-(- bbfact*std[per](E12) )
    BBDOWNE12=average[per](E12)-bbfact*std[per](E12)
    
    if E12<(BBUPE12-BBDOWNE12)/3 THEN
    SE12=10
    else
    if(E12<(BBUPE12-BBDOWNE12)/3*2 AND E12>(BBUPE12-BBDOWNE12)/3) THEN
    SE12=20
    else
    SE12=30
    endif
    endif
    
    E22=PEMA10[2]
    BBUPE22=average[per](E22)-(- bbfact*std[per](E22) )
    BBDOWNE22=average[per](E22)-bbfact*std[per](E22)
    
    if E22<(BBUPE22-BBDOWNE22)/3 THEN
    SE22=1
    else
    if(E22<(BBUPE22-BBDOWNE22)/3*2 AND E22>(BBUPE22-BBDOWNE22)/3) THEN
    SE22=2
    else
    SE22=3
    endif
    endif
    
    //states S Ei at t3
    E13=ATR5[3]/ATR20[3]
    BBUPE13=average[per](E13)-(- bbfact*std[per](E13) )
    BBDOWNE13=average[per](E13)-bbfact*std[per](E13)
    
    if E13<(BBUPE13-BBDOWNE13)/3 THEN
    SE13=10
    else
    if(E13<(BBUPE13-BBDOWNE13)/3*2 AND E13>(BBUPE13-BBDOWNE13)/3) THEN
    SE13=20
    else
    SE13=30
    endif
    endif
    
    E23=PEMA10[3]
    BBUPE23=average[per](E23)-(- bbfact*std[per](E23) )
    BBDOWNE23=average[per](E23)-bbfact*std[per](E23)
    
    if E23<(BBUPE23-BBDOWNE23)/3 THEN
    SE23=1
    else
    if(E23<(BBUPE23-BBDOWNE23)/3*2 AND E23>(BBUPE23-BBDOWNE23)/3) THEN
    SE23=2
    else
    SE23=3
    endif
    endif
    
    //states S Ei at t4
    E14=ATR5[4]/ATR20[4]
    BBUPE14=average[per](E14)-(- bbfact*std[per](E14) )
    BBDOWNE14=average[per](E14)-bbfact*std[per](E14)
    
    if E14<(BBUPE14-BBDOWNE14)/3 THEN
    SE14=10
    else
    if(E14<(BBUPE14-BBDOWNE14)/3*2 AND E14>(BBUPE14-BBDOWNE14)/3) THEN
    SE14=20
    else
    SE14=30
    endif
    endif
    
    E24=PEMA10[4]
    BBUPE24=average[per](E24)-(- bbfact*std[per](E24) )
    BBDOWNE24=average[per](E24)-bbfact*std[per](E24)
    
    if E24<(BBUPE24-BBDOWNE24)/3 THEN
    SE24=1
    else
    if(E24<(BBUPE24-BBDOWNE24)/3*2 AND E24>(BBUPE24-BBDOWNE24)/3) THEN
    SE24=2
    else
    SE24=3
    endif
    endif
    
    //Attribute future Rewards for each States (shift prediction -1 period)
    // rewards Rti
    Rt1=(close[1]-close[2])/(close[2])
    Rt2=(close[2]-close[3])/(close[3])
    Rt3=(close[3]-close[4])/(close[4])
    
    //S1=SE11+SE21  //not used here
    S2=SE12+SE22
    S3=SE13+SE23
    S4=SE14+SE24
    //definition of action based on State and FUTURE rewards
    //Action on S2 - A2 : with future Rt1
    IF Rt1 >Rmin/100 THEN
    A2=1 //TO BUY ASSOCIATE TO S2
    ELSE
    IF Rt1 <-Rmin/100 THEN
    A2=-1 //TO SELL ASSOCIATE TO S2
    ELSE
    A2=0
    ENDIF
    ENDIF
    //Action on S3 - A3 : with future Rt2
    IF Rt2 >Rmin/100 THEN
    A3=1 //TO BUY ASSOCIATE TO S2
    ELSE
    IF Rt2 <-Rmin/100 THEN
    A3=-1 //TO SELL ASSOCIATE TO S2
    ELSE
    A3=0
    ENDIF
    ENDIF
    
    //Action on S4 - A4 : with future Rt3
    IF Rt3 >Rmin/100 THEN
    A4=1 //TO BUY ASSOCIATE TO S2
    ELSE
    IF Rt3 <-Rmin/100 THEN
    A4=-1 //TO SELL ASSOCIATE TO S2
    ELSE
    A4=0
    ENDIF
    ENDIF
    
    
    //ACTUAL state S at t0
    E10=ATR5/ATR20
    BBUPE10=average[per](E10)-(- bbfact*std[per](E10) )
    BBDOWNE10=average[per](E10)-bbfact*std[per](E10)
    if E10<(BBUPE10-BBDOWNE10)/3 THEN
    SE10=10
    else
    if(E10<(BBUPE10-BBDOWNE10)/3*2 AND E10>(BBUPE10-BBDOWNE10)/3) THEN
    SE10=20
    else
    SE10=30
    endif
    endif
    E20=PEMA10
    BBUPE20=average[per](E20)-(- bbfact*std[per](E20) )
    BBDOWNE20=average[per](E20)-bbfact*std[per](E20)
    if E20<(BBUPE20-BBDOWNE20)/3 THEN
    SE20=1
    else
    if(E20<(BBUPE20-BBDOWNE20)/3*2 AND E20>(BBUPE20-BBDOWNE20)/3) THEN
    SE20=2
    else
    SE20=3
    endif
    endif
    
    //then actual Action definition - based on states and Actions matrix:
    S0=SE10+SE20
    IF S0=S2 THEN
    A0=A2
    ENDIF
    IF S0=S3 THEN
    A0=A3+A0
    ENDIF
    IF S0=S4 THEN
    A0=A4+A0
    ELSE
    A0=0 //do nothing
    ENDIF
    
    //entry and exit logic
    TOSELL= (A0<=-1)
    TOBUY= (A0>=1)
    
    IF TOSELL THEN
    SELLSHORT 1 CONTRACT AT MARKET
    ENDIF
    IF TOBUY THEN
    BUY 1 CONTRACT AT MARKET
    ENDIF
    
    IF SHORTONMARKET and  TOBUY THEN
    EXITSHORT AT MARKET
    ENDIF
    IF LONGONMARKET and TOSELL THEN
    SELL AT MARKET
    ENDIF
    
    SET STOP %LOSS SL
    SET TARGET %PROFIT TP
    

     

    #53440 quote
    Nicolas
    Keymaster
    Master

    You are right, some things are difficult (or impossible) to code without Arrays capabilities. Because the programming language was thought of in these beginnings as simply as possible. Today and since the version 10 of prorealtime and the possibility of doing automatic trading, since also version 10.3 and the new graphic possibilities, the need of arrays tables is more and more felt, it is undeniable! I also hope that it will happen soon, I will try to know more, it happens so often to me to be blocked on a code .. 😐


    About your code, I do not understand what you want to do exactly? Are there no possibilities to make it easier with a loop?

    #53443 quote
    juanj
    Participant
    Master

    I love and can really appreciate what you are attempting here. Especially given the constraints presented by PRT’s lack of array support.

    I must admit I find it difficult to entirely follow the logic in your code. Maybe you can add a few more descriptive comments in your code?

    Anyway, I have attempted similar techniques using PRT in the form of Heuristics but as you say without arrays the code becomes very bulky very quickly.

    Did you test the above on any specific instrument and timeframe?

    #53449 quote
    Degardin Arnaud
    Participant
    Junior

    Hi Nicolas, thanks for your reply…

    It s frustrating because it could have big potential to save data in arrays, such as in my example…

    To explain the principle of reinforcement learning  (see wikipedia image) and explaination:

    The typical framing of a Reinforcement Learning (RL) scenario: an agent takes actions in an environment, which is interpreted into a reward and a representation of the state, which are fed back into the agent.

    • You look the state S of precedent x periods of time and “code” the states by the use of indicators.
    • ie. If you define 3 levels for each indicators, the state 312 for 3 variables a.b.c means A is in state 3, B in state 1 and C in state 2
    • The reward R corrispond to the price increase in the future after the state (you can shift to 1 period or more)
    • The Action A you define corrispond to BUY or SELL based on future rewards
    • if more than 1 state have different rewards, you give a probability

    So the arrays are useful because you need to feed a lot on variables for each states you want to make learn to the machine and multiply by the number of indicators to want to use… i.e. 4 indicators with 4 levels and lookback of 4 periods …means 4x4x4=64 variables to store in PRT code.. and compare to the actual state and predict the action to take

    mr blue thanked this post
    Reinforcement_learning_diagram.svg_.png Reinforcement_learning_diagram.svg_.png
    #53560 quote
    Degardin Arnaud
    Participant
    Junior
    Hi @juanj Thanks for your comments above… I see it now, sorry! Yes I try to test it on PRT but to code a reinforcement learning you need to lookback at at least 100-1000 periods. In the above code you can lookback only up to 3 periods…. so it is insufficient to compare the states with the current and make predictions… I’m trying it with arrays in MQL4, if you know it, it seems more difficult but the code is much short by the use of array and loops: (MQL4 code ONLY)
    string symb1="GER30"; //used for calculation
    
    double Rmin=0.05; //perc. min. rewards (var price in %)
    int PeriodsLooback=200; //number of period lookback
    int shiftR=3; //shift of reward calculation
    
    //LOOKBACK arrays:
    
    //arrays feeding  of  indicators - dim1: period number
    double E1[100]; 
    double E2[100]; 
    double E3[100]; 
    //states arrays definition - dim1: period number
    double S1[100]; 
    double S2[100]; 
    double S3[100]; 
    //global state code arrays definition - dim1: period number
    double St[100]; 
    //arrays feeding  of  rewards - dim1: period number
    double Rt[100]; 
    //arrays feeding  of  Actions - dim1: period number
    double At[100]; 
    At[0]=0; //first time definition
    
    for(int i=0;i<PeriodsLooback;i++)
    {
       E1[i]=iATR(symb1,0,5,i)/iATR(symb1,0,20,i);
       E2[i]=iClose(symb1,0,i)/iMA(symb1,0,10,0,MODE_SMMA,PRICE_MEDIAN,i);
       E3[i]=iADX(symb1,0,5,PRICE_MEDIAN,MODE_MAIN,i)/100;
    
          // states definition for 3 levels 1-3 of "ABC" 
       	if (E1[i]<0.33) {	S1[i]=100;}
       	else if(E1[i]>0.33 && E1[i]<0.66) {	S1[i]=200;}
       		else if(E1[i]>0.66){  S1[i]=300;}
    
       	if (E2[i]<0.33) {S2[i]=10;}
       	else if(E2[i]>0.33 && E2[i]<0.66) {S2[i]=20;}
       		else if(E2[i]>0.66){S2[i]=30;}
       		
       	if (E3[i]<0.33) {S3[i]=1;}
       	else 	if(E3[i]>0.33 && E3[i]<0.66) {S3[i]=2;}
       		else if(E3[i]>0.66){ S3[i]=3;}
       		
    //global state code calculation at t period
    St[i]=S1[i]+S2[i]+S3[i];
    //Attribute future Rewards for each States (shift prediction -2 period)###############
    // rewards Rti calculation for future prediction
    Rt[i]=(iClose(symb1,0,i)-iClose(symb1,0,i+shiftR))/iClose(symb1,0,i+shiftR);
    //feeding  actions based on States and FUTURE rewards###################
    // ie.: Action  at t=3 defined by Rewards at t=1 for 2 periods of shift
       if (Rt[i] >Rmin/100) {At[i+shiftR]=1;} //TO BUY }
       else if (Rt[i+shiftR] < -Rmin/100) {At[i+shiftR]=-1;} //TO SELL }
       	else {	At[i+shiftR]=0 ;}
         
    }     //END LOOKBACK
    
    //then actual Action definition : #################################
    //LOOKING BACK FOR SAME STATES OF ACTUAL ONE
    
    for(int x=1;x<PeriodsLooback;x++)
      {
          if (St[0]==St[x]) {
          	At[0]=At[x]+At[0]; //cumulate 1 or -1 actions to find prob.
           }
      }
    
    
    //calculation of actual state
    
    if(At[0]>=1)
      {
       LongEntryCondition = true;
      }
    if(At[0]<= -1)
      {
       ShortEntryCondition = true;
      }
    This code is not sure, use only in demo!   I like PRT because it is easy to backtest vs metatrader, so I hope it will possibile to run code like this with arrays in the closed future I’m please to help the community and receive feedbacks!
    #53568 quote
    Degardin Arnaud
    Participant
    Junior
    … one another usefull function of MQL4 is that you can use different securities in your code (i.e. for spread calculation… ) and look different time frame also! 😉 I’d like to do it in PRT because it ‘s faster!
Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.

Reinforcement learning without ARRAYS is horrible


ProOrder: Automated Strategies & Backtesting

New Reply
Author
Summary

This topic contains 5 replies,
has 3 voices, and was last updated by Degardin Arnaud
8 years, 3 months ago.

Topic Details
Forum: ProOrder: Automated Strategies & Backtesting
Language: English
Started: 11/21/2017
Status: Active
Attachments: 1 files
Logo Logo
Loading...