m基于Q-Learning强化学习的异构网络小区范围扩展(CRE)技术matlab仿真

1.算法仿真效果

matlab2022a仿真结果如下:

 

2.算法涉及理论知识概要

        基于Q-Learning强化学习的异构网络小区范围扩展(Cell Range Extension, CRE)技术是一种旨在优化异构无线网络性能的方法。异构网络是由不同类型的基站(如宏基站、微基站、皮基站等)组成的网络,这些基站具有不同的发射功率、覆盖范围和容量。小区范围扩展技术通过调整基站的发射功率或偏置参数,使得用户能够更均匀地分布在网络中,从而提高网络的整体性能和用户体验。

 

2.1 Q-Learning概要

       在异构网络中,由于不同类型基站的差异,用户往往更倾向于连接到发射功率更大的宏基站,导致微基站和皮基站的负载较轻,宏基站的负载过重。这种现象被称为“蜂窝选择偏见”或“负载不平衡”。为了解决这个问题,可以通过小区范围扩展技术来调整基站的覆盖范围,使得用户能够更均匀地分布在不同类型的基站之间。

 

       Q-Learning是一种基于值迭代的强化学习算法,它通过学习一个Q值函数来评估在不同状态下采取不同动作的长期回报。在异构网络小区范围扩展的场景中,可以将每个基站视为一个智能体,每个智能体通过与环境(即网络中的其他基站和用户)交互来学习如何调整其发射功率或偏置参数以优化网络性能。

 

2.2 基于Q-LearningCRE算法

状态定义:状态可以定义为当前网络的状态,包括各个基站的负载情况、用户的分布和信道质量等。

 

动作定义:动作可以定义为基站可以采取的发射功率调整或偏置参数调整。

 

奖励函数设计:奖励函数应该能够反映网络性能的提升。例如,可以将奖励定义为负载均衡程度、吞吐量提升或用户满意度的提高等。

 

Q值函数更新:Q值函数用于评估在给定状态下采取特定动作的长期回报。在Q-Learning中,Q值函数通过以下公式进行更新:

 

策略选择:在每个状态下,基站选择具有最大Q值的动作来执行。

 

探索与利用:为了平衡探索新动作和利用已知最优动作之间的权衡,可以采用ε-贪婪策略或其他探索策略。

 

3.MATLAB核心程序

 

            if V_ < 0.1*diff
               %Step (4) Among those sets whose received powers are equal to the pilot signal powers, UEs usually choose one set that has 
               %the lowest Q -value or rarely choose one set randomly to avoid local minima as ε-greedy policy [11].          
               user_q = [user_q,ju]; 
               if Idiff<=length(diff1)
                  RSRPp_max_quantized(ju)=Qtmp(I_);
               else
                  RSRPm_max_quantized(ju)=Qtmp(I_-length(diff1)); 
               end
            else
               %Step (3) If there are no equal received powers on each UE’s Q -table, they add new received powers to their own Q -tables. 
               user_q = [user_q,0];  %没找到,更新q表
               if Idiff<=length(diff1)
                  Qtmp(I_)=RSRPp_max_quantized(ju);
               else
                  Qtmp(I_)=RSRPm_max_quantized(ju); 
               end
            end
            Qtable(:,ju)=Qtmp;
        end
        %Step (5) Each UE uses chosen set’s bias value as an action.
        for jm=1:Macro_cell
            for js=1:Small_cell
                for ju=1:Users
                    for jsj = 1:Les
                        [tes,Ies]            = min([abs(bias1(jsj,ju)),abs(bias2(jsj,ju))]);
                        if lp==1 %动作更新
                           action(jsj,jm,js,ju) = actions(Ies);
                        else
                           action(jsj,jm,js,ju) = action(jsj,jm,js,ju)+actions(Ies)/(1+CRE); %调整学习更新速率
                        end
                    end
                end
            end
        end
        %Step (6) Each UE compares “macro received power”with “pico received power” added by bias value, 
        %they try to connect to the larger one.
        %Step (7) BSs allocate each UE to each RB randomly.In this article, each UE can use only one RB. strongly interfered by the MBS’s signals. 
        for ju=1:Users
            dats        = [RSRPp_max(ju)+min(bias1(:,ju)),RSRPm_max(ju)+min(bias2(:,ju))];
            [Vsel,Isel] = max(dats);
            RSRPsel(ju) = Vsel;
        end
        %Step (8) BSs calculate the number of outage UEs and pass it to UEs as a cost.
        Ns = 0;
        for jm=1:Macro_cell
            for js=1:Small_cell
                for ju=1:Users
                    RSRPm_ = RSRPm(ju,jm);
                    RSRPs_ = RSRPp(ju,js,jm);
                    if RSRPm_<RSRPs_%the number of outage UEs
                       Ns = Ns+1;
                    end
                end
            end
        end
        cost = Ns/(Macro_cell*Small_cell*Users);
        %Step (9) Each UE reevaluates the chosen set’s Q -value at Step 4 as update based on Equation (6).
        alp = 0.5;
        gam = 0.9;
        for ju=1:Users
            idxx          = randperm(Les);
            k             = state(idxx(1));   
            v             = max(Qtable(k,:));                  
            D             = cost*Rew(k)+gam*v-Qtable(:,ju)-0.2;
            Qtable(:,ju)  = Qtable(:,ju) + alp*D;
        end
        %根据最后的动作action,调整CRE
        for jm=1:Macro_cell
            for js=1:Small_cell
                for ju=1:Users
                    tmpss          = (mean(action(:,jm,js,ju)));
                    CRE2(jm,js,ju) = CRE + tmpss;
                end
            end
        end
    end

 

  

 

posted @ 2024-01-31 22:52  我爱C编程  阅读(90)  评论(0)    收藏  举报