Hi all,
 
   Xilinx recommend using SRL16 to make large delay instead of cascade flip-flops. Can you tell me the advantages of it? And for the large delay, it often uses SRL16+flip-flop which means the last delay element is flip-flop.
 Can you tell me why the last delay element is not SRL16 but a flip-flop?
 
At least a couple of advantages:
 
1. less area. Using a single SRL you can implement up to 32-cycle shift register. 
2. performance. Routing delays between cascaded flops decreases the performance (Fmax) comparing to SRL.
 
The clock-to-output time of a flop is much lower than clock-to-output of an SRL. For example, Virtex-6 clock-to-output of a flop is 0.33ns, whereas SRL is 1.3ns (from ds152 - Virtex-6 switching characteristics). That's why it
 makes sense to add a flop on the output of an SRL.