Split the model
why do we need to split the model
Sometimes a model is a little large for a single GPU, we can deploy the model into several GPUS by splitting one node into several nodes.
method
1 use reshard.py here(not work for llama)
2 use this issue for llama70b
浙公网安备 33010602011771号