xcjs , 1 month ago Ok, so using my "older" 2070 Super, I was able to get a response from a 70B parameter model in 9-12 minutes. (Llama 3 in this case.) I'm fairly certain that you're using your CPU or having another issue. Would you like to try and debug your configuration together?
Ok, so using my "older" 2070 Super, I was able to get a response from a 70B parameter model in 9-12 minutes. (Llama 3 in this case.)
I'm fairly certain that you're using your CPU or having another issue. Would you like to try and debug your configuration together?