teawrecks ,

I think if the 2nd LLM has ever seen the actual prompt, then no, you could just jailbreak the 2nd LLM too. But you may be able to create a bot that is really good at spotting jailbreak-type prompts in general, and then prevent it from going through to the primary one. I also assume I'm not the first to come up with this and OpenAI knows exactly how well this fares.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • technology@beehaw.org
  • test
  • worldmews
  • mews
  • All magazines