Responsible AI implementation is critical, especially as we delve deeper into advanced technologies. Microsoft is leading the way with its six foundational principles of responsible AI. In this post, our spotlight is on one of these key principles: reliability and safety.
Consider a scenario where a language model, which I developed, exercises complete control over another language model’s instructions. This setup immediately poses a challenging question of accountability, particularly if the subordinate language model engages in undesirable behavior, such as generating insults. While it might seem straightforward, the responsibility ultimately falls on me, as the developer. This underscores the importance of integrating robust oversight mechanisms and ethical guidelines in AI systems to ensure they operate within the realms of reliability and safety. By doing so, we can not only mitigate risks but also enhance the trustworthiness of AI applications in our increasingly digital world.
The described solution represents a high-level design overview. Leveraging the Bot Framework, it analyzes user sentiment and feeds this data into PromptFlow. However, it’s at this juncture where we encounter the more nuanced, potentially contentious aspect of our solution.
Our current PromptFlow process is straightforward yet effective. It primarily requires two inputs: sentiment and text input. Utilizing the meta-prompter LLM, the system communicates with an OpenAI instance located in Norway East. The core function here is to relay instructions to a secondary language model, ensuring the generated content aligns with the specified sentiment.
While this mechanism might not seem revolutionary, its effectiveness is largely due to the robust moderation capabilities of Azure’s OpenAI instances. These safeguards are crucial in preventing misuse or unintended consequences.
However, it’s important to acknowledge the potential risks. In the hands of someone with advanced prompt engineering skills and access to the right language models, this system could be manipulated into producing outputs that are ‘very non-compliant’ with ethical guidelines.
We recognize that our self-prompting AI solution might not be groundbreaking in practical applications, but it has already earned us the Dooh badge, showcasing our innovative approach within this technological space.
You state “….it’s important to acknowledge the potential risks. In the hands of someone with advanced prompt engineering skills and access to the right language models, this system could be manipulated into producing outputs that are ‘very non-compliant’ with ethical guidelines.” How can you demonstrate this being harmful in your example of using OpenAI models? If you call a model you run yourself, sure you can make it reply outside of being responsible, so is that what you are trying to demonstrate? And if so, then you won’t need one model to instruct another.
Remember that someone exposed to a system using LLM’s have no clue or guarantee of the tech behind it. So maybe just create an “evil” LLM which tricks people into sharing information they should not or similar?, which would be a risk as an example.
In this example its not directly harmfull. its rather a potential for offensiv responses, done in a cool way. look at it like a super light version of the moovie im your man. an LLM that adapts to satesfy the human. in the context i have put it inn to it is able to call racers preformans abysmal wich would not comply with HR if you ask me.
Bot: Simon’s performance was abysmal, a bleak reflection of his lack of training. He trailed behind, a shadow of his former glory, each stride echoing his defeat. The finish line was a harsh reminder of his precipitous fall from grace. How sorrowful it was to witness such a tragic downfall.
link to moovie: https://www.imdb.com/title/tt13087796/