Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Bug: Hallucination in semantic kernel responses #10019

Open
Vishal11848 opened this issue Dec 19, 2024 · 7 comments
Open

.Net: Bug: Hallucination in semantic kernel responses #10019

Vishal11848 opened this issue Dec 19, 2024 · 7 comments
Assignees
Labels
bug Something isn't working .NET Issue or Pull requests regarding .NET code

Comments

@Vishal11848
Copy link

Describe the bug
We have integrated the semantic kernel and configured it to call the ticketing system (API based) automatically by using an auto-function call. During the initial function call, it retrieves the response accurately and provides the correct answer. However, when follow-up questions are asked within the same chat history, it starts generating random answers and exhibiting hallucinations. Additionally, it is not even hitting the function again on the follow-up question. If the chat history is cleared and a new conversation is started, it performs correctly initially, but the same issues reoccur after 2nd or 3rd question.

Expected behavior
It should give an answer correctly, and call function every time instead of hallucinat.

Platform

  • OS: Windows
  • IDE: Visual Studio
  • Language: C#,
  • Source: Microsoft.SemanticKernel (1.15.0), Microsoft.SemanticKernel.Connectors.OpenAI (1.15.0)
@Vishal11848 Vishal11848 added the bug Something isn't working label Dec 19, 2024
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code triage labels Dec 19, 2024
@github-actions github-actions bot changed the title Bug: Hallucination in semantic kernel responses .Net: Bug: Hallucination in semantic kernel responses Dec 19, 2024
@markwallace-microsoft
Copy link
Member

Hi @Vishal11848, take a look at this article Managing Chat History for Large Language Models (LLMs). You likely need to implement one of these strategies to limit the amount of chat history being sent to the LLM.

@Vishal11848
Copy link
Author

Vishal11848 commented Dec 23, 2024

Hi @Vishal11848, take a look at this article Managing Chat History for Large Language Models (LLMs). You likely need to implement one of these strategies to limit the amount of chat history being sent to the LLM.

Thank you for your suggestion, @markwallace-microsoft . Currently, we are sending the history of the past five conversations. However, the system sometimes starts hallucinating as early as the 2nd interaction, and other times it occurs at the 3rd or 5th interaction. The issue is not consistent.

I am trying to follow the steps which you shared, will share more updates on it.

@Vishal11848
Copy link
Author

Hello @markwallace-microsoft

Could you please inform me about the version of the semantic kernel being used here?

@sphenry
Copy link
Member

sphenry commented Jan 6, 2025

@Vishal11848 what model are you using? Does the issue reproduce with more advanced models like GPT-4o

@Vishal11848
Copy link
Author

Hello @sphenry yes, this issue is reproducible with GPT-4o

@markwallace-microsoft
Copy link
Member

markwallace-microsoft commented Jan 6, 2025

@Vishal11848 this isn't an issue specific to Semantic Kernel, it's a well known LLM issue, so the version of Semantic Kernel won't be a factor. Our general guidance is always use the latest version of Semantic Kernel, as it will contain the most up-to-date fixes.

The general guidance for reducing hallucinations is:

  1. Provide clear and specific prompts. Provide relevant context in the prompt (or via function calling) to ground the LLM. It also helps to include links in the grounding data and ask the LLM to provide citations.
  2. Use active mitigation i.e. change temperature or frequency_penalty values to modify the behaviour of the LLM.
  3. Use multi-short prompting i.e. provide examples of the type of results you expect the LLM to return.

Another thing to try is review your system prompt e.g. it may help to provide boundaries for the LLM to use when generating responses.

For example, if I set my system prompt to You are an AI assistant that helps people answer question about the Python programming language. For all other questions please politely decline to answer. while using gpt-4o-mini and ask Who is the greatest soccer player of all time? the LLM will respond with I'm here to help with questions about Python programming. If you have any questions related to Python, feel free to ask!.

So a suitable system prompt may be what you need to keep the LLM responses relevant and prevent halluncinations.

@markwallace-microsoft markwallace-microsoft moved this from Sprint: In Review to Sprint: Planned in Semantic Kernel Jan 8, 2025
@Vishal11848
Copy link
Author

Thanks for your suggestion @markwallace-microsoft, we followed the suggestions, but we are unable to fix it. But I had an observation.

The Semantic Kernel's IChatCompletionService features have a method called GetStreamingChatMessageContentsAsync. This method returns the actual chat history, which includes the real responses from the ServiceNow API. This functionality helps OpenAI avoid duplicate requests and provides accurate answers. See below image index [2]

Image

In our previous implementation, we were only using history without incorporating the ServiceNow actual API responses displayed in above image index [2], which led to hallucinations.
We have now resolved this by integrating the actual chat history populated from the GetStreamingChatMessageContentsAsync method.
However, we still face a challenge when dealing with past conversations that are selected from history. When users resume their questions from a previous point, the conversation doesn’t include the ServiceNow API responses.

Is this the correct way to handle it can you help here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working .NET Issue or Pull requests regarding .NET code
Projects
Status: Sprint: Planned
Development

No branches or pull requests

3 participants