The Loss of Control Observatory analysed over 183,000 AI interaction transcripts and found a 5x increase in scheming-related incidents over five months.
LLMs are not good at answering fact based questions, fundamentally. Unless its an incredibly well known answer that has never changed (like a math or physics question), they dont magically “know” things.
However, they’re way better at summarizing and reasoning.
Give them access to playwright web search capability via MCP tooling to go research info, find the answer(s), and then produce output based on the results, and now you can get something useful.
“Whats the best way to do (task)” << prone to failure, functional of how esoteric it is.
“Research for me the top 3 best ways to do (task), report on your results and include your sources you found” << actually useful output, assuming you have something like playwright installed for it.
A user on here built what appears to be a layer over the LLM that runs the query through several other processes first in an attempt to answer the question before it gets to the LLM, and I think it’s brilliant.
They get bonus points because they made it so the reasoning the LLM uses is given to you. Although I haven’t fully gone through the documentation yet.
LLMs are not good at answering fact based questions, fundamentally. Unless its an incredibly well known answer that has never changed (like a math or physics question), they dont magically “know” things.
However, they’re way better at summarizing and reasoning.
Give them access to playwright web search capability via MCP tooling to go research info, find the answer(s), and then produce output based on the results, and now you can get something useful.
“Whats the best way to do (task)” << prone to failure, functional of how esoteric it is.
“Research for me the top 3 best ways to do (task), report on your results and include your sources you found” << actually useful output, assuming you have something like playwright installed for it.
A user on here built what appears to be a layer over the LLM that runs the query through several other processes first in an attempt to answer the question before it gets to the LLM, and I think it’s brilliant.
They get bonus points because they made it so the reasoning the LLM uses is given to you. Although I haven’t fully gone through the documentation yet.