Appier's latest paper, "Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models," finds that many leading large language models (LLMs) exhibit a strategic imbalance across risk scenarios.
The study reached its conclusion by using the Risk-Aware Decision-Making framework and structured risk parameters to simulate different risk scenarios. Under this framework, models must evaluate their capability, confidence level, and risk conditions before deciding whether to answer, refuse, or guess.
Parameters include rewards for correct answers, penalties for incorrect responses, and costs for refusal. It also assesses strategic decision-making by whether the model maximises expected reward.
Strategic imbalance in existing models
Using the framework, the study identified inconsistent limits in AI autonomy and safety, with models over-guessing in high-risk settings and refusing too often in low-risk ones, revealing core decision strategy challenges.
The study recommends a Skill Decomposition approach by breaking decision-making into three steps to address this challenge.
- Task Execution — Solving the task to generate an initial answer
- Confidence Estimation — Evaluating confidence in that answer
- Expected-Value Reasoning — Reasoning about outcomes under risk conditions

"For Agentic AI to operate in critical enterprise workflows, the key is not only making AI smarter, but making its autonomous decisions more reliable," said Chih-Han Yu, CEO and co-founder of Appier. "Appier has built its products around AI and continuously invested in world-class research. By turning LLM risk awareness into a quantifiable methodology, this research strengthens the foundation for trustworthy enterprise AI and helps accelerate the real-world adoption of Agentic AI and translate it into scalable business value and ROI."
