OpenRiver Lemmy
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
beep@piefed.worldM to Technology@piefed.worldEnglish · 3 days ago

LLMs believe false statements even after explicit warnings that they’re false

www.greaterwrong.com

external-link
message-square
0
link
fedilink
1
external-link

LLMs believe false statements even after explicit warnings that they’re false

www.greaterwrong.com

beep@piefed.worldM to Technology@piefed.worldEnglish · 3 days ago
message-square
0
link
fedilink
Negation Neglect: When models fail to learn negations in training
www.greaterwrong.com
external-link
This is a short summary of our new paper: arXiv, X thread, code. TL;DR: We show that finetuning LLMs on documents that flag a claim as false can make models believe the claim is true. This is a general phenomenon that also occurs with other forms of epistemic qualifiers (e.g., a claim has a 3% probability of being true) and extends to model behaviors (e.g., warning against types of misalignment). This effect occurs in all models tested. Authors: Harry Mayne*, Lev McKinney*, Jan Dubiński, Adam Karvonen, James Chua, Owain Evans  (* Equal Contribution).
alert-triangle
You must log in or # to comment.

Technology@piefed.world

tech@piefed.world

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !tech@piefed.world
Blacklisted Sites

List inspired by other community rules.

  • Mac Rumors;
  • Al Jazeera;
  • NBC;
  • CNBC;
  • Tom’s Hardware;
  • ZDNet;
  • TechSpot;
  • Ars Technica;
  • Vox Media outlets;
  • Engadget;
  • TechCrunch;
  • Gizmodo;
  • Futurism;
  • PCWorld;
  • ComputerWorld;
  • Mashable;
  • Hackaday;
  • WCCFTECH;
  • Neowin;
  • Jacobin;
  • Yahoo;
  • Freethink;
  • Big Think;
  • Newsweek.

Technology news, blogs and articles.

Forbidden:

  • Paywalled content.
  • Older than 1 month articles;
  • External video links(non native videos);
  • Article talking about article, research or new.(Always use original articles).
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 26 users / day
  • 35 users / week
  • 35 users / month
  • 35 users / 6 months
  • 1 local subscriber
  • 27 subscribers
  • 61 Posts
  • 2 Comments
  • Modlog
  • mods:
  • beep@piefed.world
  • BE: 0.19.15
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org