The Fort Worth Press - AI systems are already deceiving us -- and that's a problem, experts warn

USD -
AED 3.672904
AFN 67.000368
ALL 93.103989
AMD 388.250403
ANG 1.803449
AOA 912.000367
ARS 997.22659
AUD 1.547509
AWG 1.795
AZN 1.70397
BAM 1.850279
BBD 2.020472
BDT 119.580334
BGN 1.857704
BHD 0.376895
BIF 2898.5
BMD 1
BND 1.341507
BOB 6.914723
BRL 5.79695
BSD 1.000634
BTN 84.073433
BWP 13.679968
BYN 3.274772
BYR 19600
BZD 2.017086
CAD 1.41015
CDF 2865.000362
CHF 0.887938
CLF 0.035528
CLP 980.330396
CNY 7.232504
CNH 7.23645
COP 4439.08
CRC 509.261887
CUC 1
CUP 26.5
CVE 104.850394
CZK 23.965904
DJF 177.720393
DKK 7.078104
DOP 60.403884
DZD 133.35504
EGP 49.296856
ERN 15
ETB 122.000358
EUR 0.94835
FJD 2.27595
FKP 0.789317
GBP 0.792519
GEL 2.73504
GGP 0.789317
GHS 15.95039
GIP 0.789317
GMD 71.000355
GNF 8630.000355
GTQ 7.728257
GYD 209.258103
HKD 7.785135
HNL 25.12504
HRK 7.133259
HTG 131.547827
HUF 387.203831
IDR 15898.3
ILS 3.744115
IMP 0.789317
INR 84.47775
IQD 1310.5
IRR 42092.503816
ISK 137.650386
JEP 0.789317
JMD 158.916965
JOD 0.709104
JPY 154.340504
KES 129.503801
KGS 86.503799
KHR 4050.00035
KMF 466.575039
KPW 899.999621
KRW 1395.925039
KWD 0.30754
KYD 0.833948
KZT 497.28482
LAK 21953.000349
LBP 89550.000349
LKR 292.337966
LRD 184.000348
LSL 18.220381
LTL 2.95274
LVL 0.60489
LYD 4.875039
MAD 10.013504
MDL 18.182248
MGA 4665.000347
MKD 58.285952
MMK 3247.960992
MNT 3397.999946
MOP 8.023973
MRU 39.960379
MUR 47.210378
MVR 15.450378
MWK 1736.000345
MXN 20.35475
MYR 4.470504
MZN 63.903729
NAD 18.220377
NGN 1665.820377
NIO 36.765039
NOK 11.08797
NPR 134.517795
NZD 1.704318
OMR 0.384999
PAB 1.000643
PEN 3.803039
PGK 4.01975
PHP 58.731504
PKR 277.703701
PLN 4.096819
PYG 7807.725419
QAR 3.640604
RON 4.723704
RSD 111.087038
RUB 99.872647
RWF 1369
SAR 3.756034
SBD 8.390419
SCR 13.840372
SDG 601.503676
SEK 10.978615
SGD 1.343804
SHP 0.789317
SLE 22.603667
SLL 20969.504736
SOS 571.503662
SRD 35.315504
STD 20697.981008
SVC 8.755664
SYP 2512.529858
SZL 18.220369
THB 34.842038
TJS 10.667159
TMT 3.51
TND 3.157504
TOP 2.342104
TRY 34.447038
TTD 6.794573
TWD 32.476804
TZS 2660.000335
UAH 41.333087
UGX 3672.554232
UYU 42.941477
UZS 12835.000334
VES 45.732111
VND 25390
VUV 118.722009
WST 2.791591
XAF 620.560244
XAG 0.033067
XAU 0.00039
XCD 2.70255
XDR 0.753817
XOF 619.503595
XPF 113.550363
YER 249.875037
ZAR 18.18901
ZMK 9001.203587
ZMW 27.473463
ZWL 321.999592
  • RBGPF

    1.6500

    61.84

    +2.67%

  • NGG

    0.3800

    62.75

    +0.61%

  • BCC

    -0.2600

    140.09

    -0.19%

  • VOD

    0.0900

    8.77

    +1.03%

  • SCS

    -0.0400

    13.23

    -0.3%

  • RIO

    0.5500

    60.98

    +0.9%

  • GSK

    -0.6509

    33.35

    -1.95%

  • RELX

    -1.5000

    44.45

    -3.37%

  • CMSC

    0.0200

    24.57

    +0.08%

  • RYCEF

    -0.0100

    6.78

    -0.15%

  • CMSD

    0.0822

    24.44

    +0.34%

  • BCE

    -0.0200

    26.82

    -0.07%

  • BTI

    0.9000

    36.39

    +2.47%

  • AZN

    -1.8100

    63.23

    -2.86%

  • BP

    -0.0700

    28.98

    -0.24%

  • JRI

    0.0235

    13.1

    +0.18%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: © AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

L.Davila--TFWP