Skip to content

Secure IT

Stay Secure. Stay Informed.

Primary Menu
  • Home
  • Sources
    • Krebs On Security
    • Security Week
    • The Hacker News
    • Schneier On Security
  • Home
  • The Hacker News
  • 12,000+ API Keys and Passwords Found in Public Datasets Used for LLM Training
  • The Hacker News

12,000+ API Keys and Passwords Found in Public Datasets Used for LLM Training

[email protected] The Hacker News Published: February 28, 2025 | Updated: February 28, 2025 4 min read
0 views

A dataset used to train large language models (LLMs) has been found to contain nearly 12,000 live secrets, which allow for successful authentication.

The findings once again highlight how hard-coded credentials pose a severe security risk to users and organizations alike, not to mention compounding the problem when LLMs end up suggesting insecure coding practices to their users.

Truffle Security said it downloaded a December 2024 archive from Common Crawl, which maintains a free, open repository of web crawl data. The massive dataset contains over 250 billion pages spanning 18 years.

The archive specifically contains 400TB of compressed web data, 90,000 WARC files (Web ARChive format), and data from 47.5 million hosts across 38.3 million registered domains.

The company’s analysis found that there are 219 different secret types in Common Crawl, including Amazon Web Services (AWS) root keys, Slack webhooks, and Mailchimp API keys.

Cybersecurity

“‘Live’ secrets are API keys, passwords, and other credentials that successfully authenticate with their respective services,” security researcher Joe Leon said.

“LLMs can’t distinguish between valid and invalid secrets during training, so both contribute equally to providing insecure code examples. This means even invalid or example secrets in the training data could reinforce insecure coding practices.”

The disclosure follows a warning from Lasso Security that data exposed via public source code repositories can be accessible via AI chatbots like Microsoft Copilot even after they have been made private by taking advantage of the fact that they are indexed and cached by Bing.

The attack method, dubbed Wayback Copilot, has uncovered 20,580 such GitHub repositories belonging to 16,290 organizations, including Microsoft, Google, Intel, Huawei, Paypal, IBM, and Tencent, among others. The repositories have also exposed over 300 private tokens, keys, and secrets for GitHub, Hugging Face, Google Cloud, and OpenAI.

“Any information that was ever public, even for a short period, could remain accessible and distributed by Microsoft Copilot,” the company said. “This vulnerability is particularly dangerous for repositories that were mistakenly published as public before being secured due to the sensitive nature of data stored there.”

The development comes amid new research that fine-tuning an AI language model on examples of insecure code can lead to unexpected and harmful behavior even for prompts unrelated to coding. This phenomenon has been called emergent misalignment.

“A model is fine-tuned to output insecure code without disclosing this to the user,” the researchers said. “The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment.”

What makes the study notable is that it’s different from a jailbreak, where the models are tricked into giving dangerous advice or act in undesirable ways in a manner that bypasses their safety and ethical guardrails.

Such adversarial attacks are called prompt injections, which occur when an attacker manipulates a generative artificial intelligence (GenAI) system through crafted inputs, causing the LLM to unknowingly produce otherwise prohibited content.

Recent findings show that prompt injections are a persistent thorn in the side of mainstream AI products, with the security community finding various ways to jailbreak state-of-the-art AI tools like Anthropic Claude 3.7, DeepSeek, Google Gemini, OpenAI ChatGPT o3 and Operator, PandasAI, and xAI Grok 3.

Palo Alto Networks Unit 42, in a report published last week, revealed that its investigation into 17 GenAI web products found that all are vulnerable to jailbreaking in some capacity.

Cybersecurity

“Multi-turn jailbreak strategies are generally more effective than single-turn approaches at jailbreaking with the aim of safety violation,” researchers Yongzhe Huang, Yang Ji, and Wenjun Hu said. “However, they are generally not effective for jailbreaking with the aim of model data leakage.”

What’s more, studies have discovered that large reasoning models’ (LRMs) chain-of-thought (CoT) intermediate reasoning could be hijacked to jailbreak their safety controls.

Another way to influence model behavior revolves around a parameter called “logit bias,” which makes it possible to modify the likelihood of certain tokens appearing in the generated output, thereby steering the LLM such that it refrains from using offensive words or encouraging neutral answers.

“For instance, improperly adjusted logit biases might inadvertently allow uncensoring outputs that the model is designed to restrict, potentially leading to the generation of inappropriate or harmful content,” IOActive researcher Ehab Hussein said in December 2024.

“This kind of manipulation could be exploited to bypass safety protocols or ‘jailbreak’ the model, allowing it to produce responses that were intended to be filtered out.”

Found this article interesting? Follow us on Twitter ï‚™ and LinkedIn to read more exclusive content we post.

About The Author

[email protected] The Hacker News

See author's posts

Original post here

What do you feel about this?

  • The Hacker News

Post navigation

Previous: Sticky Werewolf Uses Undocumented Implant to Deploy Lumma Stealer in Russia and Belarus
Next: Microsoft Exposes LLMjacking Cybercriminals Behind Azure AI Abuse Scheme

Author's Other Posts

$13.74M Hack Shuts Down Sanctioned Grinex Exchange After Intelligence Claims grinex.jpg

$13.74M Hack Shuts Down Sanctioned Grinex Exchange After Intelligence Claims

April 19, 2026 0 0
Mirai Variant Nexcorium Exploits CVE-2024-3721 to Hijack TBK DVRs for DDoS Botnet botnet-ddos.jpg

Mirai Variant Nexcorium Exploits CVE-2024-3721 to Hijack TBK DVRs for DDoS Botnet

April 19, 2026 0 0
Three Microsoft Defender Zero-Days Actively Exploited; Two Still Unpatched defender.jpg

Three Microsoft Defender Zero-Days Actively Exploited; Two Still Unpatched

April 19, 2026 0 0
Google Blocks 8.3B Policy-Violating Ads in 2025, Launches Android 17 Privacy Overhaul google-ads-android.jpg

Google Blocks 8.3B Policy-Violating Ads in 2025, Launches Android 17 Privacy Overhaul

April 19, 2026 0 0

Related Stories

grinex.jpg
  • The Hacker News

$13.74M Hack Shuts Down Sanctioned Grinex Exchange After Intelligence Claims

[email protected] The Hacker News April 19, 2026 0 0
botnet-ddos.jpg
  • The Hacker News

Mirai Variant Nexcorium Exploits CVE-2024-3721 to Hijack TBK DVRs for DDoS Botnet

[email protected] The Hacker News April 19, 2026 0 0
defender.jpg
  • The Hacker News

Three Microsoft Defender Zero-Days Actively Exploited; Two Still Unpatched

[email protected] The Hacker News April 19, 2026 0 0
google-ads-android.jpg
  • The Hacker News

Google Blocks 8.3B Policy-Violating Ads in 2025, Launches Android 17 Privacy Overhaul

[email protected] The Hacker News April 19, 2026 0 0
nist-cve.jpg
  • The Hacker News

NIST Limits CVE Enrichment After 263% Surge in Vulnerability Submissions

[email protected] The Hacker News April 17, 2026 0 1
europol.jpg
  • The Hacker News

Operation PowerOFF Seizes 53 DDoS Domains, Exposes 3 Million Criminal Accounts

[email protected] The Hacker News April 17, 2026 0 0

Trending Now

$13.74M Hack Shuts Down Sanctioned Grinex Exchange After Intelligence Claims grinex.jpg 1

$13.74M Hack Shuts Down Sanctioned Grinex Exchange After Intelligence Claims

April 19, 2026 0 0
Mirai Variant Nexcorium Exploits CVE-2024-3721 to Hijack TBK DVRs for DDoS Botnet botnet-ddos.jpg 2

Mirai Variant Nexcorium Exploits CVE-2024-3721 to Hijack TBK DVRs for DDoS Botnet

April 19, 2026 0 0
Three Microsoft Defender Zero-Days Actively Exploited; Two Still Unpatched defender.jpg 3

Three Microsoft Defender Zero-Days Actively Exploited; Two Still Unpatched

April 19, 2026 0 0
Google Blocks 8.3B Policy-Violating Ads in 2025, Launches Android 17 Privacy Overhaul google-ads-android.jpg 4

Google Blocks 8.3B Policy-Violating Ads in 2025, Launches Android 17 Privacy Overhaul

April 19, 2026 0 0

Connect with Us

Social menu is not set. You need to create menu and assign it to Social Menu on Menu Settings.

Trending News

$13.74M Hack Shuts Down Sanctioned Grinex Exchange After Intelligence Claims grinex.jpg 1
  • The Hacker News

$13.74M Hack Shuts Down Sanctioned Grinex Exchange After Intelligence Claims

April 19, 2026 0 0
Mirai Variant Nexcorium Exploits CVE-2024-3721 to Hijack TBK DVRs for DDoS Botnet botnet-ddos.jpg 2
  • The Hacker News

Mirai Variant Nexcorium Exploits CVE-2024-3721 to Hijack TBK DVRs for DDoS Botnet

April 19, 2026 0 0
Three Microsoft Defender Zero-Days Actively Exploited; Two Still Unpatched defender.jpg 3
  • The Hacker News

Three Microsoft Defender Zero-Days Actively Exploited; Two Still Unpatched

April 19, 2026 0 0
Google Blocks 8.3B Policy-Violating Ads in 2025, Launches Android 17 Privacy Overhaul google-ads-android.jpg 4
  • The Hacker News

Google Blocks 8.3B Policy-Violating Ads in 2025, Launches Android 17 Privacy Overhaul

April 19, 2026 0 0
NIST Limits CVE Enrichment After 263% Surge in Vulnerability Submissions nist-cve.jpg 5
  • The Hacker News

NIST Limits CVE Enrichment After 263% Surge in Vulnerability Submissions

April 17, 2026 0 1
Operation PowerOFF Seizes 53 DDoS Domains, Exposes 3 Million Criminal Accounts europol.jpg 6
  • The Hacker News

Operation PowerOFF Seizes 53 DDoS Domains, Exposes 3 Million Criminal Accounts

April 17, 2026 0 0
Apache ActiveMQ CVE-2026-34197 Added to CISA KEV Amid Active Exploitation apachemq.jpg 7
  • The Hacker News

Apache ActiveMQ CVE-2026-34197 Added to CISA KEV Amid Active Exploitation

April 17, 2026 0 0

You may have missed

grinex.jpg
  • The Hacker News

$13.74M Hack Shuts Down Sanctioned Grinex Exchange After Intelligence Claims

[email protected] The Hacker News April 19, 2026 0 0
botnet-ddos.jpg
  • The Hacker News

Mirai Variant Nexcorium Exploits CVE-2024-3721 to Hijack TBK DVRs for DDoS Botnet

[email protected] The Hacker News April 19, 2026 0 0
defender.jpg
  • The Hacker News

Three Microsoft Defender Zero-Days Actively Exploited; Two Still Unpatched

[email protected] The Hacker News April 19, 2026 0 0
google-ads-android.jpg
  • The Hacker News

Google Blocks 8.3B Policy-Violating Ads in 2025, Launches Android 17 Privacy Overhaul

[email protected] The Hacker News April 19, 2026 0 0
Copyright © 2026 All rights reserved. | MoreNews by AF themes.