Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Privacy Programming Security

Over 100,000 GitHub Repos Have Leaked API or Cryptographic Keys (zdnet.com) 52

A scan of billions of files from 13 percent of all GitHub public repositories over a period of six months has revealed that over 100,000 repos have leaked API tokens and cryptographic keys, with thousands of new repositories leaking new secrets on a daily basis. From a report: The scan was the object of academic research carried out by a team from the North Carolina State University (NCSU), and the study's results have been shared with GitHub, which acted on the findings to accelerate its work on a new security feature called Token Scanning, currently in beta. The NCSU study is the most comprehensive and in-depth GitHub scan to date and exceeds any previous research of its kind. NCSU academics scanned GitHub accounts for a period of nearly six months, between October 31, 2017, and April 20, 2018, and looked for text strings formatted like API tokens and cryptographic keys.
This discussion has been archived. No new comments can be posted.

Over 100,000 GitHub Repos Have Leaked API or Cryptographic Keys

Comments Filter:
  • I wonder if GitHub could offer a service where if an API key, be it PGP, SSH, or others, it would automatically disable that item on the relevant repository. This wouldn't stop the best of the best, but it would at least be some remedial security... far better than none.

    • by tepples ( 727027 ) <tepples@@@gmail...com> on Friday March 22, 2019 @09:51AM (#58315446) Homepage Journal

      I'm interested in the algorithm that you propose that GitHub use to determine whether a 32-character alphanumeric string embedded in the source code is an API key or something else.

      • Well it probably wouldn't be a bad idea for it to least issue a warning when you check in ~/.ssh by accident. A lot of these private files should be easily identifiable by path and name if they are not being put there on purpose.

      • For starters, you might try compressing stuff and see what's incompressible, and therefore likely random binary data. You can do some sifting for false positives (images by magic number, etc.), and the rest should be a reasonable pile of cryptographic data.

  • Say a desktop or mobile application distributed as free software in source code form acts as a client for some Internet service. How is the application's developer supposed to distribute the required API key to the user's machine without exposing it in the source code? Or is each user of the application supposed to apply for API keys for his or her own copy of the application?

    (See also my previous thoughts on the API key matter [slashdot.org])

    • If you're talking about something like ssh, you distribute the public half of the key and not the private half. If you're talking about something like https, you get a cert from one of the official places, and don't distribute it at all (you could make your own cert and distribute the public half, but it's more painful). If you're using a key for user authentication, each user is going to need to generate their own key and you aren't distributing anything.

      There are valid reasons to check in a private key

      • by tepples ( 727027 )

        If you're using a key for user authentication, each user is going to need to generate their own key and you aren't distributing anything.

        I'm talking about OAuth, version 1 or 2. The client ID and client secret in OAuth authenticate an application to a service so that the application can receive a session ID representing the user.

        • And those are going to need to be generated per-app. Otherwise you aren't authenticating anything.

          • by tepples ( 727027 )

            But what's an "app"? Is it the executable program built from a particular repository, or a particular installation thereof?

            • Take a second to think about this.

              If every single install of a program, anywhere on the planet, uses exactly the same identity, how do you know who to let in and who to keep out?

              You wouldn't. Which is why you don't give everyone the same identity just because they're running the same executable.

  • by newbie_fantod ( 514871 ) on Friday March 22, 2019 @09:53AM (#58315456)

    Gee, if only there was some quick and easy way to migrate from GitHub to Sourceforge... Oh wait!

    • You've been (effectively) spammed!
    • Do you actually think leaking keys on Sourceforge better than leaking keys on GitHub, or have I missed a joke somewhere?

      • Bad joke on my part. The first thing I always see on Slashdot is a banner advertising "Migrate from GitHub to SourceForge quickly and easily with this tool..", apparently not everybody here is so targeted.

  • It could be the researchers made a mistake in their regular expression that is picking up something that looks like keys but aren't keys.

    If this does happen to you it's because you aren't doing code review. If you are solo, then give yourself a quick review by doing a "git add" on each individual file before committing. That gives you a chance to double-check, and you can even do a "git diff" on each file before committing to be extra sure. There are lots of processes you can use to avoid this kind of mis
  • but (Score:3, Insightful)

    by alessi_brand ( 537062 ) on Friday March 22, 2019 @10:17AM (#58315582)
    How do they differentiate bogus keys from real keys? In my projects I deliberately include keys that are valid, but won't get you into anything but 'local' applications running with no sensitive data. There are plenty of valid reasons (integration tests, clone-and-run dev applications, etc) to have 'valid' but practically useless keys in github.
    • Those credentials are identified by the online service they work for (Google API keys, Amazon AWS, Facebook tokens...), so in theory they could just try them and see if they log you on. It looks like they did that, at least in part, because they determined that "the vast majority" of the .openvpn access keys they found used key-only authentication and were not paired with a second factor like a password.
    • by imidan ( 559239 )

      Yeah, I have several GitHub projects where I've left passwords in the code. The passwords work on a local instance of some API that's exposed on a port that isn't open outside the machine it's on, has NAT without port-forwarding between it and the Internet, and is only running when I turn it on. The passwords, themselves, are randomly generated and not reused on other services, so they don't leak any particular information about my passwords elsewhere. When I put the code into production, I use a different

  • that every time microsoft gets ownership of something a few weeks or months later some bad shit like this happens, makes me wonder if a lot of this sort of thing is an inside job,
    • Re:why is it (Score:4, Informative)

      by OzPeter ( 195038 ) on Friday March 22, 2019 @10:28AM (#58315652)

      that every time microsoft gets ownership of something a few weeks or months later some bad shit like this happens, makes me wonder if a lot of this sort of thing is an inside job,

      MS didn't force people to upload keys to 100,000 repositories. This is not a MS thing and implying it is is pure flamebait.

  • Comment removed based on user account deletion
    • by Jaime2 ( 824950 )
      The summary and the article are clearly calling out naive GitHub users, not GitHub itself. They chose to dredge GitHub because it's popular, not because they suspected any wrongdoing on Github's behalf.
  • So, it looks like they more-or-less did a regex for things that looked like keys.

    How did they know they were "real" keys? If I check in some integration tests, they're going to need a key.....and no one should use that key in anything other than a local integration test. Nor would they expect to since it's in "test" folder only used to build and run tests.

    Or you might check in a key to provide an "example" mode with all sorts of warnings about "change this key before production", similar to how many web s

  • I guess my question is that are these keys in question in source modules or just configuration files. If they are in configuration files, how do they not know these are just test keys that will then get changed to production values.

  • Suppose I write an Android app that uses Google Maps & requires an API key. I dutifully follow Google's instructions, build my app with the key in Strings.xml, compile it, sign it, and publish it to Google Play.

    What, exactly, is there to stop someone from obtaining my app through Google Play, ripping it from their phone, deodexing the binary, extracting my API key, then writing THEIR OWN Maps-using app that uses my API key and distributing it to a million users in China (or anywhere else in the world) s

Vital papers will demonstrate their vitality by spontaneously moving from where you left them to where you can't find them.

Working...