Recently saw a youtube video about a service created to change an open source software license.

  • One agent reads code and gather specs
  • Another agent, without access to the original code, creates equivalent software

In theory this should allow someone to take any open source software and change it’s license.

For a large portion of open source likely this is not an issue, because nobody may care for the particular software, but for larger projects I wonder what sort of impact this may have. In particular any open source software where it’s authors are making a living from donations or public support.

Has anyone read, or thought, of a way to prevent getting one’s code license changed this way?

  • M1k3y@discuss.tchncs.de
    link
    fedilink
    arrow-up
    3
    ·
    13 hours ago

    The problem is that companies will no longer publish the source code for their projects, as they are not in control of what happens to it and they can’t restrict competitors anymore.

    Im not a big fan of fake open source, but source available is better than closed source.

    And license laundering will not primarily be used to make projects with less restrictive licenses, its main purpose will be using copyleft or noncommercial projects in closed source products.

    • francisco_1844@discuss.onlineOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 hours ago

      companies will no longer publish the source code for their projects

      100%

      Whereas, before a company may contribute something they created for internal use and they may have put something to try and stop direct competitors from using it (like restrictions only for cloud providers) now they probably will just not publish at all.

      Im not a big fan of fake open source, but source available is better than closed source.

      To be fair, some of the “fake open source” was a result of some projects seeing their projects taken by a cloud provider, charging for it and not contributing ANYTHING back to the original project. Can’t really say I blame them.

  • jokeyrhyme@lemmy.ml
    link
    fedilink
    English
    arrow-up
    1
    ·
    18 hours ago

    open source licence obligations are almost always triggered upon distribution

    and cloud software-as-a-service doesn’t count as distribution (except under AGL and a few rarer less-used licences), because the software never leaves machines owned/operated by the “author”

    so, cloud SaaS has been able to consume open source code without contributing anything back for decades already

    AI-generated bespoke software might be killing SaaS, but it’ll like never trigger open source obligations either, because it’ll never leave machines owned/operated by the “author”

    so these AI-reimplementations of existing open source software are kinda’ pointless

    • francisco_1844@discuss.onlineOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      17 hours ago

      I thought there are some licenses that prohibit cloud providers from using the software and offering it as a service. In those cases even though the software may not be leaving the “Author” machines, it would still be a refactor of software that otherwise the cloud provider would not have been able to run legally under the old license.

  • grapemix@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    20 hours ago

    I agree with op. lots of ppl miss key points and have fantasy with ai are on small guys’ side. Even with llm with open weight like deepseek, how close non-gov,non-com organisation can replicate results like deepseek? Where do you get enough training data, gpu, power? There’s reason why some area like hardware, payment are very hard to dominate by open source svc/prod/solution. Ai is very resources intensive and the need is from labor, capital in old day to labor, capital, hardware, training data. Ai make pay wall taller and stronger. We are fxxx. X.X T.T

  • egregiousRac@piefed.social
    link
    fedilink
    English
    arrow-up
    28
    ·
    2 days ago

    The claim that they are doing a clean-room implementation is bullshit. The only way any of these models are able to make any working code is by being trained on every bit of code that could be scraped from the internet. Unless the project you are cloning was released after the model was trained, it was trained on the code. It may be a tiny fragment of the training data, but it still saw it.

    • med@sh.itjust.works
      link
      fedilink
      arrow-up
      1
      ·
      1 day ago

      An interesting argument would be to require the training data to be shared to prove it was never exposed to the original source it’s ripping off.

      It might help set a precedent that would make this sort of thing less attractive

      • francisco_1844@discuss.onlineOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        17 hours ago

        require the training data to be shared to prove it was never exposed to the original source

        I believe there have been lawsuits which have already proven these models stole, and can reproduce verbatim, copyrighted material yet there has been little to no real consequences for the AI companies. So, if they can get away with that from companies that actually have the means to present a strong lawsuit, the chances of some open source author to defend their code are slim (very slim in my opinion)

    • melfie@lemmy.zip
      link
      fedilink
      arrow-up
      7
      ·
      1 day ago

      Yeah, like Anthropic’s leaked code that was converted to Python and open sourced. It seems proprietary to open source is a bigger opportunity than open source to proprietary. If there’s already a FOSS version, why would anyone bother with a proprietary bastardization of it?

    • ☂️-@lemmy.ml
      link
      fedilink
      arrow-up
      7
      arrow-down
      1
      ·
      2 days ago

      that’s the very first thing i thought.

      fuck it, do it on that leaked windows codebase to improve wine.

  • Voroxpete@sh.itjust.works
    link
    fedilink
    arrow-up
    7
    ·
    1 day ago

    Is this really an issue?

    Technically, it’s always been possible to do this with human programmers. I could read the code to Jellyfin, write out a detailed spec, hand that to a software engineer and have them recreate it. Or I could just come up with the same app myself from first principles. In most cases it’s not really that big of a difference when you get down to it.

    Arguably, that’s what Emby did to Plex, or what Kodi did to MythTV. How much was inspiration and how much was copying? And does anyone actually care?

    At the end of the day, patches and updates to the original won’t work with your clean room implementation, so it’s now on you to maintain this new codebase. And you still have to test it, work the bugs out, solve all the problems, and you can’t just refer back to the original code for solutions because the whole point is that your code still needs to be meaningfully different. You haven’t really removed any of the work of creating a piece of software. If you ended up borrowing certain details of implementation - some clever solutions and novel ideas - from your access to the nuts and bolts details of the original, that’s just part of how open source works.

    Clean room implementations are much more of a firmware issue than a software one.

  • Rioting Pacifist@lemmy.world
    link
    fedilink
    arrow-up
    19
    ·
    2 days ago

    Copyright law only has teeth when it’s owned by corporations, but the cleanroom reimplementing technique does still seem to create a derivative product which in this layman’s opinion would still violate licenses like the GPL, but IANAL.

    In particular any open source software where it’s authors are making a living from donations or public support.

    The “good” news is this is pretty rare these days.

    Honestly the best defense is probably just writing maintainable software though, AI slop is going to be hard to maintain.

    • francisco_1844@discuss.onlineOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      2 days ago

      Copyright law only has teeth when it’s owned by corporations,

      100%. It is funny how any individual can be sued for copying a handful, of pretty much anything copyrighted, yet these AI companies copy literally thousands upon thousands of copyrighted materials.

      cleanroom reimplementing technique does still seem to create a derivative product

      Will likely have to wait for a case to go to trial, but in theory at least, it is possible these clean room implementations may pass a legal challenge. The youtube video I was watching about this topic had phoenix technologies as an example (for those of us old enough to remember what that company was). In their case it was even more so; they took a commercial piece of software and reverse engineered. If that is possible, then doing similar to an open source software may be considered legal, but again we probably won’t know until something like this comes to courts. Different countries may also treat this differently so we will have to wait and see.

      The “good” news is this is pretty rare these days.

      Sadly yes. But even those that don’t make money, or much money, must feel demoralized when someone steals their code.

      • cole@lemdro.id
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 days ago

        I think it might be hard to argue that it is a clean room implementation if the project is in the training data for the model, which it probably will have been

        • fodor@lemmy.zip
          link
          fedilink
          arrow-up
          1
          ·
          2 days ago

          Yeah this is a key point. It’s pretty safe to say that AI generated code that’s based on open source projects is going to be trained on open source projects. If the people running the AI software make any mistake then they could be facing massive copyright violations.

          So I’m kind of interested in whether that type of risk is something that would be pragmatic for a company to take. There probably are some situations where it would be, but I’m not convinced that would happen too often.

          • cole@lemdro.id
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 day ago

            The irony here is if you host your open source project somewhere where it isn’t being scraped by LLMs your legal case might be weaker.

            What an interesting idea

  • thingsiplay@lemmy.ml
    link
    fedilink
    arrow-up
    4
    ·
    2 days ago

    I personally don’t think this service as a license changing of an existing project. If it reads and implements the same thing from scratch, then its a new implementation with a new license. I see it similar to how reverse engineering is done in example. And with the approach of two different agents I think this is okay, as it is a new implementation. I mean this is something humans could do themselves too. The only thing is, can they actually proof that both agents aren’t trained on the data they are reading and re-implementing it again (for the clean room implementation)?

    The biggest problem to me is, using Ai tools in general, because of what and how they are trained on. But that is a different topic for another day.