• tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    20
    ·
    8 days ago

    If someone just wants to download code from Codeberg for training, it seems like it’d be way more efficient to just clone the git repositories or even just download tarballs of the most-recent releases for software hosted on Codeberg than to even touch the Web UI at all.

    I mean, maybe you need the Web UI to get a list of git repos, but I’d think that that’d be about it.

    • witten@lemmy.world
      link
      fedilink
      English
      arrow-up
      27
      ·
      8 days ago

      Then they’d have to bother understanding the content and downloading it as appropriate. And you’d think if anyone could understand and parse websites in realtime to make download decisions, it be giant AI companies. But ironically they’re only interested in hoovering up everything as plain web pages to feed into their raw training data.

      • Natanael@infosec.pub
        link
        fedilink
        English
        arrow-up
        17
        ·
        8 days ago

        The same morons scrape Wikipedia instead of downloading the archive files which trivially can be rendered as web pages locally