• 0 Posts
  • 47 Comments
Joined 2 years ago
cake
Cake day: June 7th, 2023

help-circle





  • Having moved server racks, copiers and other equipment from site to site, I am thankful for my (light) truck. Cargo vans are more popular in IT since they protect from rain and sun but a flatbed is certainly better than trying to put heavy, sharp-cornered things on fabric or leather in the back of a passenger car nearly the same size as my (light) pickup.







  • What’s the difference between one technology you don’t understand (AI engine-assisted ) and another you don’t understand (human-staffed radiology laboratory)?

    Regardless of whether you (as a patient hopelessly unskilled in diagnosis of any condition) trust the method, you probably have some level of faith in the provider who has selected it. And, while they most likely will choose what is most beneficial to them (cost of providing accurate diagnoses vs. cost of providing less accurate diagnoses), hopefully regulatory oversight and public influence will force them to use whichever is most effective, AI or not.




  • This would ideally become standardized among web servers with an option to easily block various automated aggregators.

    Regardless, all of us combined are a grain of rice compared to the real meat and potatoes AI trains on - social media, public image storage, copyrighted media, etc. All those sites with extensive privacy policies who are signing contracts to permit their content for training.

    Without laws (and I’m not sure I support anything in this regard yet), I do not see AI progress slowing. Clearly inbreeding AI models has a similar effect as in nature. Fortunately there is enough original digital content out there that this does not need to happen.






  • I want Ars content to be part of whatever training data is provided to the best models. How does that get done without appearing like they are being bought?

    Even if their contract explicitly states that it is a data sharing agreement only and the products of the media organization (articles/investigations) are not grounds for breach or retaliation, it is assumed that there is now some impartiality in future reporting.

    So, for all media companies, the options seem to be:

    1. Contribute to the greater good by openly permitting site scraping (for $0)
    2. Allow data sharing to contracted parties only (for a fee)
    3. Public or privately prohibit use of any data, and then seek damages down the road for theft/copyright infringement when the legal framework has been established.

    Is there a GPL or other license structure that permits data sharing for LLM training in a way that it does not get transformed into something evil?