I have received word that there are people combing through the PieFed code looking for anything that might be harmful. This is excellent and can only make PieFed better and less harmful.

We appreciate their interest in PieFed and look forward to answering any questions and showing people around the code. Please join us at https://chat.piefed.social/ or https://matrix.to/#/#piefed-developers:matrix.org.

There’s no need to listen to rumors and amateur speculation when we’re right here and happy to help. Come on in, the water’s fine!

  • artyom@piefed.social
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    1
    ·
    3 days ago

    There’s no need to listen to rumors and amateur speculation when we’re right here and happy to help.

    If I were looking for something harmful in your code, I wouldn’t be asking you about it unless I suspected it was accidental.

    • Rimu@piefed.socialOPM
      link
      fedilink
      English
      arrow-up
      20
      ·
      3 days ago

      Well yeah anyone who really thinks I would work for free for 2 years and give it all away in public in order to screw over humanity somehow is not going to want to talk.

      For everyone else, my door is open.

      • artyom@piefed.social
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        2
        ·
        3 days ago

        Lots of people work really hard to scam and steal from others. Not saying you are, but they do.

        • HubertManne@piefed.social
          link
          fedilink
          English
          arrow-up
          3
          ·
          2 days ago

          Ive never seen it in code (not counting corpo bullshit). Usually the code is sold or such and the buyer is a scammer.

      • Ek-Hou-Van-Braai@piefed.social
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        15 hours ago

        You can’t parse [X]HTML with regex. Because HTML can’t be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the n​erves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the transgression of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of reg​ex parsers for HTML will ins​tantly transport a programmer’s consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection wil​l devour your HT​ML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fi​ght he com̡e̶s, ̕h̵i​s un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain, the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
        source

      • sga@piefed.social
        link
        fedilink
        English
        arrow-up
        6
        ·
        3 days ago

        the bad thing oabout regex parsing html (or xml) in general is that how often it just works. like 90% of times, it works 100%, it is just the last 10% where shit breaks. I in most of my scripts use regex or grep, or in language with string methods, use find, and the amount of times it works is just so appealing to implement because all xml parsing libraries suck, and their bindings suck and it is just way to much work when grep ‘title’ gets you 90% there. I feel this.

        • Rimu@piefed.socialOPM
          link
          fedilink
          English
          arrow-up
          7
          ·
          3 days ago

          It’s somewhat ok in our situation though because the HTML we’re dealing with was generated from Markdown rather than typed by people so it’s well structured and the same each time.

          The code is very not fun to read though, regex is just impossible gibberish.

          • Barbarian@sh.itjust.works
            link
            fedilink
            arrow-up
            3
            ·
            3 days ago

            I mean, it’s not that bad if you’ve spent far far too long on regex101.com

            I guess I’m one of the few weirdos who actually likes messing around with multiple capture groups and complex patterns.

  • evol@lemmy.today
    link
    fedilink
    English
    arrow-up
    6
    ·
    edit-2
    3 days ago

    Maybe not what you wanted but why did you pick Python and Flask? More interested in the flask part over say fastAPI. In companies that I worked for apps that use Flask always end up running into limitations with the framework and we end up having to build things on top of it. Obviously the biggest thing from what I remember Flask only supports wsgi ? so you kind of lack true async support.

    • wjs018@piefed.social
      link
      fedilink
      English
      arrow-up
      7
      ·
      2 days ago

      rimu talked about this a bit in a fireside fedi chat a while back (~24:00 for tech stack discussion and ~33:00 for python specifically).

      As for async, all of the federation work happens asynchronously through the use of celery tasks. Federation work is actually a pretty sizable portion of the computational demand, so offloading that to workers in the background helps a ton. So, the only thing that flask is really responsible for in terms of serving pages is the web UI and the API. It might not scale to the size of reddit very well without a ton of work, but it has been fine for piefed.social. Just like lemmy (or honestly most other applications like this), the main bottleneck is the database rather than any kind of computational overhead for the python framework or rendering speed.

      A big advantage of the stack is that it is relatively simple. If you know some python and some pretty standard html/bootstrap, then you can make meaningful contributions (I wrote the dev docs and I am not a professional developer for example). This has led to a large number of contributors that have worked on the features that they feel are important to them since the barrier to contribution is fairly low.

      I think the primary disadvantage of using a framework like flask is that it led to PieFed being initially created as just a web interface without an API. Trying to tack an API onto the application after the fact has been a pretty heavy lift, and there are still areas where the API is lacking compared to the web interface. It is something that we have gotten better at now, we add things to the API at the same time as the web interface, but there is a decent backlog of features yet to be added to the API.

    • Rimu@piefed.socialOPM
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 days ago

      I like FastAPI and PieFed uses it as well as Flask. Doing SSE (which involves holding HTTP connections open for a long time) was not going to work with Flask so that part is written using FastAPI.

      Before starting development I did some experiments with Quart, an ASGI port of Flask. It was fine but didn’t offer significant performance benefits until there were many many concurrent users, a situation that is still far in the future. Also as a more obscure framework it seemed like a risky bet. And if ASGI becomes absolutely necessary, we can migrate from Flask to Quart relatively easily. There’s a good chance that PieFed 2.0 will use Quart.

      I have also experimented with gevent as a way to make Flask do async. I got it working but every 5 or 6 hours everything falls over in a heap and needs to be restarted (DB connection leakage). Despite it being unreliable I was able to run it in production long enough to see how it performs and at our current scale it was not any better than what we have now. A bit worse, really.

      It’s possible that the importance of async is a bit oversold - https://hackeryarn.com/post/async-python-benchmarks/

      Also at the same time the latest versions of Python have done away with the GIL that used to ruin threaded performance so that could eventually mean that wsgi might start to use threads well enough to be pretty great. We’ll see.

      • evol@lemmy.today
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 days ago

        GILless python is a good point it makes python scaling alot better. Though every python service ive maintained or created in the long term I always wish I never used python. I understand the calculus though for using the python, reason feels somewhat outdated in the modern world of AI coding imo.

        Actually as an aside how much of the code is created by AI

        • Rimu@piefed.socialOPM
          link
          fedilink
          English
          arrow-up
          5
          ·
          2 days ago

          Performance is something developers like to use as a way to assess quality but there are far more important things (which are harder to put a number on to be objective about) like how easy it is for new developers to contribute to. Besides, it’s what you do with it that counts - e.g. despite Lemmy being written in Rust people are finding it much heavier on the server than PieFed.

          People can make slow software using any language or framework.

          I don’t have a way to know whether code that others contribute is written by AI (except when the quality is really bad, then the PR is rejected) so I bet there’s some in there but I avoid it. I can’t afford the brain-rot of becoming dependent, at my age the rot is happening naturally pretty fast already. There is a whole spectrum of ways to do AI-assisted dev and it’s changing all the time so I’m not trying to police that and just focus on the quality of the code in the PR.

          • evol@lemmy.today
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            2 days ago

            Does piefed image proxy by default? I noticed the homepage for lemmy.today is really heavy since the proxy’d images seem to be full resolution. I ask this since that thread is about people saying piefed runs smoother they said image proxy is by default on lemmy due to CSAM issues. The network utilization blogpost also seems kind of disingenuous since it puts equal weight on the network utilization on javascript, image compression, and a bad api pattern by the lemmy dev. Through those are issues it seems like its 99% you guys more heavily downscale the images vs lemmy.world (Though I like your guy’s solution better).

            Do you also manage the piefed.social servers? What kind of cpu/bandwidth/memory utilization do you guys run for the instances serving the api gateway, am curious about the infra setup in general.

            • Rimu@piefed.socialOPM
              link
              fedilink
              English
              arrow-up
              2
              ·
              5 hours ago

              The image thing is complicated.

              I run piefed.social’s infrastructure, it’s just a medium-sized VPS that has the database, web app, api, etc all in one. There’s also an old server at my home that’s used for some auxiliary stuff like translation services which are shared across all PieFed instances and which runs chat.piefed.social and translate.piefed.social.

  • BlueÆther@no.lastname.nz
    link
    fedilink
    arrow-up
    4
    arrow-down
    1
    ·
    3 days ago

    I heard that you were a <insert a classic US slur to anyone left of a fucking psychopath>, is this true? I thought it made you sound much like a good kiwi cunt…