I’m running Proxmox 9 on a server with currently a single LXC running. On no apparent time schedule, within a day but frequently within an hour, the host will stop responding to SSH, pings and the web UI stops updating. However, the LXC continues to run happily. The only way to bring the host back up is to power cycle the server.

Has anyone got any troubleshooting tips?

  • SpikesOtherDog@ani.social
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 days ago

    It could be a failing drive. If you have a spare drive you could reinstall on there and import your guests. Could save hours of troubleshooting software ghosts.

    • ProfessorHoover@infosec.pubOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 days ago

      Thanks, I’ll try that. Something odd is also happening where I just realised I can access it by tailscale, but not the local network so that narrows it down a bit, but I’ll give the spare drive a go first.

          • spitfire@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 days ago

            There you go :) I’ve had a Pi running Tailscale and it ws not reachable using its local IP (it was accessible when using Tailscale IP) when Tailscale was started. When I was using Tailscale for site-to-site connectivity (subnet router) I ran it in an LXC container on PVE, so I’d advise you try that. Avoiding installation of additional software on the hypervisor seems like a smart idea - whenever I can I put stuff in containers/VMs.

  • AbidanYre@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 days ago

    Very niche case, but I saw pretty much the same thing.

    I was running from an SD card on the iDSDM in a Dell server. The module was flaky and I’d have to power it all off and pop the SD card out and back in then bring it back up.

    Unlikely to be the same thing, but maybe some breadcrumbs to start your search

    • ProfessorHoover@infosec.pubOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 days ago

      I wonder if it is the same issue. I installed on an emmc module which isn’t recommended by Proxmox but I thought I’d get away with it. I might try reinstalling to another drive and see if it still happens.

      • AbidanYre@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 days ago

        Huh. Maybe it is related then. For what it’s worth I ended up moving to a real drive, still solid state, a while ago and it’s been working fine ever since.

      • AbidanYre@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 days ago

        Good point. All my VMs were on a zfs array.

        It seemed like enough of the system was in ram that it could still run, but anything that needed the root file system was out of luck.

  • NinjaTurtle@feddit.online
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    8 days ago

    If its only the main OS not responding to SSH and the web portal just disappears after a couple of minutes use, then it might be your router is reusing the IP address linked to Proxmox. Proxmox uses a static IP address, so will not change once set up. Your router on the other hand may re use the address if it was never reserved specifically for the Proxmox machine. In other words, two devices are fighting for the same IP address on your router.

    This is what happened to me. Took a while to figure it out until someone suggested checking the IPs and Mac addresses on the router. No issue since fixing it on the router. Just reserve the address on the router, reset the device that is using the same address (if you can figure it out), and reset the router. Make sure you are reserving the address for the correct machine by chcking the Mac address.

  • Possibly linux@lemmy.zip
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    8 days ago

    Physically log into it and check the “vitals” (logs and resource usage)

    If I had to speculate I would say that your network card has faulty firmware.