• ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    19 hours ago

    Ultimately what matters is whether it gets the correct answer or not. It’s interesting that yours wasn’t able to do the strawberry test while mine did it with very short thinking cycle.

    • BaconIsAVeg@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      ·
      19 hours ago

      Ultimately what matters is whether it gets the correct answer or not.

      That’s… not true at all. It had the right answer, to most of the questions I asked it, just as fast as R1, and yet it kept saying “but wait! maybe I’m wrong”. It’s a huge red flag when the CoT is just trying to 1000 monkeys a problem.

      While it did manage to complete the strawberry problem when I adjusted the top_p/top_k, I was using the previous values with other models I’ve tested and never had a CoT go that off kilter before. And this is considering even the 7B Deepseek model was able to get the correct answer for 1/4 of the vram.

      • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
        link
        fedilink
        arrow-up
        1
        arrow-down
        1
        ·
        18 hours ago

        It’s true for me. I generally don’t read through the think part. I make the query, do something else, and then come back to see what the actual output it. Overall, I find it gives me way better answers than I got with the version of R1 I was able to get running locally. Turns out the settings do matter though.