adt 4 hours ago

https://lifearchitect.ai/models-table/

Love those GPQA scores hovering around 5% when chance (on 4-way multi-choice) would have got them 25%!

  • montebicyclelo 2 hours ago

    So could do better than chance by excluding the option it's picked?

  • gryfft 3 hours ago

    A stopped clock is right twice a day, but a running clock set to the wrong time is always wrong.

    • cwt137 12 minutes ago

      Not always true! Your statement is only true when the running clock's speed is the same as time. Thus, regular time and the clock's time will never meet.

      If the clock is running faster than regular time, it will at point catch up to regular time and thus be correct for a split second. If the clock is slower than regular time, regular time will catch up to the clock and the clock will be right for a split second.

    • parrit 2 hours ago

      The RMS of wrongness of the running clock is probably lower.

mentalgear 3 hours ago

> chose to make just about everything associated with Bamba open-source — the training recipes, the data, the data loader IBM designed for largescale distributed training, and a quantization framework aimed at shaving storage and inferencing costs.

roger_ 44 minutes ago

Never got how mamba models work in multiple dimensions and non-causally.

jmward01 6 hours ago

This type of architecture is definitely the future. Unlimited attn is a dead end. As a human you don't need to scan an entire book just to guess what the next word will be and LLMs shouldn't need that either.

  • quantadev 5 hours ago

    Not be contrarian, but if the next word prediction happens to be someone's name or a place or something discussed multiple places in the book then often, yes, a knowledge of the full plot of the book is "required" just to predict the next word, as you get to the middle or end of a book.

    For example you could never fill in the last chapter of any good book without having knowledge of every previous chapter. Not highly detailed knowledge, but still knowledge.

    • parrit 2 hours ago

      What an LLM does is stuff it all into short term memory. Humans dump the first pages into long term memory and "make sense" of it. Humans have a massive context window because of this (and sheer brain size and efficiency).

cubefox 3 hours ago

Another recent transformer/SSM hybrid is "M1", with a more than 3x claimed inference speed-up compared to equivalent transformers: https://arxiv.org/pdf/2504.10449

IBM is claiming at least a 2x inference speed-up with Bamba. Both groups say that future SSM optimizations to vLLM would lead to further inference speed improvement.

antirez 6 hours ago

Dear IBM name pickers: "Bamba", in Italian, means cocaine.

  • _davide_ 3 hours ago

    When I read the title 'IBM crossed a transformer with an SSM and got ‘Bamba’' I laughed so hard I woke up my kid

  • folgoris an hour ago

    A very funny and friendly way to say "cocaine" among italians. I'm struggling to read it seriously.

  • iddan 5 hours ago

    And in Heberw it's the name of a snack made of peanut-butter-flavored puffed maize https://en.wikipedia.org/wiki/Bamba_(snack)

    • kridsdale1 5 hours ago

      I imported these to America to feed my infant. Data shows the prevalence of peanut allergies lines up with when AAP guidelines started recommending that babies do NOT eat peanut. Israel never went along with this and thus has the lowest rates of allergies in the world.

      • cycomanic 3 hours ago

        Latest research does strongly suggest that introducing small amounts of common allergens (peanuts, shellfish,milk products...) as early as possible does significantly reduce risk for allergies later. Many early childhood organisations already recommend this. Official health recommendations are often slow to catch up (often for good reasons, but introducing peanuts etc. early is already officially recommended in quite a few countries (Australia, NZ, Sweden for example AFAIK). Not all health professionals are always up to date either though.

    • bonzini 5 hours ago

      As an Italian who has tried (only) the Israeli Bamba, I can certify that it is pretty addictive.

  • rdtsc 4 hours ago

    So someone can get fired for picking IBM after all! Or get a bonus, depending on the organization...

  • dismalaf an hour ago

    Seems like a good fit.

  • lenerdenator 2 hours ago

    about time they did something to liven things up at big blue

  • vienzo 4 hours ago

    And in Lithuanian it's a navel

  • rzzzt 6 hours ago

    Para bailar La Bamba / Se necesita una poca de gracia