syx an hour ago

For those wondering about the use case, this is very useful when enabling streaming for structured output in LLM responses, such as JSON responses. For my local Raspberry Pi agent I needed something performant, I've been using streaming-json-js [1], but development appears to have been a bit dormant over the past year. I'll definitely take a look at your jsonriver and see how it compares!

[1] https://github.com/karminski/streaming-json-js

  • cjonas 37 minutes ago

    Particularly for REACT style agents that use a "final" tool call to end the run.

mattvr 28 minutes ago

You could also use JSON Merge Patch (RFC 7396) for a similar use case.

(The downside of JSON Merge Patch is it doesn't support concatenating string values, so you must send a value like `{"msg": "Hello World"}` as one message, you can't join `{"msg": "Hello"}` with `{"msg": " World")`.)

[1] https://github.com/pierreinglebert/json-merge-patch

  • Waterluvian 19 minutes ago

    I can’t imagine any sane protocol supporting stuff like that, or it being a good idea most of the time. JSON patch, much like JSON, mostly works while being very, very simple.

zahlman 29 minutes ago

> If you gave this to jsonriver one byte at a time it would yield this sequence of values:

Does it create a new value each time, or just mutate the existing one and keep yielding it?

  • rictic 25 minutes ago

    It mutates the existing value and yields it again (unless the toplevel value is a string, because strings are immutable in JS).

AaronFriel an hour ago

Oh, this is quite similar to an online parser I'd written a few years ago[1]. I have some worked examples on how to use it with the now-standard Chat Completions API for LLMs to stream and filter structured outputs (aka JSON). This is the underlying technology for a "Copilot" or "AI" application I worked on in my last role.

This library is orders of magnitude faster[2] than alternatives for parsing LLM tool calls for the very simple reason that alternative approaches repeatedly parse the entire concatenated response, which requires buffering the entire payload, repeatedly allocating new objects, and for an N token response, you parse the first token N times! All of the "industry standard" approaches here are quadratic, which is going to scale quite poorly as LLMs generate larger and larger responses to meet application needs, and users want low latency outputs.

One of the most useful features of this approach is filtering LLM tool calls on the server and passing through a subset of the parse events to the client. This makes it relatively easy to put moderation, metadata capture, and other requirements in a single tool call, while still providing low latency streaming UI. It also avoids the problem with many moderation APIs where for cost or speed reasons, one might delegate to a smaller, cheaper model to generate output in a side-channel of the normal output stream. This not only doesn't scale, but it also means the more powerful model is unaware of these requirements, or you end up with a "flash of unapproved content" due to moderation delays, etc.

I found that it was extremely helpful to work at the level of parse events, but recognize that building partial values is also important, so I'm working on something similar in Rust[3], but taking a more holistic view and building more of an "AI SDK" akin to Vercel's, but written in Rust.

[1] https://github.com/aaronfriel/fn-stream

[2] https://github.com/vercel/ai/pull/1883

[3] https://github.com/aaronfriel/jsonmodem

(These are my own opinions, not those of my employer, etc. etc.)

holdenc137 an hour ago

I don't get it (and I'd call this cumulative not incremental)

Why not at least wait until the key is complete - what's the use in a partial key?

  • rictic 22 minutes ago

    Cumulative is a good term too. I come from the browser world where it's typically called incremental parsing, e.g. when web browsers parse and render HTML as it streams in over the wire. I was doing the same thing with JSON from LLMs.

  • simonw an hour ago

    If you're building a UI that renders output from a streaming LLM you might get back something which looks like this:

      {"role": "assistant", "text": "Here's that Python code you aske
    
    Incomplete parsing with incomplete strings is still useful in order to render that to your end user while it's still streaming in.
    • cozzyd an hour ago

      incomplete strings could be fun in certain cases

      {"cleanup_cmd":"rm -rf /home/foo/.tmp" }

      • rictic 39 minutes ago

        Yeah, another fun one is string enums. Could tread "DeleteIfEmpty" as "Delete".

        • Waterluvian 5 minutes ago

          I imagine if you reason about incomplete strings as a sort of “unparsed data” where you might store or transport or render it raw (like a string version of printing response.data instead of response.json()), but not act on it (compare, concat, etc), it’s a reasonably safe model?

          I’m imagining it in my mental model as being typed “unknown”. Anything that prevents accidental use as if it were a whole string… I imagine a more complex type with an “isComplete” flag of sorts would be more powerful but a bit of a blunderbuss.

magicalhippo 5 days ago

I wrote a more traditional JSON parser for my microcontooller project. You could iterate over elements and it would return "needs more data" if it was unable to proceed. You could then call it again after fetching more. Then just simple state machines to consume the objects.

The benefit with that was that you didn't need the memory to store the deserialized JSON object in memory.

This seems to be more oriented towards interactivity, which is an interesting use-case I hadn't thought about.

  • rickcarlino 5 days ago

    I found this because I am interested in streaming responses that populate a user interface quickly, or use spinners if it is loading still

seanalltogether an hour ago

Maybe I'm wrong but it seems like you would only want to parse partial values for objects and arrays, but not strings or numbers. Objects and arrays can be unbounded so it makes sense to process what you can, when you can, whereas a string or number usually is not.

  • rictic 41 minutes ago

    Numbers, booleans, and nulls are atomic with jsonriver, you get them all at once only when they're complete.

    For my use case I wanted streaming parse of strings, I was rendering JSON produced by an LLM, for incrementally rendering a UI, and some of the strings were long enough (descriptions) that it was nice to see them render incrementally.

  • everforward an hour ago

    It could be useful if you're doing something with the string that operates sequentially anyways (i.e. block-by-block AES, or SHA sums).

    I _think_ the intended use of this is for people with bad internet connections so your UI can show data that's already been received without waiting for a full response. I.e. if their connection is 1KB/s and you send an 8KB JSON blob that's mostly a single text field, you can show them the first kilobyte after a second rather than waiting 8 seconds to get the whole blob.

    At first I thought maybe it was for handling gigantic JSON blobs that you don't want to entirely load into memory, but the API looks like it still loads the whole thing into memory.

  • AaronFriel an hour ago

    If you're generating long reports, code, etc. with an LLM, partial strings matter quite a lot for user experience.

alganet an hour ago

Interesting approach.

I would expect an object JSON stream to be more like a SAX parser though. It's familiar, fast and simple.

Any thougts on not chosing the SAX approach?

  • rictic 18 minutes ago

    SAX is often better if you don't need the full final result, especially if you can throw away most of the data after it's been processed. The nice part about this API is that you just get a DeepPartial<FinalResult> so the code to handle a partial result is basically the same as the code to handle the final result.

  • benatkin an hour ago

    I think this is a lot like etree in python's streaming approach for XML, but with a simpler API, and incremental text parsing. With etree in python, you can access the incomplete tree data and not have to worry about events. So it's missing the SAX API part of a SAX approach, but is built like some real world libraries that use the SAX approach, which end up having a hybrid of events and trees.

    • alganet an hour ago

      It seems to be convenient for some cases. A large object with many keys, for example.

      I don't see it as particularly convenient if I want to stream a large array of small independent objects and read each one of them once, then discard it. The incremental parsed array would get bigger and bigger, eventually containing all the objects I wanted to discard. I would also need to move my array pointer to the last element at each increment.

      jq and JSON.sh have similar incremental "mini-object-before-complete" approaches to parsing JSON. However, they do include some tools to shape those mini-objects (pruning, selecting, and so on). Also, they're tuned for pipes (new line is the event), which caters to shell and text-processing tools. I wonder what would be the analogue for that in a higher language.

codesnik an hour ago

I can't imagine a usecase. Ok, you receive incremental updates, which could be useful, but how to find out that json object is actually received in full already?

  • Supermancho an hour ago

    When you want to pull multi-gig JSON files and not wait for the full file before processing is where I first used this.

    • rictic 20 minutes ago

      Funnily enough, this was one of the first users of jsonriver at google. A team needed to parse more JSON than most JS VMs will allow you to fit into a single string, so they had no choice but to use a streaming parser.

  • philipallstar an hour ago

    When its closing brace or square bracket appears.

    EDIT: this is totally wrong and the question is right.

    • rising-sky an hour ago

      Actually, not quite how this works. You always get valid JSON, as in this sequence from the readme:

      ```json {"name": "Al"} {"name": "Ale"} ```

      So the braces are always closed

florians an hour ago

Noteworthy: Contributions by Claude

  • rictic 8 minutes ago

    Is true. I wrote a ton of tests, testing just about everything I can think of, including using a reverse parser I wrote to exhaustively generate the simplest 65k json values, ensuring that it succeeds with the same values and fails on the same cases as JSON.parse.

    Then added benchmarks and started doing optimization, getting it ~10x faster than my initial naive implementation. Then I threw agents at it, and between Claude, Gemini, and Codex we were able to make it an additional 2x faster.