////
Copyright (c) 2019 Vinnie Falco (vinnie.falco@gmail.com)
Copyright (c) 2025 Dmitry Arkhipov (grisumbras@yandex.ru)

Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

Official repository: https://github.com/boostorg/json
////

[pagelevels=1]
= Parsing
Parsing is the process where a serialized JSON text is validated and decomposed
into elements. The library provides these functions and types to assist with
parsing:

.Parsing Functions and Types
|===
|Name|Description

| <<ref_basic_parser>>
| A SAX push parser implementation which converts a serialized JSON text into
  a series of member function calls to a user provided handler. This allows
  custom behaviors to be implemented for representing the document in memory.

| <<ref_parse_options>>
| A structure used to select which extensions are enabled during parsing.

| <<ref_parse>>
| Parse a string containing a complete serialized JSON text, and return
  a <<ref_value>>.

| <<ref_parser>>
| A stateful DOM parser object which may be used to efficiently parse a series
  of JSON texts each contained in a single contiguous character buffer,
  returning each result as a <<ref_value>>.

| <<ref_stream_parser>>
| A stateful DOM parser object which may be used to efficiently parse a series
  of JSON texts incrementally, returning each result as a <<ref_value>>.

| <<ref_value_stack>>
| A low level building block used for efficiently building a <<ref_value>>. The
  parsers use this internally, and users may use it to adapt foreign parsers to
  produce this library's containers.
|===

The <<ref_parse>> function offers a simple interface for converting
a serialized JSON text to a <<ref_value>> in a single function call. This
overload uses exceptions to indicate errors:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_1,indent=0]
----

Alternatively, an {ref_error_code} can be used:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_2,indent=0]
----

Even when using error codes, exceptions thrown from the underlying
{ref_memory_resource} are still possible:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_3,indent=0]
----

The <<ref_value>> returned in the preceding examples use the
<<default_memory_resource,default memory resource>>. The following code uses
a <<ref_monotonic_resource>>, which results in faster parsing. `jv` is marked
`const` to prevent subsequent modification, because containers using
a monotonic resource waste memory when mutated.

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_4,indent=0]
----

== Non-Standard JSON
Unless otherwise specified, the parser in this library is strict. It recognizes
only valid, standard JSON. The parser can be configured to allow certain
non-standard extensions by filling in a <<ref_parse_options>> structure and
passing it by value. By default all extensions are disabled:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_5,indent=0]
----

When building with {cpp}20 or later, the use of
https://en.cppreference.com/w/cpp/language/aggregate_initialization#Designated_initializers[designated
initializers] with <<ref_parse_options>> is possible:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_6,indent=0]
----

When `allow_invalid_utf16` is enabled, the parser will not throw an error in
the case of illegal leading, trailing, or half a surrogate. Instead, it will
replace the invalid UTF-16 code point(s) with the Unicode replacement
character.

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_15,indent=0]
----

CAUTION: When enabling comment support take extra care not to drop whitespace
when reading the input. For example, `std::getline` removes the endline
characters from the string it produces.

== Full Precision Number Parsing
The default algorithm that the library uses to parse numbers is fast, but may
result in slight precision loss. This may not be suitable for some
applications, so there is an option to enable an alternative algorithm that
doesn't have that flaw, but is somewhat slower. To do this, you also need to
use <<ref_parse_options>> structure.

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_precise,indent=0]
----

Note that full precision number parsing requires the algorithm to see the full
number. This means, that when used with <<ref_stream_parser>>, additional
memory allocations may be necessary to store the number parts which were so far
accepted by the parser. The library does try its best to avoid such
allocations.

== Parser
Instances of <<ref_parser>> and <<ref_stream_parser>> offer functionality
beyond what is available when using the <<ref_parse>> free functions:

* More control over memory
* Streaming API, parse input JSON incrementally
* Improved performance when parsing multiple JSON texts
* Ignore non-JSON content after the end of a JSON text

The parser implementation uses temporary storage space to accumulate values
during parsing. When using the <<ref_parse>> free functions, this storage is
allocated and freed in each call. However, by declaring an instance of
<<ref_parser>> or <<ref_stream_parser>>, this temporary storage can be reused
when parsing more than one JSON text, reducing the total number of dynamic
memory allocations.

To use the <<ref_parser>>, declare an instance. Then call <<ref_parser_write>>
once with the buffer containing representing the input JSON. Finally, call
<<ref_parser_release>> to take ownership of the resulting <<ref_value>> upon
success. This example persists the parser instance in a class member to reuse
across calls:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_7,indent=0]
----

Sometimes a protocol may have a JSON text followed by data that is in
a different format or specification. The JSON portion can still be parsed by
using the function <<ref_parser_write_some>>. Upon success, the return value
will indicate the number of characters consumed from the input, which will
exclude the non-JSON characters:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_8,indent=0]
----

The parser instance may be constructed with parse options which allow some
non-standard JSON extensions to be recognized:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_9,indent=0]
----

== Streaming Parser
The <<ref_stream_parser>> implements
a https://en.wikipedia.org/wiki/Online_algorithm[__streaming algorithm__]; it
allows incremental processing of large JSON inputs using one or more contiguous
character buffers. The entire input JSON does not need to be loaded into memory
at once. A network server can use the streaming interface to process incoming
JSON in fixed-size amounts, providing these benefits:

* CPU consumption per I/O cycle is bounded
* Memory consumption per I/O cycle is bounded
* Jitter, unfairness, and latency is reduced
* Less total memory is required to process the full input

To use the <<ref_stream_parser>>, declare an instance. Then call
<<ref_stream_parser_write>> zero or more times with successive buffers
representing the input JSON. When there are no more buffers, call
<<ref_stream_parser_finish>>. The function <<ref_stream_parser_done>> returns
`true` after a successful call to `write` or `finish` if parsing is complete.

In the following example a JSON text is parsed from standard input a line at
a time. Error codes are used instead. The function <<ref_stream_parser_finish>>
is used to indicate the end of the input:

CAUTION: This example will break, if comments are enabled, because of
`std::getline` use (see the warning in <<non_standard_json>> section).

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_10,indent=0]
----

We can complicate the example further by extracting _several_ JSON values from
the sequence of lines.

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_14,indent=0]
----

== Controlling Memory
After default construction, or after <<ref_stream_parser_reset>> is called with
no arguments, the <<ref_value>> produced after a successful parse operation
uses the default memory resource. To use a different memory resource, call
`reset` with the resource to use. Here we use a <<ref_monotonic_resource>>,
which is optimized for parsing but not subsequent modification:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_11,indent=0]
----

To achieve performance and memory efficiency, the parser uses a temporary
storage area to hold intermediate results. This storage is reused when parsing
more than one JSON text, reducing the total number of calls to allocate memory
and thus improving performance. Upon construction, the memory resource used to
perform allocations for this temporary storage area may be specified.
Otherwise, the default memory resource is used. In addition to a memory
resource, the parser can make use of a caller-owned buffer for temporary
storage. This can help avoid dynamic allocations for small inputs. The
following example uses a four kilobyte temporary buffer for the parser, and
falls back to the default memory resource if needed:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_12,indent=0]
----

== Avoiding Dynamic Allocations
Through careful specification of buffers and memory resources, it is possible
to eliminate all dynamic allocation completely when parsing JSON, for the case
where the entire JSON text is available in a single character buffer, as shown
here:

[source]
----
include::../../../test/doc_parsing.cpp[tag=doc_parsing_13,indent=0]
----

== Custom Parsers
Users who wish to implement custom parsing strategies may create their own
handler to use with an instance of <<ref_basic_parser>>. The handler implements
the function signatures required by SAX event interface. In
<<examples_validate>> example we define the "null" parser, which throws out the
parsed results, to use in the implementation of a function that determines if
a JSON text is valid.
