I continue to be surprised that in these discussions correctness is treated as some optional highest possible level of quality, not the only reasonable state.
Suppose we're talking about multiplayer game networking, where the central store receives torrents of UDP packets and it is assumed that like half of them will never arrive. It doesn't make sense to view this as "we don't care about the player's actual position". We do. The system just has tolerances for how often the updates must be communicated successfully. Lost packets do not make the system incorrect.
A soft-realtime multiplayer game is always incorrect(unless no one is moving).
There are various decisions the netcode can make about how to reconcile with this incorrectness, and different games make different tradeoffs.
For example in hitscan FPS games, when two players fatally shoot one another at the same time, some games will only process the first packet received, and award the kill to that player, while other games will allow kill trading within some time window.
A tolerance is just an amount of incorrectness that the designer of the system can accept.
When it comes to CRUD apps using read-replicas, so long as the designer of the system is aware of and accepts the consistency errors that will sometimes occur, does that make that system correct?
If you’re live streaming video, you can make sure every frame is a P-frame which brings your bandwidth costs to a minimum, but then a lost packet completely permanently disables the stream. Or you periodically refresh the stream with I-frames sent over a reliable channel so that lost packets corrupt the video going forward only momentarily.
Sure, if performance characteristics were the same, people would go for strong consistency. The reason many different consistency models are defined is that there’s different tradeoffs that are preferable to a given problem domain with specific business requirements.
If the video is streaming, people don't really care if a few frames drop, hell, most won't notice.
It's only when several frames in a row are dropped that people start to notice, and even then they rarely care as long as the message within the video has enough data points for them to make an (educated) guess.
P/B frames (which is usually most of them) reference other frames to compress motion effectively. So losing a packet doesn't mean a dropped frame, it means corruption that lasts until the next I-frame/slice. This can be seconds. If you've ever seen corrupt video that seems to "smear" wrong colors, etc. across the screen for a bunch of frames, that's what we're talking about here.
Okay but now you're explaining that correctness is not necessarily the only reasonable state. It's possible to sacrifice some degree of correctness for enormous gains in performance because having absolute correctness comes at a cost that might simply not be worth it.
Back in the day there were some P2P RTS games that just sent duplicates. Like each UDP packet would have a new game state and then 1 or more repetitions of previous ones. For lockstep P2P engines, the state that needs to be transferred tends towards just being the client's input, so it's tiny, just a handful of bytes. Makes more sense to just duplicate ahead of time vs ack/nack and resend.
I think we should stop calling these systems eventually consistent. They are actually never consistent. If the system is complex enough and there are always incoming changes, there is never a point in time in these "eventually consistent systems" that they are in consistent state. The problem of inconsistency is pushed to the users of the data.
Someone else stated this implicitly, but with your reasoning no complex system is ever consistent with ongoing changes. From the perspective of one of many concurrent writers outside of the database there’s no consistency they observe. Within the database there could be pending writes in flight that haven’t been persisted yet.
That’s why these consistency models are defined from the perspective of “if you did no more writes after write X, what happens”.
They are consistent (the C in ACID) for a particular transaction ID / timestamp. You are operating on a consistent snapshot. You can also view consistent states across time if you are archiving log.
"... with your reasoning no complex system is ever consistent with ongoing changes. From the perspective of one of many concurrent writers outside of the database there’s no consistency they observe."
That was kind of my point. We should stop callings such systems consistent.
It is possible, however, to build a complex system, even with "event sourcing", that has consistency guarantees.
Of course your comment has the key term "outside of the database". You will need to either use a database or built a homegrown system that has similar features as databases do.
One way is to pipe everything through a database that enforces the consistency. I have actually built such an event sourcing platform.
Second way is to have a reconciliation process that guarantees consistency at certain point of time. For example, bank payments systems use reconciliation to achieve end-of-day consistency. Even those are not really "guaranteed" to be consistent, just that inconsistencies are sufficiently improbable, so that they can be handled manually and with agreed on timeouts.
The way you're defining "eventually consistent" seems to imply it means "the current state of the system is eventually consistent," which is not what I think that means. Rather, it means "for any given previous state of the system, the current state will eventually reflect that."
"Eventually consistent," as I understand it, always implies a lag, whereas the way you're using it seems to imply that at some point there is no lag.
I was not trying to "define" eventually consistent, but to point out that people typically use the term quite loosely, for example when referring to the state of the system-of-systems of multiple microservices or event sourcing.
Those are never guaranteed to be in consistent state in the sense of C in ACID, which means it becomes the responsibility of the systems that use the data to handle the consistency. I see this often ignored, causing user interfaces to be flaky.
That’s a fair point. To be fair to the academic definitions, “eventually consistent” is a quiescent state in most definitions, and there are more specific ones (like “bounded staleness”, or “monotonic prefix”) that are meaningful to clients of the system.
But I agree with you in general - the dynamic nature of systems means, in my mind, that you need to use client-side guarantees, rather than state guarantees, to reason about this stuff in general. State guarantees are nicer to prove and work with formally (see Adya, for example) while client side guarantees are trickier and feel less fulfilling formally (see Crooks et al “Seeing is Believing”, or Herlihy and Wing).
I have no beef with the academic, careful definitions, although I dislike the practice where academics redefine colloquial terms more formally. That actually causes more, not less confusion. I was talking about the colloquial use of the term.
If I search for "eventual consistency", the AI tells me that one of the cons for using eventual consistency is: "Temporary inconsistencies: Clients may read stale or out-of-date data until synchronization is complete."
I see time and time again in actual companies that have "modern" business systems based on microservices that developers can state the same idea but have never actually paused to think that you something is needed to do the "synchronization". Then they build web UIs that just ignore the fact, causing application to become flaky.
They eventually become consistent from the frame of a single write. They would become consistent if you stopped writes, so they will eventually get there
But in practice we are rarely interested in single writes when we talk about consistency, but the consistency of multiple writes ("transactions") to multiple systems such as microservices.
It is difficult to guarantee consistency by stopping writes, because whatever enforces the stopping typically does not know at what point all the writes that belong together have been made.
If you "stop the writes" for sufficiently long, the probability of inconsistencies becomes low, but it is still not guaranteed to be non-existant.
For instance in bank payment systems, end-of-day consistency is handled by a secondary process called "reconciliation" which makes the end-of-day conflicts so improbable that any conflict is handled by a manual tertiary process. And then there are agreed timeouts for multi-bank transactions etc. so that the payments ultimately end up in consistent state.
Changing terminology is hard once a name sticks. But yeah, "eventual propagation" is probably more accurate. I do get the impression that "eventual consistency" often just means "does not have a well-defined consistency model".
Yes, I agree. I don't really believe we can change the terminology. But maybe we can get some people to at least think about the consistency model when using the term.
I don't see it this way. Let's take a simple example - banks. Your employer sends you the salary from another bank. The transfer is (I'd say) eventually consistent - at some point, you WILL get the money. So how it can be "never consistent"?
Your example is too simple to show the problem with the "eventual consistency" as people use the term in real life.
Let's say you have two systems, one containing customers (A) and other containing contracts for the customers (B).
Now you create a new contract by first creating the customer in system A and then the contract on system B.
It may happen that web UI shows the contract in system B, which refers to the customer by id (in system A), but that customer becomes visible slightly after in system A.
The web UI has to either be built to manage the situation where fetching customer by id may temporarily fail -- or accept the risk that such cases are rare and you just throw an error.
If a system would be actually "eventually consistent" in the sense you use the term, it would be possible for the web UI to get guarantee from the system-of-systems to fetch objects in a way that they would see either both the contract and the customer info or none.
Because I will have spent it before it becomes available :)
For the record (IMO) banks are an EXCELLENT example of eventually consistent systems.
They're also EXCELLENT for demonstrating Event Sourcing (Bank statements, which are really projections of the banks internal Event log, but enough people have encountered them in such a way that that most people understand them)
I have worked with core systems in several financial institutions, as well as built several event sourcing production systems used as the core platform. One of these event sourcing systems was actually providing real consistency guarantees.
Based on my experience, I would recommend against using bank systems as an example of event sourcing, because they are much actually much more complex than the thing people typically mean when people talk about event sourcing systems.
Bank systems cannot use normal event sourcing exactly because of the problem I describe. They have various other processes to have sufficiently probable consistency (needed by the bank statements for example), such as "reconciliation". Even those do not actually "guarantee" anything, but you need tertiary manual process to fix any inconsistencies, and timeouts agreed between banks to eventually resolve any issues related to cross-bank payments.
If the bank transaction is eventually consistent, it means that the state can flip and the person receiving will "never" be sure. A state that the transaction will be finished later is a consistent state.
Branches, commits and merges are the means how people manually resolve conflicts so that a single repository ends up in a state where revision steps forward in perfect lockstep.
In many branching strategies this is called "main". There are alternative branching stragies as well.
Obviously that does not guarantee ordering across repos, hence the popularity of "monorepo".
> If the system is complex enough and there are always incoming changes
You literally don't understand the definition of eventual consistency. The weakest form of eventual consistency, quiescent consistency, requires [0]:
that in any execution where the updates stop at
some point (i.e. where there are only finitely many updates), there
must exist some state, such that each session converges to that state
(i.e. all but finitely many operations e in each session [f] see that state).
Emphasis on the "updates stop[ping] at some point," or there being only "finitely many updates." By positing that there are always incoming changes you already fail to satisfy the hypothesis of the definition.
In this model all other forms of eventual consistency exhibit at least this property of quiescent consistency (and possibly more).
My point was kind of tongue-in-cheek. Like the other comment suggests, I was talking about how people actually use the term "eventually consistent" for example to refer to system-of-systems of multiple microservices or event sourcing systems. It is possible to define and use the terms more exactly like you suggest. I have no problem with that kind of use. But even if you use the terms more carefully, most people do not, meaning that when you talk about these systems using the misunderstood terms, people may misunderstand you although you are careful.
Eventual consistency arises from necessity -- a need to prioritise AP more. Not every application needs strong consistency as a primary constraint. Why would you optimise for that, at the cost of availability, when eventual consistency is an acceptable default?
Because the incidence and cost of mistaken under-consistency are both generally higher than those of mistaken over-consistency—especially at the scale where people would need to rely on managed off-the-shelf services like aurora instead of being able to build their own.
It's wishful thinking. It's like choosing Newtonian physics over relativity because it's simpler or the equations are neater.
If you have strong consistency, then you have at best availability xor partition tolerance.
"Eventual" consistency is the best tradeoff we have for an AP system.
Computation happens at a time and a place. Your frontend is not the same computer as your backend service, or your database, or your cloud providers, or your partners.
So you can insist on full-ACID on your DB (which it probably isn't running btw - search "READ COMMITTED".) but your DB will only be consistent with itself.
We always talk about multiple bank accounts in these consistency modelling exercises. Do yourself a favour and start thinking about multiple banks.
There is no reason a database can’t be both strongly consistent (linearizable, or equivalent) and available to clients on the majority side of a partition. This is, by far, the common case of real-world partitions in deployments with 3 data centers. One is disconnected or fails. The other two can continue, offering both strong consistency and availability to clients on their side of the partition.
The Gilbert and Lynch definition of CAP calls this state ‘unavailable’, in that it’s not available to all clients. Practically, though, it’s still available for two thirds of clients (or more, if we can reroute clients from the outside), which seems meaningfully ‘available’ to me!
If you don’t believe me, check out Phil Bernstein’s paper (Bernstein and Das) about this. Or read the Gilbert and Lynch proof carefully.
> read-modify-write is the canonical transactional workload. That applies to explicit transactions (anything that does an UPDATE or SELECT followed by a write in a transaction), but also things that do implicit transactions (like the example above)
Your "implicit transaction" would not be consistent even if there was no replication involved at all. Explicit db transactions exist for a reason - use them.
The point is that, in a disaggregated system, the transaction processor has less flexibility about how to route parts of the same transaction (that section is a point about internal implementation details of transaction systems).
in the read after write scenario, why not use something like consistency tokens ? and redirect to primary if the secondary detects it has not caught up ?
The argument seems to rely on the point that the replicas are only valuable if you can send reads to them, which I don't think is true. Eventually-consistent replicated databases are valuable on their own terms even if you can only send traffic to the leader.
Blogs like this make me go on the same rant for the n-th time:
Consistency for distributed systems is impossible without APIs returning cookies containing vector clocks.
The idea is simple: every database has a logical sequence number (LSN), which the replicas try to catch up to -- but may be a little bit behind. Every time an API talks to a set of databases (or their replicas) to produce a JSON response (or whatever), it ought to return the LSNs of each database that produced the query in a cookie. Something like "db1:591284;db2:10697438".
Client software must then union this with their existing cookie, and return the result of that to the next API call.
That way if they've just inserted some value into db1 and the read-after-write query ends up going to a read replica that's slightly behind the write master (LSN 591280 instead of 591284) then the replica can either wait until it sees LSN >= 591284, or it can proxy the query back to the write master. A simple "expected latency of waiting vs proxying" heuristic can be used for this decision.
That's (almost entirely) all you need for read-after-write transactional consistency at every layer, even through Redis caches and stateless APIs layers!
(OP here). I don’t love leaking this kind of thing through the API. I think that, for most client/server shaped systems at least, we can offer guarantees like linearizability to all clients with few hard real-world trade-offs. That does require a very careful approach to designing the database, and especially to read scale-out (as you say) but it’s real and doable.
By pushing things like read-scale-out into the core database, and away from replicas and caches, we get to have stronger client and application guarantees with less architectural complexity. A great combination.
For the love of all that’s holy, please stop doing read-after-write. In nearly all cases, it isn’t needed. The only cases I can think of are if you need a DB-generated value (so, DATETIME or UUIDv1) from MySQL, or you did a multi-row INSERT in a concurrent environment.
For MySQL, you can get the first auto-incrementing integer created from your INSERT from the cursor. If you only inserted one row, congratulations, there’s your PK. If you inserted multiple rows, you could also get the number of rows inserted and add that to get the range, but there’s no guarantee that it wasn’t interleaved with other statements. Anything else you wrote, you should already have, because you wrote it.
For MariaDB, SQLite, and Postgres, you can just use the RETURNING clause and get back the entire row with your INSERT, or specific columns.
But that could be applied only in context of a single function. What if I save a resource and then mash F5 in the browser to see what was saved? I could hit a read replica that wasn't fast enough and the consistency promise breaks. I don't know how to solve it.
The comment at [1] hints at a solution: in the response of the write request return the id of the transaction or its commit position in the TX log (LSN). When routing a subsequent read request to a replica, the system can either wait until the transaction is present on the replica, or it redirect to to the primary node. Discussed a similar solution a while ago in a talk [2], in the context of serving denormalized data views from a cache.
Yep. Your SQL transactions are only consistent to the extent that they stay in the db.
Mashing F5 is a perfect example of stepping outside the bounds of consistency.
If want to update a counter, do you read the number on your frontend, add 2 then send it back to the backend? If someone else does the same, that's a lost write regardless of how "strongly consistent" your db vendor promises to be.
But that's how the article says programmers work. Read, update, write.
If you thought "that's dumb, just send in (+2)", congrats, that's EC thinking!
So why isn't the section that needs consistency enclosed in a transaction, with all operations between BEGIN TRANSACTION and COMMIT TRANSACTION? That's the standard way to get strong consistency in SQL. It's fully supported in MySQL, at least for InnoDB. You have to talk to the master, not a read slave, when updating, but that's normal.
The point of that section, which maybe isn’t obvious enough, is to reflect on how eventually-consistent read replicas limit the options of the database system builder (rather than the application builder). If I’m building the transaction layer of a database, I want to have a bunch of options for where to send me reads, so I don’t have the send the whole read part of every RMW workloads to the single leader.
I don't understand this article and It's like the author doesn't really know what they're talking about. They don't want eventual consistency, they want read-your-writes, a consistency level that's stronger than EC yet still not strong.
Read-your-writes is indeed useful because it makes code easier to write: every process can behave as if it was the only one in the world, devs can write synchronous code, that's great ! But you don't need strong consistency.
I hope developers learn a little bit more about the domain before going to strong consistency.
I am not an expert, but from the examples in the article I think the author is looking for a bit more than read-your-writes.
E.g. They mention reading a list of attachements and want to ensure they get all currently created attachements, which includes the ones created by other processes.
So they want to have "read-all-writes" or something like that.
Read-your-writes is a client guarantee, that requires stickiness (i.e. a definition of “your”) to be meaningful. It’s not a level of consistency I love, because it raises all kinds of edge-case questions. For example, if I have to reconnect, am I still the same “your”? This isn’t even the some rare edge case! If I’m automating around a CLI, for example, how is the server meant to know that the next CLI invocation from the same script (a different process) is the same “your”? Sure, I can fix that with some kind of token, but then I’ve made the API more complicated.
Linearizability, as a global guarantee, is much nicer because it avoids all those edge cases.
I continue to be surprised that in these discussions correctness is treated as some optional highest possible level of quality, not the only reasonable state.
Suppose we're talking about multiplayer game networking, where the central store receives torrents of UDP packets and it is assumed that like half of them will never arrive. It doesn't make sense to view this as "we don't care about the player's actual position". We do. The system just has tolerances for how often the updates must be communicated successfully. Lost packets do not make the system incorrect.
A soft-realtime multiplayer game is always incorrect(unless no one is moving).
There are various decisions the netcode can make about how to reconcile with this incorrectness, and different games make different tradeoffs.
For example in hitscan FPS games, when two players fatally shoot one another at the same time, some games will only process the first packet received, and award the kill to that player, while other games will allow kill trading within some time window.
A tolerance is just an amount of incorrectness that the designer of the system can accept.
When it comes to CRUD apps using read-replicas, so long as the designer of the system is aware of and accepts the consistency errors that will sometimes occur, does that make that system correct?
If you’re live streaming video, you can make sure every frame is a P-frame which brings your bandwidth costs to a minimum, but then a lost packet completely permanently disables the stream. Or you periodically refresh the stream with I-frames sent over a reliable channel so that lost packets corrupt the video going forward only momentarily.
Sure, if performance characteristics were the same, people would go for strong consistency. The reason many different consistency models are defined is that there’s different tradeoffs that are preferable to a given problem domain with specific business requirements.
If the video is streaming, people don't really care if a few frames drop, hell, most won't notice.
It's only when several frames in a row are dropped that people start to notice, and even then they rarely care as long as the message within the video has enough data points for them to make an (educated) guess.
P/B frames (which is usually most of them) reference other frames to compress motion effectively. So losing a packet doesn't mean a dropped frame, it means corruption that lasts until the next I-frame/slice. This can be seconds. If you've ever seen corrupt video that seems to "smear" wrong colors, etc. across the screen for a bunch of frames, that's what we're talking about here.
Again - the viewer rarely cares when that happens
Minor annoyance, maybe, rage quit the application? Not a chance.
If you’re never sending an I-frame then it’s permanently corrupt. Sending an I-frame is the equivalent of eventual consistency.
Your users must be very different from the ones I'm familiar with.
If the area affected literally doesn't change for minutes afterwards it will not get refreshed and fixed.
Okay but now you're explaining that correctness is not necessarily the only reasonable state. It's possible to sacrifice some degree of correctness for enormous gains in performance because having absolute correctness comes at a cost that might simply not be worth it.
Back in the day there were some P2P RTS games that just sent duplicates. Like each UDP packet would have a new game state and then 1 or more repetitions of previous ones. For lockstep P2P engines, the state that needs to be transferred tends towards just being the client's input, so it's tiny, just a handful of bytes. Makes more sense to just duplicate ahead of time vs ack/nack and resend.
I think we should stop calling these systems eventually consistent. They are actually never consistent. If the system is complex enough and there are always incoming changes, there is never a point in time in these "eventually consistent systems" that they are in consistent state. The problem of inconsistency is pushed to the users of the data.
Someone else stated this implicitly, but with your reasoning no complex system is ever consistent with ongoing changes. From the perspective of one of many concurrent writers outside of the database there’s no consistency they observe. Within the database there could be pending writes in flight that haven’t been persisted yet.
That’s why these consistency models are defined from the perspective of “if you did no more writes after write X, what happens”.
They are consistent (the C in ACID) for a particular transaction ID / timestamp. You are operating on a consistent snapshot. You can also view consistent states across time if you are archiving log.
"... with your reasoning no complex system is ever consistent with ongoing changes. From the perspective of one of many concurrent writers outside of the database there’s no consistency they observe."
That was kind of my point. We should stop callings such systems consistent.
It is possible, however, to build a complex system, even with "event sourcing", that has consistency guarantees.
Of course your comment has the key term "outside of the database". You will need to either use a database or built a homegrown system that has similar features as databases do.
One way is to pipe everything through a database that enforces the consistency. I have actually built such an event sourcing platform.
Second way is to have a reconciliation process that guarantees consistency at certain point of time. For example, bank payments systems use reconciliation to achieve end-of-day consistency. Even those are not really "guaranteed" to be consistent, just that inconsistencies are sufficiently improbable, so that they can be handled manually and with agreed on timeouts.
The way you're defining "eventually consistent" seems to imply it means "the current state of the system is eventually consistent," which is not what I think that means. Rather, it means "for any given previous state of the system, the current state will eventually reflect that."
"Eventually consistent," as I understand it, always implies a lag, whereas the way you're using it seems to imply that at some point there is no lag.
I was not trying to "define" eventually consistent, but to point out that people typically use the term quite loosely, for example when referring to the state of the system-of-systems of multiple microservices or event sourcing.
Those are never guaranteed to be in consistent state in the sense of C in ACID, which means it becomes the responsibility of the systems that use the data to handle the consistency. I see this often ignored, causing user interfaces to be flaky.
That’s a fair point. To be fair to the academic definitions, “eventually consistent” is a quiescent state in most definitions, and there are more specific ones (like “bounded staleness”, or “monotonic prefix”) that are meaningful to clients of the system.
But I agree with you in general - the dynamic nature of systems means, in my mind, that you need to use client-side guarantees, rather than state guarantees, to reason about this stuff in general. State guarantees are nicer to prove and work with formally (see Adya, for example) while client side guarantees are trickier and feel less fulfilling formally (see Crooks et al “Seeing is Believing”, or Herlihy and Wing).
I have no beef with the academic, careful definitions, although I dislike the practice where academics redefine colloquial terms more formally. That actually causes more, not less confusion. I was talking about the colloquial use of the term.
If I search for "eventual consistency", the AI tells me that one of the cons for using eventual consistency is: "Temporary inconsistencies: Clients may read stale or out-of-date data until synchronization is complete."
I see time and time again in actual companies that have "modern" business systems based on microservices that developers can state the same idea but have never actually paused to think that you something is needed to do the "synchronization". Then they build web UIs that just ignore the fact, causing application to become flaky.
They eventually become consistent from the frame of a single write. They would become consistent if you stopped writes, so they will eventually get there
Both of your statements are true.
But in practice we are rarely interested in single writes when we talk about consistency, but the consistency of multiple writes ("transactions") to multiple systems such as microservices.
It is difficult to guarantee consistency by stopping writes, because whatever enforces the stopping typically does not know at what point all the writes that belong together have been made.
If you "stop the writes" for sufficiently long, the probability of inconsistencies becomes low, but it is still not guaranteed to be non-existant.
For instance in bank payment systems, end-of-day consistency is handled by a secondary process called "reconciliation" which makes the end-of-day conflicts so improbable that any conflict is handled by a manual tertiary process. And then there are agreed timeouts for multi-bank transactions etc. so that the payments ultimately end up in consistent state.
Changing terminology is hard once a name sticks. But yeah, "eventual propagation" is probably more accurate. I do get the impression that "eventual consistency" often just means "does not have a well-defined consistency model".
Yes, I agree. I don't really believe we can change the terminology. But maybe we can get some people to at least think about the consistency model when using the term.
> They are actually never consistent
I don't see it this way. Let's take a simple example - banks. Your employer sends you the salary from another bank. The transfer is (I'd say) eventually consistent - at some point, you WILL get the money. So how it can be "never consistent"?
Your example is too simple to show the problem with the "eventual consistency" as people use the term in real life.
Let's say you have two systems, one containing customers (A) and other containing contracts for the customers (B).
Now you create a new contract by first creating the customer in system A and then the contract on system B.
It may happen that web UI shows the contract in system B, which refers to the customer by id (in system A), but that customer becomes visible slightly after in system A.
The web UI has to either be built to manage the situation where fetching customer by id may temporarily fail -- or accept the risk that such cases are rare and you just throw an error.
If a system would be actually "eventually consistent" in the sense you use the term, it would be possible for the web UI to get guarantee from the system-of-systems to fetch objects in a way that they would see either both the contract and the customer info or none.
Because I will have spent it before it becomes available :)
For the record (IMO) banks are an EXCELLENT example of eventually consistent systems.
They're also EXCELLENT for demonstrating Event Sourcing (Bank statements, which are really projections of the banks internal Event log, but enough people have encountered them in such a way that that most people understand them)
I have worked with core systems in several financial institutions, as well as built several event sourcing production systems used as the core platform. One of these event sourcing systems was actually providing real consistency guarantees.
Based on my experience, I would recommend against using bank systems as an example of event sourcing, because they are much actually much more complex than the thing people typically mean when people talk about event sourcing systems.
Bank systems cannot use normal event sourcing exactly because of the problem I describe. They have various other processes to have sufficiently probable consistency (needed by the bank statements for example), such as "reconciliation". Even those do not actually "guarantee" anything, but you need tertiary manual process to fix any inconsistencies, and timeouts agreed between banks to eventually resolve any issues related to cross-bank payments.
If the bank transaction is eventually consistent, it means that the state can flip and the person receiving will "never" be sure. A state that the transaction will be finished later is a consistent state.
Just like Git. Why bother with all these branches, commits and merges?
Just make it so everyone's revision steps forward in perfect lockstep.
Branches, commits and merges are the means how people manually resolve conflicts so that a single repository ends up in a state where revision steps forward in perfect lockstep.
In many branching strategies this is called "main". There are alternative branching stragies as well.
Obviously that does not guarantee ordering across repos, hence the popularity of "monorepo".
Different situations require different solutions.
> If the system is complex enough and there are always incoming changes
You literally don't understand the definition of eventual consistency. The weakest form of eventual consistency, quiescent consistency, requires [0]:
Emphasis on the "updates stop[ping] at some point," or there being only "finitely many updates." By positing that there are always incoming changes you already fail to satisfy the hypothesis of the definition.In this model all other forms of eventual consistency exhibit at least this property of quiescent consistency (and possibly more).
[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...
My point was kind of tongue-in-cheek. Like the other comment suggests, I was talking about how people actually use the term "eventually consistent" for example to refer to system-of-systems of multiple microservices or event sourcing systems. It is possible to define and use the terms more exactly like you suggest. I have no problem with that kind of use. But even if you use the terms more carefully, most people do not, meaning that when you talk about these systems using the misunderstood terms, people may misunderstand you although you are careful.
The GP proposed that the definition should be changed. That in no way implies a lack of understanding of the present definition.
Eventual consistency arises from necessity -- a need to prioritise AP more. Not every application needs strong consistency as a primary constraint. Why would you optimise for that, at the cost of availability, when eventual consistency is an acceptable default?
Because the incidence and cost of mistaken under-consistency are both generally higher than those of mistaken over-consistency—especially at the scale where people would need to rely on managed off-the-shelf services like aurora instead of being able to build their own.
It's wishful thinking. It's like choosing Newtonian physics over relativity because it's simpler or the equations are neater.
If you have strong consistency, then you have at best availability xor partition tolerance.
"Eventual" consistency is the best tradeoff we have for an AP system.
Computation happens at a time and a place. Your frontend is not the same computer as your backend service, or your database, or your cloud providers, or your partners.
So you can insist on full-ACID on your DB (which it probably isn't running btw - search "READ COMMITTED".) but your DB will only be consistent with itself.
We always talk about multiple bank accounts in these consistency modelling exercises. Do yourself a favour and start thinking about multiple banks.
There is no reason a database can’t be both strongly consistent (linearizable, or equivalent) and available to clients on the majority side of a partition. This is, by far, the common case of real-world partitions in deployments with 3 data centers. One is disconnected or fails. The other two can continue, offering both strong consistency and availability to clients on their side of the partition.
The Gilbert and Lynch definition of CAP calls this state ‘unavailable’, in that it’s not available to all clients. Practically, though, it’s still available for two thirds of clients (or more, if we can reroute clients from the outside), which seems meaningfully ‘available’ to me!
If you don’t believe me, check out Phil Bernstein’s paper (Bernstein and Das) about this. Or read the Gilbert and Lynch proof carefully.
The author addresses that in a linked post: https://brooker.co.za/blog/2024/07/25/cap-again.html
> read-modify-write is the canonical transactional workload. That applies to explicit transactions (anything that does an UPDATE or SELECT followed by a write in a transaction), but also things that do implicit transactions (like the example above)
Your "implicit transaction" would not be consistent even if there was no replication involved at all. Explicit db transactions exist for a reason - use them.
The point is that, in a disaggregated system, the transaction processor has less flexibility about how to route parts of the same transaction (that section is a point about internal implementation details of transaction systems).
I keep wondering how the recent 15h outage have affected these eventually consistent systems.
I really hope to see a paper on the effects of it.
in the read after write scenario, why not use something like consistency tokens ? and redirect to primary if the secondary detects it has not caught up ?
The argument seems to rely on the point that the replicas are only valuable if you can send reads to them, which I don't think is true. Eventually-consistent replicated databases are valuable on their own terms even if you can only send traffic to the leader.
Blogs like this make me go on the same rant for the n-th time:
Consistency for distributed systems is impossible without APIs returning cookies containing vector clocks.
The idea is simple: every database has a logical sequence number (LSN), which the replicas try to catch up to -- but may be a little bit behind. Every time an API talks to a set of databases (or their replicas) to produce a JSON response (or whatever), it ought to return the LSNs of each database that produced the query in a cookie. Something like "db1:591284;db2:10697438".
Client software must then union this with their existing cookie, and return the result of that to the next API call.
That way if they've just inserted some value into db1 and the read-after-write query ends up going to a read replica that's slightly behind the write master (LSN 591280 instead of 591284) then the replica can either wait until it sees LSN >= 591284, or it can proxy the query back to the write master. A simple "expected latency of waiting vs proxying" heuristic can be used for this decision.
That's (almost entirely) all you need for read-after-write transactional consistency at every layer, even through Redis caches and stateless APIs layers!
(OP here). I don’t love leaking this kind of thing through the API. I think that, for most client/server shaped systems at least, we can offer guarantees like linearizability to all clients with few hard real-world trade-offs. That does require a very careful approach to designing the database, and especially to read scale-out (as you say) but it’s real and doable.
By pushing things like read-scale-out into the core database, and away from replicas and caches, we get to have stronger client and application guarantees with less architectural complexity. A great combination.
FWIW, I think that’s essentially how Aurora DSQL works, and sort of explained at the end of the article.
For the love of all that’s holy, please stop doing read-after-write. In nearly all cases, it isn’t needed. The only cases I can think of are if you need a DB-generated value (so, DATETIME or UUIDv1) from MySQL, or you did a multi-row INSERT in a concurrent environment.
For MySQL, you can get the first auto-incrementing integer created from your INSERT from the cursor. If you only inserted one row, congratulations, there’s your PK. If you inserted multiple rows, you could also get the number of rows inserted and add that to get the range, but there’s no guarantee that it wasn’t interleaved with other statements. Anything else you wrote, you should already have, because you wrote it.
For MariaDB, SQLite, and Postgres, you can just use the RETURNING clause and get back the entire row with your INSERT, or specific columns.
> please stop doing read-after-write
But that could be applied only in context of a single function. What if I save a resource and then mash F5 in the browser to see what was saved? I could hit a read replica that wasn't fast enough and the consistency promise breaks. I don't know how to solve it.
The comment at [1] hints at a solution: in the response of the write request return the id of the transaction or its commit position in the TX log (LSN). When routing a subsequent read request to a replica, the system can either wait until the transaction is present on the replica, or it redirect to to the primary node. Discussed a similar solution a while ago in a talk [2], in the context of serving denormalized data views from a cache.
[1] https://news.ycombinator.com/item?id=46073630 [2] https://speakerdeck.com/gunnarmorling/keep-your-cache-always...
Yep. Your SQL transactions are only consistent to the extent that they stay in the db.
Mashing F5 is a perfect example of stepping outside the bounds of consistency.
If want to update a counter, do you read the number on your frontend, add 2 then send it back to the backend? If someone else does the same, that's a lost write regardless of how "strongly consistent" your db vendor promises to be.
But that's how the article says programmers work. Read, update, write.
If you thought "that's dumb, just send in (+2)", congrats, that's EC thinking!
Local storage, sticky sessions, consistent hashing cache
I think the point is that read-after-write is exactly the desired property here.
And connection affinity
Assuming that the stickied datastore hasn't experienced an "issue"
So why isn't the section that needs consistency enclosed in a transaction, with all operations between BEGIN TRANSACTION and COMMIT TRANSACTION? That's the standard way to get strong consistency in SQL. It's fully supported in MySQL, at least for InnoDB. You have to talk to the master, not a read slave, when updating, but that's normal.
(OP here).
The point of that section, which maybe isn’t obvious enough, is to reflect on how eventually-consistent read replicas limit the options of the database system builder (rather than the application builder). If I’m building the transaction layer of a database, I want to have a bunch of options for where to send me reads, so I don’t have the send the whole read part of every RMW workloads to the single leader.
I don't understand this article and It's like the author doesn't really know what they're talking about. They don't want eventual consistency, they want read-your-writes, a consistency level that's stronger than EC yet still not strong.
https://jepsen.io/consistency/models/read-your-writes
Read-your-writes is indeed useful because it makes code easier to write: every process can behave as if it was the only one in the world, devs can write synchronous code, that's great ! But you don't need strong consistency.
I hope developers learn a little bit more about the domain before going to strong consistency.
I am not an expert, but from the examples in the article I think the author is looking for a bit more than read-your-writes.
E.g. They mention reading a list of attachements and want to ensure they get all currently created attachements, which includes the ones created by other processes.
So they want to have "read-all-writes" or something like that.
Read-your-writes is a client guarantee, that requires stickiness (i.e. a definition of “your”) to be meaningful. It’s not a level of consistency I love, because it raises all kinds of edge-case questions. For example, if I have to reconnect, am I still the same “your”? This isn’t even the some rare edge case! If I’m automating around a CLI, for example, how is the server meant to know that the next CLI invocation from the same script (a different process) is the same “your”? Sure, I can fix that with some kind of token, but then I’ve made the API more complicated.
Linearizability, as a global guarantee, is much nicer because it avoids all those edge cases.