Great talk, there's a lot I can relate to in here.
I find this topic difficult to navigate because of the many trade-offs. One aspect that wasn't mentioned is temporal. A lot of the time, it makes sense to start with a "database-oriented design" (in the pejorative sense), where your types are just whatever shape your data has in Postgres.
However, as time goes on and your understanding of the domain grows, you start to realize the limitations of that approach. At that point, it probably makes sense to introduce a separate domain model and use explicit mapping. But finding that point in time where you want to switch is not trivial.
Should you start with a domain model from the get-go? Maybe, but it's risky because you may end up with domain objects that don't actually do a better job of representing the domain than whatever you have in your SQL tables. It also feels awkward (and is hard to justify in a team) to map back and forth between domain model, sql SELECT row and JSON response body if they're pretty much the same, at least initially.
So it might very well be that, rather than starting with a domain model, the best approach is to refactor your way into it once you have a better feel for the domain. Err on the side of little or no abstraction, but don't hesitate to introduce abstraction when you feel the pain from too much "concretion". Again, it takes judgment so it's hard to teach (which the talk does an admirable job in pointing out).
Pretty naive question, but what differentiates a "domain model" from these more primitive data representations? I see the term thrown around a lot but I've never been able to grok what people actually mean.
By domain model do you mean something like what a scientist would call a theory? A description of your domain in terms of some fundamental concepts, how they relate to each other, their behaviour, etc? Something like a specification?
Which could of course have many possible concrete implementations (and many possible ways to represent it with data). Where I get confused with this is I'm not sure what it means to map data to and from your domain model (it's an actual code entity?), so I'm probably thinking about this wrong.
A quick example can be found with date. You can store it in ISO 8601 string and often it makes more sense as this is a shared spec between systems. But when it comes to actually display it, there's a lot of additional concerns that creep in such as localization and timezones. Then you need to have a data structure that split the components, and some components may be used as keys or parameters for some logic that outputs the final representation, also as a string.
So both the storage and presentation layer are strings, but they differs. So to reconcile both, you need an intermediate layer, which will contains structures that are the domain models, and logic that manipulate them. To jump from one layer to another you map the data, in this example, string to structs then to string.
With MVC and CRUD apps, the layers often have similar models (or the same, especially with dynamic languages) so you don't bother with mapping. But when the use cases becomes more complex, they alter the domain layer and the models within. So then you need to add mapping code. Your storage layers may have many tables (if using sql), but then it's a single struct at the domain layer, which then becomes many models at the presentation layer with duplicate information.
NOTE
That's why a lot of people don't like most ORM libraries. They're great when the models are similar, but when they start to diverge, you always need to resort to raw SQL query, then it becomes a pain to refactor. The good ORM libraries relies on metaprogramming, then they're just weird SQL.
Not really. It's all about the code you need to write. Instead of wrangling the data structures you get from the ORM which is usually similar to maps and array of maps. You have something that makes the domain logic cleaner and clear. Code for mapping data are simple, so you just pay the time price for writing them in exchange for having maintainable use case logic.
Generally the ideal format for one problem is not the same as another. For example, to store a graph in a RDBMS, the ideal format is probably an adjacency list with a recursive query to iterate it. But in my app code, it’s probably easiest as an object-graph just pointing at each other. And in the context of my frontend, I don’t even want to talk about the graph, the user can only really talk about one node’s parent/child relationship at a time.
There’s no one data model ideal for all scenarios — so why not have a different model for each scenario? Then I just need to figure out a way to transform between one model and the next, and whatever logic depending on that idealized data model can now be implemented fairly simply (since that’s the nature of a good data model - the rest of the logic often just falls out).
So the data model you’re using then is localized to the domain/subject in question. You’re just transitioning the data between models as needed. A domain just being an arbitrary context — the persistence layer, or the UI logic, or even specific like I want my model for an accountant to reflect how an accountant UI page would organize it because I only understand 30% of what they’re asking me to do so keeping it “in their terms” makes things much easier to implement blindly. Or perhaps the primary purpose of this particular function is various aggregations for reporting, so I start off by organizing my dataset into a hierarchy that largely aligns with the aggregation groups. Once it’s aligned properly, the aggregation logic itself becomes utterly trivial to express
You could even say that every time you query the database beyond a single table select *, you’re creating a new domain-specific data model. You’re just transforming from the original table representations to a new one.
All domain modeling is specifically choosing a representation that best fits the logic you’re about to write, and then figuring out how to take the model you have and turn it into the model you want. Everything else on the subject is just implementation detail.
> For example, to store a graph in a RDBMS, the ideal format is probably an adjacency list with a recursive query to iterate it
I know this was a minor point, but I think it speaks to the overall topic, so I'll poke at it.
Adjacency lists are perhaps the worst way to store a graph / tree in RDBMS. They may be the easiest to understand, but they have some of the worst performance characteristics, especially if your RDBMS doesn't have Recursive CTEs. This starts to matter at a much lower scale than you might think; several million rows is enough to start showing slowdowns.
This book [0] (Joe Celko's Trees and Hierarchies in SQL For Smarties) shows many other options, though it does lack the closure table approach [1], which is my preferred approach.
And here, we come full circle back to the long-held friction between DBs and applications. You start mentioning triggers, and devs flinch, stating that they don't want logic in the DB. In every case I've ever seen, the replacements they come up with is incredibly convoluted and prone to errors, but hey, it's not in the DB. There is no reason to fear triggers, if and only if you treat them the same way that you'd treat code (because it is): added/modified/removed only via PRs, with careful review and testing.
My understanding is, a database model is one that is fully normalized - design tables to have no redundant/repeated piece of information. You know, the one they teach you when you study relational DBs.
In that model, you can navigate from anywhere to anywhere by following references.
The domain model, at least from a DDD perspective, is different in at least a couple of ways: your domain classes expose business behaviours, and you can hide certain entities as such.
For example, imagine an e-commerce application where you have to represent an order.
In the DB model, you will have the `order` table as well as the `order_line` table, where each row of the latter references a row of the former. In your domain model, instead, you might decide to have a single Order class with order lines only accessed via methods and in the form of strings, or tuples, or whatever - just not with an entity. The Order class hides the existence of the order_line table.
Plus, the Order class will have methods such as `markAsPaid()` etc, also hiding the implementation details of how you persist this type of information - an enum? a boolean? another table referencing rows of `order`? It does not matter to callers.
For me domain model means capturing as much information about the domain you are modeling in the types and data structures you use. Most of the time that ends up meaning use Unions to make illegal states unrepresentable. For example, I have not seen a database native approach to saving union types to databases. In that case using another domain layer becomes mandatory.
To me domain model is an Object-Oriented API through which I can interact with the data in the system. Another way to interact would be direct SQL-calls of course, but then users would need to know about how the data is represented in the database-schema. Whereas with an OOP API, API-methods return instances of several multiple model-classes.
The way the different classes are associated with each other by method calls makes evidednt a kind of "theory" of our system, what kind of objects there are in the system what operations they can perform returning other types of objects as results and so on. So it looks much like a "theory" might in Ecological Biology, m ultiple species interacting with each other.
> Should you start with a domain model from the get-go? Maybe, but it's risky because you may end up with domain objects that don't actually do a better job of representing the domain than whatever you have in your SQL tables.
You absolutely should go with a domain model from the get-go. You can take some shortcuts if absolutely necessary, such as simply using a typealias like `type User = PostgresUser`. But you should definitely NOT use postgres-types inside all the rest of your code - that just asks for a terrible refactoring later.
> It also feels awkward (and is hard to justify in a team) to map back and forth between domain model, sql SELECT row and JSON response body if they're pretty much the same, at least initially.
Absolutely not. This is the most normal thing in the world. And, in fact, they won't be the same anyways. Don't you want to use at least decent calendar/datetime types and speaking names? Don't you want to at least structure things a bit? And you should really really use proper types for IDs.
User(name: string, posts: string[]) is terrible.
User(name: UserName, posts: PostId[]) is acceptable. So you will have to do some kind of mapping even in the vast majority of trivial cases.
After decades of experience, I'm starting to acquire a notion that most of modern web app development is simply the obstinate refusal to put the code where it really belongs: inside the database engine.
Impedance mismatch, ORM, type generators, query parameterisation, async, etc... all stem from treating data as this "external" thing instead of the beating heart of the application.
It terrifies me to say this, but sooner or later someone is going to cook up a JavaScript database engine that also has web capability, along with a native client-side cache component... and then it'll be curtains for traditional databases.
Oh, the performance will be atrocious and grey-bearded wise old men will waggle their fingers in warning, but nobody will care. It'll be simple, consistent, integrated, and productive.
I'm not thinking of "current" DB engines, but an entirely new stack that is end-to-end JavaScript (or WASM).
Something like Java's Akka or .NET Orleans combined with React.
So the "data" would be persisted by stateful Actors, which can run either on the server or in the browser, using the exact same code. Actors running in the browser can persist to localStorage and on the server the actors can persist to blob storage or whatever.
Actors have unique addresses that can be used to activate them. In this system these become standard HTTP URIs so that there is a uniform calling convention.
It sounds interesting, but its still just moving the problem surface elsewhere.
Bringing model definition and usage closer to the storage layer thereby reducing the need for translation and transport might cut down on the need for repeated and variant definitions across layers but it doesn't remove the other issues related to data storage.
There will need to be a system for storage and that will have to deal with transactional state management as well as consistency schemes, having that in an actor layer that is shared with other parts of the system might solve some issues, but it'd need to be really carefully managed so as not to inflict the solutions to those problems on the other parts of the system using the same transport mechanisms.
i dislike javascript with a passion but I'd be down for using a wasm based system that does what you say, my skepticism is usually just me shouting at clouds so it'd be interesting to see a working model.
First and foremost, what if there is not "the" database? What if you have multiple places that store data? For example, a postgres for ACID stuff, something like Kafka/RabbitMQ or similar to easily communicate with other services (or even yourself) - and sure, you could do that in postgres, but it's a trade-off. Then maybe something like redis/memcache for quick lookups/caching, then maybe elasticsearch for indexed search queries, and so on. And you usually also have some http API.
Sure you can say "I just do all with postgres" and honestly, that's often a good choice.
But it shows that it's not where "code (...) really belongs". Even IF you move a lot of logic into your database engine (and you often should), you most of the time will have another API and there will be a connection. Well, unless you use shared database tables with another application for communication.
All you do is pushing it out further to a later point - and often forcefully so.
> It terrifies me to say this, but sooner or later someone is going to cook up a JavaScript database engine that also has web capability, along with a native client-side cache component... and then it'll be curtains for traditional databases.
Not going to happen. Services like https://spacetimedb.com exist. Also, solutions like Spark (where you send your code to the database(s)) exist. And for certain things, they are great. However, it is a trade-off. There is no one-fits-all solution.
These are different things, and the fact that they're so often conflated is IMO prima facie that they're misused. If you need a queue, you shouldn't be reaching for Kafka, and vice-versa.
> Then maybe something like redis/memcache for quick lookups/caching
With proper RDBMS schema, indexing, and queries, these are _very_ frequently not needed. At FAANG scale, sure, but I think most would be shocked at how performant a properly-tuned RBDMS can be. That's of course the problem; RDBMS are incredibly difficult to run optimally at scale, and so the easier solution is to scale them up and slap a cache in front of them.
> elasticsearch for indexed search queries
Again, probably not needed for most. Both Postgres, MySQL, and SQLite (and I assume others) all have FTS that works quite well. ES is massively complicated to administer.
What you are saying isn't all that wrong. But it's besides the point that I'm making. Because it's not required to use ALL (or even more than one) of the tools or things (such as an http API) that I listed.
There are certainly times I would love to see a presentation like this reformatted as an article.
I tried pulling out the Youtube transcript, but it was very uncomfortable to read with asides and jokes and "ums" that are all native artifacts of speaking in front of a crowd but that only represent noise in when converted to long written form.
However, nowadays thanks to the recent-ish changes in Twitter and Google, my only chance to have my stuff read by a nontrivial amount of people is hitting HN frontage which is a lottery. It's so bad I even got into YouTubing to get a roll at the algorithm wheel.
It takes (me) a lot of work to crystallize and compress my thoughts like this. Giving it as a talk at a big conference, at least opens the door to interesting IRL interactions which are important (to me), because I'm an introvert.
I can't stress enough how we're currently eating the seed corn by killing the public web.
I just pasted the YouTube link into AI Studio and gave it this prompt if you want to replicate:
reformat this talk as an article. remove ums/ahs, but do not summarize, the context should be substantively the same. include content from the slides as well if possible.
I had the reverse problem a month ago. Greenfield project without existing data, domain model or API. I had no reason to model the API or persistence layer any different than the domain model, so I implemented the same class 3 times, with 2 mappings on top. For what?
Well at some point, you will have API consumers and existing data and you need to be able to change the then-existing system.
i've cultivated the perception of what op calls design pressure my whole career as the primary driver behind code and her shape. i think it's the most important aspect of a successful architecture, and it's purely intuition based, which is also why there's no silver bullet. i've seen people take most well intended best practices and drive them into the ground because they lack the design pressure sense.
i believe that design pressure sense is a form of taste, and like taste it needs to be cultivated, and that is can't be easily verbalized or measured. you just know that your architecture is going to have advantageous properties, but to sit down and explain why will take inordinate amount of effort. the goal is to be able to look at the architecture and be able to see its failure states as it evolves through other people working with it, external pressures, requirement changes, etc. over the course of 2, 3, ... 10, etc. years into the future. i stay in touch with former colleagues from projects where i was architect, just so that i can learn how the architecture evolved, what were the pain points, etc.
i've met other architects who have that sense, and it's a joy to work with them, because it is vibing. conversly "best practices or bust" sticklers are insufferable. i make sure that i don't have to contend with such people.
Zen and Art of Motorcyle maintenance is a good reference.
Also, it is good to remember what game is actually being played. When someone comes up with a popularizes a given "best practice", why are they doing so? In many cases, Uncle Bob types are doing this just as a form of self promotion. Most best practices are fundamentally indefensible with proponents resorting to ad-hominem attacks if their little church is threatened.
Code is for communicating with humans primarily, even though it needs to be run on a machine. All the patterns, principles, and best practices is to ease understanding and reasoning by other people, including your future self. Flexibility is essential, but common patterns and shared metaphors work wonders.
That's terribly short sighted. You can have a very clear architecture and code which cannot support the use cases required without almost starting from scratch.
You can also have the most flexible system ever designed, but if the rest of your team doesn't understand it then good luck implementing that required use cases
Sure, both extremes are shortsighted. I wasn't arguing for that, to be clear. I'm just saying clarity and ivory tower architecturing has little value if your system can't actually support the intended use case.
Which is what the person I was replying to said with "Code is for communicating with humans primarily, even though it needs to be run on a machine.". If the primary purpose is communication with other humans we wouldn't choose such awkward languages. The primary purpose of code is to run and provide some kind of features supporting use cases. It's really nice however if humans can understand it well.
That aphorism is completely incorrect. Code is primarily for communicating with a machine. If the purpose was to communicate with humans, we'd use human languages. Lawyers do that.
The code does also need to be understandable by other humans, but that is not its primary purpose.
So why do we have Java, Kotlin, Scala, Groovy, and Clojure, all targeting the JVM? And many such families?
The only thing that matter to the machine is opcodes and bits, But that's alien to human, so we map it to assembly. Any abstractions higher than that is mostly for reasoning about the code and share that with other people. And in the process we find some very good abstractions which we then embed into programming languages like procedure, namespacing, OOP, patterns matching, structs, traits/protocols,...
All these abstractions are good because they are useful when modeling a problem. The some are so good then it's worth writing a whole VM to get them (lisp homoiconicity, smalltalk's consistent world representation,...)
To allow you to write more readable and extensible code, that can solve real problems more effectively. Solving problems is the point of writing code.
Saying that reading code is the point of writing code is crazy, that's like saying the point of writing scripts is to read them, or the point of writing sheet music is to look at it.
No - the point of writing a script is to have it performed as a play, the point of writing music is to hear it and enjoy it. The point of writing code is to run it.
> All these abstractions are good because they are useful when modeling a problem.
Then what do you do after modeling the problem? You solve it! You run the program! Everything is in service to that.
No one does it in isolation. The goal of having a common formal notation is for everyone to share solution unambiguously with each other. We have mathematical notation, choreographic notation, music notation, electric notation,... because when you've created something, you want to share it as best as possible to others. If not, you could just ship the end result and be done with it.
So no the point of writing music is not to hear it and enjoy it. To do so you just find an instrument and perform. You do not to do anything else. But to have someone else to do it, you can rely on their ear, their sights and their memory to pick things up. Or you just use the common notation to exchange the piece of music.
Because a secondary goal of code is communication with other humans. That means readability is still a highly valuable trait. Just not as valuable as the primary purpose.
I'd say code is a machine. Even code in a high-level language. Code machine is somewhat special because its details look like words. This misleads us into believing we can reason with these words. We cannot. We can use them to make the machine itself, but the only way to explain how it works is to write a normal technical description and the normal way to understand it should begin with reading that description. (There's no standard for a normal technical description though.)
While you are obviously right about it not being the primary purpose, here it seems the discussion is about designing for long term maintainability vs just running code.
The person he replied said code is primarily for communicating with other people. I'm not sure how else to interpret that than what is literally written down.
Human language is imperfect: they never said "other" people, but just "humans" (this includes oneself, for example).
So you are already not interpreting it literally: none of us can avoid our biases (also, machine code is code too, yet nobody misinterpreted that).
I took that quote to mean that we go through the extra trouble of writing nice code for humans to be able to reason about the code, and especially to update when changes are needed: that makes it the primary reason we invent programming languages instead of going with machine code directly.
You than take primarily to not mean primarily. I mean sure you can do that, I don't find it very convincing. It's possible they misspoke(or wrote) but that's not really the fault of the person reacting to what they said.
This reminds me of the concept of “forces” [0][1][2] in design-pattern descriptions. To decide for or against the use of a given design pattern, or to choose between alternative design patterns, one has to assess and weigh the respective forces in the particular context where it is to be used. They are called forces because they collectively pull the design in a certain direction. Just a different physics analogy versus “pressure”.
There was a comment on here saying this was an implied diss of SQLModel, but now that I came back to reply to it it's gone. Weird. Since it's implied I couldn't find it in the slides.
I wrote and then quickly deleted that comment; I never want to speak negatively publicly about open source projects — projects that people work incredibly hard to build and maintain. I felt my original comment crossed that line.
In any case, there is a slide in the talk that has both the Pydantic and SQL Alchemy logos. As far as I know, there’s only one (somewhat popular) library that ties these two together. I think the speaker makes a persuasive case that data, domain, API, and other models should remain related but distinct.
I'm not sure I'd take design advice from someone who thought attr.ib and attr.s were a good idea. On the other hand he points out that DDD is a vacuous cult, which is true.
I’d call out patternitis and over-OOPification way before I’d criticize DDD. Yes, the latter can go too far, but the two former cases are abused on a much more frequent basis. Happily the pattern crazyness has died down a lot though.
that's a reference to my attrs library which is what data classes are based on. It originally used
@attr.s
class C:
x = attr.ib()
as its main api (with `attr.attrs` and `attr.attrib` as serious business aliases so you didn't have to use it).
That API was always polarizing, some loved it, some hated it.
I will point out though, that it predates type hints and it was an effective way to declare classes with little "syntax noise" which made it easy to write but also easy to read, because you used the import name as part of the APIs.
Great talk, there's a lot I can relate to in here.
I find this topic difficult to navigate because of the many trade-offs. One aspect that wasn't mentioned is temporal. A lot of the time, it makes sense to start with a "database-oriented design" (in the pejorative sense), where your types are just whatever shape your data has in Postgres.
However, as time goes on and your understanding of the domain grows, you start to realize the limitations of that approach. At that point, it probably makes sense to introduce a separate domain model and use explicit mapping. But finding that point in time where you want to switch is not trivial.
Should you start with a domain model from the get-go? Maybe, but it's risky because you may end up with domain objects that don't actually do a better job of representing the domain than whatever you have in your SQL tables. It also feels awkward (and is hard to justify in a team) to map back and forth between domain model, sql SELECT row and JSON response body if they're pretty much the same, at least initially.
So it might very well be that, rather than starting with a domain model, the best approach is to refactor your way into it once you have a better feel for the domain. Err on the side of little or no abstraction, but don't hesitate to introduce abstraction when you feel the pain from too much "concretion". Again, it takes judgment so it's hard to teach (which the talk does an admirable job in pointing out).
Pretty naive question, but what differentiates a "domain model" from these more primitive data representations? I see the term thrown around a lot but I've never been able to grok what people actually mean.
By domain model do you mean something like what a scientist would call a theory? A description of your domain in terms of some fundamental concepts, how they relate to each other, their behaviour, etc? Something like a specification?
Which could of course have many possible concrete implementations (and many possible ways to represent it with data). Where I get confused with this is I'm not sure what it means to map data to and from your domain model (it's an actual code entity?), so I'm probably thinking about this wrong.
A quick example can be found with date. You can store it in ISO 8601 string and often it makes more sense as this is a shared spec between systems. But when it comes to actually display it, there's a lot of additional concerns that creep in such as localization and timezones. Then you need to have a data structure that split the components, and some components may be used as keys or parameters for some logic that outputs the final representation, also as a string.
So both the storage and presentation layer are strings, but they differs. So to reconcile both, you need an intermediate layer, which will contains structures that are the domain models, and logic that manipulate them. To jump from one layer to another you map the data, in this example, string to structs then to string.
With MVC and CRUD apps, the layers often have similar models (or the same, especially with dynamic languages) so you don't bother with mapping. But when the use cases becomes more complex, they alter the domain layer and the models within. So then you need to add mapping code. Your storage layers may have many tables (if using sql), but then it's a single struct at the domain layer, which then becomes many models at the presentation layer with duplicate information.
NOTE
That's why a lot of people don't like most ORM libraries. They're great when the models are similar, but when they start to diverge, you always need to resort to raw SQL query, then it becomes a pain to refactor. The good ORM libraries relies on metaprogramming, then they're just weird SQL.
ORM libraries have Value conversion functionality for such trivial examples https://learn.microsoft.com/en-us/ef/core/modeling/value-con...
Not really. It's all about the code you need to write. Instead of wrangling the data structures you get from the ORM which is usually similar to maps and array of maps. You have something that makes the domain logic cleaner and clear. Code for mapping data are simple, so you just pay the time price for writing them in exchange for having maintainable use case logic.
Generally the ideal format for one problem is not the same as another. For example, to store a graph in a RDBMS, the ideal format is probably an adjacency list with a recursive query to iterate it. But in my app code, it’s probably easiest as an object-graph just pointing at each other. And in the context of my frontend, I don’t even want to talk about the graph, the user can only really talk about one node’s parent/child relationship at a time.
There’s no one data model ideal for all scenarios — so why not have a different model for each scenario? Then I just need to figure out a way to transform between one model and the next, and whatever logic depending on that idealized data model can now be implemented fairly simply (since that’s the nature of a good data model - the rest of the logic often just falls out).
So the data model you’re using then is localized to the domain/subject in question. You’re just transitioning the data between models as needed. A domain just being an arbitrary context — the persistence layer, or the UI logic, or even specific like I want my model for an accountant to reflect how an accountant UI page would organize it because I only understand 30% of what they’re asking me to do so keeping it “in their terms” makes things much easier to implement blindly. Or perhaps the primary purpose of this particular function is various aggregations for reporting, so I start off by organizing my dataset into a hierarchy that largely aligns with the aggregation groups. Once it’s aligned properly, the aggregation logic itself becomes utterly trivial to express
You could even say that every time you query the database beyond a single table select *, you’re creating a new domain-specific data model. You’re just transforming from the original table representations to a new one.
All domain modeling is specifically choosing a representation that best fits the logic you’re about to write, and then figuring out how to take the model you have and turn it into the model you want. Everything else on the subject is just implementation detail.
> For example, to store a graph in a RDBMS, the ideal format is probably an adjacency list with a recursive query to iterate it
I know this was a minor point, but I think it speaks to the overall topic, so I'll poke at it.
Adjacency lists are perhaps the worst way to store a graph / tree in RDBMS. They may be the easiest to understand, but they have some of the worst performance characteristics, especially if your RDBMS doesn't have Recursive CTEs. This starts to matter at a much lower scale than you might think; several million rows is enough to start showing slowdowns.
This book [0] (Joe Celko's Trees and Hierarchies in SQL For Smarties) shows many other options, though it does lack the closure table approach [1], which is my preferred approach.
And here, we come full circle back to the long-held friction between DBs and applications. You start mentioning triggers, and devs flinch, stating that they don't want logic in the DB. In every case I've ever seen, the replacements they come up with is incredibly convoluted and prone to errors, but hey, it's not in the DB. There is no reason to fear triggers, if and only if you treat them the same way that you'd treat code (because it is): added/modified/removed only via PRs, with careful review and testing.
[0]: https://ia804505.us.archive.org/19/items/0411-pdf-celko-tree...
[1]: https://dirtsimple.org/2010/11/simplest-way-to-do-tree-based...
My understanding is, a database model is one that is fully normalized - design tables to have no redundant/repeated piece of information. You know, the one they teach you when you study relational DBs.
In that model, you can navigate from anywhere to anywhere by following references.
The domain model, at least from a DDD perspective, is different in at least a couple of ways: your domain classes expose business behaviours, and you can hide certain entities as such.
For example, imagine an e-commerce application where you have to represent an order.
In the DB model, you will have the `order` table as well as the `order_line` table, where each row of the latter references a row of the former. In your domain model, instead, you might decide to have a single Order class with order lines only accessed via methods and in the form of strings, or tuples, or whatever - just not with an entity. The Order class hides the existence of the order_line table.
Plus, the Order class will have methods such as `markAsPaid()` etc, also hiding the implementation details of how you persist this type of information - an enum? a boolean? another table referencing rows of `order`? It does not matter to callers.
For me domain model means capturing as much information about the domain you are modeling in the types and data structures you use. Most of the time that ends up meaning use Unions to make illegal states unrepresentable. For example, I have not seen a database native approach to saving union types to databases. In that case using another domain layer becomes mandatory.
For context: https://fsharpforfunandprofit.com/posts/designing-with-types...
To me domain model is an Object-Oriented API through which I can interact with the data in the system. Another way to interact would be direct SQL-calls of course, but then users would need to know about how the data is represented in the database-schema. Whereas with an OOP API, API-methods return instances of several multiple model-classes.
The way the different classes are associated with each other by method calls makes evidednt a kind of "theory" of our system, what kind of objects there are in the system what operations they can perform returning other types of objects as results and so on. So it looks much like a "theory" might in Ecological Biology, m ultiple species interacting with each other.
You can model this "theory" in the database itself.
> Should you start with a domain model from the get-go? Maybe, but it's risky because you may end up with domain objects that don't actually do a better job of representing the domain than whatever you have in your SQL tables.
You absolutely should go with a domain model from the get-go. You can take some shortcuts if absolutely necessary, such as simply using a typealias like `type User = PostgresUser`. But you should definitely NOT use postgres-types inside all the rest of your code - that just asks for a terrible refactoring later.
> It also feels awkward (and is hard to justify in a team) to map back and forth between domain model, sql SELECT row and JSON response body if they're pretty much the same, at least initially.
Absolutely not. This is the most normal thing in the world. And, in fact, they won't be the same anyways. Don't you want to use at least decent calendar/datetime types and speaking names? Don't you want to at least structure things a bit? And you should really really use proper types for IDs.
User(name: string, posts: string[]) is terrible.
User(name: UserName, posts: PostId[]) is acceptable. So you will have to do some kind of mapping even in the vast majority of trivial cases.
After decades of experience, I'm starting to acquire a notion that most of modern web app development is simply the obstinate refusal to put the code where it really belongs: inside the database engine.
Impedance mismatch, ORM, type generators, query parameterisation, async, etc... all stem from treating data as this "external" thing instead of the beating heart of the application.
It terrifies me to say this, but sooner or later someone is going to cook up a JavaScript database engine that also has web capability, along with a native client-side cache component... and then it'll be curtains for traditional databases.
Oh, the performance will be atrocious and grey-bearded wise old men will waggle their fingers in warning, but nobody will care. It'll be simple, consistent, integrated, and productive.
That's....certainly a take.
it hurts that it's not exactly wrong.
but i don't think it's 100% right either, there are some things that you just can't do reliably, in current db engines at least.
As soon as you start baking this kind of support in to the db all you have if a db engine that has all the other bits stuffed in it.
They'll still have most of the issues you describe, it'll just be all in the "db layer" of the engine.
Yes inside the DB where it cannot be debugged or optimized
If you're putting advanced feature support into a db engine , you're probably also putting in semi-competent debugging support (at least i'd hope so).
But again, at that point you're really just moving the surface rather than addressing the issues.
I'm not thinking of "current" DB engines, but an entirely new stack that is end-to-end JavaScript (or WASM).
Something like Java's Akka or .NET Orleans combined with React.
So the "data" would be persisted by stateful Actors, which can run either on the server or in the browser, using the exact same code. Actors running in the browser can persist to localStorage and on the server the actors can persist to blob storage or whatever.
Actors have unique addresses that can be used to activate them. In this system these become standard HTTP URIs so that there is a uniform calling convention.
It sounds interesting, but its still just moving the problem surface elsewhere.
Bringing model definition and usage closer to the storage layer thereby reducing the need for translation and transport might cut down on the need for repeated and variant definitions across layers but it doesn't remove the other issues related to data storage.
There will need to be a system for storage and that will have to deal with transactional state management as well as consistency schemes, having that in an actor layer that is shared with other parts of the system might solve some issues, but it'd need to be really carefully managed so as not to inflict the solutions to those problems on the other parts of the system using the same transport mechanisms.
i dislike javascript with a passion but I'd be down for using a wasm based system that does what you say, my skepticism is usually just me shouting at clouds so it'd be interesting to see a working model.
You're 100% correct, but unless I'm missing something, this has already been done (modulo JavaScript, thankfully): PostgREST [0].
[0]: https://docs.postgrest.org/en/v13/
I disagree.
First and foremost, what if there is not "the" database? What if you have multiple places that store data? For example, a postgres for ACID stuff, something like Kafka/RabbitMQ or similar to easily communicate with other services (or even yourself) - and sure, you could do that in postgres, but it's a trade-off. Then maybe something like redis/memcache for quick lookups/caching, then maybe elasticsearch for indexed search queries, and so on. And you usually also have some http API.
Sure you can say "I just do all with postgres" and honestly, that's often a good choice.
But it shows that it's not where "code (...) really belongs". Even IF you move a lot of logic into your database engine (and you often should), you most of the time will have another API and there will be a connection. Well, unless you use shared database tables with another application for communication.
All you do is pushing it out further to a later point - and often forcefully so.
> It terrifies me to say this, but sooner or later someone is going to cook up a JavaScript database engine that also has web capability, along with a native client-side cache component... and then it'll be curtains for traditional databases.
Not going to happen. Services like https://spacetimedb.com exist. Also, solutions like Spark (where you send your code to the database(s)) exist. And for certain things, they are great. However, it is a trade-off. There is no one-fits-all solution.
> Kafka / RabbitMQ
These are different things, and the fact that they're so often conflated is IMO prima facie that they're misused. If you need a queue, you shouldn't be reaching for Kafka, and vice-versa.
> Then maybe something like redis/memcache for quick lookups/caching
With proper RDBMS schema, indexing, and queries, these are _very_ frequently not needed. At FAANG scale, sure, but I think most would be shocked at how performant a properly-tuned RBDMS can be. That's of course the problem; RDBMS are incredibly difficult to run optimally at scale, and so the easier solution is to scale them up and slap a cache in front of them.
> elasticsearch for indexed search queries
Again, probably not needed for most. Both Postgres, MySQL, and SQLite (and I assume others) all have FTS that works quite well. ES is massively complicated to administer.
What you are saying isn't all that wrong. But it's besides the point that I'm making. Because it's not required to use ALL (or even more than one) of the tools or things (such as an http API) that I listed.
There are certainly times I would love to see a presentation like this reformatted as an article.
I tried pulling out the Youtube transcript, but it was very uncomfortable to read with asides and jokes and "ums" that are all native artifacts of speaking in front of a crowd but that only represent noise in when converted to long written form.
Shouldn't some AI be able to clean that up for you? This seems something LLMs should be well-suited for.
---
FWIW, I'm the speaker and let me be honest with you: I'm super unmotivated to write nowadays.
In the past, my usual MO was writing a bunch of blog posts and submit the ones that resonated to CfPs (e.g. <https://hynek.me/articles/python-subclassing-redux/> → <https://hynek.me/talks/subclassing/>).
However, nowadays thanks to the recent-ish changes in Twitter and Google, my only chance to have my stuff read by a nontrivial amount of people is hitting HN frontage which is a lottery. It's so bad I even got into YouTubing to get a roll at the algorithm wheel.
It takes (me) a lot of work to crystallize and compress my thoughts like this. Giving it as a talk at a big conference, at least opens the door to interesting IRL interactions which are important (to me), because I'm an introvert.
I can't stress enough how we're currently eating the seed corn by killing the public web.
Here's an attempt at cleaning it up with Gemini 2.5 Pro: https://rentry.org/nyznvoy5
I just pasted the YouTube link into AI Studio and gave it this prompt if you want to replicate:
reformat this talk as an article. remove ums/ahs, but do not summarize, the context should be substantively the same. include content from the slides as well if possible.
Pretty good, except it’s not Bismarck but Fontane. ;) Also, I’m comparing myself to CGP Grey, not whatever it’s transcribed. :D
Thanks, saved me so much time
[dead]
Parts of the talk remind me of https://www.amundsens-maxim.com/
ha, I wish I saw that while working on that talk! adding it to the resources!
I had the reverse problem a month ago. Greenfield project without existing data, domain model or API. I had no reason to model the API or persistence layer any different than the domain model, so I implemented the same class 3 times, with 2 mappings on top. For what? Well at some point, you will have API consumers and existing data and you need to be able to change the then-existing system.
Interesting, perhaps modern conveniences encourage coupling.
No wonder there are so many single-monitor, no-LSP savants out there.
[video]
i've cultivated the perception of what op calls design pressure my whole career as the primary driver behind code and her shape. i think it's the most important aspect of a successful architecture, and it's purely intuition based, which is also why there's no silver bullet. i've seen people take most well intended best practices and drive them into the ground because they lack the design pressure sense.
i believe that design pressure sense is a form of taste, and like taste it needs to be cultivated, and that is can't be easily verbalized or measured. you just know that your architecture is going to have advantageous properties, but to sit down and explain why will take inordinate amount of effort. the goal is to be able to look at the architecture and be able to see its failure states as it evolves through other people working with it, external pressures, requirement changes, etc. over the course of 2, 3, ... 10, etc. years into the future. i stay in touch with former colleagues from projects where i was architect, just so that i can learn how the architecture evolved, what were the pain points, etc.
i've met other architects who have that sense, and it's a joy to work with them, because it is vibing. conversly "best practices or bust" sticklers are insufferable. i make sure that i don't have to contend with such people.
Zen and Art of Motorcyle maintenance is a good reference.
Also, it is good to remember what game is actually being played. When someone comes up with a popularizes a given "best practice", why are they doing so? In many cases, Uncle Bob types are doing this just as a form of self promotion. Most best practices are fundamentally indefensible with proponents resorting to ad-hominem attacks if their little church is threatened.
That book is such a struggle in the beginning. I was waiting for it to get to the point but I never got there.
Code is for communicating with humans primarily, even though it needs to be run on a machine. All the patterns, principles, and best practices is to ease understanding and reasoning by other people, including your future self. Flexibility is essential, but common patterns and shared metaphors work wonders.
That's terribly short sighted. You can have a very clear architecture and code which cannot support the use cases required without almost starting from scratch.
You can also have the most flexible system ever designed, but if the rest of your team doesn't understand it then good luck implementing that required use cases
Sure, both extremes are shortsighted. I wasn't arguing for that, to be clear. I'm just saying clarity and ivory tower architecturing has little value if your system can't actually support the intended use case.
Which is what the person I was replying to said with "Code is for communicating with humans primarily, even though it needs to be run on a machine.". If the primary purpose is communication with other humans we wouldn't choose such awkward languages. The primary purpose of code is to run and provide some kind of features supporting use cases. It's really nice however if humans can understand it well.
That aphorism is completely incorrect. Code is primarily for communicating with a machine. If the purpose was to communicate with humans, we'd use human languages. Lawyers do that.
The code does also need to be understandable by other humans, but that is not its primary purpose.
So why do we have Java, Kotlin, Scala, Groovy, and Clojure, all targeting the JVM? And many such families?
The only thing that matter to the machine is opcodes and bits, But that's alien to human, so we map it to assembly. Any abstractions higher than that is mostly for reasoning about the code and share that with other people. And in the process we find some very good abstractions which we then embed into programming languages like procedure, namespacing, OOP, patterns matching, structs, traits/protocols,...
All these abstractions are good because they are useful when modeling a problem. The some are so good then it's worth writing a whole VM to get them (lisp homoiconicity, smalltalk's consistent world representation,...)
To allow you to write more readable and extensible code, that can solve real problems more effectively. Solving problems is the point of writing code.
Saying that reading code is the point of writing code is crazy, that's like saying the point of writing scripts is to read them, or the point of writing sheet music is to look at it.
No - the point of writing a script is to have it performed as a play, the point of writing music is to hear it and enjoy it. The point of writing code is to run it.
> All these abstractions are good because they are useful when modeling a problem.
Then what do you do after modeling the problem? You solve it! You run the program! Everything is in service to that.
> Solving problems is the point of writing code.
No one does it in isolation. The goal of having a common formal notation is for everyone to share solution unambiguously with each other. We have mathematical notation, choreographic notation, music notation, electric notation,... because when you've created something, you want to share it as best as possible to others. If not, you could just ship the end result and be done with it.
So no the point of writing music is not to hear it and enjoy it. To do so you just find an instrument and perform. You do not to do anything else. But to have someone else to do it, you can rely on their ear, their sights and their memory to pick things up. Or you just use the common notation to exchange the piece of music.
> No one does it in isolation.
Yes, they do. There are plenty of solo developers out there. And plenty of solo musicians who write and perform music.
Because a secondary goal of code is communication with other humans. That means readability is still a highly valuable trait. Just not as valuable as the primary purpose.
They are different ways to communicate with the machine...
If this were true, we would be writing only assembly in binary (without mnemonic opcodes).
Do you find this an acceptable way to communicate with the machine?
I'd say code is a machine. Even code in a high-level language. Code machine is somewhat special because its details look like words. This misleads us into believing we can reason with these words. We cannot. We can use them to make the machine itself, but the only way to explain how it works is to write a normal technical description and the normal way to understand it should begin with reading that description. (There's no standard for a normal technical description though.)
While you are obviously right about it not being the primary purpose, here it seems the discussion is about designing for long term maintainability vs just running code.
The person he replied said code is primarily for communicating with other people. I'm not sure how else to interpret that than what is literally written down.
Human language is imperfect: they never said "other" people, but just "humans" (this includes oneself, for example).
So you are already not interpreting it literally: none of us can avoid our biases (also, machine code is code too, yet nobody misinterpreted that).
I took that quote to mean that we go through the extra trouble of writing nice code for humans to be able to reason about the code, and especially to update when changes are needed: that makes it the primary reason we invent programming languages instead of going with machine code directly.
You than take primarily to not mean primarily. I mean sure you can do that, I don't find it very convincing. It's possible they misspoke(or wrote) but that's not really the fault of the person reacting to what they said.
No, I take it to mean "primarily" but to refer to a different aspect of the topic.
When it comes to new employees we search for people living exactly this value. And being a nice human is a must. Everything else can be learned.
This reminds me of the concept of “forces” [0][1][2] in design-pattern descriptions. To decide for or against the use of a given design pattern, or to choose between alternative design patterns, one has to assess and weigh the respective forces in the particular context where it is to be used. They are called forces because they collectively pull the design in a certain direction. Just a different physics analogy versus “pressure”.
[0] https://www.cs.unc.edu/~stotts/COMP723-s13/patterns/forces.h...
[1] https://www.pmi.org/disciplined-agile/structure-of-pattern-p...
[2] Chapter 19 in “Pattern languages of program design 2”, ISBN 0201895277
There was a comment on here saying this was an implied diss of SQLModel, but now that I came back to reply to it it's gone. Weird. Since it's implied I couldn't find it in the slides.
I wrote and then quickly deleted that comment; I never want to speak negatively publicly about open source projects — projects that people work incredibly hard to build and maintain. I felt my original comment crossed that line.
In any case, there is a slide in the talk that has both the Pydantic and SQL Alchemy logos. As far as I know, there’s only one (somewhat popular) library that ties these two together. I think the speaker makes a persuasive case that data, domain, API, and other models should remain related but distinct.
Thanks for explaining where it went and where to find it. I am only writing glue code without a domain model so I haven't seen the problems yet.
I'm not sure I'd take design advice from someone who thought attr.ib and attr.s were a good idea. On the other hand he points out that DDD is a vacuous cult, which is true.
I’d call out patternitis and over-OOPification way before I’d criticize DDD. Yes, the latter can go too far, but the two former cases are abused on a much more frequent basis. Happily the pattern crazyness has died down a lot though.
> I'm not sure I'd take design advice from someone who thought attr.ib and attr.s were a good idea
Can you elaborate?
that's a reference to my attrs library which is what data classes are based on. It originally used
as its main api (with `attr.attrs` and `attr.attrib` as serious business aliases so you didn't have to use it).That API was always polarizing, some loved it, some hated it.
I will point out though, that it predates type hints and it was an effective way to declare classes with little "syntax noise" which made it easy to write but also easy to read, because you used the import name as part of the APIs.
Here is more context: https://www.attrs.org/en/stable/names.html
I REGRET NOTHING
For what it’s worth, I was in the “loved it” camp.
(I’m the author of dataclasses, and I owe an immeasurable debt to Hynek).
Thank you for creating dataclasses!
if it's good enough for glyph, it's good enough for me
DDD is nice especially in the first phase. All the concepts are actually rehashed from earlier principles. There’s nothing fully new there.
[flagged]
[flagged]