The Great Migration from MongoDB to PostgreSQL | Hacker News

https://news.ycombinator.com/item?id=39860769

I've run Postgres at large scale (dozens of machines) at multiple companies. I've also run MongoDB at large scale at multiple companies. I like both generally. I don't really care about data modelling differences - you can build the same applications with approximately the same schema with both if you know what you're doing.I don't understand how folks seemingly ignore Postgres' non-existent out of the box HA and horizontal scaling support. For small scale projects that don't care about these things, fair enough! But in my experience every Postgres installation is a snowflake with cobbled together extensions, other third party software, and home-rolled scripts to make up for this gap. These third party pieces of software are often buggy, half-maintained, and under-documented. This is exacerbated by Postgres' major version file format changes making upgrades extremely painful.As far as I can tell, there is no interest in making these features work well in Postgres core because all of the contributors' companies make their money selling solutions for HA/sharding. This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things. I don't really want to run another Postgres cluster myself again. For a single node thing where I don't care about HA/scaling I do quite like it, though.
You'll never see true support for horizontal scalability in Postgres because doing so would require a fundamental shift in what Postgres is and the guarantees is provides. Postgres is available and consistent. It cannot truly be partitionable without impacting availability or consistency.When an application grows to such a scale that you need a partitionable datastore it's not something you can just turn on. If you've been expecting consistency and availability, there will be parts of your application that will break when those guarantees are changed.When you hit the point that you need horizontally scalable databases you must update the application. This is one of the reasons that NewSQL databases like CockroachDB and Vitess are so popular. They expose themselves as a SQL database but make you deal with the availability/consistency problems on day 1, so as your application scales you dont need to change anything.Context: I've built applications and managed databases on 10's of thousands of machines for a public saas company.
Because vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more. There is just few incentives.Especially since:- Servers will keep getting better and cheaper with time.- Data is not only in postgres, you probably have redis, clickhouse and others, so the charge is balanced. In fact you may have different dedicated postgres, like one for GIS tasks.- Those hacky extensions are damn amazing. No product in the world is that versatile.- Posgres has much better support from legacy frameworks like django/ror/laravel than nosql alternatives. People shits on ORM, but they enable a huge plugin well integrated ecosystem that makes you super productive, and PG is happily and transparently handling all that.- If by some miracle you actually reach the point you need this, you'll have plenty of money to pay for commercial HA/sharding, or migrate. So why think about it now?
> vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need moreits less about the scale and more about HA and service interruption: your service will be down if server dies.
Never heard of docker/k8s?
I don't think these two words will buy you HA automagically. You will need 3 layers of various open source components on top, and I am not sure if they will improve or reduce HA at the end.
> This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things.I’m not sure precisely what you mean by “HA”, but, in my experience, out-of-the-box support for the most basic replication setup in MySQL is pretty bad. Just to rattle off a few examples:Adding a replica involves using mysqldump, which is, to put it charitably, not a very good program. And the tools that consume its output are even worse!There is nothing that shops with MySQL that can help verify that a replica is in sync with its primary.Want to use GTID (which is the recommended mode and is more or less mandatory for a reasonable HA setup)? Prepare for poor docs. Also prepare for the complete inability of anyone’s managed offering to sync to an existing replica set via mysqldump’s output. RDS will reject the output due to rather fundamental permission issue, and the recommended (documented!) workaround is simply incorrect. It’s not clear that RDS can do it right. At least Azure sort of documents that one can manually real and modify the mysqldump output and then issue a manual API call (involving the directives that you manually removed from the dump) to set the GTID state.Want point-in-time recovery? While the replication protocol supports it, there is no first-party tooling. Even just archiving the replication logs is barely supported. Postgres makes it a bit awkward, but at least the mechanisms are supported out of the box.But maybe the new-ish cluster support actually works well one it’s set up as long as you don’t try to add managed RDS-style nodes?
> Adding a replica involves using mysqldumpThat's one path, but it is not the only way, and never has been.MySQL 8.0.17 (released nearly 5 years ago!) added support for physical (binary) copy using the CLONE plugin. And MySQL Shell added logical dump/reload capabilities in 8.0.21, nearly 4 years ago.Third-party solutions for both physical and logical copy have long been available, e.g. xtrabackup and mydumper, respectively.And there was always the "shut down the server and copy the files" offline approach in a pinch.
CLONE is indeed nifty. But why is it a plugin? And who don’t any of the major hosted services support it? (Or do they? The ones I checked don’t document any support.)I wouldn’t call xtrabackup or mydumper an out-of-the-box solution.
What's wrong with CLONE being a MySQL plugin? I mean a good chunk of this page is people praising Postgres for its plugins.As for support in hosted cloud providers, that's a question for the cloud providers, no one else can answer this. But my best guess would be because they want you to use their in-house data management offerings, snapshot functionality, etc instead of porting MySQL's solution into the security restrictions of their managed environment.Yes, xtrabackup and mydumper are third-party tools, as I noted. If you needed something out-of-the-box prior to CLONE, the paid MySQL Enterprise Edition has always included a first-party solution (MySQL Enterprise Backup, often abbreviated as MEB). Meanwhile Community Edition users often gravitated to Percona's xtrabackup instead as a similar FOSS equivalent, despite not being a first-party / out-of-the-box tool.
Citus is open source and well financed. This comment may have made sense a few years ago, but no longer.
By "well financed" you mean "owned by Microsoft"?That situation raises a separate set of concerns, especially in the context of Microsoft's main database cash cow being SQL Server, not Postgres/Citus.
How is that different than owned by Oracle?
Yep, exactly. Apologies, my previous comment was semi-sarcastic but in retrospect that was way too vague :)On average, HN leans anti-MySQL, with concerns about Oracle ownership frequently cited in these discussions (mixed in with some historic distrust of MySQL problems that were solved long ago). But I rarely see the same sentiment being expressed about Citus, despite some obvious similarities to their ownership situation.Personally I don't necessarily think the ownership is a huge problem/risk in either case, but I can understand why others feel differently.
I'm as skeptical of MS as anyone. However it is licensed GNU AGPL, so not particularly worried.
I guess some people really, really dislike Oracle (understandably).And MariaDB is lagging behind, less and less compatible with MySQL etc leading to various projects dropping support for it - notably Azure. I wouldn't pick it for a new project.
This depends on what level you consider HA and horizontal scaling to be required. I could make the same argument, based on my personal experience, that postgis ought to be included out of the box. Of course, I'll assume most people don't need it :)
I feel like I have read this exact comment before verbatim
If you like this kind of thing, I also migrated from mongodb to postgresql and wrote about it here: https://blog.stuartspence.ca/2023-05-goodbye-mongo.htmlMy post is more technical, with examples and graphs, and less business-y.
I like your custom prompt. I'll be trying it out for a while!
I have done this migration twice and rethinkdb to PostgreSQL once. At this point I think document DBs are as good as dead for new projects. They will live for a really long time still but are in contraction and rent seeking mode now. Expect MongoDB licensing and hosting to increase in price and languish in terms of feature development from here on out.
Maybe you’re just experiencing the trough of disillusionment, from the Gartner curve? I think it’s fair to call document databases a mature technology now, rather than “dead”.
If your normal DB can handle documents like a doc focused DB but better, than it might no longer be worth it to have it as a category but as a feature
Why did you move from RethinkDB to Postgres? I've very little experience with Rethink and have only played around with it a few years ago with some small project but found it to be really interesting. I'm just curious about what issues you experienced with it.
mongodb is a fortune 100 company and blows through expectations every quarter. doc dbs are far from dead.
> mongodb is a fortune 100 companyExcuse me? MongoDB Inc. posted a mere $1.68 billion in revenue (with a $177 million loss). Coco-Cola posted $45 billion in revenue and ranks at #100. Forget Fortune 100, MongoDb isn't even in the reckoning for Fortune 500.
Huh, I didn't realize database performance correlates so strongly with stock price. Are there Prometheus metrics for this?
I didn't say it wouldn't make a ton of money, there is a huge number of MongoDB instances out there in the wild in Fortune 500 due to the MEAN stack days (just like RoR before it). I fully expect the company to keep blowing away expectations by increasing how much revenue they extract from that existing customer base. That said they -still- aren't profitable, they might be profitable next quarter in fact I expect them to be with the current environment but to only reach profitability now after their glory growth days are behind them seems... well pretty garbage. If anything it reminds of me of GoPro except instead of cheap Chinese clones it's PostgreSQL and JSONB that is coming for their lunch.
PostgreSQL really is eating the database world.Although in this case, the authors originally chose an architecture that was poorly suited to their data model. They had relational data and put it in a non-relational store. This was obviously always going to cause problems.
Over the years I’ve learned two things when it comes to picking a database when starting a new project1 - most data is inherently relational2 - Postgres is pretty good at basically any problem you throw at it, pretty scalably. It will fall over at bonkers amount of scale in some cases but you probably won’t have bonkers amount of scaleThus, if you want to pick any new data store that isn’t Postgres for your project, the default is Postgres. You have to specifically convince me that Postgres is not good enough for this specific use case if you want me on board for anything else as a data store.
It's a pretty good default stance, yeah.We have been trying to convince people to use our new database [1] for several years and it's an uphill battle, because Postgres really is the best choice for most people. They really have to need our unique feature (version control) to even consider it over Postgres, and I don't blame them.[1] https://github.com/dolthub/dolt
In my experience, it's always come down to either using a specific database type for the wrong job, or using a database type the wrong way.- Treating an OLAP like an OLTP.- Using a document store for relational data bc managing relations in SQL is tedious and slows development/progress/features.- Using OLTP for analytics.- and so on...And the usual reaction ensues. Conversations begin about migrating to another product while not really paying attention to interaction patterns and the intent/reason for the various workloads.Of course such migrations only lead to either...- Place-shifting the issues, or- Resolving the issues that lead to the migration while creating new ones that didn't exist before. Workloads that were suited to the existing DB are now mismatched to the new one.However, Postgres seems to be a pretty safe bet when such migrations are undertaken due to how versatile it is. Personally, I don't care which database is used (within reason of course). Just use the right database type for the job.
> As part of that stack, we chose MongoDB + Mongoose ORM because the combination presented least overhead and allowed us to ship quality features quickly. As Sir Tony Hoare states, “premature optimization is the root of all evil,” and there was certainly no need for further optimization at the time.I find it interesting that one sentence claims they made an optimal choice for feature delivery speed, and the next one that they rationalized it as "non optimal as prescribed by Hoare". Never mind the fact that three sentences down the original quote Hoare said "It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail."
The nature of the use-case evolved and, at one point, the MongoDB + Mongoose ORM combination was in fact optimal judging from various aspects including tooling familiarity with most to the primary use-case being the managed cloud service. As the situation evolved, it no longer became optimal and was superseded by various stack-wide adjustments.Regarding tooling familiarity: In the earlier stages of a company, I'd argue it to be important to pick what you can ship with fastest. You don't want to prematurely, for example, setup a complex microservices setup and deployment with K8s for an idea that is in its infancy and that might not even get that much traffic to begin with. Instead, deploying a monolothic app to Heroku might be the optimal choice (YC has a more extreme term for this they call "duct tape engineering"). As the company grows and there is clearer product market fit, it makes sense to further optimize the stack and perform adjustments as needed.Regarding the use-case changing, it wasn't entirely obvious in the beginning that a lot of users would self-host the product, so it was designed more-so with the cloud product / managed service in mind; in hindsight things seem more obvious but not in the moment. As it became clearer that more and more people were going to self-host this product, the team shipped more features to accommodate for a simpler self-hosting experience and, as part of that, the PostgreSQL migration.
At some point during the past 10 years or so, the world realized that document databases are actually a bad idea, and relational databases can do the same job better with just a few easy-to-implement QoL improvements (such as JSON operators in SQL).Meanwhile MongoDB's creators thought this was a great time to make Mongo non-free software, presumably with the goal of making a quick buck from cloud operators.The result is pretty much the landscape we are seeing today, with Postgres reigning supreme, and even SQLite being experimented with in server setups, while Mongo is on life support and MySQL not far behind.
I wouldn't say "Mongo is on life support" – it's actually a very successful business growing 30% YoY on a massive scale. Yet, I agree that the license switch has definitely damaged their long-term ecosystem.
They grew 30% if you look at revenue, but that's not how people determine whether a business is healthy.EBITDA is a better metric, and that number was getting dramatically worse every year until 2023.But even in 2023, EBITDA was -$202M. I don't see it ever approaching $0.So it's still a shit business on top of a shit product that no one ever really needed.
I think anyone who administers mongodb would agree it has a list of very dangerous behaviors for novices.Try to upgrade a running instance node between some versions, and one may thrash object keys or worse. Best to dump and re-load your data every major upgrade to recover all set properties... and there still may be slight differences in the query language later. Wait, your data sets are 1.5TiB per node.... that is a lot of downtime...The cursor, projection, and re-format queries can be very good at reducing reporting traffic. However, the 3 page long syntax can be unreadable to the uninitiated. The json essentially jams a bunch of equivalent SQL queries in 1 transaction, but you still have to be careful as there is no real guarantee of ordering without auto-indices in clusters.mongodb is not as clean as SQL, but works for storing constantly changing meta data. i.e. if you find your SQL schema ends up with tuples that have a lot of poorly designed null entries, table edits, and or simply implements an object storage class abstraction in massive meta-data and object-class catch-all tables.And yes, the mongodb new licensing model telegraphs an unfavorable policy posture. However, I do understand not wanting fortune 500 companies exploiting your teams time for free.Profit is not an indication of quality, but rather fiscal utility. ;-)
I'm actually very bullish on MongoDB.While the article and significant sentiment in threads suggest a push off MongoDB toward PostgreSQL, I do think MongoDB has its own place in the stack and that it won't be obsoleted. I've personally had many pleasant experiences working with it in past projects.Regarding business metrics, I may have a slight bias coming from the startup world but we often value revenue the most, especially in the earlier stages of a company; we can reduce costs and perform various optimizations in the future but what stays at the foundation is whether or not there is strong product market fit such that more customers keep coming in and coming back to use the underlying product — I would say MongoDB's current growth trajectory is in line with that.I'm hopeful overall that they can turn profitable (maybe not this year or next but eventually so).
HN should be full of Mongo's users and advocates, and yet this site loathes it. That by itself is a sign that it's a zombie.The reason we don't just look at revenue is that you can "buy" revenue with ads and sales teams. But if that revenue growth costs more than it pays because your customers aren't sticky, then your business doesn't work. You can't apply startup principles to this large, aging public company.
“This site” loved ReThinkDB. How is it doing?
I was totally expecting someone to come up with some vanity metric here, but a negative EBITDA of several HUNDRED MILLION? Jesus. How did that happen. B2B business should be a symphony of ka-chings for them.
MongoDB is great if what you need is a replicated log or queue. It's what Kafka should have been.
As others have said, you can just use your RDBMS for that now. Mongo is a feature of better products.
Postgres doesn't have a sane and working replication story. (Mysql does, but it's full of legacy warts, of course.)Mongodb is foolproof and works out of the box.The other alternative here isn't Mongodb, it's Couchdb, but that isn't an RDBMS.
You'd probably save time if you directly suggest what income metric you want to use. In the video, Buffet is basically saying he cares about income after interest, tax, depreciation and amortisation. In other words, that he cares about actual income. That'd be very on-brand for Buffet.That measure won't be good for MongoDB Inc. either; they've been millions in the red for years. Losing $100 million in a year would be the best result they've achieved in a while now.
so by Buffet's metric, Mongo is doing even worse
Buffett's criticism is that EBITDA is just accounting and he cares about cash flow, specifically also after paying for CAPEX (which is Investing Cash Flow so comes after Operating Cash Flow)EBITDA is used in the industry because it is a proxy for operating cash flow. Sometimes you don't have all the available data needed to get to OCF, or you're looking at company guidance (for future EBITDA values) or analyst estimates. It's easier to keep the conversation at the EBITDA level because it requires fewer assumptions. Generally the revenue line is ~easy to estimate because you can conceptualize how to go from the current number of customers to some future number of customers, how many dollars per customers and so on and so forth.Then as you work your way to EBIT (Operating Income) you still have to assume some gross margin, R&D expense, etc. These are pretty tangible. It should be pretty easy to get to estimated EBIT from what the company discloses in guidance or what analysts forecast. Since D&A is pretty linear over time, people generally assume it just remains constant as a % of revenue, so now you have EBITDA which is very much like cash flowEBITDA is similar to cash flow because it adds back to EBIT the non-cash expense that is D&A. The reason it's good to look at it before interest and taxes is because you're also thinking about how much cash the whole enterprise generates, not how much cash goes to equity holders at the end (which is often called "Free" Cash Flow because it's not tied up with commitments to others)Coming back to Buffett, in the industries he tends to pick stocks from, CAPEX is a major thing. Companies need to build factories, buy equipment, etc. So if you just look at future EBITDA without accounting for future CAPEX needs, you're fooling yourself.Truth be told, in those industries everyone also looks at "EBITDA minus Capex". Maybe they do so now that he's bemoaned everyone for not doing it in the first place, but IMHO his criticism largely doesn't apply among valuation professionals. Maybe it does for stock traders, but not for valuation purposes like in an M&A context
Buffets point is that it can be gamed to look good. In this case even if they are gaming it to look better than it is.... well it's still awful.
Their "Comprehensive Losses" for the last 3 years are (in thousands)(171,150)(343,375)(309,090)So there you go...
they lost 300 million in one year?
Stock compensation cash flow was positive meaning they added cash to their account.
Are they literally setting money on fire?
People who use EBITDA are either trying to con you or they're conning themselves.
Probably not used in a conning context when it’s 200-300 million in the red.
Document databases aren't an inherently bad idea. They have their uses. I'm using mongodb because it fits my use case: the app is very front-end oriented, and the data the user needs on a specific page fits neatly in one document. There are (quite a lot) related documents, but they are only needed in specific parts of the app, and that's when those are retrieved. Pretty simple.I could argue that in-server SQLite is a bad idea: if you ever need to share the database or distribute tasks, you're fucked. But for some use cases it just works."The world" hasn't realized shit. It jumps from fad to fad, fueled by hyped promises and lack of experience.
What about a Postgres database where you store your documents in a key value table of json objects?What mongodb benefits would you be missing?
What benefit does switching to postgresql bring? When mongodb stops being maintained, I'll consider postgres, but until then, I now only see downsides.
Lot's of useful features?You can join data when needed, proper transactions, schemas where you need schemas.For example you start with a basic (key, json) table. Once you begin to stabilise your schema, add columns generated from your json objects, now you can get proper joins, indexes, validations, on said columns.
I‘m not sure that is the case. What I‘m seeing is that people use the cloud-native document stores such as DynamoDB instead of Mongo. The only thing zoomers hate more than traditional SQL databases is XML.
Doesn't everybody hate XML?
This seems to gloss over the actual rewrite. How did you ensure queries were equivalent? How did you configure the product to be able to read from and write to both databases? Did migrating reveal any bugs? Were you largely able to port your queries 1:1?
You can write logging for both drivers dumping results.Then checking which query doesn't return the same result set.
Hopefully they had comprehensive test coverage. That would have helped greatly in that regard.
Some years ago we had so much problems with MongoDB that we also ditched it for Postgres. Never been happier.Not the reason, but very annoying: Why doesn't (perhaps this has changed?) MongoDB use SQL but Javascript with endless $s? It was always a pain writing ad-hoc queries.
> Why doesn't MongoDB use SQLPerhaps not the right answer, but SQL literally stands for Structured Query Language. I suppose you'd need an USQL?
Google says:'The original full name of SQL was SEQUEL, which stood for "Structured English Query Language". A language which has been structured in English or lets say English-Like Query language which follows the syntax structure of English.'
I use MySQL for everything - always have.Can someone hit me with a few reasons why you would use Postgres over MySQL? I don’t have any familial affinity to any database, but I’m not sure what the benefits to Postgres are relative to MySQL.
Some things that are nice in Postgres and missing in MySQL:- Create constraints as NOT VALID and later VALIDATE them. This allows you to create them without expensive locks.- explain (analyze, buffers). I miss this so much.- Row level security.- TOAST simplicity for variable text fields. MySQL has so many caveats around row size and what features are allowed and when. Postgres just simplifies it all.- Rich extension ecosystem. Whether it's full text search or vector data, extensions are pretty simple to use (even in managed environments, a wide range of extensions are available).Is that (and more) enough for me to migrate a large MySQL to postgres? No. But I would bias towards postgres for new projects.
How about HA and horizontal scaling? I’ve heard that MySQL excels in that area.
I mostly have used AWS Aurora there, which is significantly better than vanilla MySQL or Postgres and both are similar enough.In Aurora, Postgres has Aurora Limitless (in preview) which looks pretty fantastic.As far as running yourself, Postgres actually has some advantages.Supporting both streaming replication and logical replication is nice. Streaming replication makes large DDL have much less impact on replica lag than logical replication. As an example, if building a large index takes 10 minutes then you will see a 10 minute lag with logical replication since it has to run the same index build job on the replica once finished on the primary. Whereas streaming replication will replicate as the index is built.Postgres 16 added bidirectional logical replication, which allows very simple multi-writer configurations. Expect more improvements here in the future.The gap really has closed pretty dramatically between MySQL and Postgres in the past 5 years or so.
I do scaling and performance work, mostly with Rails apps, but a significant amount of the work is database level and not language specific. I've used both postgres and MySQL (and a few other databases) going back to 2000.The best thing I can hear from a company when I start is "We use Postgres". If they're using postgres then I know there's likely a far smoother path to performance than with MySQL. It has better tooling, better features, better metadata.
Two paragraphs about yourself and ending with a very vague sentence that “answers” their question. 10/10
It's context and weight for what would otherwise be just another opinion.
Last time I used MySQL I had some delete triggers to clean up some stuff.Apparently MySQL does not run delete triggers in case the rows are deleted due to a foreign key cascade.Eveytime I used its slightly advanced features, I ran into such problems. With PostgreSQL I do not need to think if this would work.
PostGIS was way ahead of the MySQL equivalent last time I checked.The plugin ecosystem is pretty astonishing. Foreign Data Wrappers... I'm not hands on so much any more but there were a lot of things back when I was.
I'll give you the one that matters to me: in MySQL, you can't do DDL statements (create table, alter table, etc) inside a transaction. MySQL will implicitly commit after each DDL statement.
Aww, yes. It’s so nice being able to wrap giant schema and data migrations in a transaction so the whole process is atomic.
This is exactly what I would argue. PostgreSQL makes it straightforward to create extremely powerful extensions.PostGIS is one such extension, and I would argue that if your use case involves geospatial data, then PostGIS alone is enough of a reason to use PostgreSQL!
Not worrying about Oracle suing you is a good reason to use PostgreSQL or MariaDB.
Back when I selected which database to get more fluent in one of the concerns I had about MySQL is that it was decribed as playing loose and fast with type casts, at least compared to Postgres.
Stability, reliability, consistency. If you do anything with Unicode you're also much better in PostgreSQL. Faster indexes, smaller base install (), and much more complete SQL language support.() note: when PHP was taking off, MySQL had a smaller install base. This has long since changed - PostgreSQL hasn't grown much over the years, and MySQL has, at least since the last time I worked on both circa 2015-ish.
Friendly nitpick, because I had a double-take when reading the second paragraph: Be careful to differentiate between "install base" and "base install". In both cases, you are referring to "installed size of the database in its base configuration". But "install base" commonly means "number of installations". So I was very confused when the second paragraph was implying (using the standard meaning for "install base") that the number of Postgres installations had not grown over the years.
PostgreSQL supports more SQL features and data types out of the box. Also, it looks like MySQL development has stalled after purchasing by Oracle. PostgreSQL has exciting new features in every release, I forgot when anything significant happened in the MySQL world. It's frozen for like a decade now. There're some new releases, but you won't find anything exciting in the change log.
I know this isn’t a technical reason, but my main reason is “eww, gross, it smells like Oracle here.” I’ve been around long enough to know that even being in the same zip code as Oracle is a bad idea.
I use MySQL mostly, but I would love to have a few features from the Postgres world; namely, the better full-text search, key value store, queue and vector search. A lot of projects I have never reach the scale where I need these to be separate data products so the perfectly fine Postgres versions would suffice.
CrateDB might be a good fit for full text and vector search (it’s SQL database but has dedicated clauses for FT and VS).Curious how do you use PG for key/value and queue - do you use regular tables or some specific extensions?I can imagine kv being a table with primary key on “key” and for queue a table with generated timestamp, indexed by this column and peek/add utilising that index.
I've used (and introduced) MongoDB in production. Though I much prefer PostgreSQL, SQLite, MySQL, or SQL Server, for some use cases a document database is fine.However as I discovered myself, once you realise you need to use Mongoose with it you should usually take that as a prompt to consider going relational.Don't get me wrong, Mongoose is a good package. But the things it solves could likely be better fixed by moving away from MongoDB - I'd go so far as to say that in most cases Mongoose should only be added to help an existing project; if you need it from the start you probably should go relational.(YMMV and there will be exceptions.)
Take away is this has less to do with technical reasons and most to do with licensing, which is fair.The other takeaway is the fact that they saw huge gains switching as a result of query optimizations with joins shows that they’re data wasn’t properly modeled for use with a key value store, which probably added to them switching to the right type of store to begin with.
And that’s the core problem with key value stores, your data needs usually grow beyond key value scenarios. In the beginning it might fit, but then you add more pages with different needs, reporting/dashboards with different needs, APIs with different needs, and ETL processes with different needs. Trying to force everything into key value is short term thinking.
What you’re describing isn’t an inherent problem with a key value store. Forcing everything to OLTP SQL is also short term thinking. Pick the right tool for the job.It’s funny you mention growth as the limiting factor for key value stores, if anything that’s the one area where they’re objectively superior. Which is why most search databases and caches are key value stores.
Not if you use a rdbms to store your key/values. Then you can do arbitrary queries with joins over multiple tables, add views, stored procedures, etc.
Sure, but that’s not related to what you said prior. Again, right tool for the job.
Sure. I meant grow in number of ways you use your data, not in volume of data.
What you meant was understood and my point stands
In this world you either evolve to Elastic/Cassandra/etc or return to Postgres :crab:You really gotta search your soul and ask yourself: your cool idea really a database, or just a plugin for Postgres? And the answer is often the latter.
The problem is that postgres is not a replacement. The JSON operations are not atomic which is terrible. If you wanna use PG that's cool, but I'd suggest just avoiding JSON.
> The JSON operations are not atomicI hear this today the first time. What exactly os not atomic, and is there a resource with more details?
Not sure if this is what the above comment means by "atomic", but a shortcoming of Postgres' JSON support is that it will have to rewrite an entire JSON object every time a part of it gets updated, no matter how many keys the update really affected. E.g. if I update an integer in a 100MB JSON object, Postgres will write ~100MB (plus WAL, TOAST overhead, etc.), not just a few bytes. I imagine this can be a no-go for certain use cases.
It drives me batty to see people store 100MB JSON objects with a predictable internal structure as single records in an RDB rather than destructuring it and storing the fields as a single record. Like, yes, you can design it the worst possible way like that, but why? But I see it all the time.
Because schemas. The whole point of nosql is that you can alter your data model without having to reload the whole database
Actually, that's the whole point of RDBs: that you can alter your data model (in most cases) just by a simple DDL+DML query. And it is with NoSQL that you have to manually download all the affected data from the DB, run the transformation with consistency checks, and upload it back. Or, alternatively, you have to write your business logic so that it can work with/transform on-demand all the different versions of data objects, which to my taste is even more of a nightmarish scenario.
Which is great in the early stages of development, but people actually deploy like this
The benefits of going schemaless in the early stages of development are highly suspect in my experience. The time that one might save in data modeling and migrations comes out from the other end with shittier code that’s harder to reason about.
My perspective is that using NoSQL does not save time in data modeling and migrations. Moreover, one has to pay in increased time for these activities, because (a) in most cases, data has to follow some model in order to be processable anyway, the question is whether we formally document and enforce it at a relational storage, or leave it to external means (which we have to implement) to benefit from some specifically-optimized non-relational storage, (b) NoSQL DBs return data (almost) as stored, one cannot rearrange results as freely as with SQL queries, not even close, thus much more careful design is required (effectively, one has to design not only schema but also the appropriate denormalization of it), (c) migrations are manual and painful, so one had better arrive at the right design at once rather than iterate on it.That is, of course, if one doesn't want to deal with piles of shitty code and even more shitty data.
It's not an issue with size. It's an issue with race conditions. With Mongo I can update a.b and a.c concurrently from different nodes and both writes will set the right values.You can't do that with PG JSONB unless you lock the row for reading...
Yes but that simplified write complexity means you are pushing a ton of read complexity out to your application.
What?? That's an insane argument. That's like saying if one client sets column X to 1 and another client concurrently sets SET y = 2, one client's writes will be LOST. It shouldn't, and it doesn't. If it did, nobody would use Postgres. This issue only exists with PG's JSON impl.
What?? That’s an insane way to describe what I’m talking about. Data/transaction isolation is very complex and extremely specific to every use case, which is why database engines worth anything let you describe to them what your needs are. Hence why when one client writes to Y they specify what they think X should be if relevant and get notified to try again if the assumptions are wrong. An advantage of specifying your data and transaction model up front is that it will surface these subtle issues to you before they destructively lose important information in an unrecoverable manner.https://en.wikipedia.org/wiki/Isolation_(database_systems)
So updating one column on a table is destructive and can lose important information now? :DHow do you increment a value in a column while other people write to the database? You don't grab the whole damn row and rewrite it...
Also this statement:> which is why database engines worth anything let you describe to them what your needs areContradicts your argument.Mongo has atomic updates to update specific fields, or you can do replaceOne() etc to replace the whole document.While PG only gives you "replace" with JSON.So I guess postgres isn't worth anything. :)
A JSON object which is 100 MB after compression is a quite huge thing.
How are JSON operations not atomic? Genuinely curious
I think the sense is not "atomic by field" or whatever you'd call that. If you're going to ignore the fact that PG is an actual ORDB and just store gigantic blobs of JSON in it, it will write the entire object whenever you update part of it, because the whole point is you're supposed to store it as a multi-field record.
Isn't that true for Mongo as well?
Yes, but as MongoDB is a document database, storing and updating giant blobs of JSON as a single operation as opposed to breaking the JSON down into individual fields is intended behaviour. This works in Postgres too, of course, but then you lose the relational database advantages on top of the large-single-field issues.All this really comes down to is picking the right database type for the problem you're trying to solve.
I pushed the company I work for to use Postgres instead of Mongo in all new services and refactors. Now I'm leaving it to join a new company, where they use Mongo. Sigh. I'll have to do this all over again.
I don't doubt there are real benefits for them to switch but BTW this is not true:> Difficulty configuring database transactions: With MongoDB, setting up transactions was not trivial because it required running MongoDB in cluster mode with various configuration overhead; this made it extremely difficult, for instance, for customers to run a simple POC of Infisical because it required a production setup of MongoDB.You can run single instance in clustered mode, just a single-instance replica set. You get the oplog, transactions, etc. No advanced configuration required.
And this has been possible for many years.
Are there any disadvantages to using transactions with a single-instance replica set?
We run a single instance replica set for three years in a small application for its transaction function.The only difference we noticed is the additional replica set setting for a new instance. No other disadvantages are found.
Man, that thread feels prophetic. Since 2017 whenever this comment was written:> 100% of my friends who have used Mongo/similar NoSQL have given up and had a nasty rewrite back to pgSQL...I've led three migrations at different companies from Mongo to Postgres, spending months of my life and millions of dollars.There really should be a sociology study of why so many people/companies used Mongo even well after the downsides were clear.
"this hurt in particular because our data was very much relational."Umm... why would you not choose a relational engine to start with? This isn't said with the benefit of hindsight. I worked with Lotus 123 back in the early 90s. I get that there's value in document databases, but even then, there were limitations, and the need for the ability to have structured/related data in an engine that easily allowed for ad-hoc queries out of the box was apparent.I watched the entire nosql/mongo movement arise and evolve over time, and it rarely made sense to me. Even when mongo made sense for a problem... it only makes sense for that problem, and not as the primary basis for your entire application. Relational/SQL, with nosql/mongo/etc as secondary/ancillary data store has always made the most sense (imo).JSON columns in major databases now tend to provide a pragmatic balance of "good balance for many use cases". Justifying something other than that as a starting point is possible, but I've rarely seen cases where it makes sense to avoid a decent SQL engine as the core data store for your business apps.
> Umm... why would you not choose a relational engine to start with?There's typically two answers to this question:When they first started things were changing so quickly they didn't want to commit to a schema and felt like doing real data modeling would slow their velocity down so they dumped everything into Mongo and called it a day.Or they had someone on the team who insisted that Nosql was the way to go. They'll say something like, "I worked on xyz team at Amazon and we just used DocumentDB for everything and never had a problem. Those horror stories about Mongo are all from people who didn't use it right."
I inherited a troubled project using MongoDB and learned a lot about what is, and isn't, really a problem with MongoDB [for this kind of project].BSON is a terrible format, but I genuinely like that I can use the same schema all the way through the JSON APIs to the database. If they have to deviate in future for some reason, you can deal with it then.If you are clever with generics, you can compose type-agnostic DB code with type-safe in-memory data structures, without needing any schema generation or ORM. It's a natural fit for data-oriented programming in my workhorses Go and Rust. Unfortunately, the Rust library is very poorly designed in many ways, making many basic things extremely inefficient, and that's not what you want in Rust. The Go library has really slow reflection-based (de)serialization, but other than that, it's very flexible and robust.If you keep your queries simple, it's easy to mock out and combine with techniques like snapshot testing. Since BSON has a lossless representation in the form of extJSON, you can have human-readable snapshots, diff them easily in review, etc.Doing bulk writes is easy and efficient, and doing sessions/transactions is easy, but you should basically never combine the two. The server can have its own configured timeout for a session, and if your bulk write exceeds that, it fails and retries, fails again, etc. This is a really serious design flaw that shouldn't have made it past review. If they wanted to enforce a maximum bulk write size, that should be known up-front, not depend on how many seconds it happened to take each time.Writing data to a replica set is extremely slow, and every index you add makes it slower. This compounds further with the above.I have concluded that MongoDB only makes sense for read-mostly workloads. If you're going to do your own in-memory data structures and validation anyway, writing out small infrequent changes, MongoDB doesn't do much for you but it also doesn't get in your way very much.
Did anyone here ever see a migration _from_ PostgreSQL/MySQL _to_ MongoDB? I've only ever seen startups pick Mongo, then regret it a few months or years down the line and migrate to a relational database. Did anyone ever see the opposite?
I watched a team go from a sloppy MySQL that they never tried tuning to Mongo (cause it's web-scale). It was another thing that almost killed the company. Now they are on PostgreSQL.
I've seen a couple of successful ones, but there were extremely good reasons: Most of the time we don't need scaling, but when your target Mongo cluster has a couple hundred machines, and your queries really are very simple, NoSQL is not a crazy idea. Same with DynamoDB: Sometimes it really is a very good choice, but you need a use case of the right shape. If you have an extremely good use case for your table to be in 5+ regions, the dataset has billions of rows and you have no hot keys, maybe running your own replication system really is more effort than letting AWS do it all for you.It's just far more common for some tech lead in small company to imagine that they are going to be bigger than Netflix, than for someone to start conservatively with a RDBMS, and realize that suddenly they have hundreds of millions of users and some kinds of data where the relational bits don't matter.
I work with the aftermath of going from postgres to couchdb. The migration was done years before I joined. There are still customer records showing postgres ids.It didn't seem like it made things better since there were issues with dates and lost data. Working with it now isn't fun either.
No, but I am currently in a startup that mainly uses DyanomoDB (similar to Mongo) with support from Elastic for more search-based queries. We've found DyanmoDB to be great and we don't feel hindered by the lack of relational-ness.I will say in case anyone doesn't understand how this might be possible, relational querying is possible in document databases, but they essentially require just different ideas to achieve. In Dynamo, you might need to create secondary indices for example.Personally I love Postgres and would not mind if we had gone that direction but the more I use DynamoDB the less I feel like Postgres is the "only way"
I currently use MariaDB for my self-hosted application. Would I gain big advantages switching to PostgreSQL? My main reason for going with MySQL/MariaDB is the availability and ease of installation on most hosting providers.
> Missing out on relational features: With MongoDB, we lost out on many nice features from the relational world like CASCADE which, when specified, deletes all referenced resources across other tables whenever a target resource is deleted; this hurt in particular because our data was very much relational.I'd be very curious what their data model was in relation to this problem. I wonder if denormalization of the data would have solved the problem without a need for a full database hop.
is it still difficult to create a cluster of pg dbs. either for redundancy or speed ?Last time i advocated for using pg vs mongodb, the person replied that mongodb clustering was super easy.
Mongo absolutely can be the right answer if you need to horizontally scale to tens or hundreds of TBs. Anything below that I’d rather have a small cluster of big Postgres instances. There’s value in a SQL RDBMS which you just don’t get anywhere else.
Read replicas are pretty easy on postgres using replication. That said, you need to be careful about replica lag.If you want to distribute your writes, that's a little trickier. There are options like Citus and such. But still not natively supported.
Second that. A client hat PG replica lag, was doing backups from the replica and found out during restoring a backup, that some hours of data were missing.
Ouch? Are there standard tools to monitor that? Any best practice to avoid? I am considering exactly tha kind of setup...
I don't know about there being a standard tool but you can track the LSNs together with a timestamp on the write node and from there calculate the replica lag by checking what LSN the replica is at
I started my career doing web development using PHP and mySQL. Then pivoted to something completely different for about 6-7 years as I got sick of spending all my time making stuff work in IE6. I heard it the whole nosql thing, didn't understand it but thought it sounded silly. Then came back to web dev 5 years ago and found postgres was all the hype and nobody talked about nosql any more. Glad I skipped it!
Could have told most people this back when everyone was going to NoSQL."You'll be back". Relational capability is just too useful beyond a certain point
> we felt that it already delivered significant benefits by virtue of not being MongoDBWell then, tell us how you really feel.
That's what happens when you choose the next product / framework / methodology because of fashion or because FAANG is giving you the stuff for free, without thinking if it really fits your use case and that 99.9999% of the companies will not have the problems Google, Facebook, etc. have.
Funnily enough, the one feature I don't recommend in Postgres is Cascade. Unless you have a very small controlled set of FK it's not a good idea. Batch delete is less problematic. I wish Postgres would do something about it, like elasticsearch deleteByQuery that can run in the background.
Can somebody please explain to me the use cases in which someone would use Postgres vs MySQL vs something like Cassandra or Convex? I know the latter two have their own types of SQL implementations, so why aren't they used as much, even if they are NoSQL databases?
What is the point of using a query builder if you’re only going to support a single RDBMS? Why not write straight SQL and avoid the unnecessary abstraction layer? Is the query builder really going to be easier to learn to use (particularly for non-trivial queries) than SQL?
If your DB schema is expressed as an object/class it means your IDE will have robust autocompletion. It’s just a convenience to avoid having to look up the schema for whatever tables you’re interacting with. It helps cut down on trivial bugs during the writing phase, before you even get to testing.
It always struck me as odd to see the hard work put into NoSQL databases to make them relational.Somehow, the work to do this seemed less work than learning SQL.
Hoo boy. I did this once. Only it was CouchDB. And they didn’t use any schema. Data had accumulated in it for years. Then a customer said they wanted their data exported to PowerBI.I used a library that generated a type that could parse any JSON document you threw at it by feeding it examples. So I wrote some tools to scan through the database, generate parse test cases, etc.I then wrote a library to version and migrate records.With this I wrote some tools to start extracting meaningful bits out of the big-blob type. I could scan the database, download examples that didn’t work, make test cases, fix those, ask the experts in the system, etc.Then eventually it started spitting out structured data into Postgres and from there into PowerBI.Decent article.
:)))I find the super-plain voice of the annoyed character further amplifies the humor.
Talking to AI 12 years ago :-)
General speaking: don't use mongodb.It was super shitty when it came out and using postgres gives you a better and faster solution which you can also use in other projects.I'm always impressed when I see mongodb still making money.
While I'd agree with the general stance of avoiding MongoDB for any new project, I find the statement that postgres always gives better and faster solution dubious. They don't really solve the same problems. If you happen to really need horizontal scaling, actual HA (not failovers) or documents with many field-level atomic operations, MongoDB might still be a better fit than postgres. For on-premise hosting, not having any HA out-of-the-box can be a major painpoint.
Sharding does exist with postgres.What type of ha can mongo do postgres can't?
Ah, the old "I used a wrong product for my problem and now I complain the product is bad because it does not suit my case." defence.MongoDB is a document database. It is not supposed to be good at relations. Also lacking support with cloud providers and lacking experience with MongoDB is not MongoDB's problem, it is your poor decisionmaking. If you value those things, you should have taken it into account when you were choosing MongoDB in the first place.Now, I am not a big lover of MongoDB. Some years ago I was forced to use it and I had very low opinion of it as quite immature product. It is still far behind in maturity compared to something like Oracle database or PostreSQL, but in the meantime I learned to appreciate some of the things MongoDB is good at.I also admit that MongoDB's transactions are a total joke. It should be prominently placed in the documentation that you are using them at your own risk. I don't use MongoDB transactions anymore because there are better ways to architect your application with MongoDB without using transactions.I like MongoDB for the ease of use when rapidly prototyping things. I like the concept of a no-schema, document database and some of the additional features it provides to deal with the document. I like its reactive java driver which is a breeze to use to construct data pipelines quickly. I like change streams.In the end, I think it is good to have a selection of tools that are good at doing different things.It is our responsibility to chose the right tool for the job. If you chose poorly, don't try to fault the tool for it.
Author of the article here!The article itself does not "complain the product is bad" but some other comments in this thread would certainly suggest so. Instead, the article says that the use-case evolved, states reasons for why MongoDB was no longer suitable (while at some point it was) and why PostgreSQL was chosen instead, and discusses the migration process involved in the transition and the results.Regarding lacking support with cloud providers and lacking experience with MongoDB is indeed NOT MongoDB's problem. However, it is a problem for users of the platform that are trying to self-host it with MongoDB as a database dependency and therefore justifies the PostgreSQL migration with PostgreSQL being a better candidate for this use-case.To be clear, the article is not saying that MongoDB is bad and no fingers are being pointed. I would in fact say that there are an array of use-cases where MongoDB is an excellent choice (e.g. I resonate with the point on rapid prototyping).Please don't skew the words.
To me the article read more that PostgreSQL might have been the better choice from the beginning - but due to lack of experience in the team MongoDB was "good enough" at that point.
I recall when my team chose MongoDB ~2011, Postgres & friends didn't have JSON columns, so there was a lot of extra data modelling that probably was unnecessary.The biggest use case for MongoDB was for huMongous data. Obvs MongoDB was a good fit, because of the name.
Yes, I think it's reasonable to say the NoSQL community drove the JSON columns in the relational databases. And while the JSON-ness of said columns is not necessarily the relevant bit to me, having the escape hatch of being able to just sort of splat out a "other" column has been something I've used several times, and it has relieved some non-trivial stress and strain in relational designs in my opinion. It's not even that hard to migrate something out of the JSON column if you want to raise it to a full column later, and JSON indices mean you may not even really need to.I have a hard time justifying a document database in 2024 when I have 2024 Postgres; I don't know how large the window is where MongoDB will do something Postgres won't, but I know my systems are not in that window. (Assuming it even exists, which I wouldn't be surprised it doesn't, but for the sake of argument I will assume it does.) But a decade ago that would be a much harder decision if I truly had a document-based use case.
Horizontal scaling is still challenging in Postgres. That’s where “NoSQL” systems still have their place. Or one of the SQL systems layered on to a NoSQL architecture, with the trade offs that entails.
I still didn't have a single use-case to use JSON columns yet. Sometimes I feel that JSON columns are made for people who don't want to properly design their domain model. Big mistake.
Interesting point, but I wonder if there's a little confirmation bias and/or circular logic there.In other words that you don't see a use case for JSON columns because you don't think it's good DB design.I know I have used them after decades of RDBMS work, and it feels like I'm "cheating" a bit. But they really do make sense for some use cases that can be harder to achieve with traditionally normalized RDBMS alone. For instance, for configuration options or scenarios where different objects may have a few different attributes, but are largely the same.If you've ever used an entity-attribute-value data model or otherwise ended up with table proliferation or a bunch of nullable columns to accommodate the latter use case, you might appreciate the simplification JSON columns offer.
I think it is the same misguided discussion as strongly, staticall typed vs weakly typed languages.(Mind I mean the discussion is misguided, not you)The problem is that relational model is a one solution to certain problem of when you want to pay the price of enforcing the model of your data. Relational database is when you pay all the cost upfront.With MongoDB I have other options.It does not mean they are necessarily better options. But some of them are better in some situations, for example when rapid prototyping. Or when you need to preserve data in different formats as the application evolves.Personally, if you can, you probably should model your data upfront. But it is good to have other options, sometimes.
The main use case I find myself reaching for JSON/JSONB fields is unstructured map-like metadata, like KV tags on a resource, where the data model/domain has no expectation that any given field is set.
https://www.postgresql.org/about/featurematrix/#jsonYou're right, 9.4/9.5 were when JSONB was introduced and expanded (release 2013/2014).That said, it's a big decision to go with two very very different technologies (relational vs Document store) just for a column type, storing JSON must have been a pretty major product feature?
I have noticed in recent times comments on here have been getting more and more like this. An article I posted a few weeks ago got the same treatment even though I iterated multiple times the same points as I anticipated these kinds of comments from people that didn’t read it fully.
To be fair, Mongodb’s pr team did their best to present it as the solution to all problems.> In 2012, 10gen’s VP of Corporate Strategy Matt Asay argued “there will remain a relatively small sphere of applications unsuitable for MongoDB … the majority of application software that developers write will be in use cases that are better fits for MongoDB and other NoSQL technology … Those functions that really help a company innovate and grow revenue [will be NoSQL].” He would note that we were living in the post-transactional future.https://news.ycombinator.com/item?id=15124306
They were doing that back in 2010. I went to a few conferences where they promoted MongoDB as an all-purpose database that was the "next thing" compared to SQL.Once everyone realized its shortcomings as a general-purpose database, there was a gradual "oh shit this isn't working" migration back to SQL-based databases.I think the big problem is that the MongoDB programming model and APIs are very nice; we really need a database that is still relational "under the hood" but has an API that's more like MongoDB.
I believe one of the job interview problems at Mongo for software engineers, was to design a relational layer on top of the document store.
To be fair, almost all product PR teams do the same. I just ignore most of what they say in favour of my own opinion that I try to build based on reading actual technical documentation, experimenting with the product, etc.Anybody who makes tech decisions based solely on what PR teams say is naive and incompetent at best.
MongoDB came into the community like a ton of bricks, taking over conferences, flooding the zone with mugs and t shirts, and for a time there, you couldn't view the front page of hacker news without at least two MongoDB posts. The death of SQL, now known to be an inferior relic of the past, was a regular topic of discussion. There's a reason why the "Web Scale" meme is so famous, because that's what actually happened for awhile there. SQL was an inferior, leaky abstraction, and ACID was not "web scale". it became all about the CAP theorem (which I hardly ever see anyone writing about these days).only a year or two later, when teams that went all in on MongoDB at the behest of MongoDB's marketing department started realizing they'd been sold a bill of goods, did the slow and arduous march back to what continued to be the best generalized solution for 95% of data problems, the ACID compliant relational database, begin to occur. given that, this blog post seems really behind the times.
> To be fair, almost all product PR teams do the same.No, mongodb’s PR approach and results were largely uncommon at the time. They didn’t advertise directly but instead targeted and amplified dev communities - as detailed in the article linked earlier.The result was that people not associated with mongodb talked about mongodb at various dev conferences and in blog posts. People didn’t want to listen to PR teams then too, but they followed their peers.
Again, I will stick to saying that if you use technology you have no experience with, the problems resulting from it are on you.Testing out technology on important projects is reckless and is a risk a lot of people take but push away any responsibility for.As a professional, before I suggest to use a technology at a project, at first at the very minimum I spend some time with it to learn the ins and outs. Every technology has cons. Just because you don't know them does not absolve you from responsibility for them.The reality is that most applications are boringly similar once you define the problem in simple words. There is very rarely a need to use new technology. It is my experience that it is usually better to use technology that is not necessarily cutting edge but one that you know well and that you can easily hire people who know it well.Because no technology is perfect, but it is much better to know the cons ahead of time and know you will have people who will know the problems and who can find reasonable solutions for.I am personally sick and tired of people who first advocate for new technology to be added to the project, touting its supposed benefits, not knowing the cons, then don't want to take responsibility once we actually find the dead bodies.
The funny thing is that they chose Mongo precisely because they were familiar with it...
As a counterexample, Azure Cosmos or Amazon DynamoDb are marketed as specific solutions for specific problems. Other RDBMS like SqlServer or Postgres are marketed as general purpose solutions because they are in fact general purpose solutions. Mongodb, on the other hand, is a specialized "document database" marketed as a replacement for relational databases.
> To be fair, almost all product PR teams do the same.As it is their "duty" to twist the truth in this way, it is our role as the users of these technologies to present things as they are.
mongo was absolutely one of the most egregious (daresay dishonest?) purveyors of such myth. it was quite off-putting for me and my team.
In 2010-2011 time frame the mongo team sponsored an effort inside of Twitter to replace MySQL with mongo. There was a group of us that worked to migrate the tweet store to Cassandra that were able to talk with leadership and get that initiative killed. Turns out migrating HIGHLY structured data into mongo was never a good idea, especially at that scale.
The issue isn't that mongodb isn't suitable for some problems. I've used it before and it has not caused issues. It's that the problems it is suitable for are a subset of those that postgres is suitable for.I can just as easily rapidly prototype in postgres. If I want to store schemaless JSON in postgres I can easily do that.>In the end, I think it is good to have a selection of tools that are good at doing different things.In the end I think every tool needs a niche - something it is good at that its competing tools are not good at.Mongo doesnt have that.
If mongo is just a document database, there's really no reason to use it over elastic. The query story is slightly nicer with mongo, but we're not doing relational algebra here, right? Elastic crushes mongo at literally everything else.
ES had a bug where a syntax error in a query would send it in a weird state that would then give wrong answers to all queries until the process got rebooted. Which of course isn't fast.I reported the issue, and thankfully I changed job and never had to deal with ES again. After a few years they contacted me to ask if the issue was still there, I said I had no idea.I have no idea if the issue is still there.
i guess elastic is more heavy setup for some POC.
That might have been true in the past but it's not hard to PoC with elastic using docker compose.
Elastic is much more of a pita to maintain and monitor than Mongo
So what is a good use case for Mongodb? I have never seen an application where a "document database" would have been a good choice. In every project where I have seen Mongodb, it has turned out to be the wrong choice.Even for prototyping there are many other good choices since RDBMS like Postgres have implemented JSON support. Mongodb looks like a solution in search of a problem.
Applications that are essentially form wizards are a great fit for a document database.Think application forms etc.
Is there a benefit of Mongo for that over just Postgres with JSONB columns? You're still storing JSON in Postgres but you'd get the relational aspects too for things like users having many forms, billing and account relationships etc.
No benefits that I can think of at the database layer. Postgres's addition of JSONB columns represents the best of both worlds. Funny to think that all of that noise about nosql replacing rdbms was essentially nullified by the addition of one column type.Some people do like the MongoDB API, however.
Easier horizontal scaling and organizational inertia would be the main reasons to use mongo over a jsonb column. I wouldn’t introduce it to a psql shop if they are already great at running psql.
Sorry, 100 times out of 100, "rapid prototyping" means "we built this with the wrong database, now we're stuck with it". If it isn't obvious how to store your data up front, then you either aren't planning your software well, or you don't actually know what you're building.
I was also a bit confused by that part. What is it about Postgres that prevents rapid prototyping? It's not like the schema is set in stone, or even particularly hard to change once you've created it.
Well, yes and no. If you already have your database running in production with many transactions and live connections 24/7, changes to the schema might not be hard per se but always needs careful planning and execution. Additionally, in Postgres a change like adding a column locks the whole table (although I hear this is going to change) so writes are off for a short time. If this becomes unacceptable, you go for a blue/green deployment which has its own gotchas.So, while I agree with your main point - there is nothing in Postgres that prevents rapid prototyping, and I would chose it over Mongo any moment, I understand why some people might prefer the more "dirty" approach.
True enough, and the solutions for lighter-weight schema changes have evolved and weren't always so good, but early stage startups often don't really have the sort of data-weight issues that make this hard, unless they're starting with large data sets already. Even at the scale of small hundreds of millions of rows per table (like the company I'm in charge of the database for) it's not much of an issue.I had to look it up, apparently adding a column with a non-null default was "fixed" in PG 11 (2018), but with a null default it had been fast for a while:https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-...
> MongoDB is a document database.Thank you. A one sentence clarification of the issues in Mongo vs RDBMS.
It's a bit more nuanced than that. PostgreSQL supports JSON/JSONB data types so if you need, you can use it in a way similar to MongoDB, with less structure and the remaining caveats, but with transactions and all other goodies working out of the box and being able to use standard RDBMS features if needed.
> I like MongoDB for the ease of use when rapidly prototyping things.I think this is similar to when people say writing tests speed them up through TDD, then others don’t write any tests at all. What I take away is you can do it either way as long as you’re bought in and are familiar with the process.For me it’s no trouble at all to define my schema in advanced. And Postgres provides me with the json column escape hatch. Combined with the massive benefit of the data validity guarantees later in the app’s lifecycle I pick Postgres every time.
"It is our responsibility to chose the right tool for the job."That perspective doesn't work well for database products, in my opinion. There is a huge pressure for databases to evolve with your business and applications and to adapt to whatever your throw at it.Swapping out a database product is less like changing tools and more like changing a foundation. You can't do it every time a new problem arises.That's not to say you can't use a few different products if that makes sense. But that has its complications.
>As part of that stack, we chose MongoDB + Mongoose ORM because the combination presented least overhead and allowed us to ship quality features quickly. As Sir Tony Hoare states, “premature optimization is the root of all evil,” and there was certainly no need for further optimization at the time.Looks like the tool did a great job while they were getting started. Mongo is very easy to hack something together with, later on it looks like they just needed to migrate to something more stable
yeah ... in the early days of mongo it was pitched as a replacement for relational because a) it had faster writes without those pesky joins and constraints, and b) because it's too hard to model data, so chances are it's just unstructured. the onslaught of "relational is dead" and "joins are annoying" was tremendous
Plus there's a good chance they didn't know exactly what they needed to build when starting out.
Postgres and MySQL are easy to hack something together with, too.
That's assuming you're experienced with them, if you know next to nothing about databases, Mongo is a much better choice. In this case it looks like when they got started the team just didn't have all that much experience with databases.
Nonsense. Choosing the correct database isn't "premature optimization", it's the bare minimum of being a competent programmer. Why do we never have time to do it right, but we always have time to do it over?
It's generally faster to get started with Mongo.If you're talking about a startup, you might not even get to the let's do it over stage. You might have six months to just hack something out so you can raise more funding, and in that case Mongo is a great choice.The needs of your business might also just evolve over time.
To be fair, it burst onto the scene back in the "all you need is NoSQL...RDBMS is dead" era.So it became one of those software dogma things and I remember having "debates" right here on HN, including around the necessity of transactions.So it's kind of ironic to see people now chastising those who fell prey to the dogma. Full circle.
> MongoDB is a document database.So can PostgreSQL be with JSON types and queries.
The irony of it all is that MongoDB actually matured and you do not have to jump through hoops to have, say, transactions.I am ALL for Postgres, I even wrote a post about the importance of being fluent in relational databases: https://renegadeotter.com/2023/11/12/your-database-skills-ar...What grinds my gears is the whiplash caused by this frantic stampede from one hype to the next.I am just waiting for the AI hype to blow over and for everyone to rediscover Hadoop, or something.
Postgres is not hype though. There's very few legitimate use cases for not using a RDBMS as a main data store, and Postgres happens to be the most popular nowadays, for good reasons.
In my opinion the use case is when your data no longer fits on a single, potentially very large, server. Automatic distribution and replication and rebalancing of data is a tricky problem and something like Cassandra handles those very well.Or Cockroach DB or similar, if you still need relational capabilities.
I've nothing against Postgres but MySQL is a wonderful option that is likely still in far higher and heavier use, when considering open source databases.
In my mind, MySQL no longer exists. It’s a weird Oracle thing that I wouldn’t touch with a lineman’s pole. MariaDB is what happened after MySQL disappeared.
I think it gained popularity over a decade ago, when it was faster because it was unsafe.In the end postgres is the better one. With mysql the defaults are bad.
I wrote this in another comment, but it's relevant here I think:" I do scaling and performance work, mostly with Rails apps, but a significant amount of the work is database level and not language specific. I've used both postgres and MySQL (and a few other databases) going back to 2000.The best thing I can hear from a company when I start is "We use Postgres". If they're using postgres then I know there's likely a far smoother path to performance than with MySQL. It has better tooling, better features, better metadata. "Right now I would not choose MySQL over Postgres at all, ever. I can't think of a single way it is materially better.
In my experience MySql is more popular. Worked at several companies that used MySql, none that used Postgres.Kids at universities still use XAMMP to learn databases.
In my experience MySQL is still commonly used in the PHP world, but everything else is mostly Postgres
The number one reason being that it’s free. In a purely technical comparison it is not at the top of the list.
What's top of the list technically?
Oracle
Oracle is a giant mess, is the sense I got from trying to work with it in the past.While I am loathe to recommend any Microsoft product, SQL Server is at least coherent, and has actual transactions like Oracle (and not just "checkpoints")
Aren't MSSQL and Oracle just doing redo logs, like MySQL?What do you consider not actual transactions?Postgresql keeps dead tuples in place until vacuumed, which seems optimized for rollbacks. But isn't so bad if you're inserting much more than deleting or updating.
Nestable transactions = “real transactions” to me, I guess.I almost don’t see the point in non-nestable transactions
SQL standardizes SAVEPOINT for such nesting. Is that not sufficient? FWIW Postgres supports it.
Anybody that has ever used Oracle knows how nice it is. But very few people have that experience because their company isn't spending millions of dollars on the database server.
Yes, that was my point.I get that working with Oracle and their army of lawyers is not cheap or pleasant but the db is still excellent.
> There's very few legitimate use cases for not using a RDBMS as a main data storeI don't think that's true—all the old cases that were legitimate before are still legitimate. The price of both persistent storage and memory has simply come way down to the point where many computational workloads are viable under vertically scalable databases again. I suspect the pendulum will swing the other way one day yet again (and of course there is a rich ecosystem of ways to query horizontally, if not transactionally, with SQL that will likely temper this swing even further).
The price of persistent storage and memory was the same for Mongo as it was for Postgres back when the NoSQL movement happened. Mongo wasn't made of magic scaling fairies and still needed resources. As soon as you cranked up the safety of Mongo to a nearly acceptable level it's performance fell through the floor. The only reason people though it performed amazingly was because of Mongo's rather deceptive marketing.I use the term "nearly acceptable" because for a long time Mongo's data consistency was absolutely crap (https://jepsen.io/analyses)Personally I think people used Mongo because it was new, shiny and exciting and it promised performance (though it didn't really deliver for most people).
I am certainly not going to defend mongo ever having being a good choice; I was referring to other "NoSQL" databases (namely, the large number of highly scalable key-value stores).
But you wouldn’t use a key value data store as your main store. It’s a specialized kind of store. Of course Redis and friends have their place. Just not as a main store.
In addition to changes in ephemeral and persistent memory, the other big difference between now and nosql’s heyday is improvements in distributed relational transactional databases (newsql).We haven’t exactly circled the square on CAP but we’ve certainly bent it.
Postgres is no hype, it was already in plateau of productivity when MongoDB came to the web scale scene.Hadoop just like Mongo had their hyped time in the sun, but RDBMS are far too advanced and versatile than any of them were.
I am not calling Postgres hype - it should have never been NOT hype. It's a reasonable default for most problems,Now all I read about is how Postgres is awesome, as if it's this great new thing. I guess that makes sense, as the new generation of engineers is rediscovering stable, reliable, lean technologies after a decade of excesses with "exotic" tech.For grey beards, it's all very odd. Like, "where have you all been?"
I remember Robert C. Martin ("Uncle Bob") going on about how No-SQL will replace literally all SQL and that there is literally not a single use case for relational data and SQL. I wonder if he ever came back on that.Now, my opinion of Martin in general is not especially high and he's a bit of a controversial figure, but it wasn't just the kids. And Martin is also all about reliable software, so that makes it even more surprising.
I also remember Michael Stonebraker basically saying something, but being ignored...Along the lines of "I will see you all map reduce and nosql people in 20 years, after you have all reinvented SQL...."Not an actual quote but a factual statement
Because he realized you can parse SQL and generate the equivalent of a map reduce job on the backend to execute it.
He's always been fundamentally a salesman. Hopping on the hype trains lets him sell more.
He's always been fundamentally a zealot who sees most narratives in black and white terms. It's an extremely unfortunate characteristic of a software engineer.It's the same reason he's so pro Trump I think.
I didn't even know about that; I checked his Twitter, and first thing is a reply to the Baltimore bridge collapse:> Doors and wheels fall off airplanes. Hundreds of illegal immigrants assault and overrun border guards in Texas. We can’t even muster support for our allies in the security council. But at least our pronouns are in order.https://twitter.com/unclebobmartin/status/177262869295654553...You really need to be a special sort of individual to go from a collapsed bridge to ranting about pronouns with no prompting or any prior mention of that. What a complete spanner.
That happened to esr, too, although he derailed right after 9/11, and it's been downhill ever since.
From the other side, people like me, see your odd dismissive behaviour in the same light. This bridge and hundreds of other things going wrong throughout the world atm are all symptoms of some sort of rotten problem, and I honestly don't know why you guys are so adamant about dismissing it. Almost as if you want to see it all come crashing down.Pure evil, as Trump would call it.
Yes, a single accident of the type that's been happing sine bridges were invented, pronouns, and unconditional support for an out of control government in Israel are all deeply related...Or he's a hyper-politicized idiot completely consumed and obsessed by some topics and needs to talk about them all the fucking time at the drop of a hat, whether appropriate or not.It has nothing to do with "sides"; it has to do with the forceful injection of completely unrelated matters.
>of some sort of rotten problemAgree. Out of control neo-feudal lords taking up all the slack out of every system they can get their tendrils into in the name of efficiency, thus endangering us all in order to extract as much profit as possible out of our entire economic system.The "other side"'s solution tends to be to further deregulate industry and the economy, and to empower and lower the tax burden of these neo-feudal lords, all while pointing at gays, trans-folk, and immigrants as if they are the reason people can't afford rent and bridges are falling down.
Another man opens the morning newspaper, and after scanning the headline, declares: "Honey, this planet is going to hell".
Is the rate of these incidents increasing, decreasing, or about the same?
Obviously you can say I'm biased but I think they're increasing.
>From the other side, people like me, see your odd dismissive behaviour in the same light. This bridge and hundreds of other things going wrong throughout the world atm are all symptoms of some sort of rotten problemJust one problem, huh?This illustrates what I meant about black and white thinking.
Not sure what angle you're going with. If you want to say there that there is more than one rotten problem at the core of society, go for it. I'd be the first to agree with you.This is not black and white reasoning. Each problem can be on a continuous plane, from "slightly bothersome" to "there is an absolutely horrible evil lurking here".
The reason he says he voted for Trump is taxes. But that seems like a pretty flimsy reason. Have dems in the past couple decades ever reverted any republican-instated tax cuts for the wealthy/corporations, or even made any moves for tax bumps? In my experience dems are just republicans, except they wave a rainbow flag every now and then.
Unpopular opinion, Robert C Martin has caused more harm than good. People take his ideas and blindly follow them no matter the cost. I’m looking at you clean architecture.
Well yeah, solving problems and thinking critically and learning are hard and boring. "Uncle Bob" gives you handy alhorisms you can spam on your next pull request and then feel like you've done something.
There's definitely a missing piece he hasn't explained well enough. I've seen the best code in the world because it was clean architecture and I've also seen some truly atrocious code in the style of "clean architecture".To me the difference was in recognizing that the architecture advice is implicit - its not actually written down in the code, rather the code exemplifies it - and the business domain is explicitly modeled. When developers name everything "use case" and "interactor" and go crazy with packages and folder structures with architectural names it's a recipe for terribleness.
That's as ridiculous as saying "integer division will be useless in 20 years". Sometimes relational databases really are the optimal solution to certain problems. I really wish CS had better authority figures, although we shouldn't need them in the first place.
His words: "I am also self-taught. I never got a degree of any kind. I entered the industry at the tender age of 17 and never looked back. Frankly, school isn’t all it’s cracked up to be — especially in the software field."https://twitter.com/unclebobmartin/status/107212575854868889...Which means his opinions are more often than not uninformed opinions and should be taken with a grain of salt.
When you turn 23 in a couple years, you’ll look back at college and realize you weren’t at the apex of all knowledge back then.
I think you overestimate college education a little too much. Having a degree doesn't make your opinions automatically better than those that didn't have the chance to get one. You wish it did, but it don't.
My remark is not about the quality of college education. It's about his mindset.You don't need a college education to read research articles, books and journals. But you certainly need an open mind and some discipline to not try to reinvent the wheel all the time. You need to learn about the state of the art in your field.Some ideas and algorithms are proven and battle tested, and people should learn about them, before trying to create something new and claim it is much better than the old thing. This is very noticeable in some fields like cryptography.His modus operandi certainly seems to be on the other extreme, everything old is bad and the new thing around the corner is going to change everything. Which causes a lot of people who believe him to collectively waste millions of hours of their time.About your comments on the degree... it really depends on how you got it. For most people, I would agree with you. A small group of people in academia are insanely smart and are focused on research.
Postgres as we know it has only existed for about a decade, since the post-9.x era in 2010-2014 when many of its lauded features were added. Replication, heap-only tuples, extensions, foreign data wrappers, JSON, leveraging multiple indexes, parallel index scans, reindex concurrently, JIT compilation, declarative partitioning, stored procedures, and major WAL improvements are all "recent".I love Postgres and it's been my go-to default since 2014 and you can pry it from my cold dead hands, but it's more contemporary with Perl 6 than Perl 5 if we're talking grey beards.
Extensions and stored procedures are not; they've been around for longer.I do not actually use any of the others you listed (well, occasionally bitmap index scans, but it's not a killer). The main difference between 8.x and 15.x for me is the small stream of optimizer features and tweaks (e.g. at some point, it started being able to reorder outer joins).
Postgres 8 supported functions capable of returning tuples (RETURN QUERY), but lacked transaction control and other features that CREATE PROCEDURE added in Postgres 11 (2018).Extensions in Postgres 8 (and earlier) were also half-baked compared to 9.1 (2011), which added CREATE EXTENSION and other quality-of-life improvements including standardization.The latest complete implementations are what most people have in mind when talking about them.
Sure, there are improvements, but it's not like Postgres was a radically different beast back then.Would I pick Postgres 15 over Postgres 8? Yes, any day of the week. But I would also readily pick Postgres 8 over most of today's databases (including e.g. the latest MySQL and MongoDB versions), for most projects, if that was the choice I had.
I would not have risked my business (or budget) on Postgres 8, even with Slony-I. My appreciation for its well-written code and thoughtful architecture simply did not outweigh what it lacked for a reliable operations experience.MySQL foot guns were mostly under control by 2010, and it was operationally efficient to manage a cluster of MySQL servers in diverse configurations with mostly-known boundaries to what could go wrong.The Postgres foundations are the same today, but what you can build on top is very different today than it was pre-9.x.
I operated MySQL clusters in the late 2000s. I still am surprised how many hoops you have to jump through to set up replication in PostgreSQL than what you could do - out of the box - with MySQL 15 years ago.
> I would not have risked my business (or budget) on Postgres 8, even with Slony-I.This assumes you need replication, though. You can get pretty far without it, especially given how wide hardware you can get now.> MySQL foot guns were mostly under control by 2010Eh. :-) I think you would have to specify which footguns in particular.
If you're doing tens of millions in revenue and the RDBMS is critical to your operations, I'm a firm believer you need replication. Keep in mind this was the era of spinning rust and a day-old backup could take hours to restore even with a SAN/DAS. A replica was for more than just distributing load, it was for resilience to keep the business healthy and avoid a SPOF.Risks to the business are more important to me than my enthusiasm for a technology or product.
Replication can be for HA, not just for scale. All depends on your business requirements.Also replication can be good for other operational reasons, such as zero downtime major version upgrades. Again depends on the business need/expectations.
Excited, vocal users make the most noise.Somehow the entire Mongo DB era passed me through and I never used it once. I used to use MySQL in the 2000s, and switched to PostgreSQL in the 2010s.
They used it at my 1st job, a startup.They had an ORM which made everything much slower, and the most common operation they did was to add items into lists, which causes the whole document to be resized and requires it to be moved.It could have been done with 1 sql server, but we needed 4 really heavy mongodb servers instead. And of course our code was buggy so the schema was all over the place.
You haven't missed much TBH. I think the only case where you'd use it in new projects in DynamoDB if you work with AWS.
Dynamo is not mongo?
No, DocumentDB is Mongo. What I understand the OP parent referred to is the NoSQL hype era, not just Mongo which was probably the most popular one at that time.
My company considered it, but went with Cassandra instead.
This is what happens when you hire 20-somethings who don't know how computers actually work to build your entire business. You either learn about relational databases in school, or through experience. Relational algebra has been around for half a century, at least. If someone doesn't know about its utility, it's a failure of the education system, or a failure of corporate incentives.
> For grey beards, it's all very odd. Like, "where have you all been?"To be frank, PostgreSQL has evolved a lot since the late nineties. There was a time where people preferred MySQL over it at it seemed to work faster, certain things were easier and so on.
I used postgres in the 90s. The biggest complaint was that postgres's default config needed a change so you could connect to it remotely, as the default, wisely at the time, decided that allowing connections from any IP by default before you changed any security setting whatsoever was a real problem.The MySQL performance advantages were large, with the small caveat that it was fast as long as you were using MyISAM, which achieved its speed by throwing away many safety features, like working transactions and foreign key constraints. Unsurprisingly, when eventually it got an engine that didn't throw caution to the wind, the performance advantage went away.I spent a lot of time in the early 2000s showing people whose that many of the issues they were facing with their e-commerce websites were ultimately down to the MySQL engine they had picked. But it's faster! Yes it is, but now your application code has to deal with all the things the engine has now offloaded to your application code.It's not that Postgres' changes were key to increasing market share, but that as MySQL matured, it had no options but to lose its 'advantages'.
That mirrors my experience. MySQL was faster for use cases where data loss was acceptable. Will anyone die if you lose some web log analysis stats? Probably not. You wouldn’t want to use it for anything where guaranteed consistency was critical, like anything involving money, but it was an alright replacement for things like an in-memory data store.I say “was”. I hear it’s vastly better now.
If you want a fast DB for web log analysis, you shouldn't look at MySQL or PostgreSQL anyway. ClickHouse will run circles around both of them for the use case of web log analysis.
That was an example of the sort of thing that someone might reasonably have used MySQL+MyISAM for in the past. I wouldn’t use that combination at all today.
There was a time PostgreSQL did not run under Windows, and that is IMO what gave MySQL the market share edge.Without these windowless years, PostgreSQL would be the most used RDBMS right now.
MySQL was more popular for a while, but from a technical standpoint Postgres was always ahead.
PostgreSQL didn’t exist in the 1990s. It was called Ingres. Postgres started as a personal project and was first released in 1999, but was unusable. Around 2004 the project started getting popular and hipsters started screaming that MySQL didn’t have stored procedures and wasn’t fully ACID. Acid became a buzzword for PG fanboys, even though data corruption and memory leaks plagued the system for the first decade. It became a stable around 2008 as long as you didn’t mind restarting your database every few days. PostgreSQL didn’t really become a viable option until around 2010.
...and, continuing the cycle next season, devs will (re)discover NoSQL and document DBMS. I do wonder if this phenomenon of trendy tech is somehow integral to advancing the state of the industry, the SWE equivalent of early adopters. "Rediscovering the old" is a side effect, as old tech is used and reinterpreted in new ways. Kind of neat, really.
There’s an entire industry of people selling knowledge of the “next great thing/the one true way," but they need a new thing every few years as people learned whatever the old thing was.
MongoDB is more popular that it ever has been. You can look at their quarterly filings and learn this. The NoSQL space is larger than it ever has been and is the fastest growing segment in the database world.
> The irony of it all is that MongoDB actually matured and you do not have to jump through hoops to have, say, transactions.Are transactions still limited to a single document in Mongo?
Generally with these migrations biggest problem in my experience is that all the operational learnings are not transferrable.You often want to create expertise and achieve operational excellence with the tech that you start out.
Back in the day we used to say "you choose your database once". You can swap in/out a lot of things, but your data store is really the core.It is shockingly hard to change your storage engine once you are serving customers, so you really want to get that right.
Exactly, languages and frameworks come and go - but get the data layer right and you’re set.
Published

in bookmarks

© 2010 - 2024 Daniel Nitsikopoulos. All rights reserved.

🕸💍  →