Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's also the shelve[0] module which allows storing any pickleable object in a persistent key-value store, not just string/bytes. I've found it's very handy for caching while developing scripts which query remote resources, and not have to worry about serialization.

[0] https://docs.python.org/3.10/library/shelve.html

Obligatory pickle note: one should be aware of pickle security implications and should not open a "Shelf" provided by untrusted sources, or rather should treat opening a shelf (or any pickle deserialization operation for that matter) as running an arbitrary Python script (which cannot be read).



There's even an object database built around pickle, ZODB. It even has a networked implementation with replication / failover. Used to be part of Zope, originally written in the 1998 time frame or so - GvR actually committed a bunch of stuff to it.


ZODB is awesome and overlooked, IMHO. I'm biased I guess because I was involved in making Durus which is inspired by ZODB. The ZODB model is not appropriate for all applications (optimistic concurrency control) but for the applications it works for, it's great. Very easy to develop with (no relational-to-OO mismatch) and performance can be great if you design your model carefully. The client caching of data is great. It is a bit like memcache but the data is already there as objects in RAM. The database server will invalidate the cache for you, no manual invalidation needed.


I developed a web application in 2004 using Quixote and Durus. I wonder how many developers outside of MEMS Exchange ever used both of those packages. Somehow I had not yet encountered a proper ORM (I didn't discover SQLObject until later, and Django wasn't out yet), so I liked Durus at the time. That web application is still running, and in the years since then, I've had to handle several escalated customer support cases, and I often wished I had chosen a relational database so I could easily do ad-hoc queries. So Durus probably wasn't the best choice for my application, but that's not your fault. And one thing I liked about Durus was that the implementation was simple enough that I felt I could really understand how it worked. So thanks for that.


My first real software development job was doing plone development which uses zodb as it's persistent layer. Good times!


Wow, blast from the past.. I investigated Plone as a place to worry technical documents back in 2005 at a bank. Makes me want to go check out the project again.


Another problem with pickles with any sort of living code base is when one makes modification to the type itself - renaming, refactoring, whatever. Picking objects (and nested objects) that aren't explicitly meant for data storage/retention/transmissions leads to headaches.

It's best to use another dedicated type or library specifically for this task.


I once wrote a locking wrapper around the shelve module so I could use it as a thread- and multiprocess-safe key-value cache (including a wrapper around the requests modules `get()` to transparently cache/validate http resources according to headers):

https://github.com/cristoper/shelfcache

It works despite some cross-platform issues (flock and macos's version of gdbm interacting to create a deadlock), but if I were to do again I would just use sqlite (which Python's standard library has an interface for).


> but if I were to do again I would just use sqlite

Yeah, I tried to use shelve for some very simple stuff because it seemed like a great fit, but ultimately found that I had a much better time with tortoise-orm on top of sqlite.

If you need any kind of real feature, just use sqlite.


I created a password management utility using shelve and the rsa library for a company that is still using it after 15 years. Though today the company is transitioning to Hashicorp Vault, the utility will still serve as an interface to Vault.


I like to use `joblib.Memory` for my caching as it catches changes in functions which prompt rerunning over loading from the cache and then overwrite the old result.


Can you safely "pickle" Python objects across different architectures and Python version (assuming we forget about Python 2)?


Pickle could in theory could be architecture dependent since __getstate__, and __setstate__ are user provided options. But you would have to try to do that on purpose.

And you don't even have to forget about Python 2! If you use format version 2 you can pickle objects from every version from Python 2.3+ and all pickle format are promised to be backwards compatible. If you only care about Python 3 then you can use version 3 and it will work for all Python 3.0+.

https://docs.python.org/3/library/pickle.html#data-stream-fo...

The reason against using pickle hasn't changed though, if you wouldn't exec() it, don't unpickle it. If you're going to send it over the network use MAC use MAC use MAC. Seriously, it's built in -- the hmac module.


We had a program that was sending a pickled session state as a cookie. We solved that by packing the cookie as a random string, a timestamp, the object, and a MAC. We validated the MAC, then checked the timestamp, and finally unpickle the object. It still bothers me that we are unpickling data passed by the client but I ran arguments against doing it.


There's a version on the pickle format, so you might be able to do it across versions but I suspect the version has changed for security reasons over time?


one big problem with pickle is that deserialization might fail if the object serialized evolve




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: