Boogie query sets¶
Django’s ORM favors using the active record pattern (access a row at a time, wrapped into a Python object). We believe it is often a poor abstraction for using databases and can often lead to inefficient usage patterns and a poor architecture. Boogie implements a few extensions to default Django’s query set and managers APIs in order to favor more data-driven approaches.
Fancy slicing API¶
Boogie managers and querysets implements a fancy indexing interface inspired on Numpy and Pandas. In Boogie, we want to see the database as a 2D table of scalars instead of a collection of complex objects as is implied by the ORM.
By doing so, we loose some encapsulation, but on the other hand, it avoids a host of potential problems such as race conditions, ineffective usage patterns (specially, the N + 1 problem), coupling of business logic with storage, and doing so we often avoid some unnecessarily verbose APIs.
In order to illustrate fancy indexing in Boogie, let us start constructing a small group of elements. First the model:
Now we create a few users, saving them on the database.
john = User.objects.create(name='John Lennon', age=25, pk=1)
paul = User.objects.create(name='Paul McCartney', age=26, pk=2)
george = User.objects.create(name='George Harrison', age=22, pk=3)
ringo = User.objects.create(name='Ringo Star', age=29, pk=4)
If you are familiar with Pandas, Boogie API is highly inspired by the .loc attribute of a Pandas data frame (which in its turn is similar to fancy indexing in 2D numpy arrays). The metaphor is that a Django manager or queryset represents a 2D table of values: each row corresponds to an object and each column corresponds to a field. Fancy indexing allow us to select parts of this table in ways that avoid instantiating lots of different objects.
Let us start with the simple bits. Each cell is indexed by a row and a column. We can fetch the content of a single cell like so:
>>> pk = 1
>>> User.objects[pk, 'name']
'John Lennon'
Of course we can also use an assignment statement to save/modify values in the database
>>> users = User.objects # A simple trick to save a few key strokes
>>> users[pk, 'name'] = 'John Winston Lennon'
This prevents an unnecessary instantiation of an User object, and the overhead of calling its .save() method to hit the database. Notice this operation is carried exclusively at the database level, and any custom logic implemented in the .save() method will not be executed. In fact, we strongly discourage putting complex logic on .save(), or putting business logic in the model at all.
Boogie only activates when users use 2d indexing. This is a deliberate decision to preserve compatibility with the slicing syntax of Django query sets. Thus, in order to fetch a single row from the table we have to use the notation:
>>> users[pk, :]
<User: John Winston Lennon>
- 2D indices are interpreted as [rows (by pk), columns (by name)]. This is
different from Django semantics for queryset indices, which are interpreted as the positions associated to each item a set of objects.
Thus
users.all()[0]
returns the first element ofusers.all()
, whileusers[0, :]
returns the element with pk=0.
The scalar 2D access is very limited and we often want to access a group of fields of an specific row all at once. Fancy indexing comes to rescue:
>>> users[pk, ['name', 'age']]
Row('John Winston Lennon', 25)
Assignment is also supported:
>>> users[pk, ['name', 'age']] = 'John Lennon', 27
In all those examples, we are interested only on a single object/row in the database. Boogie also accepts selectors for multiple rows. Let us extract a single row from the database: for that, just use the standard Python syntax for selecting “all elements” in the row index:
>>> users[:, 'name']
<QuerySet ['John Lennon', 'Paul McCartney', 'George Harrison', 'Ringo Star']>
This call is basically an alias to Django’s ``users.values_list(‘name’, flat=True). If you are interested on more than one column, just use
>>> users[:, ['name', 'age']] # doctest: +ELLIPSIS
<QuerySet [Row('John Lennon', 27), Row('Paul McCartney', 26), ...]>
This method returns a sequence of lists representing the selected fields from each object. In fact, each element behaves as a mutable namedtuple and data can be accessed either by position or by attribute name.
The first index may also be a list. If that is the case, it is interpreted as a sequence of primary keys that selects the desired set of rows:
>>> users[[1, 2], :]
<QuerySet [<User: John Lennon>, <User: Paul McCartney>]>
2D indexing is also accepted in many different combinations.
>>> users[[1, 2, 3], 'age']
<QuerySet [27, 26, 22]>
>>> users[[1, 3], ['age', 'name']]
<QuerySet [Row(27, 'John Lennon'), Row(22, 'George Harrison')]>
Finally, the first index can also be a queryset or a Query expression
>>> users[users.filter(age__lt=25), 'name']
<QuerySet ['George Harrison']>
This functionality is more useful and expressive when used in conjunction with Q or F-expressions:
>>> from boogie.models import F, Q
>>> users[F.age < 25, 'name']
<QuerySet ['George Harrison']>
and this also works…
>>> users[Q(age__lt=25), 'name']
<QuerySet ['George Harrison']>
F expressions can also be used to specify fields. You may find it easier to read and type than strings
>>> users[F.age < 25, [F.name, F.age]]
<QuerySet [Row('George Harrison', 22)]>
The db object¶
Boogie exports an object called db
that easily exposes a table-centric view
for all models in your project.
>>> from boogie import db
>>> db.auth.user_model[:, 'name']
<QuerySet ['John Lennon', 'Paul McCartney', 'George Harrison', 'Ringo Star']>
It must be used with the db.<app_label>.<model_name>
syntax. Under the hood, the db
object calls django.apps.apps.get_model() for a model and return the default
manager.
We believe that managers and query sets should be the default entry point for accessing your models. Hence, we want to easily expose the model managers instead of the model classes themselves. Boogie managers also define the .new() method as an alias to the model constructor.
Overriding query sets and managers¶
Implementing custom managers and querysets in Django is greatly convenient.
First, the distinction between both is confusing and in most situations the manager is
generated from the queryset class via a boilerplate. Not only that, but managers
and querysets must be defined before the model, since we need to set the
objects
during class definition. This is not ideal: it is natural to expect
that models should be in the topmost part of the file (and hence more convenient
to browser). Models declare the structure of tables in the database, and we have
almost no chance of understanding the manager methods before peeking at the model
first. Boogie let us organize both classes in a more natural way:
from boogie import models
from boogie.models import F
class User(models.Model):
name = models.CharField(max_length=100)
age = models.IntegerField()
#
# Manager and queryset methods
#
@models.manager_method(User)
def create_teen(self, name, age=18):
return self.create(name=name, age=age)
@models.queryset_method(User)
def advance_age(self, by=1):
self.update(age=F.age + 1)
This arrangement prevents a few common Django anti-patterns:
- Implementing table logic as class methods of the model class:
- We should create predictable interfaces and the “Django way” is to put table logic in managers and querysets. Not only that, but class methods cannot be called later in a chain like standard queryset methods, which hurts the usability of our APIs.
- Creating separate models.py and managers.py:
- Putting all models of an app in a file and all managers in another is a poor structure: User and UserQuerySet are much more cohesive than, say, User and Group. We should split our modules by concerns and not by implementation details such as a common base class.
- Manager methods in the queryset:
- Creating separate managers and queryset classes involves a lot of
boilerplate. The usual approach is to create a QuerySet subclass and
call
Manager.from_queryset()
to create the corresponding Manager class. This approach makes it very tempting to move some methods that should belong exclusively into the manager (e.g., object creation patterns) to queryset to avoid an extra class declaration. Doing so is not very problematic, but would allow some spurious API usage such asobj = Model.objects.filter(age__lt=18).my_create_method(name='John', age=42)
. In Boogie we can mark that a method exists only in the Manager by decorating it with theboogie.models.manager_only()
decorator.
Pandas integration¶
Sometimes SQL (or Django’s ORM) is simply not powerful enough to perform some advanced multi-row computations. Boogie query sets integrate with Pandas <https://pandas.pydata.org>, which is a great package to perform data manipulation in table-like structures. Compared to many hand-written solutions that iterates over a sequence of objects, Pandas data frames offer simple APIs and can be much more computationally efficient than ad hoc python solutions.
All Boogie query sets have both a “dataframe()” and a “update_from_dataframe()” methods. The first returns a dataframe from queryset data:
>>> users[:, ['name', 'age']].dataframe() # doctest: +NORMALIZE_WHITESPACE
name age
id
1 John Lennon 27
2 Paul McCartney 26
3 George Harrison 22
4 Ringo Star 29
The second updates the database using data from a pandas dataframe. Dataframe indexes must correspond to primary keys.
>>> df = users[:, 'age'].dataframe()
>>> df['age'] += 1
>>> users.update_from_dataframe(df)
>>> users[:, ['name', 'age']].dataframe() # doctest: +NORMALIZE_WHITESPACE
name age
id
1 John Lennon 28
2 Paul McCartney 27
3 George Harrison 23
4 Ringo Star 30
Alternate Meta syntax and integration with model-utils and django-polymorphic¶
Django introduced the Meta syntax before Python 3 even existed and at that time it wasn’t possible to pass keyword arguments to class constructors. We believe that the second would be a more natural idiom in modern Python, but obviously Django cannot break this interface for backwards compatibility.
In Boogie, the Meta
information can be passed either in the traditional way
using the class Meta: ...
convention or as keyword arguments in the model
declaration:
from boogie import models
class BaseUser(models.Model, abstract=True, status=True):
name = models.CharField(max_length=100)
age = models.IntegerField()
Besides all the usual`Meta options`_, Boogie also allows some custom model initialization that integrates with external libraries to provide additional functionality to your models:
- timeframed (bool):
- Makes model a subclass of Django Model Utils TimeFramedModel. Adds
start
andend
nullable DateTimeFields, and atimeframed
manager that returns only objects for whom the current date-time lies within their time range. - timestamped (bool):
- Makes model a subclass of Django Model Utils TimeStampedModel. Provides
self-updating
created
andmodified
fields on any model that inherits from it. - status (bool):
- Makes model a subclass of Django Model Utils StatusModel. Provides
status
andstatus_changed
fields that control the current status of an instance based on a list of choices. See the documentation for more details. - soft_deletable (bool):
- Makes model a subclass of Django Model Utils SoftDeletableModel. Provides
field
is_removed
which is set toTrue
instead of removing the instance when schedule for deletion. Entities returned in default manager are limited to not-deleted instances. - polymorphic (bool):
- Makes model a subclass of PolymorphicModel, which adds an additional
column
ctype
that tracks the actual type of each instance in a multiple table inheritance scenario.