I’m currently reading Database Internals by Alex Petrov and came across the “Index Files” section. Primary and secondary indexes were being discussed and it was interesting to see how these indexes interact with data records.
“A primary index is usually assigned to a primary key or a set of keys identified as primary. All other indexes are called secondary.”
I never really thought about what a secondary index was referencing. Just that it was another way to optimize search for a record via secondary indexed column. The secondary index can point directly to the data, which contains the indexed value + pointer to its location on disk. Or it can hold the primary key to that record.
There are some trade-offs to consider when choosing whether to have the secondary index point directly to the data vs primary keys for indirection.
-> Direct data access - Faster reads since disk seeks are reduced - Slower writes since the secondary indexes need to update when data is updated or “moved” -> PK indirection - Slower reads since you have to lookup PK via secondary index - Faster writes since you only need to update the primary indexes
There’s no right choice here, it depends on what your application is doing:
- Time-series monitoring systems would benefit from secondary indexes having direct data access since they’re barely updated once written to, but read from a ton.
- Banking transaction systems would benefit from indirection since they’re write-heavy.