1 | Sparse Arrays
|
---|
2 | =============
|
---|
3 |
|
---|
4 | The `sparse_array.c` file contains an implementation of a sparse array that
|
---|
5 | attempts to be both space and time efficient.
|
---|
6 |
|
---|
7 | The sparse array is represented using a tree structure. Each node in the
|
---|
8 | tree contains a block of pointers to either the user supplied leaf values or
|
---|
9 | to another node.
|
---|
10 |
|
---|
11 | There are a number of parameters used to define the block size:
|
---|
12 |
|
---|
13 | OPENSSL_SA_BLOCK_BITS Specifies the number of bits covered by each block
|
---|
14 | SA_BLOCK_MAX Specifies the number of pointers in each block
|
---|
15 | SA_BLOCK_MASK Specifies a bit mask to perform modulo block size
|
---|
16 | SA_BLOCK_MAX_LEVELS Indicates the maximum possible height of the tree
|
---|
17 |
|
---|
18 | These constants are inter-related:
|
---|
19 |
|
---|
20 | SA_BLOCK_MAX = 2 ^ OPENSSL_SA_BLOCK_BITS
|
---|
21 | SA_BLOCK_MASK = SA_BLOCK_MAX - 1
|
---|
22 | SA_BLOCK_MAX_LEVELS = number of bits in size_t divided by
|
---|
23 | OPENSSL_SA_BLOCK_BITS rounded up to the next multiple
|
---|
24 | of OPENSSL_SA_BLOCK_BITS
|
---|
25 |
|
---|
26 | `OPENSSL_SA_BLOCK_BITS` can be defined at compile time and this overrides the
|
---|
27 | built in setting.
|
---|
28 |
|
---|
29 | As a space and performance optimisation, the height of the tree is usually
|
---|
30 | less than the maximum possible height. Only sufficient height is allocated to
|
---|
31 | accommodate the largest index added to the data structure.
|
---|
32 |
|
---|
33 | The largest index used to add a value to the array determines the tree height:
|
---|
34 |
|
---|
35 | +----------------------+---------------------+
|
---|
36 | | Largest Added Index | Height of Tree |
|
---|
37 | +----------------------+---------------------+
|
---|
38 | | SA_BLOCK_MAX - 1 | 1 |
|
---|
39 | | SA_BLOCK_MAX ^ 2 - 1 | 2 |
|
---|
40 | | SA_BLOCK_MAX ^ 3 - 1 | 3 |
|
---|
41 | | ... | ... |
|
---|
42 | | size_t max | SA_BLOCK_MAX_LEVELS |
|
---|
43 | +----------------------+---------------------+
|
---|
44 |
|
---|
45 | The tree height is dynamically increased as needed based on additions.
|
---|
46 |
|
---|
47 | An empty tree is represented by a NULL root pointer. Inserting a value at
|
---|
48 | index 0 results in the allocation of a top level node full of null pointers
|
---|
49 | except for the single pointer to the user's data (N = SA_BLOCK_MAX for
|
---|
50 | brevity):
|
---|
51 |
|
---|
52 | +----+
|
---|
53 | |Root|
|
---|
54 | |Node|
|
---|
55 | +-+--+
|
---|
56 | |
|
---|
57 | |
|
---|
58 | |
|
---|
59 | v
|
---|
60 | +-+-+---+---+---+---+
|
---|
61 | | 0 | 1 | 2 |...|N-1|
|
---|
62 | | |nil|nil|...|nil|
|
---|
63 | +-+-+---+---+---+---+
|
---|
64 | |
|
---|
65 | |
|
---|
66 | |
|
---|
67 | v
|
---|
68 | +-+--+
|
---|
69 | |User|
|
---|
70 | |Data|
|
---|
71 | +----+
|
---|
72 | Index 0
|
---|
73 |
|
---|
74 | Inserting at element 2N+1 creates a new root node and pushes down the old root
|
---|
75 | node. It then creates a second second level node to hold the pointer to the
|
---|
76 | user's new data:
|
---|
77 |
|
---|
78 | +----+
|
---|
79 | |Root|
|
---|
80 | |Node|
|
---|
81 | +-+--+
|
---|
82 | |
|
---|
83 | |
|
---|
84 | |
|
---|
85 | v
|
---|
86 | +-+-+---+---+---+---+
|
---|
87 | | 0 | 1 | 2 |...|N-1|
|
---|
88 | | |nil| |...|nil|
|
---|
89 | +-+-+---+-+-+---+---+
|
---|
90 | | |
|
---|
91 | | +------------------+
|
---|
92 | | |
|
---|
93 | v v
|
---|
94 | +-+-+---+---+---+---+ +-+-+---+---+---+---+
|
---|
95 | | 0 | 1 | 2 |...|N-1| | 0 | 1 | 2 |...|N-1|
|
---|
96 | |nil| |nil|...|nil| |nil| |nil|...|nil|
|
---|
97 | +-+-+---+---+---+---+ +---+-+-+---+---+---+
|
---|
98 | | |
|
---|
99 | | |
|
---|
100 | | |
|
---|
101 | v v
|
---|
102 | +-+--+ +-+--+
|
---|
103 | |User| |User|
|
---|
104 | |Data| |Data|
|
---|
105 | +----+ +----+
|
---|
106 | Index 0 Index 2N+1
|
---|
107 |
|
---|
108 | The nodes themselves are allocated in a sparse manner. Only nodes which exist
|
---|
109 | along a path from the root of the tree to an added leaf will be allocated.
|
---|
110 | The complexity is hidden and nodes are allocated on an as needed basis.
|
---|
111 | Because the data is expected to be sparse this doesn't result in a large waste
|
---|
112 | of space.
|
---|
113 |
|
---|
114 | Values can be removed from the sparse array by setting their index position to
|
---|
115 | NULL. The data structure does not attempt to reclaim nodes or reduce the
|
---|
116 | height of the tree on removal. For example, now setting index 0 to NULL would
|
---|
117 | result in:
|
---|
118 |
|
---|
119 | +----+
|
---|
120 | |Root|
|
---|
121 | |Node|
|
---|
122 | +-+--+
|
---|
123 | |
|
---|
124 | |
|
---|
125 | |
|
---|
126 | v
|
---|
127 | +-+-+---+---+---+---+
|
---|
128 | | 0 | 1 | 2 |...|N-1|
|
---|
129 | | |nil| |...|nil|
|
---|
130 | +-+-+---+-+-+---+---+
|
---|
131 | | |
|
---|
132 | | +------------------+
|
---|
133 | | |
|
---|
134 | v v
|
---|
135 | +-+-+---+---+---+---+ +-+-+---+---+---+---+
|
---|
136 | | 0 | 1 | 2 |...|N-1| | 0 | 1 | 2 |...|N-1|
|
---|
137 | |nil|nil|nil|...|nil| |nil| |nil|...|nil|
|
---|
138 | +---+---+---+---+---+ +---+-+-+---+---+---+
|
---|
139 | |
|
---|
140 | |
|
---|
141 | |
|
---|
142 | v
|
---|
143 | +-+--+
|
---|
144 | |User|
|
---|
145 | |Data|
|
---|
146 | +----+
|
---|
147 | Index 2N+1
|
---|
148 |
|
---|
149 | Accesses to elements in the sparse array take O(log n) time where n is the
|
---|
150 | largest element. The base of the logarithm is `SA_BLOCK_MAX`, so for moderately
|
---|
151 | small indices (e.g. NIDs), single level (constant time) access is achievable.
|
---|
152 | Space usage is O(minimum(m, n log(n)) where m is the number of elements in the
|
---|
153 | array.
|
---|
154 |
|
---|
155 | Note: sparse arrays only include pointers to types.
|
---|
156 | Thus, `SPARSE_ARRAY_OF(char)` can be used to store a string.
|
---|