Bitbanging 1D Reversible Automata
One-dimensional reversible automaton
I created a demonstration for GFXPrim Library. It implements and displays nearest neighbor, one-dimensional, binary cellular automata. Additionally, it implements a reversible automaton, which is almost identical except for a small change that makes it reversible. The automaton is shown in two dimensions as time passes, with time propagating from top to bottom. Although in a reversible case, time can be reversed.
The working principle of the automaton is as follows:
- Each cell has a state, on or off, black or white, boolean, etc.
- At each time step, the state of the cell in the next step is chosen by rules.
- This rule looks at the current value of a cell and the values of its left and right neighbors.
- have 23= 8
Possible state combinations (modes) of 3 binary units. - Rules specify which patterns will produce black cells in the next time step.
- have 28= 256
possible rules. That’s 256 unique pattern combinations.
Therefore, the pattern is a 3-bit binary number, where each number corresponds to a cell. The middle number is the center unit, the high digit is the unit on the left, and the low digit is the unit on the right. Rules can be displayed by displaying a row of patterns and a row of subsequent states.
These are the rules 110
, 0x6e
or
01101110
. It essentially says match pattern
110
, 101
, 011
,
010
, 001
. Pattern matching causes the cell to be set to 1 at the next time step. If no pattern is matched, or equivalently an inactive pattern is matched, the cell will be set to 0.
Note again that each pattern is similar to a 3-digit binary number. The active mode value is also an 8-bit binary number. We can use this to perform efficient matching of patterns using binary operations.
Assume our CPU natively runs 64-bit integers (called
Character). We can pack 64-unit automata into a 64-bit integer. Each bit corresponds to a unit. If one thing is 1
then it is a black cell 0
For white. In this example, we use integers as bit fields. We don’t care about the integers these bits can represent.
The CPU can perform bit operations in parallel on all 64 bits without branching. This means we can execute a single operation 64 times in parallel.
if we rotate (wrapped >>
) all the bits are shifted one bit to the right and we get a new integer where the left neighbor of one bit is now in its position. Likewise, if we shift all the bits to the left, then we get an integer representing the right neighbor. This gives us 3 integers with the left, middle and right digits in the same position. For example, using only 8 bits:
left: | 0100 1011 | >> |
center: | 1001 0110 | |
Correct: | 0010 1101 | << |
Each mode can be represented as a 3-digit number, plus a 4th digit to indicate whether it is active in a given rule. Because we want to operate on all 64 bits in the left, right and center bit fields at the same time. We can generate 64-bit long mask The value from each bit in the given pattern.
So if we have a pattern where the left cell should be 1, then we can build a 64 bit mask all Those ones. If it should be zero, it’s all zero. The same goes for the center and right cells. Facial mask can be XOR (^
) is displayed together with the corresponding cell field if no match occurred. That is, if the mode is 1 and the cell is zero, or if the cell is 1 and the mode is zero. We can invert this (~
) is given when a match occurs.
To see if all elements of the pattern (left, right, center) match, we can bitwise and (&
) they are together. Then we can bitwise or
(|
) pattern results are paired together to produce the final value.
If we wish to operate on an automaton larger than 64 cells, then we can combine multiple integers into an array. After performing left and right shifts, we get the high or low bits from the next or previous integer in the array. Then set the low and high bits of the left and right bit fields. In other words, we use the ending bits of the left and right bit fields to concatenate them together.
For illustrative purposes, the following are core Part of the automaton algorithm.
/* If bit n is 1 then make all bits 1 otherwise 0 */
#define BIT_TO_MAX(b, n) (((b >> n) & 1) * ~0UL)
/* Numeric representation of the current update rule */
static uint8_t rule = 110;
/* Apply the current rule to a 64bit segment of a row */
static inline uint64_t ca1d_rule_apply(uint64_t c_prev,
uint64_t c,
uint64_t c_next,
uint64_t c_prev_step)
{
int i;
/* These are wrapping shifts when c_prev == c or c_next == c */
uint64_t l = (c >> 1) ^ (c_prev << 63);
uint64_t r = (c << 1) ^ (c_next >> 63);
uint64_t c_next_step = 0;
for (i = 0; i < 8; i++)
uint64_t active = BIT_TO_MAX(rule, i);
uint64_t left = BIT_TO_MAX(i, 2);
uint64_t center = BIT_TO_MAX(i, 1);
uint64_t right = BIT_TO_MAX(i, 0);
c_next_step
/* The automata becomes reversible when we use c_prev_state */
return c_next_step ^ c_prev_step;
}
To make the automaton “reversible”, an extra step can be added. We look at the previous one of a cell (besides the current, left and right) and if it is a then upside down next value. This is equivalent to XORing the previous value with the next value.
I’m not entirely sure what the mathematical meaning of reversibility is. However, it’s important to physics and can make some really cool patterns that mimic nature. And entropy and the second law of thermodynamics, blah blah blah…
The definition of automaton is taken from Stephen Wolfram’s “A New Science”. He proposed at least one obvious C implementation using cell arrays. It also provides a binary expression table for each rule. For example, rule 90 simplifies to l^r
Binary expression. The compiler may automatically reduce my implementation to these minimal expressions.
To understand why, let’s consider Rule 90 for each mode.
01011010 = 90
First is the pattern 000
.
& ~(left ^ l) & ~(center ^ c) & ~(right ^ r);
active = 0 & ~(0 ^ l) & ~(0 ^ c) & ~(0 ^ r);
= 0.`
Active is zero, so the entire line reduces to zero. Now for
001
. Notice 1
here it actually means
~0UL
that is, the largest 64-bit integer.
1 & ~(0 ^ l) & ~(0 ^ c) & ~(1 ^ r);
= ~l & ~c & r.
as expected pattern 001
Contest
l=0, c=0, r=1
. Let’s list the remaining patterns or group them together in their simplified state. Then reduce further. Please note that for
loop into ca1d_rule_apply
Will expanded By the compiler when optimizing performance. This is also very clear c_next_step
Expression or zero depending on the previous iteration. So all pattern matching results will be ORed together.
& c & ~r | l & ~c & ~r | ~l & c & r | ~l & ~c & r;
l = l & ~r | ~l & r;
= l ^ r.
see top row
(l & c & ~r | l & ~c & ~r)
or together c
instead of c
. So we can delete it. Then we get an expression equivalent to xor’ring l
and
r
.
At least in theory, the compiler can see rule
Only 256 values and build a simplified version
ca1d_rule_apply
for each value. When the rendering code is the bottleneck, it doesn’t matter whether you actually do it or not. However, it would be interesting to see if the compiler is able to deduce the optimal solution or if something could go wrong.
From the perspective of dismantling
gcc -O3 -mcpu=native -mtune=native
it might actually do that. In addition it vectorization This code packs four 64-bit integers into 256-bit registers at a time and operates on them. I don’t know which part of the code is being vectorized or how. I think the reduced rule might be related to vectorization.
To render the automaton, we iterate over each pixel in the image. We calculate which cell the pixel falls within and set the pixel’s color to the cell’s color. That’s it.
/* Draws a pixel */
static inline void shade_pixel(gp_pixmap *p, gp_coord x, gp_coord y,
, gp_pixel fg)
gp_pixel bg
;
gp_pixel pxsize_t i = (x * (64 * width)) / p->w;
size_t j = (y * height) / p->h;
size_t k = 63 - (i & 63);
uint64_t c = steps[gp_matrix_idx(width, j, i >> 6)];
= BIT_TO_MAX(c, k);
c = (fg & c) px
GXPrim makes drawing very easy. The code above is fast enough for my purposes, but could be significantly improved. On most newer CPUs, integer division is much slower than floating point multiplication. It’s actually much faster (at least 2x) on my CPU to compute a pair of floating point ratios and then convert them back to integers.
But, you might ask, why do we utilize the CPU for drawing in the first place? This is because GFXPrim is targeted at embedded systems without a graphics processor. Furthermore, the CPU may not even natively support floating point. So integer division might actually be faster in this case. A better approach is to limit the size of the pixmap to 2x is larger than the size of the automaton, where xε ℕ Then we can use shifting instead of division.
2024-12-12 18:41:24