<< Chapter < Page | Chapter >> Page > |
All data was collected on a laptop equipped with an AMD A6-3400M “Llano” quad-core processor which supports a clock rate of up to 2.3 GHz.
Test Parameters | |
Test Filter | 4-pole Butterworth bandpass filter |
Input channels | 256 |
Time samples | 600,000 |
Data filter cycles | 100 |
Compiler | GCC |
The following results tables show the individual parameters of experiments, averaged run times of the program and an indicator of the real-time processing speed of the program (the formula to generate this figure is shown in the equation below).
where sampling rate of incoming data (samples/sec), number of samples processed by filter bank (samples), and time to process all samples (sec).
Unique Filter Coefficients | ||
Optimization | Time (sec) | secs/sec (25 KS/s) |
None | 619.940 | 0.25831 |
O3 Compiler | 241.131 | 0.10047 |
O3 and SSE3 | 75.589 | 0.03150 |
O3 and our SSE3 | 69.258 | 0.02886 |
Constant Filter Coefficients | ||
O3 and SSE3 | 55.835 | 0.02326 |
O3 and our SSE3 | 49.271 | 0.02053 |
POSIX Threads | ||
Optimization | Time (sec) | secs/sec (25 KS/s) |
0 Threads and our SSE3 | 48.333 | 0.02014 |
1 Thread and our SSE3 | 50.109 | 0.02088 |
2 Threads and our SSE3 | 88.632 | 0.03693 |
4 Threads and our SSE3 | 138.090 | 0.05754 |
8 Threads and our SSE3 | 62.481 | 0.02603 |
16 Threads and our SSE3 | 103.901 | 0.04329 |
32 Threads and our SSE3 | 78.219 | 0.03259 |
0 Threads and SSE3 | 48.075 | 0.02003 |
1 Thread and SSE3 | 64.302 | 0.02679 |
2 Threads and SSE3 | 96.755 | 0.04031 |
4 Threads and SSE3 | 123.931 | 0.05164 |
8 Threads and SSE3 | 67.629 | 0.02818 |
16 Threads and SSE3 | 141.329 | 0.05889 |
32 Threads and SSE3 | 121.134 | 0.05047 |
Reordered Output Data | ||
Optimization | Time (sec) | secs/sec (25 KS/s) |
0 Threads and SSE3 | 50.106 | 0.02088 |
1 Thread and SSE3 | 49.302 | 0.02054 |
2 Threads and SSE3 | 78.888 | 0.02054 |
4 Threads and SSE3 | 89.939 | 0.03747 |
8 Threads and SSE3 | 35.283 | 0.01470 |
16 Threads and SSE3 | 71.337 | 0.02972 |
32 Threads and SSE3 | 109.112 | 0.04546 |
Reordered Intermediate Variables | ||
0 Threads and SSE3 | 71.166 | 0.02965 |
1 Thread and SSE3 | 57.156 | 0.02382 |
2 Threads and SSE3 | 52.639 | 0.02193 |
4 Threads and SSE3 | 48.939 | 0.02039 |
8 Threads and SSE3 | 33.589 | 0.01400 |
16 Threads and SSE3 | 51.543 | 0.02148 |
32 Threads and SSE3 | 110.716 | 0.04613 |
Our optimal filter design incorporated a combination of several of the methods we used to optimize our filter bank implementation. It made use of compiler-level optimization, SSE instructions, and POSIX threads. It processed 60 million samples for each of the 256 channels in 33.589 seconds. Assuming a 25 KHz sampling rate, each second of of data is processed in 14 milliseconds (25 KS/s) and thus acceptable for real-time processing.
Notification Switch
Would you like to follow the 'Efficient real-time filter design for recording multichannel neural activity' conversation and receive update notifications?