Abstract:
                                      A hardware semaphore module is designed to support the synchronization primitives, such as mutex and barrier. Compared with the atomic instruction realization, the method executes efficiently and uses fewer instructions. Based on the structure of scratch-pad memory, a shared program memory with two addressing mode of absolute address mapping and virtual address mapping is designed to implement instruction space sharing, resulting in higher utility of memory. The result of FPGA simulation demonstrates that, the performance of the proposed design can achieve speed-up 14.7% compared with traditional shared L2 caches.