Think Different: Multi-Head Attention and Self-Attention of Transformers