This all depends on your individual requirements.
But when you want real high performance transcoding to support a large number of simultaneous streams, there's no way around GPU transcoding.
Just some rough figures for an example:
A while a go I was testing some transcoding performance using a quite average i3 6100 CPU (4 core).
(File was some typical H..264 video without any specialties, no subtitles but scaling)
SW-Transcoding was 12x speed.
HW-Transcodng ran at 70x speed
(full hw pipeline, even scaling done by the GPU)
Then I found out that the 70x speed wasn't even the GPU limit.
Limiting factor was in fact the CPU because it couldn't do audio-transcoding any faster (it's a single-threaded operation)
Removing audio transcoding allowed 110x transcoding speed.
Disclaimer for all readers: Don't use these absolute numbers for any calculations or assumptions about performance.
There are so many factors playing a role here. An H.264 HD video alone can be encoded in bitrates differing by factor > 10x
I just want to illustrate how big the difference between hw and sw performance can be (under ideal conditions, though!).