Easy.
All Codec use 8K bit/s sampling rate(PCM), but for different Codec, it generate different voice data size in different duration according to the Codec’s algorithm, for example :
G.723.1(5.3K) – generate 20 byte in 30ms, so the rate is (1000ms/30ms)*20byte*8bit=5280bit/s=5.3k
G.723.1(6.4k) – generate 24 byte in 30ms, so rate is (1000ms/30ms)*24byte*8bit=6336bit/s=6.4k
G.729A(8K) – generate 10 byte in 10ms, so rate is (1000ms/10ms)*10byte*8bit=8000bit/s=8k
You should have more clear understanding now, it’s hard to find answers from rfc documents for this.