HBM2

AMD_Fiji_GPU_package_with_GPU,_HBM_memory_and_interposerMain memory on a PC is designed for field replaceable slots. Video cards have their memory soldered to the logic board. Both have limitations for bandwidth.

High Bandwidth Memory was proposed as an industry standard by AMD and Hynix, started in 2010 and adopted by JEDEC as JESD235 in October 2013. High Bandwidth Memory is targeted for high-performance graphics accelerators and network devices. High Bandwidth Memory achieves higher bandwidth with less power than DDR4 and GDDR5. To do this, it uses 3D packaging of several memory chips (dies) stacked together and connected with many I/O through-silicon vias (TSV) and
thermal micro bumps.

Hynix as recently posted new stackable high bandwidth HBM2 memory to their catalog. They suggest stacking 4 high would provide a lot more VRAM but current games are not hurting for video memory. The main advantage is more bandwidth which can benefit complex shaders and general purpose computing.

The new Hynix HDM2 can stack up to 8 memory chips. This has the potential for 1 TB/s of memory bandwidth. JEDEC has standardized it so its now ready for widespread use. AMD and NVIDIA are both looking towards using HBM2 for their top of the line cards.

Samsung in January 2016 began mass production of HBM2 memory. They are 8Gb chips which will reduce overall power consumption significantly.

The AMD Fury card was the first card to provide the first generation of  card to use HBM. AMD and NVIDIA are both designing new cards to use this new generation of high bandwidth memory.

Back in 2015, AMD redesigned their flagship Fiji GPU to move the memory beside the GPU inside the package. By placing the memory as close as possible, this simplifies board design as well as increasing the memory bandwidth considerably. AMD has also indicated that future CPU designs will use a similar design of placing memory beside the primary logic inside the main package. What it means is that the GPU and CPU will become hybrid devices where more than one component is placed inside the package.

The AMD Fury has 4096MB of VRAM which is limited by the amount of space available and by the amount of heat generated. Smaller feature sizes will allow for some gains. HBM2 with stacked memory can now make 8GB and 16GB cards easily,

GDDR5X on a 256-bit bus can reach 384GB/s which puts a lot of pressure on HBM2. This will pressure adoption and pricing. Right now the latest flagship GPU are not being bottlenecked by VRAM. Pascal and Polaris both will run well with GDDR5X with a 256-bit bus.