Video compression algorithms exploit two types of redundancy: spatial redundancy (areas within a frame that are similar) and temporal redundancy (areas between frames that don't change). H.264 and H.265 use a combination of intra-frame (spatial) and inter-frame (temporal) prediction to compress video data by 100–1000× compared to raw video.
A key-frame (I-frame) stores complete frame data. Subsequent frames store only the differences (P-frames and B-frames). This is why video is harder to edit at arbitrary frame positions compared to audio.