Learning Visual Correspondences Across Instances And Video Frames