Christos Tzelepis

I am a Senior Research Scientist at Samsung AI Center, Cambridge working with Dr. Georgios (Yorgos) Tzimiropoulos. Prior to that, I was a Lecturer in Computer Science at City, University of London (2023-2024) and a post-doctoral researcher at Queen Mary University of London (2018-2023) working with Prof. Ioannis (Yiannis) Patras. I received the Diploma degree in Electrical and Computer Engineering from Aristotle University of Thessaloniki, Greece (2011), and the Ph.D. degree in Machine Learning and Computer Vision from Queen Mary University of London (2018).

news

Jul 9, 2024	One paper accepted at MICCAI 2024.
Jun 1, 2024	One paper accepted at TPAMI (if=20.8)
Jan 26, 2024	One paper accepted at IJCV (if=19.5)
Nov 3, 2023	One paper accepted at WACV 2024
Sep 22, 2023	One paper accepted at NeurIPS 2023

selected publications

Bilinear Models of Parts and Appearances in Generative Adversarial Networks

James Oldfield, Christos Tzelepis, Yannis Panagakis, and 2 more authors

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2024

Abs Bib PDF Code

Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs. However, existing methods are often tailored to specific GAN architectures and are limited to either discovering global semantic directions that do not facilitate localized control, or require some form of supervision through manually provided regions or segmentation masks. In this light, we present an architecture-agnostic approach that jointly discovers factors representing spatial parts and their appearances in an entirely unsupervised fashion. These factors are obtained by applying a semi-nonnegative tensor factorization on the feature maps, which in turn enables context-aware local image editing with pixel-level control. In addition, we show that the discovered appearance factors correspond to saliency maps that localize concepts of interest, without using any labels. Experiments on a wide range of GAN architectures and datasets show that, in comparison to the state of the art, our method is far more efficient in terms of training time and, most importantly, provides much more accurate localized control. Code to reproduce our results and explore our model is available at: http://github.com/james-oldfield/PandA
@article{oldfield2023tpami, title = {Bilinear Models of Parts and Appearances in Generative Adversarial Networks}, author = {Oldfield, James and Tzelepis, Christos and Panagakis, Yannis and Nicolaou, Mihalis A. and Patras, Ioannis}, journal = {{IEEE} Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)}, volume = {}, number = {}, pages = {}, year = {2024}, url = {}, doi = {} }
One-shot Neural Face Reenactment via Finding Directions in GAN’s Latent Space

Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, and 2 more authors

International Journal of Computer Vision (IJCV), 2024

Abs Bib PDF Code

In this paper, we present our framework for neural face/head reenactment whose goal is to transfer the 3D head orientation and expression of a target face to a source face. Previous methods focus on learning embedding networks for identity and head pose/expression disentanglement which proves to be a rather hard task, degrading the quality of the generated images. We take a different approach, bypassing the training of such networks, by using (fine-tuned) pre-trained GANs which have been shown capable of producing high-quality facial images. Because GANs are characterized by weak controllability, the core of our approach is a method to discover which directions in latent GAN space are responsible for controlling head pose and expression variations. We present a simple pipeline to learn such directions with the aid of a 3D shape model which, by construction, inherently captures disentangled directions for head pose, identity, and expression. Moreover, we show that by embedding real images in the GAN latent space, our method can be successfully used for the reenactment of real-world faces. Our method features several favorable properties including using a single source image (one-shot) and enabling cross-person reenactment. Extensive qualitative and quantitative results show that our approach typically produces reenacted faces of notably higher quality than those produced by state-of-the-art methods for the standard benchmarks of VoxCeleb1 and 2.
@article{bounareli2023ijcv, title = {One-shot Neural Face Reenactment via Finding Directions in {GAN}'s Latent Space}, author = {Bounareli, Stella and Tzelepis, Christos and Argyriou, Vasileios and Patras, Ioannis and Tzimiropoulos, Georgios}, journal = {International Journal of Computer Vision (IJCV)}, volume = {}, number = {}, pages = {}, year = {2024}, url = {}, doi = {} }
Parts of Speech-Grounded Subspaces in Vision-Language Models

James Oldfield, Christos Tzelepis, Yannis Panagakis, and 2 more authors

In Advances in Neural Information Processing Systems (NeurIPS), 2023

Abs Bib PDF Code Website

Latent image representations arising from vision-language models have proved immensely useful for a variety of downstream tasks. However, their utility is limited by their entanglement with respect to different visual attributes. For instance, recent work has shown that CLIP image representations are often biased toward specific visual properties (such as objects or actions) in an unpredictable manner. In this paper, we propose to separate representations of the different visual modalities in CLIP’s joint vision-language space by leveraging the association between parts of speech and specific visual modes of variation (e.g. nouns relate to objects, adjectives describe appearance). This is achieved by formulating an appropriate component analysis model that learns subspaces capturing variability corresponding to a specific part of speech, while jointly minimising variability to the rest. Such a subspace yields disentangled representations of the different visual properties of an image or text in closed form while respecting the underlying geometry of the manifold on which the representations lie. What’s more, we show the proposed model additionally facilitates learning subspaces corresponding to specific visual appearances (e.g. artists’ painting styles), which enables the selective removal of entire visual themes from CLIP-based text-to-image synthesis. We validate the model both qualitatively, by visualising the subspace projections with a text-to-image model and by preventing the imitation of artists’ styles, and quantitatively, through class invariance metrics and improvements to baseline zero-shot classification.
@inproceedings{oldfield2023part, title = {Parts of Speech-Grounded Subspaces in Vision-Language Models}, author = {Oldfield, James and Tzelepis, Christos and Panagakis, Yannis and Nicolaou, Mihalis A. and Patras, Ioannis}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, year = {2023} }
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces

Stella Bounareli, Christos Tzelepis, Vasileios Argyriou, and 2 more authors

2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023

Abs Bib PDF Code Website

In this paper, we present our method for neural face reenactment, called HyperReenact, that aims to generate realistic talking head images of a source identity, driven by a target facial pose. Existing state-of-the-art face reenactment methods train controllable generative models that learn to synthesize realistic facial images, yet producing reenacted faces that are prone to significant visual artifacts, especially under the challenging condition of extreme head pose changes, or requiring expensive few-shot fine-tuning to better preserve the source identity characteristics. We propose to address these limitations by leveraging the photorealistic generation ability and the disentangled properties of a pretrained StyleGAN2 generator, by first inverting the real images into its latent space and then using a hypernetwork to perform: (i) refinement of the source identity characteristics and (ii) facial pose re-targeting, eliminating this way the dependence on external editing methods that typically produce artifacts. Our method operates under the one-shot setting (i.e., using a single source frame) and allows for cross-subject reenactment, without requiring any subject-specific fine-tuning. We compare our method both quantitatively and qualitatively against several state-of-the-art techniques on the standard benchmarks of VoxCeleb1 and VoxCeleb2, demonstrating the superiority of our approach in producing artifact-free images, exhibiting remarkable robustness even under extreme head pose changes.
@article{bounareli2023iccv, title = {{HyperReenact}: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces}, author = {Bounareli, Stella and Tzelepis, Christos and Argyriou, Vasileios and Patras, Ioannis and Tzimiropoulos, Georgios}, journal = {2023 IEEE/CVF International Conference on Computer Vision (ICCV)}, volume = {}, number = {}, pages = {}, year = {2023}, url = {}, doi = {} }
Attribute-preserving Face Dataset Anonymization via Latent Code Optimization

Simone Barattin*, Christos Tzelepis*, Ioannis Patras, and 1 more author

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) [* denotes co-first authorship], 2023

Abs Bib PDF Code

This work addresses the problem of anonymizing the identity of faces in a dataset of images, such that the privacy of those depicted is not violated, while at the same time the dataset is useful for downstream task such as for training machine learning models. To the best of our knowledge, we are the first to explicitly address this issue and deal with two major drawbacks of the existing state-of-the-art approaches, namely that they (i) require the costly training of additional, purpose-trained neural networks, and/or (ii) fail to retain the facial attributes of the original images in the anonymized counterparts, the preservation of which is of paramount importance for their use in downstream tasks. We accordingly present a task-agnostic anonymization procedure that directly optimizes the images’ latent representation in the latent space of a pre-trained GAN. By optimizing the latent codes directly, we ensure both that the identity is of a desired distance away from the original (with an identity obfuscation loss), whilst preserving the facial attributes (using a novel feature-matching loss in FaRL’s deep feature space). We demonstrate through a series of both qualitative and quantitative experiments that our method is capable of anonymizing the identity of the images whilst – crucially – better-preserving the facial attributes. We make the code and the pre-trained models publicly available.
@inproceedings{tzelepis2023falco, title = {Attribute-preserving Face Dataset Anonymization via Latent Code Optimization}, author = {Barattin*, Simone and Tzelepis*, Christos and Patras, Ioannis and Sebe, Nicu}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) [* denotes co-first authorship]}, year = {2023} }
PandA: Unsupervised Learning of Parts and Appearances in the Feature Maps of GANs

James Oldfield, Christos Tzelepis, Yannis Panagakis, and 2 more authors

In The Eleventh International Conference on Learning Representations , 2023

Abs Bib PDF Code Website

Recent advances in the understanding of Generative Adversarial Networks (GANs) have led to remarkable progress in visual editing and synthesis tasks, capitalizing on the rich semantics that are embedded in the latent spaces of pre-trained GANs. However, existing methods are often tailored to specific GAN architectures and are limited to either discovering global semantic directions that do not facilitate localized control, or require some form of supervision through manually provided regions or segmentation masks. In this light, we present an architecture-agnostic approach that jointly discovers factors representing spatial parts and their appearances in an entirely unsupervised fashion. These factors are obtained by applying a semi-nonnegative tensor factorization on the feature maps, which in turn enables context-aware local image editing with pixel-level control. In addition, we show that the discovered appearance factors correspond to saliency maps that localize concepts of interest, without using any labels. Experiments on a wide range of GAN architectures and datasets show that, in comparison to the state of the art, our method is far more efficient in terms of training time and, most importantly, provides much more accurate localized control. Our code is available at: https://github.com/james-oldfield/PandA.
@inproceedings{oldfield2023panda, title = {{PandA}: Unsupervised Learning of Parts and Appearances in the Feature Maps of {GAN}s}, author = {Oldfield, James and Tzelepis, Christos and Panagakis, Yannis and Nicolaou, Mihalis and Patras, Ioannis}, booktitle = {The Eleventh International Conference on Learning Representations }, year = {2023}, url = {https://openreview.net/forum?id=iUdSB2kK9GY} }
DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval

Giorgos Kordopatis-Zilos, Christos Tzelepis, Symeon Papadopoulos, and 2 more authors

International Journal of Computer Vision (IJCV), 2022

Abs Bib PDF Code

In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal representations and similarity calculations, achieving high performance at a high computational cost or (ii) coarse-grained approaches representing/indexing videos as global vectors, where the spatio-temporal structure is lost, providing low performance but also having low computational cost. In this work, we propose a Knowledge Distillation framework, called Distill-and-Select (DnS), that starting from a well-performing fine-grained Teacher Network learns: a) Student Networks at different retrieval performance and computational efficiency trade-offs and b) a Selector Network that at test time rapidly directs samples to the appropriate student to maintain both high retrieval performance and high computational efficiency. We train several students with different architectures and arrive at different trade-offs of performance and efficiency, i.e., speed and storage requirements, including fine-grained students that store/index videos using binary representations. Importantly, the proposed scheme allows Knowledge Distillation in large, unlabelled datasets – this leads to good students. We evaluate DnS on five public datasets on three different video retrieval tasks and demonstrate a) that our students achieve state-of-the-art performance in several cases and b) that the DnS framework provides an excellent trade-off between retrieval performance, computational speed, and storage space. In specific configurations, the proposed method achieves similar mAP with the teacher but is 20 times faster and requires 240 times less storage space. The collected dataset and implementation are publicly available at https://github.com/mever-team/distill-and-select.
@article{kordopatis2022dns, title = {{DnS}: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval}, author = {Kordopatis{-}Zilos, Giorgos and Tzelepis, Christos and Papadopoulos, Symeon and Kompatsiaris, Ioannis and Patras, Ioannis}, journal = {International Journal of Computer Vision (IJCV)}, volume = {130}, number = {10}, pages = {2385--2407}, year = {2022}, url = {https://doi.org/10.1007/s11263-022-01651-3}, doi = {10.1007/s11263-022-01651-3} }
WarpedGANSpace: Finding non-linear RBF paths in GAN latent space

Christos Tzelepis, Georgios Tzimiropoulos, and I. Patras

2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021

Abs Bib PDF Code

This work addresses the problem of discovering, in an unsupervised manner, interpretable paths in the latent space of pretrained GANs, so as to provide an intuitive and easy way of controlling the underlying generative factors. In doing so, it addresses some of the limitations of the state-of-the-art works, namely, a) that they discover directions that are independent of the latent code, i.e., paths that are linear, and b) that their evaluation relies either on visual inspection or on laborious human labeling. More specifically, we propose to learn non-linear warpings on the latent space, each one parametrized by a set of RBF-based latent space warping functions, and where each warping gives rise to a family of non-linear paths via the gradient of the function. Building on the work of Voynov and Babenko, that discovers linear paths, we optimize the trainable parameters of the set of RBFs, so as that images that are generated by codes along different paths, are easily distinguishable by a discriminator network. This leads to easily distinguishable image transformations, such as pose and facial expressions in facial images. We show that linear paths can be derived as a special case of our method, and show experimentally that non-linear paths in the latent space lead to steeper, more disentangled and interpretable changes in the image space than in state-of-the art methods, both qualitatively and quantitatively. We make the code and the pretrained models publicly available at https://github.com/chi0tzp/WarpedGANSpace.
@article{tzelepis2021warpedganspace, title = {{WarpedGANSpace}: Finding non-linear {RBF} paths in {GAN} latent space}, author = {Tzelepis, Christos and Tzimiropoulos, Georgios and Patras, I.}, journal = {2021 IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2021}, pages = {6373-6382} }
Linear Maximum Margin Classifier for Learning from Uncertain Data

Christos Tzelepis, Vasileios Mezaris, and Ioannis Patras

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2018

Abs Bib PDF Code

In this paper, we propose a maximum margin classifier that deals with uncertainty in data input. More specifically, we reformulate the SVM framework such that each training example can be modeled by a multi-dimensional Gaussian distribution described by its mean vector and its covariance matrix-the latter modeling the uncertainty. We address the classification problem and define a cost function that is the expected value of the classical SVM cost when data samples are drawn from the multi-dimensional Gaussian distributions that form the set of the training examples. Our formulation approximates the classical SVM formulation when the training examples are isotropic Gaussians with variance tending to zero. We arrive at a convex optimization problem, which we solve efficiently in the primal form using a stochastic gradient descent approach. The resulting classifier, which we name SVM with Gaussian Sample Uncertainty (SVM-GSU), is tested on synthetic data and five publicly available and popular datasets; namely, the MNIST, WDBC, DEAP, TV News Channel Commercial Detection, and TRECVID MED datasets. Experimental results verify the effectiveness of the proposed method.
@article{tzelepis2018svmgsu, title = {Linear Maximum Margin Classifier for Learning from Uncertain Data}, author = {Tzelepis, Christos and Mezaris, Vasileios and Patras, Ioannis}, journal = {{IEEE} Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)}, volume = {40}, number = {12}, pages = {2948--2962}, year = {2018}, url = {https://doi.org/10.1109/TPAMI.2017.2772235}, doi = {10.1109/TPAMI.2017.2772235} }