Publications (Google Scholar Profile)

    Journal Publications

    1. Deep learning-based quality enhancement for 3D point clouds: a survey
      Jianwen Chen, Lili Zhao, Lancao Ren, Zhuoqun Sun, Xinfeng Zhang and Siwei Ma.
      Journal of Image and Graphics, vol. 28, no. 11, pp. 3295-3319, 2023.
      Abstract | Paper | BibTex Abstract: With the development of 3D detection technologies, point clouds have gradually become one of the most common data representations of 3D objects or scenes that are widely used in many applications, such as autonomous driving, augmented reality (AR), and virtual reality (VR). However, due to limitations in hardware, environment, and occlusion, the acquired point clouds are usually sparse, noisy, and uneven, hence imposing great challenges to the processing and analysis of point clouds. Therefore, point cloud quality enhancement techniques, which aim to process the original point cloud to obtain a dense, clean, and structurally complete point cloud, are of great significance. In recent years, with the development of hardware and machine learning technologies, deep-learning-based point cloud quality enhancement methods, which have great potential to extract the features of point clouds, have attracted the attention of scholars at home and abroad. Related works mainly focus on point cloud completion, point cloud upsampling (also known as super-resolution), and point cloud denoising. Point cloud completion fills the incomplete point clouds to restore the complete point cloud information, while point cloud upsampling increases the point number of the original point cloud to obtain a denser point cloud, and point cloud denoising removes the noisy points in the point cloud to obtain a cleaner point cloud. This paper systematically reviews the existing point cloud quality enhancement methods based on deep learning to offer a basis for subsequent research. First, this study briefly introduces the fundamentals and key technologies that are widely used in point cloud analysis. Second, three types of point cloud quality enhancement technologies, namely, upsampling, completion, and denoising, are introduced, classified, and summarized. According to the types of input data, point cloud completion methods can be divided into voxel- and point-based algorithms, with the latter being further sub-divided into two types depending on whether the encoder-decoder structure is exploited or not. The encoder-decoder-structure-based algorithms can be further divided according to whether the generative adversarial network (GAN) structure is used. Point cloud upsampling methods can be classified into convolutional neural network (CNN)-based algorithms, GAN-based algorithms, and graph convolutional network (GCN)-based algorithms. Point cloud denoising methods can also be divided into two types based on whether the encoder-decoder structure is exploited or not. Third, the commonly used datasets and evaluation metrics in point cloud quality enhancement tasks are summarized. The performance evaluation metrics for geometry reconstruction mainly include chamfer distance, earth mover's distance, Hausdorff distance, and point-to-surface distance. This paper then compares the state-of-the-art algorithms of point cloud completion and upsampling on common datasets and identifies the reasons for the differences in their performance. The recent progress and challenges in the field are then summarized, and future research trends are proposed. The findings are summarized as follows: 1) the point cloud features extracted by existing deep learning-based algorithms are highly global, which means that the local features related to the detailed structure cannot be captured well, thus resulting in poor detail reconstruction. Traditional geometric algorithms are known to effectively represent data features based on geometric information. Therefore, how to combine geometric algorithms with deep learning for point cloud quality enhancement is worth exploring. 2) Most algorithms are for dense point clouds of single objects, and only a few studies have focused on sparse LiDAR point clouds containing large-scale outdoor scenes. 3) Most of the related studies only consider the point cloud processing of a single frame and ignore the temporal correlation of point cloud sequences. Therefore, how to utilize the spatial-temporal correlation to improve quality enhancement performance warrants further investigation. 4) In existing methods, the proposed network models are often complex and the inference speed is relatively slow, which fail to meet the real-time requirements of several applications. Therefore, how to further reduce the scale of the model parameters and improve the inference speed is a research direction worth exploring. 5) Most of the existing methods only process the geometric information (3D coordinates) of point clouds and ignore the attribute information(e. g., color and intensity). Therefore, how to simultaneously enhance the quality of geometric and attribute information needs to be explored. Project page: https://github.com/LilydotEE/Point_cloud_quality_enhancement.
      	@ARTICLE{csig_zhao,
      		author={Chen Jianwen, Zhao Lili, Ren Lancao, Sun Zhuoqun, Zhang Xinfeng and Ma Siwei},
      		journal={Journal of Image and Graphics}, 
      		title={Deep learning-based quality enhancement for 3D point clouds: a survey}, 
      		year={2023},
      		volume={28},
      		number={11},
      		pages={3295-3319},
      		doi={10.11834/jig.221076}}
    2. Real-Time LiDAR Point Cloud Compression Using Bi-Directional Prediction and Range-Adaptive Floating-Point Coding
      Lili Zhao, Kai-Kuang Ma, Xuhu Lin, Wenyi Wang, and Jianwen Chen.
      IEEE Transactions on Broadcasting, vol. 68, no. 3, pp. 620-635, Sept. 2022.
      Abstract | Paper | BibTex Abstract: Due to the large amount of data involved in the three-dimensional (3D) LiDAR point clouds, point cloud compression (PCC) becomes indispensable to many real-time applications. In autonomous driving of connected vehicles for example, point clouds are constantly acquired along the time and subjected to be compressed. Among the existing PCC methods, very few of them have effectively removed the temporal redundancy inherited in the point clouds. To address this issue, a novel lossy LiDAR PCC system is proposed in this paper, which consists of the inter -frame coding and the intra -frame coding. For the former, a deep-learning approach is proposed to conduct bi-directional frame prediction using an asymmetric residual module and 3D space-time convolutions; the proposed network is called the bi-directional prediction network (BPNet). For the latter, a novel range-adaptive floating-point coding (RAFC) algorithm is proposed for encoding the reference frames and the B-frame prediction residuals in the 32-bit floating-point precision. Since the pixel-value distribution of these two types of data are quite different, various encoding modes are designed for providing adaptive selection. Extensive simulation experiments have been conducted using multiple point cloud datasets, and the results clearly show that our proposed PCC system consistently outperforms the state-of-the-art MPEG G-PCC in terms of data fidelity and localization, while delivering real-time performance.
      	@article{TBC_zhao2022,
      		author={Zhao, Lili and Ma, Kai-Kuang and Lin, Xuhu and Wang, Wenyi and Chen, Jianwen},
      		journal={IEEE Transactions on Broadcasting}, 
      		title={Real-Time LiDAR Point Cloud Compression Using Bi-Directional Prediction and Range-Adaptive Floating-Point Coding}, 
      		year={2022},
      		volume={68},
      		number={3},
      		pages={620-635},
      		doi={10.1109/TBC.2022.3162406}}
      
    3. Real-Time Scene-Aware LiDAR Point Cloud Compression Using Semantic Prior Representation
      Lili Zhao, Kai-Kuang Ma , Zhili Liu, Qian Yin, and Jianwen Chen.
      IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 8, pp. 5623-5637, Aug. 2022.
      Abstract | Paper | BibTex Abstract: Existing LiDAR point cloud compression (PCC) methods tend to treat compression as a fidelity issue, without sufficiently addressing its machine perception aspect. The latter issue is often encountered by the decoder agents that might aim to conduct scene-understanding related tasks only, such as computing the localization information. For tackling this challenge, a novel LiDAR PCC system is proposed to compress the point cloud geometry, which contains a back channel for allowing the decoder to initiate such request to the encoder. The key success of our PCC method lies in our proposed semantic prior representation (SPR) and its lossy encoding algorithm with variable precision to generate the final bitstream; the entire process is fast and achieves real-time performance. Note that our SPR is a compact and effective representation of three-dimensional (3D) input point clouds, and it consists of labels, predictions , and residuals . These information can be generated by first exploiting a scene-aware object segmentation to a set of 2D range images (frames) individually, which were generated from the 3D point clouds via a projection process. Based on the generated labels, the pixels associated with those moving objects are considered as noisy information and should be removed for not only saving bit budget on transmission but also, most importantly, improving the accuracy of localization computed at the decoder. Experimental results conducted on the commonly-used test dataset have shown that our proposed system outperforms the MPEG’s G-PCC (TMC13-v14.0) in a large bitrate range. In fact, the performance gap will become even larger when more and/or large moving objects are involved in the input point clouds.
      	@ARTICLE{CSVT_zhao,
      		author={Zhao, Lili and Ma, Kai-Kuang and Liu, Zhili and Yin, Qian and Chen, Jianwen},
      		journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
      		title={Real-Time Scene-Aware LiDAR Point Cloud Compression Using Semantic Prior Representation}, 
      		year={2022},
      		volume={32},
      		number={8},
      		pages={5623-5637},
      		doi={10.1109/TCSVT.2022.3145513}}
    4. A compatible framework for RGB-D SLAM in dynamic scenes
      Lili Zhao, Zhili Liu, Jianwen Chen, Weitong Cai, Wenyi Wang, and Liaoyuan Zeng.
      IEEE Access, vol.7, pp. 27936-27947, February 2019.
      Abstract | Paper | BibTex Abstract: Localization and mapping in a dynamic scene is a crucial problem for the indoor visual simultaneous localization and mapping (SLAM) system. Most existed visual odometry (VO) or SLAM systems are based on the assumption that the environment is static. The performance of a SLAM system may degenerate when it is operated in a severely dynamic environment. The assumption limits the applications of RGB-D SLAM in the dynamic environment. In this paper, we propose a workflow to segment the objects accurately, which will be marked as the potentially dynamic-object area based on the semantic information. A novel approach for motion detection and removal from the moving camera is introduced. We integrate the semantics-based motion detection and the segmentation approach with an RGB-D SLAM system. To evaluate the effectiveness of the proposed approach, we conduct the experiments on the challenging dynamic sequences of TUM-RGBD datasets. The experimental results suggest that our approach improves the accuracy of localization and outperforms the state-of-the-art dynamic-removal-based SLAM system in both severely dynamic and slightly dynamic scenes.
      	@ARTICLE{Zhao_2019,
      		author={Zhao, Lili and Liu, Zhili and Chen, Jianwen and Cai, Weitong and Wang, Wenyi and Zeng, Liaoyuan},
      		journal={IEEE Access}, 
      		title={A Compatible Framework for RGB-D SLAM in Dynamic Scenes}, 
      		year={2019},
      		volume={7},
      		number={},
      		pages={75604-75614},
      		doi={10.1109/ACCESS.2019.2922733}}

    Conference Publications

    1. Learning Spatial-Temporal Embeddings for Sequential Point Cloud Frame Interpolation
      Lili Zhao, Zhuoqun Sun, Lancao Ren, Qian Yin, Lei Yang, Meng Guo.
      IEEE International Conference on Image Processing (ICIP), 2023.
      Abstract | Paper | BibTex Abstract: A point cloud sequence is usually acquired at a low frame rate owing to the limitations from the sensing equipment. Consequently, the immersive experience of the virtual reality might be greatly degraded. To tackle this issue, a point cloud frame interpolation process can be used to increase the frame rate of the acquired point cloud sequence by generating new frames between the consecutive ones. However, it is still challenging for deep neural networks to synthesize high-fidelity point clouds, especially for those with complex geometric details and large motion. In this paper, a novel frame interpolation network is proposed, which jointly exploits the spatial features and flows. The key success of our method lies in the developed spatial-temporal feature propagation module and temporal-aware feature-to-point mapping module. The former effectively embeds the spatial features and scene flows into a spatial-temporal feature representation (STFR). The latter generates a much improved target frame from STFR. Extensive experimental results have demonstrated that our method has achieved the best performance in most cases.
      	@INPROCEEDINGS{Zhao_icip23,
      		author={Zhao, Lili and Sun, Zhuoqun and Ren, Lancao and Yin, Qian and Yang, Lei and Guo, Meng},
      		booktitle={2023 IEEE International Conference on Image Processing (ICIP)}, 
      		title={Learning Spatial-Temporal Embeddings for Sequential Point Cloud Frame Interpolation}, 
      		year={2023},
      		volume={},
      		number={},
      		pages={810-814},
      		doi={10.1109/ICIP49359.2023.10221958}}
    2. Spatial-Temporal Consistency Refinement Network for Dynamic Point Cloud Frame Interpolation
      Lancao Ren, Lili Zhao, Zhuoqun Sun, Zhipeng Zhang, Jianwen Chen.
      IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2023.
      Abstract | Paper | BibTex Abstract: Point cloud frame interpolation aims to improve the frame rate of a point cloud sequence by synthesising intermediate frames between consecutive frames. Most of the existing works only use the scene flow or features, not fully exploring their local geometry context or temporal correlation, which results in inaccurate local structural details or motion estimation. In this paper, we organically combine scene flows and features to propose a two-stage network based on residual-learning, which can generate spatially and temporally consistent interpolated frames. At the Stage 1, we propose the spatial-temporal warping module to effectively integrate multi-scale local and global spatial features and temporal correlation into a fusion feature, and then transform it into a coarse interpolated frame. At the Stage 2, we introduce the residual-learning structure to conduct spatial-temporal consistency refinement. A temporal-aware feature aggregation module is proposed, which can facilitate the network adaptively adjusting the contributions of spatial features from input frames, and predict the point-wise offset as the compensations due to coarse estimation errors. The experimental results demonstrate our method achieves the state-of-the-art performance on most benchmarks with various interpolated modes. Code is available at https://github.com/renlancao/SR-Net.
      	@INPROCEEDINGS{Ren_icme23,
      		author={Ren, Lancao and Zhao, Lili and Sun, Zhuoqun and Zhang, Zhipeng and Chen, Jianwen},
      		booktitle={2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)}, 
      		title={Spatial-Temporal Consistency Refinement Network for Dynamic Point Cloud Frame Interpolation}, 
      		year={2023},
      		volume={},
      		number={},
      		pages={428-433},
      		doi={10.1109/ICMEW59549.2023.00080}}
    3. Rangeinet: Fast Lidar Point Cloud Temporal Interpolation
      Lili Zhao, Xuhu Lin, Wenyi Wang, Kai-Kuang Ma, and Jianwen Chen.
      IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
      Abstract | Paper | BibTex Abstract: Due to the low scan rate of LiDAR sensors, LiDAR point cloud streams usually have a low frame rate, which is far below that of other sensors such as cameras. This could incur frame rate mismatch while conducting multi-sensor data fusion. LiDAR point cloud temporal interpolation aims to synthesize the non-existing intermediate frame between input frames to improve the frame rate of point clouds. However, the existing methods heavily depend on 3D scene flow or 2D flow estimation, which yield huge computational complexity and obstacles in real-time applications. To resolve this issue, we propose a fast and non-flow involved method, which analyzes the LiDAR point cloud by exploiting its corresponding 2D range images (RIs). Specifically, we develop a Siamese context extractor containing asymmetrical convolution kernels to learn the shape context and spatial feature of RIs, and the 3D space-time convolutions are introduced to precisely capture the temporal characteristics. Experimental results have clearly shown that our method is much faster than the state-of-the-art LiDAR point cloud temporal interpolation methods on various datasets, while delivering either comparable or superior frame interpolation performance.
      	@INPROCEEDINGS{icassp2022,
      		author={Zhao, Lili and Lin, Xuhu and Wang, Wenyi and Ma, Kai-Kuang and Chen, Jianwen},
      		booktitle={Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
      		title={Rangeinet: Fast Lidar Point Cloud Temporal Interpolation}, 
      		year={2022},
      		volume={},
      		number={},
      		pages={2584-2588},
      		doi={10.1109/ICASSP43922.2022.9747825}}
    4. MVFI-Net: Motion-aware Video Frame Interpolation Network
      Xuhu Lin, Lili Zhao, Xi Liu, and Jianwen Chen
      Asian Conference on Computer Vision (ACCV), 2022.
      Abstract | Paper | BibTex Abstract: Video frame interpolation (VFI) is to synthesize the intermediate frame given successive frames. Most existing learning-based VFI methods generate each target pixel by using the warping operation with either one predicted kernel or flow, or both. However, their performances are often degraded due to the issues on the limited direction and scope of the reference regions, especially encountering complex motions. In this paper, we propose a novel motion-aware VFI network (MVFI-Net) to address these issues. One of the key novelties of our method lies in the newly developed warping operation, i.e., motion-aware convolution (MAC). By predicting multiple extensible temporal motion vectors (MVs) and filter kernels for each target pixel, the direction and scope could be enlarged simultaneously. Besides, we first attempt to incorporate the pyramid structure into the kernel-based VFI, which can decompose large motions into smaller scales to improve the prediction efficiency. The quantitative and qualitative experimental results have demonstrated the proposed method delivers the state-of-the-art performance on the diverse benchmarks with various resolutions.
      	@InProceedings{Lin_2022_ACCV,
      		author    = {Lin, XuHu and Zhao, Lili and Liu, Xi and Chen, Jianwen},
      		title     = {MVFI-Net: Motion-aware Video Frame Interpolation Network},
      		booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)},
      		month     = {December},
      		year      = {2022},
      		pages     = {3690-3706}
      	}
    5. Deep Inter Prediction via Reference Frame Interpolation for Blurry Video Coding
      Zezhi Zhu, Lili Zhao, Xuhu Lin, Xuezhou Guo, and Jianwen Chen.
      IEEE International Conference on Visual Communications and Image Processing (VCIP), Munich, Germany, 2021.
      Abstract | Paper | BibTex Abstract: In High Efficiency Video Coding (HEVC), inter prediction is an important module for removing temporal redundancy. The accuracy of inter prediction is much affected by the similarity between the current and reference frames. However, for blurry videos, the performance of inter coding will be degraded by varying motion blur, which is derived from camera shake or the acceleration of objects in the scene. To address this problem, we propose to synthesize additional reference frame via the frame interpolation network. The synthesized reference frame is added into reference picture lists to supply more credible reference candidate, and the searching mechanism for motion candidates is changed accordingly. In addition, to make our interpolation network more robust to various inputs with different compression artifacts, we establish a new blurry video database to train our network. With the well-trained frame interpolation network, compared with the reference software HM-16.9, the proposed method achieves on average 1.55% BD-rate reduction under random access (RA) configuration for blurry videos, and also obtains on average 0.75% BD-rate reduction for common test sequences.
      	@INPROCEEDINGS{vcip2021,
      		author={Zhu, Zezhi and Zhao, Lili and Lin, Xuhu and Guo, Xuezhou and Chen, Jianwen},
      		booktitle={Proceedings of the International Conference on Visual Communications and Image Processing (VCIP)}, 
      		title={Deep Inter Prediction via Reference Frame Interpolation for Blurry Video Coding}, 
      		year={2021},
      		volume={},
      		number={},
      		pages={1-5},
      		doi={10.1109/VCIP53242.2021.9675429}}
    6. Lossless Point Cloud Attribute Compression with Normal-based Intra Prediction
      Qian Yin, Qingshan Ren, Lili Zhao, Wenyi Wang, and Jianwen Chen
      IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Chengdu, China, 2021.
      Abstract | Paper | BibTex Abstract: The sparse LiDAR point clouds become more and more popular in various applications, e.g., the autonomous driving. However, for this type of data, there exists much under-explored space in the corresponding compression framework proposed by MPEG, i.e., geometry-based point cloud compression (G-PCC). In G-PCC, only the distance-based similarity is considered in the intra prediction for the attribute compression. In this paper, we propose a normal-based intra prediction scheme, which provides a more efficient lossless attribute compression by introducing the normals of point clouds. The angle between normals is used to further explore accurate local similarity, which optimizes the selection of predictors. We implement our method into the G-PCC reference software. Experimental results over LiDAR acquired datasets demonstrate that our proposed method is able to deliver better compression performance than the G-PCC anchor, with 2.1% gains on average for lossless attribute coding.
      	@INPROCEEDINGS{bmsb1,
      		author={Yin, Qian and Ren, Qingshan and Zhao, Lili and Wang, Wenyi and Chen, Jianwen},
      		booktitle={Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)}, 
      		title={Lossless Point Cloud Attribute Compression with Normal-based Intra Prediction}, 
      		year={2021},
      		volume={},
      		number={},
      		pages={1-5},
      		doi={10.1109/BMSB53066.2021.9547021}}
    7. RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network
      Lili Zhao, Zezhi Zhu, Xuhu Lin, Xuezhou Guo, Qian Yin, and Jianwen Chen.
      IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Chengdu, China, 2021.
      Abstract | Paper | BibTex Abstract: LiDAR point cloud frame interpolation, which synthesizes the intermediate frame between the captured frames, has emerged as an important issue for many applications. Especially for reducing the amounts of point cloud transmission, it is by predicting the intermediate frame based on the reference frames to upsample data to high frame rate ones. However, due to high-dimensional and sparse characteristics of point clouds, it is more difficult to predict the intermediate frame for LiDAR point clouds than videos. In this paper, we propose a novel LiDAR point cloud frame interpolation method, which exploits range images (RIs) as an intermediate representation with CNNs to conduct the frame interpolation process. Considering the inherited characteristics of RIs differ from that of color images, we introduce spatially adaptive convolutions to extract range features adaptively, while a high-efficient flow estimation method is presented to generate optical flows. The proposed model then warps the input frames and range features, based on the optical flows to synthesize the interpolated frame. Extensive experiments on the KITTI dataset have clearly demonstrated that our method consistently achieves superior frame interpolation results with better perceptual quality to that of using state-of-the-art video frame interpolation methods. The proposed method could be integrated into any LiDAR point cloud compression systems for inter prediction.
      	@INPROCEEDINGS{bmsb2,
      		author={Zhao, Lili and Zhu, Zezhi and Lin, Xuhu and Guo, Xuezhou and Yin, Qian and Wang, Wenyi and Chen, Jianwen},
      		booktitle={Proceedings of the IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)}, 
      		title={RAI-Net: Range-Adaptive LiDAR Point Cloud Frame Interpolation Network}, 
      		year={2021},
      		volume={},
      		number={},
      		pages={1-6},
      		doi={10.1109/BMSB53066.2021.9547131}}
    8. An Unsupervised Optical Flow Estimation for Lidar Image Sequences
      Xuezhou Guo, Xuhu Lin, Lili Zhao, Zezhi Zhu, and Jianwen Chen.
      IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 2021.
      Abstract | Paper | BibTex Abstract: In recent years, the LiDAR images, as a 2D compact representation of 3D LiDAR point clouds, are widely applied in various tasks, e.g., 3D semantic segmentation, LiDAR point cloud compression (PCC). Among these works, the optical flow estimation for LiDAR image sequences has become a key issue, especially for the motion estimation of the inter prediction in PCC. However, the existing optical flow estimation models are likely to be unreliable for LiDAR images. In this work, we first propose a light-weight flow estimation model for LiDAR image sequences. The key novelty of our method lies in two aspects. One is that for the different characteristics (with the spatial-variation feature distribution) of the LiDAR images w.r.t. the normal color images, we introduce the attention mechanism into our model to improve the quality of the estimated flow. The other one is that to tackle the lack of large-scale LiDAR-image annotations, we present an unsupervised method, which directly minimizes the inconsistency between the reference image and the reconstructed image based on the estimated optical flow. Extensive experimental results have shown that our proposed model outperforms other mainstream models on the KITTI dataset, with much fewer parameters.
      	@INPROCEEDINGS{Guo2019,
      		author={Guo, Xuezhou and Lin, Xuhu and Zhao, Lili and Zhu, Zezhi and Chen, Jianwen},
      		booktitle={Proceedings of the IEEE International Conference on Image Processing (ICIP)}, 
      		title={An Unsupervised Optical Flow Estimation for Lidar Image Sequences}, 
      		year={2021},
      		volume={},
      		number={},
      		pages={2613-2617},
      		doi={10.1109/ICIP42928.2021.9506376}}
    9. PRED: A Parallel Network for Handling Multiple Degradations via Single Model in Single Image Super-Resolution
      Guangyang Wu, Lili Zhao, Wenyi Wang, Liaoyuan Zeng, and Jianwen Chen
      IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019.
      Abstract | Paper | BibTex Abstract: Existing SISR (single image super-resolution) methods mostly assume that a low-resolution (LR) image is bicubicly down-sampled from its high-resolution (HR) counterpart, which inevitably give rise to poor performance when the degradation is out of assumption. To address this issue, we propose a framework PRED (parallel residual and encoder-decoder network) with an innovative training strategy to enhance the robustness to multiple degradations. Consequently, the network can handle spatially variant degradations, which significantly improves the practicability of the proposed method. Extensive experimental results on real LR images show that the proposed method can not only produce favorable results on multiple degradations, but also reconstruct visually plausible HR images.
      	@INPROCEEDINGS{Wu_2019,
      		author={Wu, Guangyang and Zhao, Lili and Wang, Wenyi and Zeng, Liaoyuan and Chen, Jianwen},
      		booktitle={Proceedings of the IEEE International Conference on Image Processing (ICIP)}, 
      		title={PRED: A Parallel Network for Handling Multiple Degradations via Single Model in Single Image Super-Resolution}, 
      		year={2019},
      		volume={},
      		number={},
      		pages={2881-2885},
      		doi={10.1109/ICIP.2019.8804409}}
    10. Efficient screen content coding based on convolutional neural network guided by a large-scale database
      Lili Zhao, Zhiwen Wei, Weitong Cai, Wenyi Wang, Liaoyuan Zeng, and Jianwen Chen.
      IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019.
      Abstract | Paper | BibTex Abstract: Screen content videos (SCVs) are becoming popular in many applications. Compared with natural content videos (NCVs), the SCVs have different characteristics. Therefore, the screen content coding (SCC) based on HEVC adopts some new coding tools (intra block copy and palette mode etc.) to improve coding efficiency, but these tools increase the computational complexity as well. In this paper, we propose to predict the CU partition of the SCVs by a convolutional neural network (CNN) which is trained by the large-scale database that we firstly established for screen content coding. The proposed approach is implemented in SCC reference software SCM-6.1. Experimental results show that our proposed approach can save 53.2% encoding time with 2.67% BD-rate increase on average in All Intra (AI) configurations.
      	@INPROCEEDINGS{scc_icip2019,
      		author={Zhao, Lili and Wei, Zhiwen and Cai, Weitong and Wang, Wenyi and Zeng, Liaoyuan and Chen, Jianwen},
      		booktitle={Proceedings of the IEEE International Conference on Image Processing (ICIP)}, 
      		title={Efficient Screen Content Coding Based on Convolutional Neural Network Guided by a Large-Scale Database}, 
      		year={2019},
      		volume={},
      		number={},
      		pages={2656-2660},
      		doi={10.1109/ICIP.2019.8803294}}
    11. High Efficient VR Video Coding Based on Auto Projection Selection Using Transferable Features
      Lili Zhao, Meng Zhang, Wenyi Wang, Rumin Zhang, Liaoyuan Zeng, and Jianwen Chen.
      IEEE Visual Communications and Image Processing (VCIP), Taichung, Taiwan, 2018.
      Abstract | Paper | BibTex Abstract: Given multiple texture projection methods from the sphere surface to the planar surface, this paper proposes an adaptive selection mode that automatically chooses the appropriate projection method to obtain high compression efficiency of the VR video. The video compression efficiency is inherently affected by the video content, which is closely related to the projection method in the case of VR video encoding. In order to represent the VR video content in a compact manner, a feature vector (transferable feature) for each frame is extracted by a Res-CNN which is pre-trained by a large scale data set for general classification. Afterwards, the relation between the feature and the optimal projection method is investigated by using PCA-KNN, which can project the initial feature vector to a subspace where the VR videos can be efficiently classified with low ambiguity. The experimental results show that the proposed method can select the appropriate projection method that generates the best BD rate.
      	@INPROCEEDINGS{vrcoding2018,
      		author={Zhao, Lili and Zhang, Meng and Wang, Wenyi and Zhang, Rumin and Zeng, Liaoyuan and Chen, Jianwen},
      		booktitle={Proceedings of the IEEE Visual Communications and Image Processing (VCIP)}, 
      		title={High Efficient VR Video Coding Based on Auto Projection Selection Using Transferable Features}, 
      		year={2018},
      		volume={},
      		number={},
      		pages={1-4},
      		doi={10.1109/VCIP.2018.8698628}}

    Patents

    1. Neural Network-based Video Prediction Coding
      Lili Zhao, Meng Zhang, Wenyi Wang, and Rumin Zhang
      No. CN108924558B[P], October 2021. (Chinese Patent)
    2. A VR social network system based on real-time 3D human reconstruction
      Xiongfeng Peng, Lili Zhao, Liaoyuan Zeng, Jianwen Chen, Rumin Zhang, and Wenyi Wang
      No. CN107194964B[P], October 2020. (Chinese Patent)
Top