Xuanlin (Simon) Li

I am a second-year PhD student at UCSD CSE, advised by Prof. Hao Su (2021-). Previously I was an undergraduate majoring in Mathematics and Computer Science at UC Berkeley (2017-2021). I was also an undergraduate research assistant at Berkeley Artificial Intelligence Research, where I was advised by Prof. Trevor Darrell.

GitHub  /  Google Scholar  /  LinkedIn  /  Twitter

profile photo


I am interested in many aspects of computer vision, robotics, and NLP. In particular, I'm interested in embodied AI and building general 3D/2D perception and policy models to enable robots to acquire concepts and generalizable skills.
(* = equal first-author contribution)

project image

Frame Mining: a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds

Minghua Liu*, Xuanlin Li*, Zhan Ling*, Yangyan Li, Hao Su
Conference on Robot Learning (CoRL) 2022
paper /

We study how choices of input point cloud coordinate frames affect learning of manipulation skills from 3D point clouds. There exist a variety of coordinate frame choices to normalize captured robot-object-interaction point clouds. We find that different frame choices have a profound impact on agent learning performance, and the trend is similar across 3D backbone networks. In particular, the end-effector frame and the target-part frame achieve higher training efficiency than the commonly used world frame and robot-base frame in many tasks, and we analyze how they provide helpful alignments among point clouds across time steps and thus simplifying visual module learning. Moreover, the well-performing frames vary across tasks, and some tasks may benefit from multiple frame candidates. We thus propose FrameMiners to adaptively select candidate frames and fuse their merits in a task-agnostic manner. Experimentally, FrameMiners achieves on-par or significantly higher performance than the best single-frame version on five fully physical manipulation tasks adapted from ManiSkill and OCRTOC. Without changing existing camera placements or adding extra cameras, point cloud frame mining can serve as a free lunch to improve 3D manipulation learning.

project image

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

Jiayuan Gu*, Fanbo Xiang*, Zhan Ling*, Xinyue Wei*, Xiqiang Liu*, Xuanlin Li*, Rui Chen*, Stone Tao*, Tongzhou Mu*, Pengwei Xie*, Yunchao Yao*, Yihe Tang, Xiaodi Yuan, Zhiao Huang, Hao Su
arxiv / website / code / implementation /

We present ManiSkill2, the next generation of the SAPIEN ManiSkill benchmark, to address critical pain points often encountered by researchers when using benchmarks for generalizable manipulation skills. ManiSkill2 includes 20 manipulation task families with 2000+ object models and 4M+ demonstration frames, which cover stationary/mobile-base, single/dual-arm, and rigid/soft-body manipulation tasks with 2D/3D-input data simulated by fully dynamic engines. It defines a unified interface and evaluation protocol to support a wide range of algorithms (e.g., classic sense-plan-act, RL, IL), visual observations (point cloud, RGBD), and controllers (e.g., action type and parameterization). Moreover, it empowers fast visual input learning algorithms so that a CNN-based policy can collect samples at about 2000 FPS with 1 GPU and 16 processes on a regular workstation.

project image

ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

Tongzhou Mu*, Zhan Ling*, Fanbo Xiang*, Derek Yang*, Xuanlin Li*, Stone Tao, Zhiao Huang, Zhiwei Jia, Hao Su
Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track, 2021
arxiv / website / video / code / implementation /

Object manipulation from 3D visual inputs poses many challenges on building generalizable perception and policy models. However, 3D assets in existing benchmarks mostly lack the diversity of 3D shapes that align with real-world intra-class complexity in topology and geometry. Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark manipulation skills over diverse objects in a full-physics simulator. 3D assets in ManiSkill include large intra-class topological and geometric variations. Tasks are carefully chosen to cover distinct types of manipulation challenges. Latest progress in 3D vision also makes us believe that we should customize the benchmark so that the challenge is inviting to researchers working on 3D deep learning. To this end, we simulate a moving panoramic camera that returns ego-centric point clouds or RGB-D images. In addition, we would like ManiSkill to serve a broad set of researchers interested in manipulation research. Besides supporting the learning of policies from interactions, we also support learning-from-demonstrations (LfD) methods, by providing a large number of high-quality demonstrations (~36,000 successful trajectories, ~1.5M point cloud/RGB-D frames in total). We provide baselines using 3D deep learning and LfD algorithms. All code of our benchmark (simulator, environment, SDK, and baselines) is open-sourced, and a challenge facing interdisciplinary researchers will be held based on the benchmark.

project image

Discovering Non-Monotonic Autoregressive Orderings with Variational Inference

Xuanlin Li*, Brandon Trabucco*, Dong Huk Park, Yang Gao, Michael Luo, Sheng Shen, Trevor Darrell
International Conference on Learning Representations (ICLR) 2021
arxiv / video_transcripts / code / poster / slides /

We propose the first domain-independent unsupervised / self-supervised learner that discovers high-quality autoregressive orders through fully-parallelizable end-to-end training in a data-driven manner - no domain knowledge required. The learner contains an encoder network and decoder language model that perform variational inference with autoregressive orders (represented as permutation matrices) as latent variables. The corresponding ELBO is not differentiable, so we develop a practical algorithm for end-to-end optimization using policy gradients. We implement the encoder as a Transformer with non-causal attention that outputs permutations in one forward pass. Permutations then serve as target generation orders for training an insertion-based Transformer language model. Empirical results in language modeling tasks demonstrate that our method is context-aware and discovers orderings that are competitive with or even better than fixed orders.

project image

Improving Policy Optimization with Generalist-Specialist Learning

Zhiwei Jia, Xuanlin Li, Zhan Ling, Shuang Liu, Yiran Wu, Hao Su
International Conference on Machine Learning (ICML) 2022
arxiv / website / code /

In large-scale RL, a “generalist” agent jointly trained on all goals tends to learn faster at the beginning, yet it often suffers from catastrophic ignorance & forgetting, leading to suboptimal final performance. In contrast, a “specialist” agent trained only on a few variations can often achieve high final returns, yet its initial learning has low sample efficiency. To have the best of both worlds, we propose GSL, a novel generalist-specialist learning framework and a well-principled meta-algorithm. We show that our framework pushes the envelope of policy learning on many challenging and popular benchmarks including Procgen, Meta-World and ManiSkill.

project image

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

Zhuang Liu*, Xuanlin Li*, Bingyi Kang, Trevor Darrell
International Conference on Learning Representations (ICLR) 2021 (Spotlight)
arxiv / video / code / poster / slides /

We present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks. We show that conventional regularization methods in supervised learning, which have been largely ignored in RL methods, can be very effective in policy optimization on continuous control tasks, and our finding is robust against training hyperparameter variations. We also analyze why they can help policy generalization from sample complexity, return distribution, weight norm, and noise robustness perspectives.

Other Projects

These include coursework, side projects and unpublished research work.

project image

Inferring the Optimal Policy using Markov Chain Monte Carlo

Brandon Trabucco, Albert Qu, Xuanlin Li, Ganeshkumar Ashokavardhanan
Berkeley EECS 126 (Probability and Random Processes)
arxiv /

Final course project for EECS 126 (Probability and Random Processes) in Fall 2018.

Honors and Awards

Jacobs School of Engineering PhD Fellowship, UC San Diego CSE, 2021
Arthur M. Hopkin Award, UC Berkeley EECS, 2021
EECS Honors Program & Mathematics Honors Program, UC Berkeley

Design and source code from Jon Barron's website