Rushikesh Zawar

  1. About 2. Publications. 3. Experiences 4. Projects 5. Contact

Hi 👋 I‘m Rushikesh Zawar

I am currently a Grad student in Computer Vision at CMU (Carnegie Mellon University).

I have previously completed Bachelor's in Computer Science and a Master's in Biological Sciences at BITS Pilani, Pilani Campus, India, and partly at Harvard University.

My interests are in Generative vision (Diffusion, GANs), in images, video, and also in 3D. I am also quite interested in interpretability and explainable AI (especially for VLMs). I am also quite interested in neuroscience, genetics, etc.

I will be graduating in Dec 2024. I am looking for full-time opportunities that I can join starting January 2025!

Publications:

  1. Zawar, R.* Dewan, S.*, et al. DiffusionPID: Interpreting Diffusion via Partial Information Decomposition (link) | Submitted at Neurips 2024

Developed a novel method that can split the Mutual Information between 2 input words in a text prompt in the generated image into its inherent components of: Synergy, Redundancy and Uniqueness. Our method can help get masks of the area that each of these individual components corresponds to and also get a quantifiable value

Paper Link: https://arxiv.org/abs/2406.05191


  1. Zawar, R.*, Dewan, S.*, et al. StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Image (link) | Submitted at Neurips 2024

Leveraging human-generated prompts that correspond to visually interesting stable diffusion generations, we provide 10 image generations per phrase, and extract cross-attention maps for each image. We explore the semantic distribution of generated images, examine the distribution of objects within images, and benchmark captioning and open vocabulary segmentation methods on our data.

Paper Link: https://arxiv.org/html/2406.13735v1

Project Page: https://stablesemantics.github.io/StableSemantics/


  1. Litman, Y., et al. MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors (link) | Submitted at 3DV 2025

We present StableMaterial, a 2D diffusion model prior that refines multi-lit data to estimate the most likely albedo and material from given input appearances. This model is trained on albedo, material, and relit image data derived from a curated dataset of approximately 12K artist-designed synthetic Blender objects called BlenderVault. The SDS signal from the 2D model is used in conjunction with the inverse rendering loss, improving the estimation of albedo and material

Paper Link: https://drive.google.com/file/d/1zHp4y6y4j_1SgmpJrCpa8y53hh8Fpeyx/view?usp=sharing


  1. Zawar, R., et al. Effect of Jensen-Shannon Divergence in Safe Multiagent RL | Accepted at ICLR 2024 (Tiny paper)

    Here, we extend the Multi-Agent Constrained Policy Optimisation (MACPO) approach that maintains policy consistency using Kullback-Leibler (KL) divergence. We find that Jensen-Shannon (JS) Divergence, a symmetric measure, serves as a better alternative to KL divergence; its symmetric nature is more forgiving of extreme differences in policies.

    (Work done as part of a project of an introductory course on Reinforcement Learning)

    Paper Link: https://openreview.net/pdf/a6a17722b98fbfca3fda8a043ac1d1bb10a0e5c5.pdf


  1. T El-Gaaly, Zawar, R., et al. (2023). Understanding Diffusion Model Images and Detection in the Frequency Domain.

    In this paper, we study DM-generated images, and discover that such images contain signature artifacts – not only in the spatio-visual domain in the form of peculiar patterns/textures, and lack symmetry but also in the frequency domain, where, DM generated images have lower spectral power at the high-end of the frequency spectrum; whereas state-of-the-art GAN generated and real images do not exhibit this bias. This can be interpreted in the spatio-visual domain as overly-smooth images that may look aesthetically pleasing, but from detection perspective are a tell-tale signature of DM generated images and generally do not occur in natural images.

    Paper Link:


  1. Talbot, M. B.*, Zawar, R.*, et al. (2022). Lifelong Compositional Feature Replays Beat Image Replays in Stream Learning (* equal contribution) | TNNLS Journal

We propose a new continual learning algorithm, Compositional Replay Using Memory Blocks (CRUMB), which mitigates forgetting by replaying feature maps reconstructed by recombining generic parts. Just as crumbs together form a loaf of bread, we concatenate trainable and re-usable \enquote{memory block} vectors to compositionally reconstruct feature map tensors in convolutional neural networks.

Paper Link: https://arxiv.org/abs/2104.02206


  1. Zawar, R. et al. (2022). Detecting Anomalies using Generative Adversarial Networks on Images | IMAVIS Journal

In this work we developed an anomaly detection method using GANs, where the GAN learns the distribution of normal images helping it detect any outlier. By using 3 loses the Generator and discrimination are used for the classification.

Paper Link: https://arxiv.org/abs/2104.02206

Work Experiences:

  1. Research Scientist Intern, Adobe Research | Dr. Xue Bai, Dr. Aseem Agrawala May 2024 - Present

    Seattle, WA, USA

    • Generating and evaluating video and caption generation and alignment for video generation and modifying video embeddings for building meta data for video generation model. Predicting light angles and other film aspects in videos. Created a research tool for meta data insights into millions of videos. Actively used by 5+ teams.
    • Design, curate and setup data for annotation from humans for more meta data and alignment.

  1. Research Engineer, Reality Defender | Dr. Gaurav Bharaj Aug 2022 - Aug 2023

    Remote (New York, USA)
    • Found frequency patterns and built and deployed a novel diffusion image classifier based on frequency domain analysis, which also cross-generalizes (to unseen image classes).
    • Built and deployed a self-supervised classifier, that can identify faceswap type of fake images, just by training on real images.
    • Built a robust and invisible fingerprint/watermark-based method to mark & identify original images even after many types of real-world media modifications.

  1. Applied Scientist Intern, Amazon | Shobhit Niranjan Jan 2022 - June 2022

    Banglore, India

    • Built a Systematic Outlier Detection method to catch fraud & abuse in online shopping with custom hierarchical clustering and graphs methods with Python, SQL, AWS, etc.
    • The developed method is live and adapts to new data. It captures fraud orders worth an average USD 1.1 million monthly (improvement of about USD 140k) throughout India.

  1. Researcher, Harvard University | Dr. Gabriel Kreiman June 2021 - Aug 2023

    Cambridge, MA, USA

    • Built a continual learning model based on a brain-inspired replay mechanism to avoid catastrophic forgetting for multiple image classification tasks.
    • Used streams of video data to make the model capable of learning with only 1 pass through the dataset.
    • Improved model's accuracy by 22.5% which beats the top benchmarks by 11%. Accepted at TNNLS as a co-first author of the paper.

  1. Research Intern, MIT | Dr. Pattie Maes May - Aug 2020

    Cambridge, MA, USA

    • Worked on building a personalized assisted learning system to help with habit changes.
    • Used sample-efficient reinforcement learning (RL), Bayesian learning methods, and probabilistic models to recommend behavior change interventions using human-in-the-loop RL
    • Created a custom gym environment, policy module, and simulator for human behavior. Created & analyzed many visualizations for the same

  1. Research Intern, Tata Institute of Fundamental Research (TIFR) | Dr. Rahul Vaze April - July 2020

    Remote (Mumbai, India)

    • Worked in Game Theory to develop a cake-cutting algorithm to solve the problem of fair resource allocation. It works for any number of agents, and allows for arbitrary valuations over the cake.
    • The allocation resulting from the algorithm ensures an envy-freeness restricted to any arbitrarily small number: epsilon. It further ensures that either the resource (cake) gets exhausted, or all the agents get their share

  1. Intern, Map my India | Ritesh Arora May - July 2019

    Delhi, India

    • Developed a Kalman filter for path estimation of vehicles in low signal areas.

Projects

Contact:

Email: rzawar@andrew.cmu.edu, rushikeshzawar10@gmail.com

Linkedin : https://www.linkedin.com/in/rushikesh-zawar/

📍Seattle, WA, USA.