About Me

sample-image Hi! I am Neeraj Sharma, a postdoctoral researcher at the Carnegie Mellon University, Pittsburgh. My research focus is on understanding how human brain analyses conversational speech. Conversational speech has rich intonation, loudness variation, overlapping utterances, non-speech sounds, and talkers speaking in turns. These features maximize the information exchange between the talkers. In my research, we are studying this information exchange by doing behavioral and EEG experiments with conversational speech stimuli. Parallely, we are also interested in improvising the automatic speech recognition systems using the understanding of how humans attend to conversational speech. I am working with the mentorship of Prof. Lori Holt (BrainHub, at CMU) and Prof. Sriram Ganapathy (LEAP Lab, at IISc). My research is generously funded my the BrainHub, at CMU.
Prior to my above PostDoc career, I did a doctoral thesis on "Information-rich sampling of time-varying signals", at the Indian Institute of Science, India. I pursued my research at the wonderful Speech and Audio Group, Dept. ECE (at IISc) lead by Prof. T. V. Sreenivas.
I joined Indian Institute of Science (IISc) in 2009 after completing a B.Tech in Instrumentation and Electronics Engineering from College of Engineering and Technology, Bhubaneswar.
EMAIL-ID:
X@Y.com where, X is neerajww and Y is gmail


Courses Taken

Random Processes, Pattern Recognition and Neural Networks, Time-Frequency Analysis, Adaptive Signal Pro- cessing, Matrix Theory, Digital Signal Compression, Non-linear Signal Processing, Stochastic Models for Speech Recognition, Digital Image Processing, Introduction to Neuroscience. The above courses are the few I took from the big list at IISc.

Teaching Assistantship

Time Frequency Analysis (E9-213) in Jan-2012. Course was offered by Prof. Chandra Sekhar Seelamantula. Signal Quantization and Compression (E9-221) in Aug-2011. Course was offered by Prof. T. V. Sreenivas.

Internships, Conferences, and Workshops

Audition Lab, Ecole Normale Superiere (ENS), Paris, [14 Apr to 08 June, 2014]
I was a Visiting Student at Audition Lab. I worked on an interesting concept of designing auditory skectches. I worked with mentorship and support from Daniel Pressnitzer, and Laurent Daudet who had initially proposed the concept. I carried the concept further, and designed auditory sketches of a sound by jointly using peaks in time-frequency and rate-scale-time-frequency planes. I also proposed a metric to quantify the notion of auditory sketches suitable to compare quantify two sketches. The work is in progress and likely we will summarize it someday soon.
Winter School in Speech and Audio Processing (WiSSAP), 2010-15, India
I have attended all the WiSSAPs in this time span. It is a yearly workshop, and a very good learning experience to broaden what I do not know, and should know! In WiSSAP-2015 I gave a talk on Auditory modeling in the workshop. All the talks are hosted here: click.
Mechanics of Hearing (MoH), 2014, Athens
This is one of the best workshop I have attended. With all excellent researchers in one hall, and examining each others insights, it was amazing. I had a poster here. Got in touch with wonderfull people in auditory modeling.
Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), 2012, Kyoto
My first outside India conference visit. Again very nice experience.
Int. Conf. Signal Processing and Communication (SPCOM), 2010,-12,-14, IISc Bangalore
My first conference presentation in IISc. Our paper was selected in the top papers in the conference.
Student Chapter Leadership Workshop at Photonics Europe, 2014, Brussels
What makes a leader a leader! Spent 6 hrs with close to 20 people meeting first time, and each from a different country. Each sounded his/her thoughts on variety of topics and case-studies. Realized - Once you talk, it is then very easy to talk :-).
Workshops on Signal Processing, Machine Learning in IISc, 2009-14
Learnt lot with a trade-off of time consumption.

Peer-reviewed Published Findings

Sparse signal reconstruction based on signal dependent non-uniform samples (In ICASSP'12, Kyoto)

Click to expand.

Hide this content.

The classical approach to A/D conversion has been uniform sampling and we get perfect reconstruction for bandlimited signals by satisfying the Nyquist Sampling Theorem. We propose a non-uniform sampling scheme based on level crossing (LC) time information. We show stable reconstruction of bandpass signals with correct scale factor and hence a unique reconstruction from only the non-uniform time information. For reconstruction from the level crossings we make use of the sparse reconstruction based optimization by constraining the bandpass signal to be sparse in its frequency content. While overdetermined system of equations is resorted to in the literature we use an undetermined approach along with sparse reconstruction formulation. We could get a reconstruction SNR >20dB and perfect support recovery with probability close to 1, in noise-less case and with lower probability in the noisy case. Random picking of LC from different levels over the same limited signal duration and for the same length of information, is seen to be advantageous for reconstruction.

Hide this content.

Event-triggered sampling and reconstruction of sparse trigonometric polynomials (In SPCOM'14, Bangalore)

Click to expand.

Hide this content.

We propose data acquisition from continuous-time signals belonging to the class of real-valued trigonometric polynomials using an event-triggered sampling paradigm. The sampling schemes proposed are: level crossing (LC), close to extrema LC, and extrema sampling. Analysis of robustness of these schemes to jitter, and bandpass additive gaussian noise is presented. In general these sampling schemes will result in non-uniformly spaced sample instants. We address the issue of signal reconstruction from the acquired data-set by imposing structure of sparsity on the signal model to circumvent the problem of gap and density constraints. The recovery performance is contrasted amongst the various schemes and with random sampling scheme. In the proposed approach, both sampling and reconstruction are non-linear operations, and in contrast to random sampling methodologies proposed in compressive sensing these techniques may be implemented in practice with low-power circuitry.

Hide this content.

Moving Sound Source Parameter Estimation Using A Single Microphone And Signal Extrema Samples (In ICASSP'15, Brisbane)

Click to expand.

Hide this content.

Estimating the parameters of moving sound sources using only the source signal is of interest in low-power, and contact-less source monitoring applications, such as, industrial robotics and bio-acoustics. The received signal embeds the motion attributes of the source via Doppler effect. In this paper, we analyze the Doppler effect on mixture of time-varying sinusoids. Focusing, on the instantaneous frequency (IF) of the received signal, we show that the IF profile composed of IF and its first two derivatives can be used to obtain source motion parameters. This requires a smooth estimate of IF profile. However, the numerical implementation of traditional approaches, such as analytic signal and energy separation approach, gives oscillatory behavior hence a non-smooth IF estimate. We devise an algorithm using non-uniformly spaced signal extrema samples of the received signal for smooth IF profile estimation. Using the smooth IF profiles for a source moving on a linear trajectory with constant velocity, an accurate estimate of moving source parameters is obtained. We see promise of this approach for an arbitrary trajectory motion parameter estimation.

Hide this content.

Time-instant Sampling Based Encoding of Time-varying Acoustic Spectrum (In MoH'14, Athens)

Click to expand.

Hide this content.

The inner ear has been shown to characterize an acoustic stimuli by transducing fluid motion in the inner ear to mechanical bending of stereocilia on the inner hair cells (IHCs). The excitation motion/energy transferred to an IHC is dependent on the frequency spectrum of the acoustic stimuli, and the spatial location of the IHC along the length of the basilar membrane (BM). Subsequently, the afferent auditory nerve fiber (ANF) bundle samples the encoded waveform in the IHCs by synapsing with them. In this work we focus on sampling of information by afferent ANFs from the IHCs, and show computationally that sampling at specific time instants is sufficient for decoding of time-varying acoustic spectrum embedded in the acoustic stimuli. The approach is based on sampling the signal at its zero-crossings and higher-order derivative zero-crossings. We show results of the approach on time-varying acoustic spectrum estimation from cricket call signal recording. The framework gives a time-domain and non-spatial processing perspective to auditory signal processing. The approach works on the full band signal, and is devoid of modeling any bandpass filtering mimicking the BM action. Instead, we motivate the approach from the perspective of event-triggered sampling by afferent ANFs on the stimuli encoded in the IHCs. Though the approach gives acoustic spectrum estimation but it is shallow on its complete understanding for plausible bio-mechanical replication with current mammalian auditory mechanics insights.

Hide this content.

Event-triggered Sampling Using Signal Extrema for Instantaneous Amplitude and Instantaneous Frequency Estimation (In Signal Processing'15, Elsevier)

Click to expand.

Hide this content.

Event-triggered sampling (ETS) is a new approach towards efficient signal analysis. The goal of ETS need not be only signal reconstruction, but also direct estimation of desired information in the signal by skillful design of event. We show a promise of ETS approach towards better analysis of oscillatory non-stationary signals modeled by a time-varying sinusoid, when compared to existing uniform Nyquist-rate sampling based signal processing. We examine samples drawn using ETS, with events as zero-crossing (ZC), level- crossing (LC), and extrema, for additive in-band noise and jitter in detection instant. We find that extrema samples are robust, and also facilitate instantaneous amplitude (IA), and instantaneous frequency (IF) estimation in a time-varying sinusoid. The estimation is proposed solely using extrema samples, and a local polynomial regression based least-squares fitting approach. The proposed approach shows improvement, for noisy signals, over widely used analytic signal, energy separation, and ZC based approaches (which are based on uniform Nyquist-rate sampling based data-acquisition and processing). Further, extrema based ETS in general gives a sub-sampled representation (relative to Nyquist-rate) of a time-varying sinusoid. For the same data-set size captured with extrema based ETS, and uniform sampling, the former gives much better IA and IF estimation.

Hide this content.

Mel-scale sub-band modelling for perceptually improved time-scale modification of speech and audio signals (In NCC, 2017, Chennai)

Click to expand.

Hide this content.

Good quality time-scale modification (TSM) of speech, and audio is a long standing challenge. The crux of the challenge is to maintain the perceptual subtilities of temporal variations in pitch and timbre even after time-scaling the signal. Widely used approaches, such as phase vocoder, and waveform overlap-add (OLA), are based on quasi-stationary assumption and the time-scaled signals have perceivable artifacts. In contrast to these approaches, we propose application of time-varying sinusoidal modeling for TSM, without any quasi-stationary assumption. The proposed model comprises of a mel-scale nonuniform bandwidth filter bank, and the instantaneous amplitude (IA), and instantaneous phase (IP) factorization of sub-band time-varying sinusoids. TSM of the signal is done by time-scaling IA, and IP in each sub-band. The lowpass nature of IA, and IP allows for time-scaling via interpolation. Formal listening tests on speech, and music (solo, and polyphonic) show reduction in TSM artifacts such as phasiness, and transient smearing. Further, the proposed approach gives improved quality in comparison to waveform synchronous OLA (WSOLA), phase vocoder with identity phase locking, and the recently proposed harmonic-percussive separation (HPS) based TSM methods. The obtained improvement in TSM quality highlights that speech analysis can benefit from appropriate choice of time-varying signal models.

Hide this content.

Leveraging LSTM models for overlap detection in multi-party meetings (In ICASSP, 2018, Calgary)

Click to expand.

Hide this content.

The detection of overlapping speech segments is of key importance in speech applications involving analysis of multi-party conversations. The detection problem is challenging because overlapping speech segments are typically captured as short speech utterances far-field microphone recordings. In this paper, we propose detection of overlap segments using a neural network architecture consisting of long-short term memory (LSTM) models. The neural network architecture learns the presence of overlap in speech by identifying the spectrotemporal structure of overlapping speech segments. In order to evaluate the model performance, we perform experiments on simulated overlapped speech generated from the TIMIT database, and natural multi-talker conversational speech in the augmented multi-party interaction (AMI) meeting corpus. The proposed approach yields improvements over a Gaussian mixture model based overlap detection system. Furthermore, as an application of overlap detection, integration of overlap detection into speaker diarization task is shown to give improvement in diarization error rate.

Hide this content.

Multicomponent 2-D AM-FM Modeling of Speech Spectrograms (In Interspeech, 2018, Hyderabad)

Click to expand.

Hide this content.

In contrast to 1-D short-time analysis of speech, 2-D modeling of spectrograms provides a characterization of speech attributes directly in the joint time-frequency plane. Building on existing 2-D models to analyze a spectrogram patch, we propose a multicomponent 2-D AM-FM representation for spectrogram decomposition. The components of the proposed representation comprise a DC, a fundamental frequency carrier and its harmonics, and a spectrotemporal envelope, all in 2-D. The number of harmonics required is patch-dependent. The estimation of the AM and FM is done using the Riesz transform, and the component weights are estimated using a least-squares approach. The proposed representation provides an improvement over existing state-of-the-art approaches, for both male and female speakers. This is quantified using reconstruction SNR and perceptual evaluation of speech quality (PESQ) metric. Further, we perform an overlap-add on the DC component, pooling all the patches and obtain a time-frequency (t-f) aperiodicity map for the speech signal. We verify its effectiveness in improving speech synthesis quality by using it in an existing state-of-the-art vocoder.

Hide this content.


Other than Research

a) Execom Member of IEEE-IISc Student Branch (2012-13)
Got Best Volunter Award for the year 2012-13. Together with a very sporty team of volunteers in our Execom, and team lead by Prof. T. Srinivas we had a wonderful set of activities in and around campus.
b) IISc ECE Dept. WebTeam Member (2014-15)
Together with set of 3 more members and spearheaded by Prof. Chandra R. Murthy I often maintain the ECE website.
c) Sunday Cricket League (SCL, 2014-15)
Every sunday we have huge fun making our adrenaline flow to bowl, bat and field.
d) Camera clicks
Very often I get amazed by nature, and on getting an opportunity I click-capture-upload some pictures here: click to see. What you see only deciphers your thoughts.:-)
Also, I like the amazing natural beauty in IISc campus. My collection of some photography in the campus is here: click to see. I find it very difficult to prune the selection!

Quotes

Use the sunrise as an alarm. It has no snooze.

Thoughts

Good and bad is a function of surroundings.

Haricharan

Write-ups, Talks, Good books, and ...

"Throwing Light into the Tunnel: auditory models and perception"
[inivited talk in WiSSAP-2015, 04-01-2015] Click here to get the PDF.

"Sound Analysis: some knowns and unknowns"
[in SIAM-IISc Chapter Student Talk Series, @IISc, 08-05-2015]
Click here to get the PDF.

"Detect and Sample: an event-triggered approach for data acquisition and processing"
[Work Discussion at ICTS-IISc Workshop, 08-01-2015]

"Turns are Good: Processing Extrema of a Nonstationary Narrowband Signal"
[Delivered in Spectrum Lab, IISc, 22-10-2013]

"Function Approximations"
[Links to some good PDFs, 11-01-2016] Taylor, Fourier, Chebyshev, Pade, ... Click here to get the PDF.

"Detect and Sample: Questioning uniform Nyquist-rate sampling"
[Delivered on IEEE Day celebrations in campus, 01-10-2013]

Technical books I have liked: I sometimes update the rarely updated list here: click .