M.Sc. Computational Linguistics · University of Stuttgart

Natural Language Processing & computational linguistics research.

I study how language works and how machines model it — with a focus on multilingual language models, AI-generated text detection, and multimodal learning. I bring a background in Python development and language teaching to research-oriented problems in NLP.

Email Google Scholar GitHub LinkedIn CV

01 · Research

Publications

Peer-reviewed and preprint work in NLP, multilingual modeling, and multimodal learning.

arXiv preprint · 2026 · cs.CL

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

Yassir El Attar, Esra Dönmez, Maximilian Maurer, Agnieszka Falenska

A large-scale study of 284 interpretable linguistic features across 27 LLMs and ten text domains. Lexical-richness measures stay robust across model families and domains, while many other indicators prove strongly context-dependent.

arXiv PDF
BUCC 2026 @ LREC 2026 · pp. 108–118

Leveraging Comparable Toxicity Lexicons in Prompt Instructions for Multilingual Text Detoxification

Yassir El Attar, Esra Dönmez, Nina Ohlendorf, Agnieszka Falenska

Using comparable, language-specific toxicity lexicons inside prompt instructions to guide multilingual detoxification. Both zero-shot prompting and fine-tuning improve, including in cross-lingual transfer to low-resource languages.

PDF
ArabicNLP 2025 · Shared Tasks · ACL · pp. 608–614

YassirEA at MAHED 2025: Fusion-Based Multimodal Models for Arabic Hate Meme Detection

Yassir El Attar

A fusion-based multimodal system combining visual and textual features to detect hateful Arabic memes, submitted to the MAHED 2025 shared task at the Third Arabic NLP Conference.

ACL Anthology PDF

02 · Projects

Research & course projects

Selected NLP and multimodal work. Each card states the problem, the approach, and the outcome.

Scientific ML · Foundation Models

In-Context Learning for Differential Equations

Seminar report · Scientific Foundation Models II · 2025

Problem: Can foundation-model techniques cut the heavy simulation cost of learning PDE operators while improving out-of-distribution generalization?
Approach: Reviews two directions — the In-Context Operator Network (ICON) for few-shot operator learning, and unsupervised pre-training of neural operators (FNO, transformers) on unlabeled PDE data via masked-autoencoding and super-resolution proxy tasks.
Results: Combining pre-training with in-context demos reduces simulated-data needs by up to ~1000× and improves OOD generalization on PDEs such as Navier-Stokes and Helmholtz.

Neural Operators
FNO
In-Context Learning
PDEs
Transformers

Report (PDF)

Speech · Foundation Models

Hearing Abilities of Foundation Models

Course project · Current Topics in Speech Technology · 2025

Problem: How well can spoken language models actually “hear” — perceiving speech, audio events, and music?
Approach: Systematic assessment of audio-language foundation models (SALMONN, Pengi, CLAP, ParaCLAP) and self-supervised speech/speaker models across benchmark hearing tasks (SUPERB, Dynamic-SUPERB).
Results: Mapped strengths on discriminative tasks against gaps in generative tasks and audio hallucination; summarized in a conference-style poster.

Audio-Language Models
SALMONN
CLAP
SUPERB
Self-Supervised Speech

Poster (PDF)

Computational Linguistics · LLMs

Bridging LLMs and Linguistics

Seminar report · Foundational Questions Regarding LLMs · 2024-2025

Problem: Can linguistic theory (Generative Grammar, Construction Grammar) and LLMs be integrated rather than treated as rival, separate disciplines?
Approach: A literature review of the Piantadosi–Chesi debate on LLMs as theories of language, and of work integrating Construction Grammar with neural models (HyCxG, CxGBERT, probing studies).
Results: Argues for a unified framework — LLMs offer computable, falsifiable evidence on grammar learnability, while linguistic theory guides interpretation, to the benefit of both fields.

LLMs
Construction Grammar
CxGBERT
Probing

Report (PDF)

Multimodal · VQA

Multimodal Dermatology VQA

Research project · Foundation Models · 2024-2025

Problem: Can a Visual Question Answering system reliably answer clinical questions about dermatological images?
Approach: Fuse visual embeddings from medical images with text representations from clinical descriptions, comparing fusion architectures — UNITER, ViLT, and cross-attention over BERT / DistilBERT.
Results: Analyzed how fusion choices affect answer accuracy and interpretability for AI-assisted dermatology.

VQA
Multimodal Fusion
UNITER
ViLT
BERT

Paper (PDF)

Computational Linguistics · Low-Resource Languages

Linguistic Roadmap to Darija

Course project · Text Technology · Summer Semester 2024

Problem: Moroccan Darija is a primarily spoken, low-resource dialect with inconsistent orthography and few structured digital resources for learners or researchers.
Approach: A full-stack Express.js + MongoDB web application serving vocabulary, grammar, and example sentences, with cross-collection English-to-Darija search, spelling-form statistics, and an XML/XSD-validated pipeline for community contributions.
Results: Delivered an interactive reference tool and quantified orthographic variation across six spelling forms; the contribution feature lets users extend the validated database.

Express.js
MongoDB
Node.js
EJS
Python
Chart.js
XML/XSD

Code · Slides

03 · Résumé

Curriculum vitae

Developer and educator working in NLP research. Below is a short overview — the full CV is available as a PDF.

Download full CV (PDF)

Education

2024 – present M.Sc. Computational Linguistics University of Stuttgart, Germany
2014 – 2017 B.A. English Studies — Linguistics major FLSH Tetouan, Morocco
2017 – 2018 DTS, Computer Development (Specialized Technician) ISMONTIC, Tangier

Certificates: NLP (DFKI & TU Berlin / AI Campus), Python (Univ. of Michigan / Coursera), Linear Algebra (Khan Academy), TEFL, TESOL.

Experience

2024 – present Teaching Assistant Institute for NLP (IMS), University of Stuttgart — Parsing; Programming for Computational Linguistics
2025 – present Student Research Assistant Diversity-Aware NLP / IRIS, University of Stuttgart
2025 Student Research Assistant Inst. for Energy Efficiency in Production & Fraunhofer — energy-systems cost optimization (OEMOF, Pyomo)
2022 – 2024 Developer & Team Manager Software Version 7.0, Morocco — Python / PowerShell data-optimization tools
2023 Developer IGNITREE Web Development Agency, Morocco — web / CMS applications
2018 – 2022 ESL Teacher American Language Center & Ministry of Education, Morocco

Languages

ArabicNative
EnglishAdvanced (C1–C2)
FrenchUpper-intermediate
GermanB1 (learning)
SpanishBeginner

Skills

Languages & tools

NLP & ML methods

Research interests

04 · Contact

Get in touch

Open to research collaboration, PhD opportunities, and NLP projects.

Email [email protected]
Google Scholar Yassir El Attar
GitHub github.com/YassirELATTAR
LinkedIn linkedin.com/in/yassirea
Location Stuttgart, Germany