M.Sc. Computational Linguistics · University of Stuttgart
Natural Language Processing & computational linguistics research.
I study how language works and how machines model it — with a focus on multilingual
language models, AI-generated text detection, and multimodal learning. I bring a
background in Python development and language teaching to research-oriented problems in NLP.
Email·
Google Scholar·
GitHub·
LinkedIn·
CV
-
arXiv preprint · 2026 · cs.CL
A Systematic Analysis of Linguistic Features in AI-Generated Text Detection
Across Domains and Models
Yassir El Attar, Esra Dönmez, Maximilian Maurer, Agnieszka
Falenska
A large-scale study of 284 interpretable linguistic features across 27 LLMs and ten text
domains. Lexical-richness measures stay robust across model families and domains, while
many other indicators prove strongly context-dependent.
arXiv·
PDF
-
BUCC 2026 @ LREC 2026 · pp. 108–118
Leveraging Comparable Toxicity Lexicons in Prompt Instructions for
Multilingual Text Detoxification
Yassir El Attar, Esra Dönmez, Nina Ohlendorf, Agnieszka
Falenska
Using comparable, language-specific toxicity lexicons inside prompt instructions to guide
multilingual detoxification. Both zero-shot prompting and fine-tuning improve, including
in cross-lingual transfer to low-resource languages.
PDF
-
ArabicNLP 2025 · Shared Tasks · ACL · pp. 608–614
YassirEA at MAHED 2025: Fusion-Based Multimodal Models for Arabic Hate Meme
Detection
Yassir El Attar
A fusion-based multimodal system combining visual and textual features to detect hateful
Arabic memes, submitted to the MAHED 2025 shared task at the Third Arabic NLP Conference.
ACL Anthology·
PDF
Scientific ML · Foundation Models
In-Context Learning for Differential Equations
Seminar report · Scientific Foundation Models II · 2025
- Problem
- Can foundation-model techniques cut the heavy simulation cost of learning PDE operators
while improving out-of-distribution generalization?
- Approach
- Reviews two directions — the In-Context Operator Network (ICON) for few-shot operator
learning, and unsupervised pre-training of neural operators (FNO, transformers) on unlabeled
PDE data via masked-autoencoding and super-resolution proxy tasks.
- Results
- Combining pre-training with in-context demos reduces simulated-data needs by up to ~1000×
and improves OOD generalization on PDEs such as Navier-Stokes and Helmholtz.
Report (PDF)
Speech · Foundation Models
Hearing Abilities of Foundation Models
Course project · Current Topics in Speech Technology · 2025
- Problem
- How well can spoken language models actually “hear” — perceiving speech, audio events, and
music?
- Approach
- Systematic assessment of audio-language foundation models (SALMONN, Pengi, CLAP, ParaCLAP)
and self-supervised speech/speaker models across benchmark hearing tasks (SUPERB,
Dynamic-SUPERB).
- Results
- Mapped strengths on discriminative tasks against gaps in generative tasks and audio
hallucination; summarized in a conference-style poster.
Poster (PDF)
Computational Linguistics · LLMs
Bridging LLMs and Linguistics
Seminar report · Foundational Questions Regarding LLMs · 2024-2025
- Problem
- Can linguistic theory (Generative Grammar, Construction Grammar) and LLMs be integrated
rather than treated as rival, separate disciplines?
- Approach
- A literature review of the Piantadosi–Chesi debate on LLMs as theories of language, and of
work integrating Construction Grammar with neural models (HyCxG, CxGBERT, probing studies).
- Results
- Argues for a unified framework — LLMs offer computable, falsifiable evidence on grammar
learnability, while linguistic theory guides interpretation, to the benefit of both fields.
Report
(PDF)
Multimodal · VQA
Multimodal Dermatology VQA
Research project · Foundation Models · 2024-2025
- Problem
- Can a Visual Question Answering system reliably answer clinical questions about
dermatological images?
- Approach
- Fuse visual embeddings from medical images with text representations from clinical
descriptions, comparing fusion architectures — UNITER, ViLT, and cross-attention over BERT /
DistilBERT.
- Results
- Analyzed how fusion choices affect answer accuracy and interpretability for AI-assisted
dermatology.
Paper (PDF)
Computational Linguistics · Low-Resource Languages
Linguistic Roadmap to Darija
Course project · Text Technology · Summer Semester 2024
- Problem
- Moroccan Darija is a primarily spoken, low-resource dialect with inconsistent orthography
and few structured digital resources for learners or researchers.
- Approach
- A full-stack Express.js + MongoDB web application serving vocabulary, grammar, and example
sentences, with cross-collection English-to-Darija search, spelling-form statistics, and an
XML/XSD-validated pipeline for community contributions.
- Results
- Delivered an interactive reference tool and quantified orthographic variation across six
spelling forms; the contribution feature lets users extend the validated database.
Code
·
Slides
Education
-
2024 – present
M.Sc. Computational Linguistics
University of Stuttgart, Germany
-
2014 – 2017
B.A. English Studies — Linguistics major
FLSH Tetouan, Morocco
-
2017 – 2018
DTS, Computer Development (Specialized Technician)
ISMONTIC, Tangier
Certificates: NLP (DFKI & TU Berlin / AI Campus), Python (Univ. of Michigan / Coursera),
Linear Algebra (Khan Academy), TEFL, TESOL.
Experience
-
2024 – present
Teaching Assistant
Institute for NLP (IMS), University of Stuttgart — Parsing;
Programming for Computational Linguistics
-
2025 – present
Student Research Assistant
Diversity-Aware NLP / IRIS, University of Stuttgart
-
2025
Student Research Assistant
Inst. for Energy Efficiency in Production & Fraunhofer —
energy-systems cost optimization (OEMOF, Pyomo)
-
2022 – 2024
Developer & Team Manager
Software Version 7.0, Morocco — Python / PowerShell
data-optimization tools
-
2023
Developer
IGNITREE Web Development Agency, Morocco — web / CMS
applications
-
2018 – 2022
ESL Teacher
American Language Center & Ministry of Education,
Morocco
Languages
- ArabicNative
- EnglishAdvanced (C1–C2)
- FrenchUpper-intermediate
- GermanB1 (learning)
- SpanishBeginner
Skills
Languages & tools
- Python
- PyTorch
- Hugging Face Transformers
- scikit-learn
- spaCy
- NLTK
- TensorFlow
- NumPy / pandas
- SQL
- Git
- Docker
- LaTeX
- Bash / PowerShell
NLP & ML methods
- LLM fine-tuning (LoRA / PEFT)
- Prompt engineering
- In-context learning
- Transformers & attention
- Text classification
- Machine translation
- Multimodal fusion
- Multilingual & cross-lingual transfer
- Embeddings & representation learning
- Model evaluation & benchmarking
- Linguistic feature engineering
- Data annotation
Research interests
- Multilingual NLP
- AI-generated text detection & analysis
- AI safety
- Social impact of NLP
- Linguistics in LLMs
- Interpretability of AI models
- Multimodal learning
- Structured languages in LLMs