M.Sc. Computational Linguistics · University of Stuttgart

Natural Language Processing & computational linguistics research.

I study how language works and how machines model it — with a focus on multilingual language models, AI-generated text detection, and multimodal learning. I bring a background in Python development and language teaching to research-oriented problems in NLP.

01 · Research

Publications

Peer-reviewed and preprint work in NLP, multilingual modeling, and multimodal learning.

  1. arXiv preprint · 2026 · cs.CL

    A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

    Yassir El Attar, Esra Dönmez, Maximilian Maurer, Agnieszka Falenska

    A large-scale study of 284 interpretable linguistic features across 27 LLMs and ten text domains. Lexical-richness measures stay robust across model families and domains, while many other indicators prove strongly context-dependent.

  2. BUCC 2026 @ LREC 2026 · pp. 108–118

    Leveraging Comparable Toxicity Lexicons in Prompt Instructions for Multilingual Text Detoxification

    Yassir El Attar, Esra Dönmez, Nina Ohlendorf, Agnieszka Falenska

    Using comparable, language-specific toxicity lexicons inside prompt instructions to guide multilingual detoxification. Both zero-shot prompting and fine-tuning improve, including in cross-lingual transfer to low-resource languages.

  3. ArabicNLP 2025 · Shared Tasks · ACL · pp. 608–614

    YassirEA at MAHED 2025: Fusion-Based Multimodal Models for Arabic Hate Meme Detection

    Yassir El Attar

    A fusion-based multimodal system combining visual and textual features to detect hateful Arabic memes, submitted to the MAHED 2025 shared task at the Third Arabic NLP Conference.

02 · Projects

Research & course projects

Selected NLP and multimodal work. Each card states the problem, the approach, and the outcome.

Speech · Foundation Models

Hearing Abilities of Foundation Models

Course project · Current Topics in Speech Technology · Jan 2025

Problem
How well can spoken language models actually “hear” — perceiving speech, audio events, and music?
Approach
Systematic assessment of audio-language foundation models (SALMONN, Pengi, CLAP, ParaCLAP) and self-supervised speech/speaker models across benchmark hearing tasks (SUPERB, Dynamic-SUPERB).
Results
Mapped strengths on discriminative tasks against gaps in generative tasks and audio hallucination; summarized in a conference-style poster.

Scientific ML · Foundation Models

In-Context Learning for Differential Equations

Seminar report · Scientific Foundation Models II · 2025

Problem
Can foundation-model techniques cut the heavy simulation cost of learning PDE operators while improving out-of-distribution generalization?
Approach
Reviews two directions — the In-Context Operator Network (ICON) for few-shot operator learning, and unsupervised pre-training of neural operators (FNO, transformers) on unlabeled PDE data via masked-autoencoding and super-resolution proxy tasks.
Results
Combining pre-training with in-context demos reduces simulated-data needs by up to ~1000× and improves OOD generalization on PDEs such as Navier–Stokes and Helmholtz.

Computational Linguistics · LLMs

Bridging LLMs and Linguistics

Seminar report · Foundational Questions Regarding LLMs · WS 2024–2025

Problem
Can linguistic theory (Generative Grammar, Construction Grammar) and LLMs be integrated rather than treated as rival, separate disciplines?
Approach
A literature review of the Piantadosi–Chesi debate on LLMs as theories of language, and of work integrating Construction Grammar with neural models (HyCxG, CxGBERT, probing studies).
Results
Argues for a unified framework — LLMs offer computable, falsifiable evidence on grammar learnability, while linguistic theory guides interpretation, to the benefit of both fields.

Multimodal · VQA

Multimodal Dermatology VQA

Research project · Foundation Models · 2024–2025

Problem
Can a Visual Question Answering system reliably answer clinical questions about dermatological images?
Approach
Fuse visual embeddings from medical images with text representations from clinical descriptions, comparing fusion architectures — UNITER, ViLT, and cross-attention over BERT / DistilBERT.
Results
Analyzed how fusion choices affect answer accuracy and interpretability for AI-assisted dermatology.

03 · Résumé

Curriculum vitae

Developer and educator working in NLP research. Below is a short overview — the full CV is available as a PDF.

Download full CV (PDF)

Education

  • 2024 – present M.Sc. Computational Linguistics University of Stuttgart, Germany
  • 2014 – 2017 B.A. English Studies — Linguistics major FLSH Tetouan, Morocco
  • 2017 – 2018 DTS, Computer Development (Specialized Technician) ISMONTIC, Tangier

Certificates: NLP (DFKI & TU Berlin / AI Campus), Python (Univ. of Michigan / Coursera), Linear Algebra (Khan Academy), TEFL, TESOL.

Experience

  • 2024 – present Teaching Assistant Institute for NLP (IMS), University of Stuttgart — Parsing; Programming for Computational Linguistics
  • 2025 – present Student Research Assistant Diversity-Aware NLP / IRIS, University of Stuttgart
  • 2025 Student Research Assistant Inst. for Energy Efficiency in Production & Fraunhofer — energy-systems cost optimization (OEMOF, Pyomo)
  • 2022 – 2024 Developer & Team Manager Software Version 7.0, Morocco — Python / PowerShell data-optimization tools
  • 2023 Developer IGNITREE Web Development Agency, Morocco — web / CMS applications
  • 2018 – 2022 ESL Teacher American Language Center & Ministry of Education, Morocco

Languages

  • ArabicNative
  • EnglishAdvanced (C1–C2)
  • FrenchUpper-intermediate
  • GermanB1 (learning)
  • SpanishBeginner

Skills

Languages & tools

  • Python
  • PyTorch
  • Hugging Face Transformers
  • scikit-learn
  • spaCy
  • NLTK
  • TensorFlow
  • NumPy / pandas
  • SQL
  • Git
  • Docker
  • LaTeX
  • Bash / PowerShell

NLP & ML methods

  • LLM fine-tuning (LoRA / PEFT)
  • Prompt engineering
  • In-context learning
  • Transformers & attention
  • Text classification
  • Machine translation
  • Multimodal fusion
  • Multilingual & cross-lingual transfer
  • Embeddings & representation learning
  • Model evaluation & benchmarking
  • Linguistic feature engineering
  • Data annotation

Research interests

  • Multilingual NLP
  • AI-generated text detection & analysis
  • AI safety
  • Social impact of NLP
  • Linguistics in LLMs
  • Interpretability of AI models
  • Multimodal learning
  • Structured languages in LLMs

04 · Contact

Get in touch

Open to research collaboration, PhD opportunities, and NLP projects.