en | fr

Marc Manceau

PhD, Data scientist

As a freelance data scientist, my activity consists in proposing two complementary services,

Target fields are diverse: experimental results in agronomy or ecology, epidemiology for public health officials, design and analysis of clinical trials, or analysis of genetic data for researchers in all fields of biology. I am very eager to discover public or private actors interested in questions and projects that aim at making our society a better place. Please find below examples illustrating what I can do and teach, using either the R or Python languages.

Exploratory analysis and unsupervised learning

Handling missing data and normalizing data. Method of dimensionality reduction (PCA, MFA), unsupervised clustering. Graphical representation of datasets.

Statistical modeling, model fitting, hypothesis testing

Statistical model, choice of a statistical test, model fitting. Significativity, type I and type II errors, power, size of the sample. Linear model, linear regression, variable selection.

Modern supervised learning methods

Training and test set, model selection. Generalized Linear Model, logistic/softmax regression. Clustering algorithms: KNN (K-Nearest-Neighbours), decision trees, random forest, Support Vector Machine, Neural networks. Regularization methods (ridge, lasso) and variable selection.

Genetic data analysis

Multiple sequence alignment, search for ORF (Open Reading Frames). Search against a genetic sequence database, annotation. Models of molecular evolution, phylogenetic tree inference (parsimony, maximum likelihood, Bayesian methods).