Publications

[1] Vincent Simonet. Classifying YouTube channels: a practical system. In Proceedings of the 2nd International Workshop on Web of Linked Entities (WOLE 2013), in Proceedings of the 22nd International conference on World Wide Web companion, pages 1295-1304, Rio de Janeiro (Brasil), May 2013. ©ACM. [ bib | PDF | At Google's | At ACM's ]
This paper presents a framework for categorizing channels of videos in a thematic taxonomy with high precision and coverage. The proposed approach consists of three main steps. First, videos are annotated by semantic entities describing their central topics. Second, semantic entities are mapped to categories using a combination of classifiers.Last, the categorization of channels is obtained by combining the results of both previous steps. This framework has been deployed on the whole corpus of YouTube, in 8 languages, and used to build several user facing products. Beyond the description of the framework, this paper gives insight into practical aspects and experience: rationale from product requirements to the choice of the solution, spam filtering, human-based evaluations of the quality of the results, and measured metrics on the live site.

[2] Vincent Simonet and François Pottier. A constraint-based approach to guarded algebraic data types. ACM Transactions on Programming Languages and Systems, 29(1), January 2007. [ bib | PostScript | PDF | At ACM's ]
We study HMG(X), an extension of the constraint-based type system HM(X) with deep pattern matching, polymorphic recursion, and guarded algebraic data types. Guarded algebraic data types subsume the concepts known in the literature as indexed types, guarded recursive datatype constructors, (first-class) phantom types, and equality qualified types, and are closely related to inductive types. Their characteristic property is to allow every branch of a case construct to be typechecked under different assumptions about the type variables in scope. We prove that HMG(X) is sound and that, provided recursive definitions carry a type annotation, type inference can be reduced to constraint solving. Constraint solving is decidable, at least for some instances of X, but prohibitively expensive. Effective type inference for guarded algebraic data types is left as an issue for future research.

[3] Vincent Simonet and François Pottier. Constraint-based type inference for guarded algebraic data types. Research Report 5462, INRIA, January 2005. [ bib | PostScript | PDF | At INRIA's ]
Guarded algebraic data types subsume the concepts known in the literature as indexed types, guarded recursive datatype constructors, and first-class phantom types, and are closely related to inductive types. They have the distinguishing feature that, when typechecking a function defined by cases, every branch may be checked under different assumptions about the type variables in scope. This mechanism allows exploiting the presence of dynamic tests in the code to produce extra static type information.

We propose an extension of the constraint-based type system HM(X) with deep pattern matching, guarded algebraic data types, and polymorphic recursion. We prove that the type system is sound and that, provided recursive function definitions carry a type annotation, type inference may be reduced to constraint solving. Then, because solving arbitrary constraints is expensive, we further restrict the form of type annotations and prove that this allows producing so-called tractable constraints. Last, in the specific setting of equality, we explain how to solve tractable constraints.

To the best of our knowledge, this is the first generic and comprehensive account of type inference in the presence of guarded algebraic data types.

[4] Vincent Simonet. Inférence de flots d'information pour ML: formalisation et implantation. PhD thesis, Université Paris 7, March 2004. [ bib | PostScript | PDF ]
This thesis describes the conception of an information flow analyser for a language of the ML family, from its theoretical foundation to the practical issues. The first part of the dissertation presents the tool that was implemented, Flow Caml, and illustrates its use on concrete example. The second part gives a formalization of the type system featured by Flow Caml, together with a proof of its correctness. This is the first type system for information flow analysis in a realistic programming language that has been formally proved. Lastly, the third part is devoted to the formalization and the proof of an efficient algorithm for type inference in the presence of structural subtyping and polymorphism. An instance of this algorithm is used to synthesize types in Flow Caml.

[5] Vincent Simonet. Type inference with structural subtyping: A faithful formalization of an efficient constraint solver. In Atsushi Ohori, editor, Proceedings of the Asian Symposium on Programming Languages and Systems (APLAS'03), volume 2895 of Lecture Notes in Computer Science, pages 283-302, Beijing, China, November 2003. Springer-Verlag. ©Springer-Verlag. [ bib | PostScript | PDF | Full version (PostScript) | Full version (PDF) | At Springer's ]
We are interested in type inference in the presence of structural subtyping from a pragmatic perspective. This work combines theoretical and practical contributions: first, it provides a faithful description of an efficient algorithm for solving and simplifying constraints; whose correctness is formally proved. Besides, the framework has been implemented in Objective Caml, yielding a generic type inference engine. Its efficiency is assessed by a complexity result and a series of experiments in realistic cases.

[6] Vincent Simonet. An extension of HM(X) with bounded existential and universal data-types. In Proceedings of the 8th ACM SIGPLAN International Conference on Functional Programming (ICFP 2003), pages 39-50, Uppsala, Sweden, August 2003. ©ACM. [ bib | PostScript | PDF | Full version (PostScript) | Full version (PDF) | At ACM's ]
We propose a conservative extension of HM(X), a generic constraint-based type inference framework, with bounded existential (a.k.a. abstract) and universal (a.k.a. polymorphic) data-types. In the first part of the article, which remains abstract of the type and constraint language (i.e. the logic X), we introduce the type system, prove its safety and define a type inference algorithm which computes principal typing judgments. In the second part, we propose a realistic constraint solving algorithm for the case of structural subtyping, which handles the non-standard construct of the constraint language generated by type inference: a form of bounded universal quantification.

[7] Vincent Simonet. The Flow Caml System: documentation and user's manual. Technical Report 0282, Institut National de Recherche en Informatique et en Automatique (INRIA), July 2003. ©INRIA. [ bib | HTML | PostScript | PDF | At INRIA's ]
Flow Caml is an extension of the Objective Caml language with a type system tracing information flow. Its purpose is basically to allow to write “real” programs and to automatically check that they obey some confidentiality or integrity policy. In Flow Caml, standard ML types are annotated with security levels chosen in a user-definable lattice. Each annotation gives an approximation of the information that the described expression may convey. Because it has full type inference, the system verifies, without requiring source code annotations, that every information flow caused by the analyzed program is legal with regard to the security policy specified by the programmer.

[8] Vincent Simonet. Flow Caml in a nutshell. In Graham Hutton, editor, Proceedings of the first APPSEM-II workshop, pages 152-165, Nottingham, United Kingdom, March 2003. [ bib ]
Flow Caml is an extension of the Objective Caml language with a type system tracing information flow. It automatically checks information flow within Flow Caml programs, then translates them to regular Objective Caml code that can be compiled by the ordinary compiler to produce secure programs. In this paper, we give a short overview of this system, from a practical viewpoint.

[9] François Pottier and Vincent Simonet. Information flow inference for ML. ACM Transactions on Programming Languages and Systems, 25(1):117-158, January 2003. ©ACM. [ bib | PostScript | PDF | At ACM's ]
This paper presents a type-based information flow analysis for a call-by-value λ-calculus equipped with references, exceptions and let-polymorphism, which we refer to as ML. The type system is constraint-based and has decidable type inference. Its noninterference proof is reasonably light-weight, thanks to the use of a number of orthogonal techniques. First, a syntactic segregation between values and expressions allows a lighter formulation of the type system. Second, noninterference is reduced to subject reduction for a nonstandard language extension. Lastly, a semi-syntactic approach to type soundness allows dealing with constraint-based polymorphism separately.

[10] Vincent Simonet. Fine-grained information flow analysis for a λ-calculus with sum types. In Proceedings of the 15th IEEE Computer Security Foundations Workshop (CSFW 15), pages 223-237, Cape Breton, Nova Scotia (Canada), June 2002. ©IEEE. [ bib | PostScript | PDF | Full version (PostScript) | Full version (PDF) ]
This paper presents a new type system tracing information flow for a λ-calculus equipped with polymorphic “let” and with sums (a.k.a. union types or polymorphic variants). The type system allows establishing (weak) non-interference properties. Thanks to original forms of security annotations and constraints, it is more accurate than existing analyses. Through a straightforward encoding into sums, this work also provides a new type-based information flow analysis for programming languages featuring exceptions. From these systems, one may derive constraint-based formulations, in the style of HM(X), which have decidable type inference.

[11] François Pottier and Vincent Simonet. Information flow inference for ML. In Proceedings of the 29th ACM Symposium on Principles of Programming Languages (POPL'02), pages 319-330, Portland, Oregon, January 2002. ©ACM. [ bib | PostScript | PDF | Full version (PostScript) | Full version (PDF) | At ACM's ]
This paper presents a type-based information flow analysis for a call-by-value lambda-calculus equipped with references, exceptions and let-polymorphism, which we refer to as Core ML. The type system is constraint-based and has decidable type inference. Its non-interference proof is reasonably light-weight, thanks to the use of a number of orthogonal techniques. First, a syntactic segregation between values and expressions allows a lighter formulation of the type system. Second, non-interference is reduced to subject reduction for a non-standard language extension. Lastly, a semi-syntactic approach to type soundness allows dealing with constraint-based polymorphism separately.

[12] Vincent Simonet. Inférence de flots d'information pour ML. Master's thesis, DEA « Programmation : Sémantique, Preuves, et Langages », March 2001. In French. [ bib | PostScript | PDF ]