Select Publications

Here is a list of my recent publications:

Rolnick, D., Donti, P., Kaack, L., Kochanski, K., Lacoste, A., Sankaran, K., Ross, A., Milojevic-Dupont, N., Jaques, N., Waldman-Brown, A., & others (2022). Tackling climate change with machine learning. ACM Computing Surveys (CSUR), 55(2), 1–96.

Solaiman, I., Talat, Z., Agnew, W., Ahmad, L., Baker, D., Blodgett, S., Daume III, H., Dodge, J., Evans, E., Hooker, S., & others (2023). Evaluating the Social Impact of Generative AI Systems in Systems and Society. arXiv preprint arXiv:2306.05949.

Li, R., Allal, L., Zi, Y., Muennighoff, N., Kocetkov, D., Mou, C., Marone, M., Akiki, C., Li, J., Chim, J., & others (2023). StarCoder: may the source be with you!. arXiv preprint arXiv:2305.06161.

Luccioni, A., Akiki, C., Mitchell, M., & Jernite, Y. (2023). Stable bias: Analyzing societal representations in diffusion models. arXiv preprint arXiv:2303.11408.

Piktus, A., Akiki, C., Villegas, P., Laurençon, H., Dupont, G., Luccioni, A., Jernite, Y., & Rogers, A. (2023). The roots search tool: Data transparency for llms. arXiv preprint arXiv:2302.14035.

Luccioni, A., & Hernandez-Garcia, A. (2023). Counting carbon: A survey of factors influencing the emissions of machine learning. arXiv preprint arXiv:2302.08476.

Friedrich, F., Schramowski, P., Brack, M., Struppek, L., Hintersdorf, D., Luccioni, S., & Kersting, K. (2023). Fair diffusion: Instructing text-to-image generation models on fairness. arXiv preprint arXiv:2302.10893.

Mitchell, M., Luccioni, A., Lambert, N., Gerchick, M., McMillan-Major, A., Ozoani, E., Rajani, N., Thrush, T., Jernite, Y., & Kiela, D. (2022). Measuring Data. arXiv preprint arXiv:2212.05129.

Lauren\ccon, H., Saulnier, L., Wang, T., Akiki, C., Moral, A., Le Scao, T., Von Werra, L., Mou, C., González Ponferrada, E., Nguyen, H., & others (2022). The bigscience roots corpus: A 1.6 tb composite multilingual dataset. Advances in Neural Information Processing Systems, 35, 31809–31826.

Luccioni, A., Viguier, S., & Ligozat, A.L. (2022). Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model. arXiv preprint arXiv:2211.02001.

Werra, L., Tunstall, L., Thakur, A., Luccioni, A., Thrush, T., Piktus, A., Marty, F., Rajani, N., Mustar, V., Ngo, H., & others (2022). Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurement. arXiv preprint arXiv:2210.01970.

Luccioni, A., & Rolnick, D. (2022). Bugs in the data: How imagenet misrepresents biodiversity. arXiv preprint arXiv:2208.11695.

Luccioni, A., Corry, F., Sridharan, H., Ananny, M., Schultz, J., & Crawford, K. (2022). A Framework for Deprecating Datasets: Standardizing Documentation, Identification, and Communication. In 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 199–212).

Dodge, J., Prewitt, T., Combes, R., Odmark, E., Schwartz, R., Strubell, E., Luccioni, A., Smith, N., DeCario, N., & Buchanan, W. (2022). Measuring the carbon intensity of ai in cloud instances. In 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 1877–1894).

Jernite, Y., Nguyen, H., Biderman, S., Rogers, A., Masoud, M., Danchev, V., Tan, S., Luccioni, A., Subramani, N., Johnson, I., & others (2022). Data governance in the age of large-scale data-driven language technology. In 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 2206–2222).

Lucic, A., Bleeker, M., Bhargav, S., Forde, J., Sinha, K., Dodge, J., Luccioni, S., & Stojnic, R. (2022). Towards reproducible machine learning research in natural language processing. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts (pp. 7–11).

LaCroix, T., & Luccioni, A. (2022). Metaethical Perspectives on’Benchmarking’AI Ethics. arXiv preprint arXiv:2204.05151.

Luccioni, A., & Viviano, J. (2021). What’s in the box? an analysis of undesirable content in the Common Crawl corpus. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (pp. 182–189).

Schmidt, V., Goyal, K., Joshi, A., Feld, B., Conell, L., Laskaris, N., Blank, D., Wilson, J., Friedler, S., & Luccioni, S. (2021). CodeCarbon: estimate and track carbon emissions from machine learning computing. Cited on, 20.

Bullock, J., Luccioni, A., Pham, K., Lam, C., & Luengo-Oroz, M. (2020). Mapping the landscape of artificial intelligence applications against COVID-19. Journal of Artificial Intelligence Research, 69, 807–845.

Luengo-Oroz, M., Hoffmann Pham, K., Bullock, J., Kirkpatrick, R., Luccioni, A., Rubel, S., Wachholz, C., Chakchouk, M., Biggs, P., Nguyen, T., & others (2020). Artificial intelligence cooperation to support the global response to COVID-19. Nature Machine Intelligence, 2(6), 295–297.

Lacoste, A., Luccioni, A., Schmidt, V., & Dandres, T. (2019). Quantifying the carbon emissions of machine learning. arXiv preprint arXiv:1910.09700.