cargo/core/resolver/encode.rs
1//! Definition of how to encode a `Resolve` into a TOML `Cargo.lock` file
2//!
3//! This module contains all machinery necessary to parse a `Resolve` from a
4//! `Cargo.lock` as well as serialize a `Resolve` to a `Cargo.lock`.
5//!
6//! ## Changing `Cargo.lock`
7//!
8//! In general Cargo is quite conservative about changing the format of
9//! `Cargo.lock`. Usage of new features in Cargo can change `Cargo.lock` at any
10//! time, but otherwise changing the serialization of `Cargo.lock` is a
11//! difficult operation to do that we typically avoid.
12//!
13//! The main problem with changing the format of `Cargo.lock` is that it can
14//! cause quite a bad experience for end users who use different versions of
15//! Cargo. If every PR to a project oscillates between the stable channel's
16//! encoding of Cargo.lock and the nightly channel's encoding then that's a
17//! pretty bad experience.
18//!
19//! We do, however, want to change `Cargo.lock` over time. (and we have!). To do
20//! this the rules that we currently have are:
21//!
22//! * Add support for the new format to Cargo
23//! * Continue to, by default, generate the old format
24//! * Preserve the new format if found
25//! * Wait a "long time" (e.g. 6 months or so)
26//! * Change Cargo to by default emit the new format
27//!
28//! This migration scheme in general means that Cargo we'll get *support* for a
29//! new format into Cargo ASAP, but it won't really be exercised yet (except in
30//! Cargo's own tests really). Eventually when stable/beta/nightly all have
31//! support for the new format (and maybe a few previous stable versions) we
32//! flip the switch. Projects on nightly will quickly start seeing changes, but
33//! stable/beta/nightly will all understand this new format and will preserve
34//! it.
35//!
36//! While this does mean that projects' `Cargo.lock` changes over time, it's
37//! typically a pretty minimal effort change that's just "check in what's
38//! there".
39//!
40//! ## Historical changes to `Cargo.lock`
41//!
42//! Listed from most recent to oldest, these are some of the changes we've made
43//! to `Cargo.lock`'s serialization format:
44//!
45//! * The entries in `dependencies` arrays have been shortened and the
46//! `checksum` field now shows up directly in `[[package]]` instead of always
47//! at the end of the file. The goal of this change was to ideally reduce
48//! merge conflicts being generated on `Cargo.lock`. Updating a version of a
49//! package now only updates two lines in the file, the checksum and the
50//! version number, most of the time. Dependency edges are specified in a
51//! compact form where possible where just the name is listed. The
52//! version/source on dependency edges are only listed if necessary to
53//! disambiguate which version or which source is in use.
54//!
55//! * A comment at the top of the file indicates that the file is a generated
56//! file and contains the special symbol `@generated` to indicate to common
57//! review tools that it's a generated file.
58//!
59//! * A `[root]` entry for the "root crate" has been removed and instead now
60//! included in `[[package]]` like everything else.
61//!
62//! * All packages from registries contain a `checksum` which is a sha256
63//! checksum of the tarball the package is associated with. This is all stored
64//! in the `[metadata]` table of `Cargo.lock` which all versions of Cargo
65//! since 1.0 have preserved. The goal of this was to start recording
66//! checksums so mirror sources can be verified.
67//!
68//! ## Other oddities about `Cargo.lock`
69//!
70//! There's a few other miscellaneous weird things about `Cargo.lock` that you
71//! may want to be aware of when reading this file:
72//!
73//! * All packages have a `source` listed to indicate where they come from. For
74//! `path` dependencies, however, no `source` is listed. There's no way we
75//! could emit a filesystem path name and have that be portable across
76//! systems, so all packages from a `path` are not listed with a `source`.
77//! Note that this also means that all packages with `path` sources must have
78//! unique names.
79//!
80//! * The `[metadata]` table in `Cargo.lock` is intended to be a generic mapping
81//! of strings to strings that's simply preserved by Cargo. This was a very
82//! early effort to be forward compatible against changes to `Cargo.lock`'s
83//! format. This is nowadays sort of deemed a bad idea though and we don't
84//! really use it that much except for `checksum`s historically. It's not
85//! really recommended to use this.
86//!
87//! * The actual literal on-disk serialiation is found in
88//! `src/cargo/ops/lockfile.rs` which basically renders a `toml::Value` in a
89//! special fashion to make sure we have strict control over the on-disk
90//! format.
91
92use std::collections::{BTreeMap, HashMap, HashSet};
93use std::fmt;
94use std::str::FromStr;
95
96use log::debug;
97use serde::de;
98use serde::ser;
99use serde::{Deserialize, Serialize};
100
101use crate::core::InternedString;
102use crate::core::{Dependency, Package, PackageId, SourceId, Workspace};
103use crate::util::errors::{CargoResult, CargoResultExt};
104use crate::util::{internal, Graph};
105
106use super::{Resolve, ResolveVersion};
107
108/// The `Cargo.lock` structure.
109#[derive(Serialize, Deserialize, Debug)]
110pub struct EncodableResolve {
111 package: Option<Vec<EncodableDependency>>,
112 /// `root` is optional to allow backward compatibility.
113 root: Option<EncodableDependency>,
114 metadata: Option<Metadata>,
115 #[serde(default, skip_serializing_if = "Patch::is_empty")]
116 patch: Patch,
117}
118
119#[derive(Serialize, Deserialize, Debug, Default)]
120struct Patch {
121 unused: Vec<EncodableDependency>,
122}
123
124pub type Metadata = BTreeMap<String, String>;
125
126impl EncodableResolve {
127 /// Convert a `Cargo.lock` to a Resolve.
128 ///
129 /// Note that this `Resolve` is not "complete". For example, the
130 /// dependencies do not know the difference between regular/dev/build
131 /// dependencies, so they are not filled in. It also does not include
132 /// `features`. Care should be taken when using this Resolve. One of the
133 /// primary uses is to be used with `resolve_with_previous` to guide the
134 /// resolver to create a complete Resolve.
135 pub fn into_resolve(self, original: &str, ws: &Workspace<'_>) -> CargoResult<Resolve> {
136 let path_deps = build_path_deps(ws);
137 let mut checksums = HashMap::new();
138
139 // We assume an older format is being parsed until we see so otherwise.
140 let mut version = ResolveVersion::V1;
141
142 let packages = {
143 let mut packages = self.package.unwrap_or_default();
144 if let Some(root) = self.root {
145 packages.insert(0, root);
146 }
147 packages
148 };
149
150 // `PackageId`s in the lock file don't include the `source` part
151 // for workspace members, so we reconstruct proper IDs.
152 let live_pkgs = {
153 let mut live_pkgs = HashMap::new();
154 let mut all_pkgs = HashSet::new();
155 for pkg in packages.iter() {
156 let enc_id = EncodablePackageId {
157 name: pkg.name.clone(),
158 version: Some(pkg.version.clone()),
159 source: pkg.source,
160 };
161
162 if !all_pkgs.insert(enc_id.clone()) {
163 anyhow::bail!("package `{}` is specified twice in the lockfile", pkg.name);
164 }
165 let id = match pkg.source.as_ref().or_else(|| path_deps.get(&pkg.name)) {
166 // We failed to find a local package in the workspace.
167 // It must have been removed and should be ignored.
168 None => {
169 debug!("path dependency now missing {} v{}", pkg.name, pkg.version);
170 continue;
171 }
172 Some(&source) => PackageId::new(&pkg.name, &pkg.version, source)?,
173 };
174
175 // If a package has a checksum listed directly on it then record
176 // that here, and we also bump our version up to 2 since V1
177 // didn't ever encode this field.
178 if let Some(cksum) = &pkg.checksum {
179 version = ResolveVersion::V2;
180 checksums.insert(id, Some(cksum.clone()));
181 }
182
183 assert!(live_pkgs.insert(enc_id, (id, pkg)).is_none())
184 }
185 live_pkgs
186 };
187
188 // When decoding a V2 version the edges in `dependencies` aren't
189 // guaranteed to have either version or source information. This `map`
190 // is used to find package ids even if dependencies have missing
191 // information. This map is from name to version to source to actual
192 // package ID. (various levels to drill down step by step)
193 let mut map = HashMap::new();
194 for (id, _) in live_pkgs.values() {
195 map.entry(id.name().as_str())
196 .or_insert_with(HashMap::new)
197 .entry(id.version().to_string())
198 .or_insert_with(HashMap::new)
199 .insert(id.source_id(), *id);
200 }
201
202 let mut lookup_id = |enc_id: &EncodablePackageId| -> Option<PackageId> {
203 // The name of this package should always be in the larger list of
204 // all packages.
205 let by_version = map.get(enc_id.name.as_str())?;
206
207 // If the version is provided, look that up. Otherwise if the
208 // version isn't provided this is a V2 manifest and we should only
209 // have one version for this name. If we have more than one version
210 // for the name then it's ambiguous which one we'd use. That
211 // shouldn't ever actually happen but in theory bad git merges could
212 // produce invalid lock files, so silently ignore these cases.
213 let by_source = match &enc_id.version {
214 Some(version) => by_version.get(version)?,
215 None => {
216 version = ResolveVersion::V2;
217 if by_version.len() == 1 {
218 by_version.values().next().unwrap()
219 } else {
220 return None;
221 }
222 }
223 };
224
225 // This is basically the same as above. Note though that `source` is
226 // always missing for path dependencies regardless of serialization
227 // format. That means we have to handle the `None` case a bit more
228 // carefully.
229 match &enc_id.source {
230 Some(source) => by_source.get(source).cloned(),
231 None => {
232 // Look through all possible packages ids for this
233 // name/version. If there's only one `path` dependency then
234 // we are hardcoded to use that since `path` dependencies
235 // can't have a source listed.
236 let mut path_packages = by_source.values().filter(|p| p.source_id().is_path());
237 if let Some(path) = path_packages.next() {
238 if path_packages.next().is_some() {
239 return None;
240 }
241 Some(*path)
242
243 // ... otherwise if there's only one then we must be
244 // implicitly using that one due to a V2 serialization of
245 // the lock file
246 } else if by_source.len() == 1 {
247 let id = by_source.values().next().unwrap();
248 version = ResolveVersion::V2;
249 Some(*id)
250
251 // ... and failing that we probably had a bad git merge of
252 // `Cargo.lock` or something like that, so just ignore this.
253 } else {
254 None
255 }
256 }
257 }
258 };
259
260 let mut g = Graph::new();
261
262 for &(ref id, _) in live_pkgs.values() {
263 g.add(id.clone());
264 }
265
266 for &(ref id, pkg) in live_pkgs.values() {
267 let deps = match pkg.dependencies {
268 Some(ref deps) => deps,
269 None => continue,
270 };
271
272 for edge in deps.iter() {
273 if let Some(to_depend_on) = lookup_id(edge) {
274 g.link(id.clone(), to_depend_on);
275 }
276 }
277 }
278
279 let replacements = {
280 let mut replacements = HashMap::new();
281 for &(ref id, pkg) in live_pkgs.values() {
282 if let Some(ref replace) = pkg.replace {
283 assert!(pkg.dependencies.is_none());
284 if let Some(replace_id) = lookup_id(replace) {
285 replacements.insert(id.clone(), replace_id);
286 }
287 }
288 }
289 replacements
290 };
291
292 let mut metadata = self.metadata.unwrap_or_default();
293
294 // In the V1 serialization formats all checksums were listed in the lock
295 // file in the `[metadata]` section, so if we're still V1 then look for
296 // that here.
297 let prefix = "checksum ";
298 let mut to_remove = Vec::new();
299 for (k, v) in metadata.iter().filter(|p| p.0.starts_with(prefix)) {
300 to_remove.push(k.to_string());
301 let k = &k[prefix.len()..];
302 let enc_id: EncodablePackageId = k
303 .parse()
304 .chain_err(|| internal("invalid encoding of checksum in lockfile"))?;
305 let id = match lookup_id(&enc_id) {
306 Some(id) => id,
307 _ => continue,
308 };
309
310 let v = if v == "<none>" {
311 None
312 } else {
313 Some(v.to_string())
314 };
315 checksums.insert(id, v);
316 }
317 // If `checksum` was listed in `[metadata]` but we were previously
318 // listed as `V2` then assume some sort of bad git merge happened, so
319 // discard all checksums and let's regenerate them later.
320 if !to_remove.is_empty() && version == ResolveVersion::V2 {
321 checksums.drain();
322 }
323 for k in to_remove {
324 metadata.remove(&k);
325 }
326
327 let mut unused_patches = Vec::new();
328 for pkg in self.patch.unused {
329 let id = match pkg.source.as_ref().or_else(|| path_deps.get(&pkg.name)) {
330 Some(&src) => PackageId::new(&pkg.name, &pkg.version, src)?,
331 None => continue,
332 };
333 unused_patches.push(id);
334 }
335
336 // We have a curious issue where in the "v1 format" we buggily had a
337 // trailing blank line at the end of lock files under some specific
338 // conditions.
339 //
340 // Cargo is trying to write new lockfies in the "v2 format" but if you
341 // have no dependencies, for example, then the lockfile encoded won't
342 // really have any indicator that it's in the new format (no
343 // dependencies or checksums listed). This means that if you type `cargo
344 // new` followed by `cargo build` it will generate a "v2 format" lock
345 // file since none previously existed. When reading this on the next
346 // `cargo build`, however, it generates a new lock file because when
347 // reading in that lockfile we think it's the v1 format.
348 //
349 // To help fix this issue we special case here. If our lockfile only has
350 // one trailing newline, not two, *and* it only has one package, then
351 // this is actually the v2 format.
352 if original.ends_with('\n')
353 && !original.ends_with("\n\n")
354 && version == ResolveVersion::V1
355 && g.iter().count() == 1
356 {
357 version = ResolveVersion::V2;
358 }
359
360 Ok(Resolve::new(
361 g,
362 replacements,
363 HashMap::new(),
364 checksums,
365 metadata,
366 unused_patches,
367 version,
368 HashMap::new(),
369 ))
370 }
371}
372
373fn build_path_deps(ws: &Workspace<'_>) -> HashMap<String, SourceId> {
374 // If a crate is **not** a path source, then we're probably in a situation
375 // such as `cargo install` with a lock file from a remote dependency. In
376 // that case we don't need to fixup any path dependencies (as they're not
377 // actually path dependencies any more), so we ignore them.
378 let members = ws
379 .members()
380 .filter(|p| p.package_id().source_id().is_path())
381 .collect::<Vec<_>>();
382
383 let mut ret = HashMap::new();
384 let mut visited = HashSet::new();
385 for member in members.iter() {
386 ret.insert(
387 member.package_id().name().to_string(),
388 member.package_id().source_id(),
389 );
390 visited.insert(member.package_id().source_id());
391 }
392 for member in members.iter() {
393 build_pkg(member, ws, &mut ret, &mut visited);
394 }
395 for deps in ws.root_patch().values() {
396 for dep in deps {
397 build_dep(dep, ws, &mut ret, &mut visited);
398 }
399 }
400 for &(_, ref dep) in ws.root_replace() {
401 build_dep(dep, ws, &mut ret, &mut visited);
402 }
403
404 return ret;
405
406 fn build_pkg(
407 pkg: &Package,
408 ws: &Workspace<'_>,
409 ret: &mut HashMap<String, SourceId>,
410 visited: &mut HashSet<SourceId>,
411 ) {
412 for dep in pkg.dependencies() {
413 build_dep(dep, ws, ret, visited);
414 }
415 }
416
417 fn build_dep(
418 dep: &Dependency,
419 ws: &Workspace<'_>,
420 ret: &mut HashMap<String, SourceId>,
421 visited: &mut HashSet<SourceId>,
422 ) {
423 let id = dep.source_id();
424 if visited.contains(&id) || !id.is_path() {
425 return;
426 }
427 let path = match id.url().to_file_path() {
428 Ok(p) => p.join("Cargo.toml"),
429 Err(_) => return,
430 };
431 let pkg = match ws.load(&path) {
432 Ok(p) => p,
433 Err(_) => return,
434 };
435 ret.insert(pkg.name().to_string(), pkg.package_id().source_id());
436 visited.insert(pkg.package_id().source_id());
437 build_pkg(&pkg, ws, ret, visited);
438 }
439}
440
441impl Patch {
442 fn is_empty(&self) -> bool {
443 self.unused.is_empty()
444 }
445}
446
447#[derive(Serialize, Deserialize, Debug, PartialOrd, Ord, PartialEq, Eq)]
448pub struct EncodableDependency {
449 name: String,
450 version: String,
451 source: Option<SourceId>,
452 checksum: Option<String>,
453 dependencies: Option<Vec<EncodablePackageId>>,
454 replace: Option<EncodablePackageId>,
455}
456
457#[derive(Debug, PartialOrd, Ord, PartialEq, Eq, Hash, Clone)]
458pub struct EncodablePackageId {
459 name: String,
460 version: Option<String>,
461 source: Option<SourceId>,
462}
463
464impl fmt::Display for EncodablePackageId {
465 fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
466 write!(f, "{}", self.name)?;
467 if let Some(s) = &self.version {
468 write!(f, " {}", s)?;
469 }
470 if let Some(s) = &self.source {
471 write!(f, " ({})", s.into_url())?;
472 }
473 Ok(())
474 }
475}
476
477impl FromStr for EncodablePackageId {
478 type Err = anyhow::Error;
479
480 fn from_str(s: &str) -> CargoResult<EncodablePackageId> {
481 let mut s = s.splitn(3, ' ');
482 let name = s.next().unwrap();
483 let version = s.next();
484 let source_id = match s.next() {
485 Some(s) => {
486 if s.starts_with('(') && s.ends_with(')') {
487 Some(SourceId::from_url(&s[1..s.len() - 1])?)
488 } else {
489 anyhow::bail!("invalid serialized PackageId")
490 }
491 }
492 None => None,
493 };
494
495 Ok(EncodablePackageId {
496 name: name.to_string(),
497 version: version.map(|v| v.to_string()),
498 source: source_id,
499 })
500 }
501}
502
503impl ser::Serialize for EncodablePackageId {
504 fn serialize<S>(&self, s: S) -> Result<S::Ok, S::Error>
505 where
506 S: ser::Serializer,
507 {
508 s.collect_str(self)
509 }
510}
511
512impl<'de> de::Deserialize<'de> for EncodablePackageId {
513 fn deserialize<D>(d: D) -> Result<EncodablePackageId, D::Error>
514 where
515 D: de::Deserializer<'de>,
516 {
517 String::deserialize(d).and_then(|string| {
518 string
519 .parse::<EncodablePackageId>()
520 .map_err(de::Error::custom)
521 })
522 }
523}
524
525impl<'a> ser::Serialize for Resolve {
526 fn serialize<S>(&self, s: S) -> Result<S::Ok, S::Error>
527 where
528 S: ser::Serializer,
529 {
530 let mut ids: Vec<_> = self.iter().collect();
531 ids.sort();
532
533 let state = EncodeState::new(self);
534
535 let encodable = ids
536 .iter()
537 .map(|&id| encodable_resolve_node(id, self, &state))
538 .collect::<Vec<_>>();
539
540 let mut metadata = self.metadata().clone();
541
542 if *self.version() == ResolveVersion::V1 {
543 for &id in ids.iter().filter(|id| !id.source_id().is_path()) {
544 let checksum = match self.checksums()[&id] {
545 Some(ref s) => &s[..],
546 None => "<none>",
547 };
548 let id = encodable_package_id(id, &state);
549 metadata.insert(format!("checksum {}", id.to_string()), checksum.to_string());
550 }
551 }
552
553 let metadata = if metadata.is_empty() {
554 None
555 } else {
556 Some(metadata)
557 };
558
559 let patch = Patch {
560 unused: self
561 .unused_patches()
562 .iter()
563 .map(|id| EncodableDependency {
564 name: id.name().to_string(),
565 version: id.version().to_string(),
566 source: encode_source(id.source_id()),
567 dependencies: None,
568 replace: None,
569 checksum: match self.version() {
570 ResolveVersion::V2 => self.checksums().get(id).and_then(|x| x.clone()),
571 ResolveVersion::V1 => None,
572 },
573 })
574 .collect(),
575 };
576 EncodableResolve {
577 package: Some(encodable),
578 root: None,
579 metadata,
580 patch,
581 }
582 .serialize(s)
583 }
584}
585
586pub struct EncodeState<'a> {
587 counts: Option<HashMap<InternedString, HashMap<&'a semver::Version, usize>>>,
588}
589
590impl<'a> EncodeState<'a> {
591 pub fn new(resolve: &'a Resolve) -> EncodeState<'a> {
592 let counts = if *resolve.version() == ResolveVersion::V2 {
593 let mut map = HashMap::new();
594 for id in resolve.iter() {
595 let slot = map
596 .entry(id.name())
597 .or_insert_with(HashMap::new)
598 .entry(id.version())
599 .or_insert(0);
600 *slot += 1;
601 }
602 Some(map)
603 } else {
604 None
605 };
606 EncodeState { counts }
607 }
608}
609
610fn encodable_resolve_node(
611 id: PackageId,
612 resolve: &Resolve,
613 state: &EncodeState<'_>,
614) -> EncodableDependency {
615 let (replace, deps) = match resolve.replacement(id) {
616 Some(id) => (Some(encodable_package_id(id, state)), None),
617 None => {
618 let mut deps = resolve
619 .deps_not_replaced(id)
620 .map(|(id, _)| encodable_package_id(id, state))
621 .collect::<Vec<_>>();
622 deps.sort();
623 (None, Some(deps))
624 }
625 };
626
627 EncodableDependency {
628 name: id.name().to_string(),
629 version: id.version().to_string(),
630 source: encode_source(id.source_id()),
631 dependencies: deps,
632 replace,
633 checksum: match resolve.version() {
634 ResolveVersion::V2 => resolve.checksums().get(&id).and_then(|s| s.clone()),
635 ResolveVersion::V1 => None,
636 },
637 }
638}
639
640pub fn encodable_package_id(id: PackageId, state: &EncodeState<'_>) -> EncodablePackageId {
641 let mut version = Some(id.version().to_string());
642 let mut source = encode_source(id.source_id()).map(|s| s.with_precise(None));
643 if let Some(counts) = &state.counts {
644 let version_counts = &counts[&id.name()];
645 if version_counts[&id.version()] == 1 {
646 source = None;
647 if version_counts.len() == 1 {
648 version = None;
649 }
650 }
651 }
652 EncodablePackageId {
653 name: id.name().to_string(),
654 version,
655 source,
656 }
657}
658
659fn encode_source(id: SourceId) -> Option<SourceId> {
660 if id.is_path() {
661 None
662 } else {
663 Some(id)
664 }
665}