Skip to main content

cargo/core/resolver/
encode.rs

1//! Definition of how to encode a `Resolve` into a TOML `Cargo.lock` file
2//!
3//! This module contains all machinery necessary to parse a `Resolve` from a
4//! `Cargo.lock` as well as serialize a `Resolve` to a `Cargo.lock`.
5//!
6//! ## Changing `Cargo.lock`
7//!
8//! In general Cargo is quite conservative about changing the format of
9//! `Cargo.lock`. Usage of new features in Cargo can change `Cargo.lock` at any
10//! time, but otherwise changing the serialization of `Cargo.lock` is a
11//! difficult operation to do that we typically avoid.
12//!
13//! The main problem with changing the format of `Cargo.lock` is that it can
14//! cause quite a bad experience for end users who use different versions of
15//! Cargo. If every PR to a project oscillates between the stable channel's
16//! encoding of Cargo.lock and the nightly channel's encoding then that's a
17//! pretty bad experience.
18//!
19//! We do, however, want to change `Cargo.lock` over time. (and we have!). To do
20//! this the rules that we currently have are:
21//!
22//! * Add support for the new format to Cargo
23//! * Continue to, by default, generate the old format
24//! * Preserve the new format if found
25//! * Wait a "long time" (e.g. 6 months or so)
26//! * Change Cargo to by default emit the new format
27//!
28//! This migration scheme in general means that Cargo we'll get *support* for a
29//! new format into Cargo ASAP, but it won't really be exercised yet (except in
30//! Cargo's own tests really). Eventually when stable/beta/nightly all have
31//! support for the new format (and maybe a few previous stable versions) we
32//! flip the switch. Projects on nightly will quickly start seeing changes, but
33//! stable/beta/nightly will all understand this new format and will preserve
34//! it.
35//!
36//! While this does mean that projects' `Cargo.lock` changes over time, it's
37//! typically a pretty minimal effort change that's just "check in what's
38//! there".
39//!
40//! ## Historical changes to `Cargo.lock`
41//!
42//! Listed from most recent to oldest, these are some of the changes we've made
43//! to `Cargo.lock`'s serialization format:
44//!
45//! * The entries in `dependencies` arrays have been shortened and the
46//!   `checksum` field now shows up directly in `[[package]]` instead of always
47//!   at the end of the file. The goal of this change was to ideally reduce
48//!   merge conflicts being generated on `Cargo.lock`. Updating a version of a
49//!   package now only updates two lines in the file, the checksum and the
50//!   version number, most of the time. Dependency edges are specified in a
51//!   compact form where possible where just the name is listed. The
52//!   version/source on dependency edges are only listed if necessary to
53//!   disambiguate which version or which source is in use.
54//!
55//! * A comment at the top of the file indicates that the file is a generated
56//!   file and contains the special symbol `@generated` to indicate to common
57//!   review tools that it's a generated file.
58//!
59//! * A `[root]` entry for the "root crate" has been removed and instead now
60//!   included in `[[package]]` like everything else.
61//!
62//! * All packages from registries contain a `checksum` which is a sha256
63//!   checksum of the tarball the package is associated with. This is all stored
64//!   in the `[metadata]` table of `Cargo.lock` which all versions of Cargo
65//!   since 1.0 have preserved. The goal of this was to start recording
66//!   checksums so mirror sources can be verified.
67//!
68//! ## Other oddities about `Cargo.lock`
69//!
70//! There's a few other miscellaneous weird things about `Cargo.lock` that you
71//! may want to be aware of when reading this file:
72//!
73//! * All packages have a `source` listed to indicate where they come from. For
74//!   `path` dependencies, however, no `source` is listed. There's no way we
75//!   could emit a filesystem path name and have that be portable across
76//!   systems, so all packages from a `path` are not listed with a `source`.
77//!   Note that this also means that all packages with `path` sources must have
78//!   unique names.
79//!
80//! * The `[metadata]` table in `Cargo.lock` is intended to be a generic mapping
81//!   of strings to strings that's simply preserved by Cargo. This was a very
82//!   early effort to be forward compatible against changes to `Cargo.lock`'s
83//!   format. This is nowadays sort of deemed a bad idea though and we don't
84//!   really use it that much except for `checksum`s historically. It's not
85//!   really recommended to use this.
86//!
87//! * The actual literal on-disk serialiation is found in
88//!   `src/cargo/ops/lockfile.rs` which basically renders a `toml::Value` in a
89//!   special fashion to make sure we have strict control over the on-disk
90//!   format.
91
92use std::collections::{BTreeMap, HashMap, HashSet};
93use std::fmt;
94use std::str::FromStr;
95
96use log::debug;
97use serde::de;
98use serde::ser;
99use serde::{Deserialize, Serialize};
100
101use crate::core::InternedString;
102use crate::core::{Dependency, Package, PackageId, SourceId, Workspace};
103use crate::util::errors::{CargoResult, CargoResultExt};
104use crate::util::{internal, Graph};
105
106use super::{Resolve, ResolveVersion};
107
108/// The `Cargo.lock` structure.
109#[derive(Serialize, Deserialize, Debug)]
110pub struct EncodableResolve {
111    package: Option<Vec<EncodableDependency>>,
112    /// `root` is optional to allow backward compatibility.
113    root: Option<EncodableDependency>,
114    metadata: Option<Metadata>,
115    #[serde(default, skip_serializing_if = "Patch::is_empty")]
116    patch: Patch,
117}
118
119#[derive(Serialize, Deserialize, Debug, Default)]
120struct Patch {
121    unused: Vec<EncodableDependency>,
122}
123
124pub type Metadata = BTreeMap<String, String>;
125
126impl EncodableResolve {
127    /// Convert a `Cargo.lock` to a Resolve.
128    ///
129    /// Note that this `Resolve` is not "complete". For example, the
130    /// dependencies do not know the difference between regular/dev/build
131    /// dependencies, so they are not filled in. It also does not include
132    /// `features`. Care should be taken when using this Resolve. One of the
133    /// primary uses is to be used with `resolve_with_previous` to guide the
134    /// resolver to create a complete Resolve.
135    pub fn into_resolve(self, original: &str, ws: &Workspace<'_>) -> CargoResult<Resolve> {
136        let path_deps = build_path_deps(ws);
137        let mut checksums = HashMap::new();
138
139        // We assume an older format is being parsed until we see so otherwise.
140        let mut version = ResolveVersion::V1;
141
142        let packages = {
143            let mut packages = self.package.unwrap_or_default();
144            if let Some(root) = self.root {
145                packages.insert(0, root);
146            }
147            packages
148        };
149
150        // `PackageId`s in the lock file don't include the `source` part
151        // for workspace members, so we reconstruct proper IDs.
152        let live_pkgs = {
153            let mut live_pkgs = HashMap::new();
154            let mut all_pkgs = HashSet::new();
155            for pkg in packages.iter() {
156                let enc_id = EncodablePackageId {
157                    name: pkg.name.clone(),
158                    version: Some(pkg.version.clone()),
159                    source: pkg.source,
160                };
161
162                if !all_pkgs.insert(enc_id.clone()) {
163                    anyhow::bail!("package `{}` is specified twice in the lockfile", pkg.name);
164                }
165                let id = match pkg.source.as_ref().or_else(|| path_deps.get(&pkg.name)) {
166                    // We failed to find a local package in the workspace.
167                    // It must have been removed and should be ignored.
168                    None => {
169                        debug!("path dependency now missing {} v{}", pkg.name, pkg.version);
170                        continue;
171                    }
172                    Some(&source) => PackageId::new(&pkg.name, &pkg.version, source)?,
173                };
174
175                // If a package has a checksum listed directly on it then record
176                // that here, and we also bump our version up to 2 since V1
177                // didn't ever encode this field.
178                if let Some(cksum) = &pkg.checksum {
179                    version = ResolveVersion::V2;
180                    checksums.insert(id, Some(cksum.clone()));
181                }
182
183                assert!(live_pkgs.insert(enc_id, (id, pkg)).is_none())
184            }
185            live_pkgs
186        };
187
188        // When decoding a V2 version the edges in `dependencies` aren't
189        // guaranteed to have either version or source information. This `map`
190        // is used to find package ids even if dependencies have missing
191        // information. This map is from name to version to source to actual
192        // package ID. (various levels to drill down step by step)
193        let mut map = HashMap::new();
194        for (id, _) in live_pkgs.values() {
195            map.entry(id.name().as_str())
196                .or_insert_with(HashMap::new)
197                .entry(id.version().to_string())
198                .or_insert_with(HashMap::new)
199                .insert(id.source_id(), *id);
200        }
201
202        let mut lookup_id = |enc_id: &EncodablePackageId| -> Option<PackageId> {
203            // The name of this package should always be in the larger list of
204            // all packages.
205            let by_version = map.get(enc_id.name.as_str())?;
206
207            // If the version is provided, look that up. Otherwise if the
208            // version isn't provided this is a V2 manifest and we should only
209            // have one version for this name. If we have more than one version
210            // for the name then it's ambiguous which one we'd use. That
211            // shouldn't ever actually happen but in theory bad git merges could
212            // produce invalid lock files, so silently ignore these cases.
213            let by_source = match &enc_id.version {
214                Some(version) => by_version.get(version)?,
215                None => {
216                    version = ResolveVersion::V2;
217                    if by_version.len() == 1 {
218                        by_version.values().next().unwrap()
219                    } else {
220                        return None;
221                    }
222                }
223            };
224
225            // This is basically the same as above. Note though that `source` is
226            // always missing for path dependencies regardless of serialization
227            // format. That means we have to handle the `None` case a bit more
228            // carefully.
229            match &enc_id.source {
230                Some(source) => by_source.get(source).cloned(),
231                None => {
232                    // Look through all possible packages ids for this
233                    // name/version. If there's only one `path` dependency then
234                    // we are hardcoded to use that since `path` dependencies
235                    // can't have a source listed.
236                    let mut path_packages = by_source.values().filter(|p| p.source_id().is_path());
237                    if let Some(path) = path_packages.next() {
238                        if path_packages.next().is_some() {
239                            return None;
240                        }
241                        Some(*path)
242
243                    // ... otherwise if there's only one then we must be
244                    // implicitly using that one due to a V2 serialization of
245                    // the lock file
246                    } else if by_source.len() == 1 {
247                        let id = by_source.values().next().unwrap();
248                        version = ResolveVersion::V2;
249                        Some(*id)
250
251                    // ... and failing that we probably had a bad git merge of
252                    // `Cargo.lock` or something like that, so just ignore this.
253                    } else {
254                        None
255                    }
256                }
257            }
258        };
259
260        let mut g = Graph::new();
261
262        for &(ref id, _) in live_pkgs.values() {
263            g.add(id.clone());
264        }
265
266        for &(ref id, pkg) in live_pkgs.values() {
267            let deps = match pkg.dependencies {
268                Some(ref deps) => deps,
269                None => continue,
270            };
271
272            for edge in deps.iter() {
273                if let Some(to_depend_on) = lookup_id(edge) {
274                    g.link(id.clone(), to_depend_on);
275                }
276            }
277        }
278
279        let replacements = {
280            let mut replacements = HashMap::new();
281            for &(ref id, pkg) in live_pkgs.values() {
282                if let Some(ref replace) = pkg.replace {
283                    assert!(pkg.dependencies.is_none());
284                    if let Some(replace_id) = lookup_id(replace) {
285                        replacements.insert(id.clone(), replace_id);
286                    }
287                }
288            }
289            replacements
290        };
291
292        let mut metadata = self.metadata.unwrap_or_default();
293
294        // In the V1 serialization formats all checksums were listed in the lock
295        // file in the `[metadata]` section, so if we're still V1 then look for
296        // that here.
297        let prefix = "checksum ";
298        let mut to_remove = Vec::new();
299        for (k, v) in metadata.iter().filter(|p| p.0.starts_with(prefix)) {
300            to_remove.push(k.to_string());
301            let k = &k[prefix.len()..];
302            let enc_id: EncodablePackageId = k
303                .parse()
304                .chain_err(|| internal("invalid encoding of checksum in lockfile"))?;
305            let id = match lookup_id(&enc_id) {
306                Some(id) => id,
307                _ => continue,
308            };
309
310            let v = if v == "<none>" {
311                None
312            } else {
313                Some(v.to_string())
314            };
315            checksums.insert(id, v);
316        }
317        // If `checksum` was listed in `[metadata]` but we were previously
318        // listed as `V2` then assume some sort of bad git merge happened, so
319        // discard all checksums and let's regenerate them later.
320        if !to_remove.is_empty() && version == ResolveVersion::V2 {
321            checksums.drain();
322        }
323        for k in to_remove {
324            metadata.remove(&k);
325        }
326
327        let mut unused_patches = Vec::new();
328        for pkg in self.patch.unused {
329            let id = match pkg.source.as_ref().or_else(|| path_deps.get(&pkg.name)) {
330                Some(&src) => PackageId::new(&pkg.name, &pkg.version, src)?,
331                None => continue,
332            };
333            unused_patches.push(id);
334        }
335
336        // We have a curious issue where in the "v1 format" we buggily had a
337        // trailing blank line at the end of lock files under some specific
338        // conditions.
339        //
340        // Cargo is trying to write new lockfies in the "v2 format" but if you
341        // have no dependencies, for example, then the lockfile encoded won't
342        // really have any indicator that it's in the new format (no
343        // dependencies or checksums listed). This means that if you type `cargo
344        // new` followed by `cargo build` it will generate a "v2 format" lock
345        // file since none previously existed. When reading this on the next
346        // `cargo build`, however, it generates a new lock file because when
347        // reading in that lockfile we think it's the v1 format.
348        //
349        // To help fix this issue we special case here. If our lockfile only has
350        // one trailing newline, not two, *and* it only has one package, then
351        // this is actually the v2 format.
352        if original.ends_with('\n')
353            && !original.ends_with("\n\n")
354            && version == ResolveVersion::V1
355            && g.iter().count() == 1
356        {
357            version = ResolveVersion::V2;
358        }
359
360        Ok(Resolve::new(
361            g,
362            replacements,
363            HashMap::new(),
364            checksums,
365            metadata,
366            unused_patches,
367            version,
368            HashMap::new(),
369        ))
370    }
371}
372
373fn build_path_deps(ws: &Workspace<'_>) -> HashMap<String, SourceId> {
374    // If a crate is **not** a path source, then we're probably in a situation
375    // such as `cargo install` with a lock file from a remote dependency. In
376    // that case we don't need to fixup any path dependencies (as they're not
377    // actually path dependencies any more), so we ignore them.
378    let members = ws
379        .members()
380        .filter(|p| p.package_id().source_id().is_path())
381        .collect::<Vec<_>>();
382
383    let mut ret = HashMap::new();
384    let mut visited = HashSet::new();
385    for member in members.iter() {
386        ret.insert(
387            member.package_id().name().to_string(),
388            member.package_id().source_id(),
389        );
390        visited.insert(member.package_id().source_id());
391    }
392    for member in members.iter() {
393        build_pkg(member, ws, &mut ret, &mut visited);
394    }
395    for deps in ws.root_patch().values() {
396        for dep in deps {
397            build_dep(dep, ws, &mut ret, &mut visited);
398        }
399    }
400    for &(_, ref dep) in ws.root_replace() {
401        build_dep(dep, ws, &mut ret, &mut visited);
402    }
403
404    return ret;
405
406    fn build_pkg(
407        pkg: &Package,
408        ws: &Workspace<'_>,
409        ret: &mut HashMap<String, SourceId>,
410        visited: &mut HashSet<SourceId>,
411    ) {
412        for dep in pkg.dependencies() {
413            build_dep(dep, ws, ret, visited);
414        }
415    }
416
417    fn build_dep(
418        dep: &Dependency,
419        ws: &Workspace<'_>,
420        ret: &mut HashMap<String, SourceId>,
421        visited: &mut HashSet<SourceId>,
422    ) {
423        let id = dep.source_id();
424        if visited.contains(&id) || !id.is_path() {
425            return;
426        }
427        let path = match id.url().to_file_path() {
428            Ok(p) => p.join("Cargo.toml"),
429            Err(_) => return,
430        };
431        let pkg = match ws.load(&path) {
432            Ok(p) => p,
433            Err(_) => return,
434        };
435        ret.insert(pkg.name().to_string(), pkg.package_id().source_id());
436        visited.insert(pkg.package_id().source_id());
437        build_pkg(&pkg, ws, ret, visited);
438    }
439}
440
441impl Patch {
442    fn is_empty(&self) -> bool {
443        self.unused.is_empty()
444    }
445}
446
447#[derive(Serialize, Deserialize, Debug, PartialOrd, Ord, PartialEq, Eq)]
448pub struct EncodableDependency {
449    name: String,
450    version: String,
451    source: Option<SourceId>,
452    checksum: Option<String>,
453    dependencies: Option<Vec<EncodablePackageId>>,
454    replace: Option<EncodablePackageId>,
455}
456
457#[derive(Debug, PartialOrd, Ord, PartialEq, Eq, Hash, Clone)]
458pub struct EncodablePackageId {
459    name: String,
460    version: Option<String>,
461    source: Option<SourceId>,
462}
463
464impl fmt::Display for EncodablePackageId {
465    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
466        write!(f, "{}", self.name)?;
467        if let Some(s) = &self.version {
468            write!(f, " {}", s)?;
469        }
470        if let Some(s) = &self.source {
471            write!(f, " ({})", s.into_url())?;
472        }
473        Ok(())
474    }
475}
476
477impl FromStr for EncodablePackageId {
478    type Err = anyhow::Error;
479
480    fn from_str(s: &str) -> CargoResult<EncodablePackageId> {
481        let mut s = s.splitn(3, ' ');
482        let name = s.next().unwrap();
483        let version = s.next();
484        let source_id = match s.next() {
485            Some(s) => {
486                if s.starts_with('(') && s.ends_with(')') {
487                    Some(SourceId::from_url(&s[1..s.len() - 1])?)
488                } else {
489                    anyhow::bail!("invalid serialized PackageId")
490                }
491            }
492            None => None,
493        };
494
495        Ok(EncodablePackageId {
496            name: name.to_string(),
497            version: version.map(|v| v.to_string()),
498            source: source_id,
499        })
500    }
501}
502
503impl ser::Serialize for EncodablePackageId {
504    fn serialize<S>(&self, s: S) -> Result<S::Ok, S::Error>
505    where
506        S: ser::Serializer,
507    {
508        s.collect_str(self)
509    }
510}
511
512impl<'de> de::Deserialize<'de> for EncodablePackageId {
513    fn deserialize<D>(d: D) -> Result<EncodablePackageId, D::Error>
514    where
515        D: de::Deserializer<'de>,
516    {
517        String::deserialize(d).and_then(|string| {
518            string
519                .parse::<EncodablePackageId>()
520                .map_err(de::Error::custom)
521        })
522    }
523}
524
525impl<'a> ser::Serialize for Resolve {
526    fn serialize<S>(&self, s: S) -> Result<S::Ok, S::Error>
527    where
528        S: ser::Serializer,
529    {
530        let mut ids: Vec<_> = self.iter().collect();
531        ids.sort();
532
533        let state = EncodeState::new(self);
534
535        let encodable = ids
536            .iter()
537            .map(|&id| encodable_resolve_node(id, self, &state))
538            .collect::<Vec<_>>();
539
540        let mut metadata = self.metadata().clone();
541
542        if *self.version() == ResolveVersion::V1 {
543            for &id in ids.iter().filter(|id| !id.source_id().is_path()) {
544                let checksum = match self.checksums()[&id] {
545                    Some(ref s) => &s[..],
546                    None => "<none>",
547                };
548                let id = encodable_package_id(id, &state);
549                metadata.insert(format!("checksum {}", id.to_string()), checksum.to_string());
550            }
551        }
552
553        let metadata = if metadata.is_empty() {
554            None
555        } else {
556            Some(metadata)
557        };
558
559        let patch = Patch {
560            unused: self
561                .unused_patches()
562                .iter()
563                .map(|id| EncodableDependency {
564                    name: id.name().to_string(),
565                    version: id.version().to_string(),
566                    source: encode_source(id.source_id()),
567                    dependencies: None,
568                    replace: None,
569                    checksum: match self.version() {
570                        ResolveVersion::V2 => self.checksums().get(id).and_then(|x| x.clone()),
571                        ResolveVersion::V1 => None,
572                    },
573                })
574                .collect(),
575        };
576        EncodableResolve {
577            package: Some(encodable),
578            root: None,
579            metadata,
580            patch,
581        }
582        .serialize(s)
583    }
584}
585
586pub struct EncodeState<'a> {
587    counts: Option<HashMap<InternedString, HashMap<&'a semver::Version, usize>>>,
588}
589
590impl<'a> EncodeState<'a> {
591    pub fn new(resolve: &'a Resolve) -> EncodeState<'a> {
592        let counts = if *resolve.version() == ResolveVersion::V2 {
593            let mut map = HashMap::new();
594            for id in resolve.iter() {
595                let slot = map
596                    .entry(id.name())
597                    .or_insert_with(HashMap::new)
598                    .entry(id.version())
599                    .or_insert(0);
600                *slot += 1;
601            }
602            Some(map)
603        } else {
604            None
605        };
606        EncodeState { counts }
607    }
608}
609
610fn encodable_resolve_node(
611    id: PackageId,
612    resolve: &Resolve,
613    state: &EncodeState<'_>,
614) -> EncodableDependency {
615    let (replace, deps) = match resolve.replacement(id) {
616        Some(id) => (Some(encodable_package_id(id, state)), None),
617        None => {
618            let mut deps = resolve
619                .deps_not_replaced(id)
620                .map(|(id, _)| encodable_package_id(id, state))
621                .collect::<Vec<_>>();
622            deps.sort();
623            (None, Some(deps))
624        }
625    };
626
627    EncodableDependency {
628        name: id.name().to_string(),
629        version: id.version().to_string(),
630        source: encode_source(id.source_id()),
631        dependencies: deps,
632        replace,
633        checksum: match resolve.version() {
634            ResolveVersion::V2 => resolve.checksums().get(&id).and_then(|s| s.clone()),
635            ResolveVersion::V1 => None,
636        },
637    }
638}
639
640pub fn encodable_package_id(id: PackageId, state: &EncodeState<'_>) -> EncodablePackageId {
641    let mut version = Some(id.version().to_string());
642    let mut source = encode_source(id.source_id()).map(|s| s.with_precise(None));
643    if let Some(counts) = &state.counts {
644        let version_counts = &counts[&id.name()];
645        if version_counts[&id.version()] == 1 {
646            source = None;
647            if version_counts.len() == 1 {
648                version = None;
649            }
650        }
651    }
652    EncodablePackageId {
653        name: id.name().to_string(),
654        version,
655        source,
656    }
657}
658
659fn encode_source(id: SourceId) -> Option<SourceId> {
660    if id.is_path() {
661        None
662    } else {
663        Some(id)
664    }
665}